Python - Select columns with specific datatypes

To select columns with specific datatypes in Pandas, use the select_dtypes() method with the include parameter. This method allows you to filter DataFrame columns based on their data types such as object, int64, float64, etc.

Syntax

DataFrame.select_dtypes(include=None, exclude=None)

Parameters

  • include ? List of data types to include
  • exclude ? List of data types to exclude

Creating a Sample DataFrame

Let's start by creating a DataFrame with different data types ?

import pandas as pd

# Create DataFrame with multiple data types
dataFrame = pd.DataFrame(
    {
        "Student": ['Jack', 'Robin', 'Ted', 'Marc', 'Scarlett', 'Kat', 'John'],
        "Roll_Number": [5, 10, 3, 8, 2, 9, 6],
        "Grade": [85.5, 90.0, 78.5, 92.0, 88.0, 76.5, 94.0]
    }
)

print("DataFrame:")
print(dataFrame)
print("\nData types:")
print(dataFrame.dtypes)
DataFrame:
   Student  Roll_Number  Grade
0     Jack            5   85.5
1    Robin           10   90.0
2      Ted            3   78.5
3     Marc            8   92.0
4 Scarlett            2   88.0
5      Kat            9   76.5
6     John            6   94.0

Data types:
Student        object
Roll_Number     int64
Grade         float64
dtype: object

Selecting Columns by Specific Data Types

Now let's select columns with specific data types ?

import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame(
    {
        "Student": ['Jack', 'Robin', 'Ted', 'Marc', 'Scarlett', 'Kat', 'John'],
        "Roll_Number": [5, 10, 3, 8, 2, 9, 6],
        "Grade": [85.5, 90.0, 78.5, 92.0, 88.0, 76.5, 94.0]
    }
)

# Select columns with object datatype (strings)
object_columns = dataFrame.select_dtypes(include=['object']).columns
print("Object type columns:", object_columns.tolist())

# Select columns with int64 datatype
int_columns = dataFrame.select_dtypes(include=['int64']).columns
print("Integer type columns:", int_columns.tolist())

# Select columns with float64 datatype
float_columns = dataFrame.select_dtypes(include=['float64']).columns
print("Float type columns:", float_columns.tolist())
Object type columns: ['Student']
Integer type columns: ['Roll_Number']
Float type columns: ['Grade']

Selecting Multiple Data Types

You can also select columns with multiple data types at once ?

import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame(
    {
        "Student": ['Jack', 'Robin', 'Ted', 'Marc', 'Scarlett', 'Kat', 'John'],
        "Roll_Number": [5, 10, 3, 8, 2, 9, 6],
        "Grade": [85.5, 90.0, 78.5, 92.0, 88.0, 76.5, 94.0]
    }
)

# Select both numeric columns (int64 and float64)
numeric_columns = dataFrame.select_dtypes(include=['int64', 'float64'])
print("Numeric columns:")
print(numeric_columns)

# Select only the actual DataFrame with numeric types
print("\nColumn names:", numeric_columns.columns.tolist())
Numeric columns:
   Roll_Number  Grade
0            5   85.5
1           10   90.0
2            3   78.5
3            8   92.0
4            2   88.0
5            9   76.5
6            6   94.0

Column names: ['Roll_Number', 'Grade']

Using Generic Data Type Categories

Pandas also supports generic data type categories ?

import pandas as pd
import numpy as np

# Create DataFrame with various data types
dataFrame = pd.DataFrame(
    {
        "Name": ['Alice', 'Bob', 'Charlie'],
        "Age": [25, 30, 35],
        "Salary": [50000.5, 60000.0, 70000.0],
        "Active": [True, False, True]
    }
)

print("Data types:")
print(dataFrame.dtypes)

# Select all numeric columns
numeric_cols = dataFrame.select_dtypes(include=[np.number]).columns
print("\nNumeric columns:", numeric_cols.tolist())

# Select all non-numeric columns
non_numeric_cols = dataFrame.select_dtypes(exclude=[np.number]).columns
print("Non-numeric columns:", non_numeric_cols.tolist())
Data types:
Name       object
Age         int64
Salary    float64
Active       bool
dtype: object

Numeric columns: ['Age', 'Salary']
Non-numeric columns: ['Name', 'Active']

Comparison

Method Purpose Example
include=['object'] Select string columns Text, categorical data
include=[np.number] Select all numeric columns int, float types
exclude=[np.number] Select non-numeric columns strings, booleans, dates

Conclusion

The select_dtypes() method is essential for data preprocessing and analysis. Use include to select specific data types and exclude to filter them out. This helps in applying type-specific operations efficiently.

Updated on: 2026-03-26T03:04:59+05:30

268 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements