Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Select columns with specific datatypes
To select columns with specific datatypes in Pandas, use the select_dtypes() method with the include parameter. This method allows you to filter DataFrame columns based on their data types such as object, int64, float64, etc.
Syntax
DataFrame.select_dtypes(include=None, exclude=None)
Parameters
- include ? List of data types to include
- exclude ? List of data types to exclude
Creating a Sample DataFrame
Let's start by creating a DataFrame with different data types ?
import pandas as pd
# Create DataFrame with multiple data types
dataFrame = pd.DataFrame(
{
"Student": ['Jack', 'Robin', 'Ted', 'Marc', 'Scarlett', 'Kat', 'John'],
"Roll_Number": [5, 10, 3, 8, 2, 9, 6],
"Grade": [85.5, 90.0, 78.5, 92.0, 88.0, 76.5, 94.0]
}
)
print("DataFrame:")
print(dataFrame)
print("\nData types:")
print(dataFrame.dtypes)
DataFrame: Student Roll_Number Grade 0 Jack 5 85.5 1 Robin 10 90.0 2 Ted 3 78.5 3 Marc 8 92.0 4 Scarlett 2 88.0 5 Kat 9 76.5 6 John 6 94.0 Data types: Student object Roll_Number int64 Grade float64 dtype: object
Selecting Columns by Specific Data Types
Now let's select columns with specific data types ?
import pandas as pd
# Create DataFrame
dataFrame = pd.DataFrame(
{
"Student": ['Jack', 'Robin', 'Ted', 'Marc', 'Scarlett', 'Kat', 'John'],
"Roll_Number": [5, 10, 3, 8, 2, 9, 6],
"Grade": [85.5, 90.0, 78.5, 92.0, 88.0, 76.5, 94.0]
}
)
# Select columns with object datatype (strings)
object_columns = dataFrame.select_dtypes(include=['object']).columns
print("Object type columns:", object_columns.tolist())
# Select columns with int64 datatype
int_columns = dataFrame.select_dtypes(include=['int64']).columns
print("Integer type columns:", int_columns.tolist())
# Select columns with float64 datatype
float_columns = dataFrame.select_dtypes(include=['float64']).columns
print("Float type columns:", float_columns.tolist())
Object type columns: ['Student'] Integer type columns: ['Roll_Number'] Float type columns: ['Grade']
Selecting Multiple Data Types
You can also select columns with multiple data types at once ?
import pandas as pd
# Create DataFrame
dataFrame = pd.DataFrame(
{
"Student": ['Jack', 'Robin', 'Ted', 'Marc', 'Scarlett', 'Kat', 'John'],
"Roll_Number": [5, 10, 3, 8, 2, 9, 6],
"Grade": [85.5, 90.0, 78.5, 92.0, 88.0, 76.5, 94.0]
}
)
# Select both numeric columns (int64 and float64)
numeric_columns = dataFrame.select_dtypes(include=['int64', 'float64'])
print("Numeric columns:")
print(numeric_columns)
# Select only the actual DataFrame with numeric types
print("\nColumn names:", numeric_columns.columns.tolist())
Numeric columns: Roll_Number Grade 0 5 85.5 1 10 90.0 2 3 78.5 3 8 92.0 4 2 88.0 5 9 76.5 6 6 94.0 Column names: ['Roll_Number', 'Grade']
Using Generic Data Type Categories
Pandas also supports generic data type categories ?
import pandas as pd
import numpy as np
# Create DataFrame with various data types
dataFrame = pd.DataFrame(
{
"Name": ['Alice', 'Bob', 'Charlie'],
"Age": [25, 30, 35],
"Salary": [50000.5, 60000.0, 70000.0],
"Active": [True, False, True]
}
)
print("Data types:")
print(dataFrame.dtypes)
# Select all numeric columns
numeric_cols = dataFrame.select_dtypes(include=[np.number]).columns
print("\nNumeric columns:", numeric_cols.tolist())
# Select all non-numeric columns
non_numeric_cols = dataFrame.select_dtypes(exclude=[np.number]).columns
print("Non-numeric columns:", non_numeric_cols.tolist())
Data types: Name object Age int64 Salary float64 Active bool dtype: object Numeric columns: ['Age', 'Salary'] Non-numeric columns: ['Name', 'Active']
Comparison
| Method | Purpose | Example |
|---|---|---|
include=['object'] |
Select string columns | Text, categorical data |
include=[np.number] |
Select all numeric columns | int, float types |
exclude=[np.number] |
Select non-numeric columns | strings, booleans, dates |
Conclusion
The select_dtypes() method is essential for data preprocessing and analysis. Use include to select specific data types and exclude to filter them out. This helps in applying type-specific operations efficiently.
Advertisements
