Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Drop Empty Columns in Pandas
Pandas DataFrames often contain empty columns filled with NaN values that can clutter your data analysis. Python provides several efficient methods to identify and remove these empty columns to create cleaner, more relevant datasets.
What Are Empty Columns?
In pandas, a column is considered empty when it contains only NaN (Not a Number) values. Note that columns with empty strings, zeros, or spaces are not considered empty since these values may carry meaningful information about your dataset.
Creating a DataFrame with Empty Columns
Let's start by creating a sample DataFrame that includes an empty column filled with NaN values ?
import pandas as pd
import numpy as np
dataset = {
"Hostel ID": ["DSC224", "DSC124", "DSC568", "DSC345"],
"Hostel Rating": [8, 6, 10, 5],
"Hostel price": [35000, 32000, 50000, 24000],
"Hostel location": [np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(dataset, index=["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"])
print("Original DataFrame:")
print(df)
Original DataFrame:
Hostel ID Hostel Rating Hostel price Hostel location
Hostel 1 DSC224 8 35000 NaN
Hostel 2 DSC124 6 32000 NaN
Hostel 3 DSC568 10 50000 NaN
Hostel 4 DSC345 5 24000 NaN
Method 1: Using dropna() with how='all'
The dropna() method with how='all' removes columns where all values are NaN ?
import pandas as pd
import numpy as np
dataset = {
"Hostel ID": ["DSC224", "DSC124", "DSC568", "DSC345"],
"Hostel Rating": [8, 6, 10, 5],
"Hostel price": [35000, 32000, 50000, 24000],
"Hostel location": [np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(dataset, index=["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"])
# Drop columns where all values are NaN
cleaned_df = df.dropna(how='all', axis=1)
print("After dropping empty columns:")
print(cleaned_df)
After dropping empty columns:
Hostel ID Hostel Rating Hostel price
Hostel 1 DSC224 8 35000
Hostel 2 DSC124 6 32000
Hostel 3 DSC568 10 50000
Hostel 4 DSC345 5 24000
Note: To modify the original DataFrame instead of creating a new one, use inplace=True.
df.dropna(how='all', axis=1, inplace=True)
Method 2: Using notnull() with Column Selection
This approach uses notnull() to identify columns that contain at least one non-null value ?
import pandas as pd
import numpy as np
dataset = {
"Hostel ID": ["DSC224", "DSC124", "DSC568", "DSC345"],
"Hostel Rating": [8, 6, 10, 5],
"Hostel price": [35000, 32000, 50000, 24000],
"Hostel location": [np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(dataset, index=["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"])
# Select columns that have at least one non-null value
cleaned_df = df.loc[:, df.notnull().any(axis=0)]
print("Using notnull() method:")
print(cleaned_df)
Using notnull() method:
Hostel ID Hostel Rating Hostel price
Hostel 1 DSC224 8 35000
Hostel 2 DSC124 6 32000
Hostel 3 DSC568 10 50000
Hostel 4 DSC345 5 24000
Comparison of Methods
| Method | Syntax | Best For |
|---|---|---|
dropna() |
df.dropna(how='all', axis=1) |
Simple and readable approach |
notnull() |
df.loc[:, df.notnull().any(axis=0)] |
More control over selection logic |
Conclusion
Use dropna(how='all', axis=1) for the simplest approach to remove empty columns. The notnull() method provides more flexibility when you need custom filtering logic. Both methods help maintain clean, relevant DataFrames for better data analysis.
