Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Remove the missing (NaN) values in the DataFrame
To remove missing values (NaN) from a DataFrame, use the dropna() method. This method removes rows or columns containing missing values based on your requirements.
Creating a DataFrame with Missing Values
First, let's create a DataFrame with some missing values to demonstrate the concept ?
import pandas as pd
import numpy as np
# Create a DataFrame with missing values
data = {
'Car': ['Audi', 'Porsche', 'RollsRoyce', 'BMW', 'Mercedes'],
'Place': ['Bangalore', 'Mumbai', 'Pune', 'Delhi', 'Hyderabad'],
'UnitsSold': [80.0, np.nan, 100.0, np.nan, 80.0]
}
df = pd.DataFrame(data)
print("DataFrame with missing values:")
print(df)
DataFrame with missing values:
Car Place UnitsSold
0 Audi Bangalore 80.0
1 Porsche Mumbai NaN
2 RollsRoyce Pune 100.0
3 BMW Delhi NaN
4 Mercedes Hyderabad 80.0
Using dropna() to Remove Missing Values
The dropna() method removes all rows containing any missing values by default ?
import pandas as pd
import numpy as np
# Create DataFrame with missing values
data = {
'Car': ['Audi', 'Porsche', 'RollsRoyce', 'BMW', 'Mercedes'],
'Place': ['Bangalore', 'Mumbai', 'Pune', 'Delhi', 'Hyderabad'],
'UnitsSold': [80.0, np.nan, 100.0, np.nan, 80.0]
}
df = pd.DataFrame(data)
print("Original DataFrame shape:", df.shape)
print("\nDataFrame after removing NaN values:")
clean_df = df.dropna()
print(clean_df)
print("\nNew DataFrame shape:", clean_df.shape)
Original DataFrame shape: (5, 3)
DataFrame after removing NaN values:
Car Place UnitsSold
0 Audi Bangalore 80.0
2 RollsRoyce Pune 100.0
4 Mercedes Hyderabad 80.0
New DataFrame shape: (3, 3)
dropna() Parameters
The dropna() method provides several parameters for more control ?
import pandas as pd
import numpy as np
# Create DataFrame with missing values
data = {
'A': [1, 2, np.nan],
'B': [np.nan, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print("\nDrop rows with any NaN (default):")
print(df.dropna())
print("\nDrop columns with any NaN:")
print(df.dropna(axis=1))
print("\nDrop rows only if all values are NaN:")
print(df.dropna(how='all'))
Original DataFrame:
A B C
0 1.0 NaN 7
1 2.0 5.0 8
2 NaN 6.0 9
Drop rows with any NaN (default):
A B C
1 2.0 5.0 8
Drop columns with any NaN:
C
0 7
1 8
2 9
Drop rows only if all values are NaN:
A B C
0 1.0 NaN 7
1 2.0 5.0 8
2 NaN 6.0 9
Key Parameters
| Parameter | Description | Default |
|---|---|---|
axis |
0 for rows, 1 for columns | 0 |
how |
'any' or 'all' | 'any' |
inplace |
Modify original DataFrame | False |
Conclusion
Use dropna() to remove missing values from DataFrames. By default, it removes rows with any NaN values, but you can customize this behavior using parameters like axis and how.
Advertisements
