Highlight the NaN values in Pandas Dataframe

Working with incomplete or missing data is a common challenge in data analysis, and the initial step towards addressing this problem is to identify the NaN (missing) values in data structures like a Pandas DataFrame. In a Pandas DataFrame, these missing values are often represented as NaN (Not a Number) values, which can occur due to various reasons like errors during data entry, extraction, or processing.

Fortunately, Pandas offers a range of effective techniques for detecting and managing missing values. This article will explore multiple approaches to identify NaN values within a Pandas DataFrame, including utilizing built-in functions like isna(), notna(), and info(), as well as employing advanced methods like heatmap visualization.

Using isna() to Detect Missing Values

The isna() function returns a DataFrame of the same shape as the input, where each element is True if it is a NaN value and False otherwise ?

import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing values
data = {'Name': ['Alice', 'Bob', None, 'David', 'Eve'], 
        'Age': [25, None, 30, 28, None],
        'City': ['New York', 'Paris', 'London', None, 'Tokyo']}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print("\nNaN Detection using isna():")
print(df.isna())
Original DataFrame:
    Name   Age      City
0  Alice  25.0  New York
1    Bob   NaN     Paris
2   None  30.0    London
3  David  28.0      None
4    Eve   NaN     Tokyo

NaN Detection using isna():
    Name    Age   City
0  False  False  False
1  False   True  False
2   True  False  False
3  False  False   True
4  False   True  False

Using notna() to Identify Non-Missing Values

The notna() function returns the opposite of isna(), marking True for non-missing values ?

import pandas as pd
import numpy as np

data = {'Name': ['Alice', 'Bob', None, 'David', 'Eve'], 
        'Age': [25, None, 30, 28, None]}
df = pd.DataFrame(data)

print("Non-NaN Detection using notna():")
print(df.notna())

# Count non-missing values per column
print("\nCount of non-missing values:")
print(df.notna().sum())
Non-NaN Detection using notna():
    Name    Age
0   True   True
1   True  False
2  False   True
3   True   True
4   True  False

Count of non-missing values:
Name    4
Age     3
dtype: int64

Using info() for DataFrame Summary

The info() method provides a comprehensive summary including the number of non-null values in each column ?

import pandas as pd
import numpy as np

data = {'Product': ['A', 'B', None, 'D', 'E'], 
        'Price': [10.5, None, 15.0, 12.5, None],
        'Stock': [100, 50, None, 75, 25]}
df = pd.DataFrame(data)

print("DataFrame Info:")
df.info()
DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Product  4 non-null      object 
 1   Price    3 non-null      float64
 2   Stock    4 non-null      float64
dtypes: float64(2), object(1)
memory usage: 248.0 bytes

Finding Specific NaN Locations

You can combine boolean indexing to find exact positions of missing values ?

import pandas as pd
import numpy as np

data = {'A': [1, 2, None, 4], 'B': [None, 6, 7, 8], 'C': [9, 10, 11, None]}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Find rows with any NaN values
rows_with_nan = df[df.isna().any(axis=1)]
print("\nRows containing NaN values:")
print(rows_with_nan)

# Count NaN values per column
print("\nNaN count per column:")
print(df.isna().sum())
Original DataFrame:
     A    B     C
0  1.0  NaN   9.0
1  2.0  6.0  10.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN

Rows containing NaN values:
     A    B     C
0  1.0  NaN   9.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN

NaN count per column:
A    1
B    1
C    1
dtype: int64

Visualizing Missing Data with Heatmap

For large datasets, a heatmap provides an intuitive visual representation of missing data patterns. This requires external libraries like matplotlib and seaborn ?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Create a larger sample DataFrame
data = {
    'A': [1, 2, None, 4, 5, None, 7, 8],
    'B': [None, 2, 3, None, 5, 6, None, 8],
    'C': [1, None, 3, 4, None, 6, 7, None],
    'D': [1, 2, 3, 4, 5, 6, 7, 8]
}
df = pd.DataFrame(data)

# Create heatmap of missing values
plt.figure(figsize=(8, 6))
sns.heatmap(df.isna(), cmap='YlOrRd', cbar=True, yticklabels=False)
plt.title('Missing Data Heatmap')
plt.show()

Comparison of Methods

Method Output Type Best For
isna() Boolean DataFrame Precise location detection
notna() Boolean DataFrame Filtering complete data
info() Text summary Quick overview
Heatmap Visual plot Pattern identification

Conclusion

Identifying NaN values is essential for effective data analysis. Use isna() for precise detection, info() for quick summaries, and heatmaps for visual pattern recognition in large datasets.

Updated on: 2026-03-27T07:48:13+05:30

477 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements