How to deal with NaN values while plotting a boxplot using Python Matplotlib?

When plotting boxplots in Python, NaN values can cause issues or distort the visualization. The most effective approach is to filter out NaN values before plotting using NumPy's isnan() function.

Understanding the Problem

NaN (Not a Number) values represent missing or undefined data. Matplotlib's boxplot() function may not handle these values gracefully, potentially causing errors or incorrect statistical representations.

Solution: Filtering NaN Values

The best practice is to remove NaN values before creating the boxplot ?

import matplotlib.pyplot as plt
import numpy as np

# Set figure size
plt.figure(figsize=(8, 5))

# Create sample data with NaN values
N = 20
data = np.random.normal(50, 15, N)  # Normal distribution
data[5] = np.nan  # Insert NaN value
data[12] = np.nan  # Insert another NaN value

print("Original data shape:", data.shape)
print("Number of NaN values:", np.sum(np.isnan(data)))

# Filter out NaN values
filtered_data = data[~np.isnan(data)]

print("Filtered data shape:", filtered_data.shape)

# Create boxplot with filtered data
plt.boxplot(filtered_data)
plt.title("Boxplot with NaN Values Removed")
plt.ylabel("Values")
plt.show()
Original data shape: (20,)
Number of NaN values: 2
Filtered data shape: (18,)

Multiple Datasets with NaN Values

When dealing with multiple datasets, filter each one separately ?

import matplotlib.pyplot as plt
import numpy as np

# Create multiple datasets with NaN values
dataset1 = np.random.normal(30, 10, 25)
dataset2 = np.random.normal(45, 8, 25)
dataset3 = np.random.normal(60, 12, 25)

# Add NaN values
dataset1[3] = np.nan
dataset2[7] = np.nan
dataset2[15] = np.nan
dataset3[1] = np.nan

# Filter NaN values from each dataset
filtered_data = [
    dataset1[~np.isnan(dataset1)],
    dataset2[~np.isnan(dataset2)],
    dataset3[~np.isnan(dataset3)]
]

# Create boxplot
plt.figure(figsize=(10, 6))
plt.boxplot(filtered_data, labels=['Dataset 1', 'Dataset 2', 'Dataset 3'])
plt.title("Multiple Boxplots with NaN Values Handled")
plt.ylabel("Values")
plt.grid(True, alpha=0.3)
plt.show()

Alternative: Using Pandas dropna()

Pandas provides a convenient dropna() method for handling NaN values ?

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create DataFrame with NaN values
data = pd.DataFrame({
    'Group A': np.random.normal(40, 12, 30),
    'Group B': np.random.normal(55, 15, 30),
    'Group C': np.random.normal(48, 10, 30)
})

# Add some NaN values
data.loc[5, 'Group A'] = np.nan
data.loc[12, 'Group B'] = np.nan
data.loc[20, 'Group C'] = np.nan

print("NaN values per column:")
print(data.isnull().sum())

# Create boxplot (pandas handles NaN automatically)
plt.figure(figsize=(10, 6))
data.boxplot()
plt.title("Pandas Boxplot (NaN Values Handled Automatically)")
plt.ylabel("Values")
plt.show()
NaN values per column:
Group A    1
Group B    1
Group C    1
dtype: int64

Comparison of Methods

Method Pros Cons
NumPy filtering Explicit control, works with any data Manual filtering required
Pandas dropna() Automatic handling, clean syntax Requires pandas DataFrame
Matplotlib default No extra code May cause errors or warnings

Conclusion

Filter NaN values using data[~np.isnan(data)] before plotting boxplots to ensure accurate statistical visualization. Pandas DataFrames handle NaN values automatically in boxplots, making them ideal for complex datasets.

Updated on: 2026-03-26T19:00:18+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements