Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to plot masked and NaN values in Matplotlib?
Matplotlib provides several approaches to handle masked and NaN values in data visualization. This is useful when you need to exclude certain data points from your plots based on specific conditions or handle missing data.
Understanding Masked Arrays and NaN Values
Masked arrays use NumPy's ma module to hide certain values without removing them from the dataset. NaN (Not a Number) values are treated as missing data and are automatically excluded from plot lines.
Example: Plotting with Different Masking Approaches
Let's create a dataset and demonstrate four different approaches to handle data exclusion ?
import matplotlib.pyplot as plt
import numpy as np
# Set figure properties
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True
# Create sample data
x = np.linspace(-np.pi/2, np.pi/2, 31)
y = np.cos(x)**3
# Method 1: Remove points where y > 0.7
x2 = x[y <= 0.7]
y2 = y[y <= 0.7]
# Method 2: Mask points where y > 0.7
y3 = np.ma.masked_where(y > 0.7, y)
# Method 3: Set to NaN where y > 0.7
y4 = y.copy()
y4[y > 0.7] = np.nan
# Plot all approaches
plt.plot(x, y, 'o-', color='lightgrey', label='Original data', alpha=0.7)
plt.plot(x2 + 0.3, y2, 'o-', label='Points removed', color='blue')
plt.plot(x + 0.6, y3, 'o-', label='Masked values', color='red')
plt.plot(x + 0.9, y4, 'o-', label='NaN values', color='green')
plt.legend()
plt.title('Comparison: Masked vs NaN vs Removed Data')
plt.xlabel('X values (shifted for clarity)')
plt.ylabel('Y values')
plt.grid(True, alpha=0.3)
plt.show()
Key Differences Between Approaches
| Method | Data Size | Memory Usage | Best For |
|---|---|---|---|
| Remove Points | Reduced | Lower | Permanent filtering |
| Masked Arrays | Original | Higher | Conditional visibility |
| NaN Values | Original | Medium | Missing data handling |
Practical Example with Real Data Scenario
Here's a more practical example showing temperature data with missing readings ?
import matplotlib.pyplot as plt
import numpy as np
# Simulate temperature data with some extreme values
days = np.arange(1, 31)
temperatures = 20 + 10 * np.sin(days * np.pi / 15) + np.random.normal(0, 2, 30)
# Simulate some sensor errors (extreme values)
temperatures[5] = 60 # Sensor error
temperatures[15] = -20 # Another error
temperatures[25] = np.nan # Missing reading
print("Original temperatures (first 10 days):")
print(temperatures[:10])
# Create masked array for extreme values
temp_masked = np.ma.masked_where((temperatures > 40) | (temperatures < -10), temperatures)
# Plot comparison
plt.figure(figsize=(12, 8))
plt.subplot(2, 1, 1)
plt.plot(days, temperatures, 'ro-', label='Raw data with errors', alpha=0.7)
plt.plot(days, temp_masked, 'bo-', label='Masked extreme values')
plt.ylabel('Temperature (°C)')
plt.title('Temperature Data: Raw vs Masked')
plt.legend()
plt.grid(True, alpha=0.3)
plt.subplot(2, 1, 2)
# Clean data by removing extreme values
valid_mask = (temperatures <= 40) & (temperatures >= -10) & ~np.isnan(temperatures)
clean_days = days[valid_mask]
clean_temps = temperatures[valid_mask]
plt.plot(clean_days, clean_temps, 'go-', label='Cleaned data')
plt.xlabel('Day')
plt.ylabel('Temperature (°C)')
plt.title('Cleaned Temperature Data')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Working with NaN Values
NaN values are particularly useful for time series data with missing observations ?
import matplotlib.pyplot as plt
import numpy as np
# Create time series with missing data
time = np.arange(0, 20, 0.5)
signal = np.sin(time) + 0.1 * np.random.randn(len(time))
# Introduce missing data (NaN values)
signal[10:15] = np.nan # Missing data period
signal[25] = np.nan # Single missing point
plt.figure(figsize=(10, 6))
plt.plot(time, signal, 'b-o', markersize=4, label='Signal with missing data')
plt.axvspan(time[10], time[14], alpha=0.2, color='red', label='Missing data period')
plt.xlabel('Time')
plt.ylabel('Signal Value')
plt.title('Handling Missing Data with NaN Values')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"Total data points: {len(signal)}")
print(f"Missing data points: {np.sum(np.isnan(signal))}")
Conclusion
Use masked arrays when you need to conditionally hide data while preserving the original dataset structure. Use NaN values for genuine missing data in time series. Remove points entirely when you need permanent filtering for cleaner datasets.
