Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Fill NaN values using an interpolation method
Pandas interpolate() method fills NaN values by estimating missing data points based on existing values. It uses mathematical interpolation to calculate reasonable values that fit between known data points.
Creating Sample Data with NaN Values
Let's create a DataFrame with missing values to demonstrate interpolation ?
import pandas as pd
import numpy as np
# Create sample data with NaN values
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100.0, np.nan, 120.0, np.nan, 110.0]
}
df = pd.DataFrame(data)
print("Original DataFrame with NaN values:")
print(df)
Original DataFrame with NaN values:
Car Reg_Price Units
0 BMW 2500 100.0
1 Lexus 3500 NaN
2 Audi 2500 120.0
3 Jaguar 2000 NaN
4 Mustang 2500 110.0
Using Linear Interpolation
The default method is linear interpolation, which estimates values along a straight line between known points ?
import pandas as pd
import numpy as np
# Create sample data
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Jaguar', 'Mustang'],
'Reg_Price': [2500, 3500, 2500, 2000, 2500],
'Units': [100.0, np.nan, 120.0, np.nan, 110.0]
}
df = pd.DataFrame(data)
# Fill NaN values using linear interpolation
result = df.interpolate()
print("DataFrame after linear interpolation:")
print(result)
DataFrame after linear interpolation:
Car Reg_Price Units
0 BMW 2500 100.0
1 Lexus 3500 110.0
2 Audi 2500 120.0
3 Jaguar 2000 115.0
4 Mustang 2500 110.0
Different Interpolation Methods
Pandas supports various interpolation methods for different data patterns ?
import pandas as pd
import numpy as np
# Create time series data
data = [10, np.nan, 30, np.nan, 50, np.nan, 70]
df = pd.DataFrame(data, columns=['values'])
print("Original data:")
print(df)
print("\nLinear interpolation:")
print(df.interpolate(method='linear'))
print("\nPolynomial interpolation (order=2):")
print(df.interpolate(method='polynomial', order=2))
Original data: values 0 10.0 1 NaN 2 30.0 3 NaN 4 50.0 5 NaN 6 70.0 Linear interpolation: values 0 10.0 1 20.0 2 30.0 3 40.0 4 50.0 5 60.0 6 70.0 Polynomial interpolation (order=2): values 0 10.0 1 20.0 2 30.0 3 40.0 4 50.0 5 60.0 6 70.0
Comparison of Interpolation Methods
| Method | Description | Best For |
|---|---|---|
linear |
Straight line between points | Evenly spaced data |
polynomial |
Polynomial curve fitting | Smooth curved data |
spline |
Spline interpolation | Complex curved patterns |
nearest |
Nearest neighbor values | Categorical-like data |
Key Parameters
Important parameters for the interpolate() method ?
import pandas as pd
import numpy as np
data = [10, np.nan, np.nan, 40, np.nan, 60]
df = pd.DataFrame(data, columns=['values'])
print("Original:")
print(df)
# Limit the number of consecutive NaNs to fill
print("\nLimit=1 (fill only 1 consecutive NaN):")
print(df.interpolate(limit=1))
# Fill only specific direction
print("\nForward direction only:")
print(df.interpolate(limit_direction='forward'))
Original: values 0 10.0 1 NaN 2 NaN 3 40.0 4 NaN 5 60.0 Limit=1 (fill only 1 consecutive NaN): values 0 10.0 1 20.0 2 NaN 3 40.0 4 50.0 5 60.0 Forward direction only: values 0 10.0 1 20.0 2 30.0 3 40.0 4 50.0 5 60.0
Conclusion
Use interpolate() to fill NaN values by estimating missing data points. Linear interpolation works well for most cases, while polynomial and spline methods handle curved data patterns. Use parameters like limit and method to control the interpolation behavior.
