Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python – How to check missing dates in Pandas
Checking for missing dates in a pandas DataFrame is a common task when working with time series data. We can identify gaps in date ranges using pd.date_range() and the difference() method.
Setting Up the Data
First, let's create a DataFrame with some car purchase dates that have gaps ?
import pandas as pd
# Dictionary of lists with car data
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],
'Date_of_purchase': ['2020-10-10', '2020-10-12', '2020-10-17', '2020-10-16', '2020-10-19', '2020-10-22']
}
# Create DataFrame
dataFrame = pd.DataFrame(data)
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Date_of_purchase
0 BMW 2020-10-10
1 Lexus 2020-10-12
2 Audi 2020-10-17
3 Mercedes 2020-10-16
4 Jaguar 2020-10-19
5 Bentley 2020-10-22
Converting to DateTime Index
Next, we'll set the date column as the index and convert it to a proper DateTime format ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],
'Date_of_purchase': ['2020-10-10', '2020-10-12', '2020-10-17', '2020-10-16', '2020-10-19', '2020-10-22']
}
dataFrame = pd.DataFrame(data)
# Set Date_of_purchase as index
dataFrame = dataFrame.set_index('Date_of_purchase')
# Convert string dates to DateTime objects
dataFrame.index = pd.to_datetime(dataFrame.index)
print("DataFrame with DateTime index:")
print(dataFrame)
DataFrame with DateTime index:
Car
Date_of_purchase
2020-10-10 BMW
2020-10-12 Lexus
2020-10-17 Audi
2020-10-16 Mercedes
2020-10-19 Jaguar
2020-10-22 Bentley
Finding Missing Dates
Now we can identify missing dates by creating a complete date range and finding the difference ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],
'Date_of_purchase': ['2020-10-10', '2020-10-12', '2020-10-17', '2020-10-16', '2020-10-19', '2020-10-22']
}
dataFrame = pd.DataFrame(data)
dataFrame = dataFrame.set_index('Date_of_purchase')
dataFrame.index = pd.to_datetime(dataFrame.index)
# Create complete date range
complete_range = pd.date_range(start="2020-10-10", end="2020-10-22")
# Find missing dates
missing_dates = complete_range.difference(dataFrame.index)
print("Missing dates in the range:")
print(missing_dates)
print(f"\nTotal missing dates: {len(missing_dates)}")
Missing dates in the range:
DatetimeIndex(['2020-10-11', '2020-10-13', '2020-10-14', '2020-10-15',
'2020-10-18', '2020-10-20', '2020-10-21'],
dtype='datetime64[ns]', freq=None)
Total missing dates: 7
Alternative Method Using reindex()
You can also use reindex() to identify missing dates and fill them with NaN values ?
import pandas as pd
data = {
'Car': ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],
'Date_of_purchase': ['2020-10-10', '2020-10-12', '2020-10-17', '2020-10-16', '2020-10-19', '2020-10-22']
}
dataFrame = pd.DataFrame(data)
dataFrame = dataFrame.set_index('Date_of_purchase')
dataFrame.index = pd.to_datetime(dataFrame.index)
# Create complete date range and reindex
complete_range = pd.date_range(start="2020-10-10", end="2020-10-22")
full_dataFrame = dataFrame.reindex(complete_range)
print("DataFrame with missing dates as NaN:")
print(full_dataFrame)
# Find rows with NaN values (missing dates)
missing_rows = full_dataFrame.isna().any(axis=1)
print(f"\nMissing date positions: {missing_rows.sum()} out of {len(full_dataFrame)}")
DataFrame with missing dates as NaN:
Car
2020-10-10 BMW
2020-10-11 NaN
2020-10-12 Lexus
2020-10-13 NaN
2020-10-14 NaN
2020-10-15 NaN
2020-10-16 Mercedes
2020-10-17 Audi
2020-10-18 NaN
2020-10-19 Jaguar
2020-10-20 NaN
2020-10-21 NaN
2020-10-22 Bentley
Missing date positions: 7 out of 13
Conclusion
Use pd.date_range().difference() to get missing dates directly, or reindex() to create a complete DataFrame with NaN values for missing dates. Both methods help identify gaps in time series data effectively.
