Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Write a program in Python to filter valid dates in a given series
Filtering valid dates from a Pandas Series involves identifying date strings that follow a specific format. We'll explore two approaches: using regular expressions with loops and using Pandas filtering methods.
Sample Data
Let's start with a Series containing date strings in various formats ?
import pandas as pd
dates = ['2010-03-12', '2011-3-1', '2020-10-10', '11-2-2']
data = pd.Series(dates)
print("Original Series:")
print(data)
Original Series: 0 2010-03-12 1 2011-3-1 2 2020-10-10 3 11-2-2 dtype: object
Using Regular Expression with Loop
This method iterates through each element and checks if it matches the YYYY-MM-DD pattern ?
import pandas as pd
import re
dates = ['2010-03-12', '2011-3-1', '2020-10-10', '11-2-2']
data = pd.Series(dates)
print("Valid dates (Method 1):")
for i, j in data.items():
if(re.match(r"\d{4}\W\d{2}\W\d{2}", j)):
print(i, j)
Valid dates (Method 1): 0 2010-03-12 2 2020-10-10
Using Filter and isin() Methods
This approach uses Python's filter() function with lambda and Pandas isin() method for cleaner code ?
import pandas as pd
import re
dates = ['2010-03-12', '2011-3-1', '2020-10-10', '11-2-2']
data = pd.Series(dates)
# Filter valid dates using lambda and regular expression
result = pd.Series(filter(lambda x: re.match(r"\d{4}\W\d{2}\W\d{2}", x), data))
# Use isin() to get original indices
valid_dates = data[data.isin(result)]
print("Valid dates (Method 2):")
print(valid_dates)
Valid dates (Method 2): 0 2010-03-12 2 2020-10-10 dtype: object
Using Pandas str.match() Method
A more Pandas-native approach using the str.match() method ?
import pandas as pd
dates = ['2010-03-12', '2011-3-1', '2020-10-10', '11-2-2']
data = pd.Series(dates)
# Use Pandas string methods for filtering
valid_mask = data.str.match(r"\d{4}\W\d{2}\W\d{2}")
valid_dates = data[valid_mask]
print("Valid dates (Method 3):")
print(valid_dates)
Valid dates (Method 3): 0 2010-03-12 2 2020-10-10 dtype: object
Comparison
| Method | Performance | Readability | Best For |
|---|---|---|---|
| Loop with re.match() | Slower | Medium | Complex validation logic |
| Filter + isin() | Medium | Good | Functional programming style |
| str.match() | Fastest | Excellent | Large datasets |
Conclusion
Use str.match() for the most efficient and Pandas-native approach to filter valid dates. The regular expression r"\d{4}\W\d{2}\W\d{2}" ensures dates follow the YYYY-MM-DD format with proper digit counts.
