What are the ways to extract features from a DateTime variable using pandas?


Reading and extracting valid information from a DateTime object is a very important task in data analysis. The pandas package provides some useful tools to perform feature extracting from a DateTime object.

In pandas, the series.dt() method is used to access the components like years, months, days, etc., from a given time series.

The series.dt() method has some attributes to extract the year, month, quarter, and day features. In the examples given below, we will use some of these attributes to extract features.

Example 1

You can see that we have created a pandas series with 10 different timestamps. Then, we accessed only the years from the time series by using the dt.day attribute.

# importing pandas package
import pandas as pd

#creating the pandas Series
s = pd.Series(pd.date_range('2021-01-01 2:30', periods=10, freq='30H10min'))

print(s)

# access day
result = s.dt.day
print("Output:",result)

Output

The output is mentioned below −

0    2021-01-01 02:30:00
1    2021-01-02 08:40:00
2    2021-01-03 14:50:00
3    2021-01-04 21:00:00
4    2021-01-06 03:10:00
5    2021-01-07 09:20:00
6    2021-01-08 15:30:00
7    2021-01-09 21:40:00
8    2021-01-11 03:50:00
9    2021-01-12 10:00:00
dtype: datetime64[ns]

Output:
0    1
1    2
2    3
3    4
4    6
5    7
6    8
7    9
8   11
9   12
dtype: int64

The first timestamp of the initial series is 2021-01-01 02:30:00 and the following timestamps are incremented by 30Hrs 10 minutes, and the second part of the above output block displays accessed days from the DateTime object.

Example 2

Apply the dt.hour attribute on the same above example for getting the hours data from the 10 timestamps.

# importing pandas package
import pandas as pd

#creating the pandas Series
s = pd.Series(pd.date_range('2021-01-01 2:30', periods=10, freq='30H10min'))

print(s)

# access hour
result = s.dt.hour
print("Output:",result)

Output

The output is given below −

0    2021-01-01 02:30:00
1    2021-01-02 08:40:00
2    2021-01-03 14:50:00
3    2021-01-04 21:00:00
4    2021-01-06 03:10:00
5    2021-01-07 09:20:00
6    2021-01-08 15:30:00
7    2021-01-09 21:40:00
8    2021-01-11 03:50:00
9    2021-01-12 10:00:00
dtype: datetime64[ns]

Output:
0    2
1    8
2   14
3   21
4    3
5    9
6   15
7   21
8    3
9   10
dtype: int64

As we can see in the above output block, the series.dt function has successfully returned the hours data by using the hour attribute.

Updated on: 07-Mar-2022

214 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements