- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What are the ways to extract features from a DateTime variable using pandas?
Reading and extracting valid information from a DateTime object is a very important task in data analysis. The pandas package provides some useful tools to perform feature extracting from a DateTime object.
In pandas, the series.dt() method is used to access the components like years, months, days, etc., from a given time series.
The series.dt() method has some attributes to extract the year, month, quarter, and day features. In the examples given below, we will use some of these attributes to extract features.
Example 1
You can see that we have created a pandas series with 10 different timestamps. Then, we accessed only the years from the time series by using the dt.day attribute.
# importing pandas package import pandas as pd #creating the pandas Series s = pd.Series(pd.date_range('2021-01-01 2:30', periods=10, freq='30H10min')) print(s) # access day result = s.dt.day print("Output:",result)
Output
The output is mentioned below −
0 2021-01-01 02:30:00 1 2021-01-02 08:40:00 2 2021-01-03 14:50:00 3 2021-01-04 21:00:00 4 2021-01-06 03:10:00 5 2021-01-07 09:20:00 6 2021-01-08 15:30:00 7 2021-01-09 21:40:00 8 2021-01-11 03:50:00 9 2021-01-12 10:00:00 dtype: datetime64[ns] Output: 0 1 1 2 2 3 3 4 4 6 5 7 6 8 7 9 8 11 9 12 dtype: int64
The first timestamp of the initial series is 2021-01-01 02:30:00 and the following timestamps are incremented by 30Hrs 10 minutes, and the second part of the above output block displays accessed days from the DateTime object.
Example 2
Apply the dt.hour attribute on the same above example for getting the hours data from the 10 timestamps.
# importing pandas package import pandas as pd #creating the pandas Series s = pd.Series(pd.date_range('2021-01-01 2:30', periods=10, freq='30H10min')) print(s) # access hour result = s.dt.hour print("Output:",result)
Output
The output is given below −
0 2021-01-01 02:30:00 1 2021-01-02 08:40:00 2 2021-01-03 14:50:00 3 2021-01-04 21:00:00 4 2021-01-06 03:10:00 5 2021-01-07 09:20:00 6 2021-01-08 15:30:00 7 2021-01-09 21:40:00 8 2021-01-11 03:50:00 9 2021-01-12 10:00:00 dtype: datetime64[ns] Output: 0 2 1 8 2 14 3 21 4 3 5 9 6 15 7 21 8 3 9 10 dtype: int64
As we can see in the above output block, the series.dt function has successfully returned the hours data by using the hour attribute.
To Continue Learning Please Login
Login with Google