Processing time with Pandas DataFrame

PythonServer Side ProgrammingProgramming

In this article, we will learn about generating & processing different timestamps using built-in pandas library. We are also using the numpy module to generate and modify the database needed for the timestamp generation.

Preferable IDE: Jupyter notebook

Before beginning this tutorial we must install pandas and numpy library. For this jupyter notebook is the best place to test and run your code. For installing pandas we must run the following command.

>>> pip install pandas

If we run this command all the dependencies are automatically installed. After it’s done we must restart the kernel to see the changes.

After we finished installing all the dependencies we can import pandas as ‘p’.

Here we call the data frame constructor and initialize a database with period 4 and frequency 2 hrs from the date argument. By specifying key ‘time’ we are displaying the database.

>>> pip install pandas
>>> import pandas as p
>>> data_struct = p.DataFrame()
>>> data_struct['time'] = p.date_range('14/7/2019', periods = 4, freq='3H')
>>> print(data_struct['time'])
0 2019-07-14 00:00:00
1 2019-07-14 03:00:00
2 2019-07-14 06:00:00
3 2019-07-14 09:00:00
Name: time, dtype: datetime64[ns]

By using <data_struct>.dt.<feature name>features are extracted . By using head() method we display all the rows from the database.

>>> data_struct['year'] = data_struct['time'].dt.year
>>> data_struct.head(4)
   time
0 2019-07-14 00:00:00 2019
1 2019-07-14 03:00:00 2019
2 2019-07-14 06:00:00 2019
3 2019-07-14 09:00:00 2019

Here we implemented .array() function present in numpy module to create time strings.These strings are converted to DateTime by using .to_datetime() method in pandas library.

>>> import numpy as n
>>> dt_timestring = n.array(['14-07-2019 07:26 AM', '13-07-2019 11:01 PM'])
>>> timestamps = [p.to_datetime(date, format ="%d-%m-%Y %I:%M %p", errors ="coerce") for date in dt_timestring]
>>> print(timestamps)
[Timestamp('2019-07-14 07:26:00'), Timestamp('2019-07-13 23:01:00')]

Here we are indexing database with the date that means the ‘date’ field will be displayed first by using .set_index() method.

>>> data_struct1 = p.DataFrame()
>>> data_struct1['date'] = p.date_range('18/07/2019', periods = 5, freq ='2H')
>>> data_struct1= data_struct1.set_index(data_struct1['date'])
>>> print(data_struct1.head(5))
   date
date
2019-07-18 00:00:00 2019-07-18 00:00:00
2019-07-18 02:00:00 2019-07-18 02:00:00
2019-07-18 04:00:00 2019-07-18 04:00:00
2019-07-18 06:00:00 2019-07-18 06:00:00
2019-07-18 08:00:00 2019-07-18 08:00:00

If we want to display only a specific dataset from the database then we can implement the commands as discussed below -

>>> data_struct2 = p.DataFrame()
>>> data_struct2['date'] = p.date_range('17/07/2019', periods =3, freq ='4H')
>>> print(data_struct2.head(5))
   date
0 2019-07-17 00:00:00
1 2019-07-17 04:00:00
2 2019-07-17 08:00:00
>>> inp = data_struct2[(data_struct2['date'] > '2019-07-17 04:00:00')]
>>> print(inp)
   date
2 2019-07-17 08:00:00

Conclusion

In this article, we learnt how we can pack and unpack tuples in a variety of Ways.

raja
Published on 29-Aug-2019 10:14:37
Advertisements