Working with Date and Time using Pandas


A core part of Python data analysis and manipulation is working with date and time using Pandas. Powerful library Pandas provides effective methods for processing and examining time series data. It offers a DateTimeIndex, making it simple to index DataFrames and perform time-based actions on them. Users can construct DateTimeIndexes for their data by converting strings or other representations to Pandas DateTime objects, simplifying time-aware analysis. Resampling, time dilation, and date range creation are supported by the library, making it simple to combine and work with time-based data. Additionally, Pandas supports managing time zones, enabling timestamp conversion and translation for big data analysis.

Installation Command

You must install Pandas on your computer system before using it. Using Python's package manager, pip, run the following command to do this:

pip install pandas

Features of Pandas

  • DataFrame: The DataFrame, a two-dimensional labelled data structure that mimics a spreadsheet or SQL table, is a new feature introduced by Pandas. It enables effective management of data in rows and columns and facilitates different data operations.

  • Series: A Series is a one-dimensional labelled array with additional functionality that is akin to a list or a NumPy array. Series serve as the fundamental units of DataFrames and may store a variety of data kinds.

  • Data Alignment: Data operations (like arithmetic) are carried out correctly even when data originates from many sources since Pandas automatically aligns data based on the labels.

  • Data Cleaning: A broad variety of methods are available in Pandas to manage missing data, such as dropna(), which eliminates NaN values, and fillna(), which fills in missing values using a defined technique.

  • Data Reshaping: With the use of pivot_table(), melt(), and stack()/unstack() methods, users may easily reshape data with the help of the flexible tools provided by Pandas.

  • Grouping and Aggregation: Using the groupby() method, which Pandas offers, users may divide data into groups based on certain criteria and then apply aggregation functions to each group, such as sum, mean, max, etc.

  • Merge, Join, and Concatenate: Through techniques like merge(), join(), and concat(), Pandas makes it possible to seamlessly integrate and merge data from many sources.

  • Time Series Analysis: Pandas provides a wide range of features for working with time series data, including date range construction, time-based indexing, and resampling at different frequencies.

  • Data I/O: Pandas can read and write data into many different formats, such as CSV, Excel, SQL databases, and others.

  • Label-based indexing: It is versatile and user-friendly with Pandas, making it easy to slice, select, and update data according to labels or criteria.

  • Data Visualization: Pandas doesn't manage data visualisation in and of itself, but it interfaces easily with other libraries like Matplotlib and Seaborn to let users make useful plots and graphs using Pandas data.

Basic programs using Pandas

  • Creating a DataFrame

  • Creating a DateTimeIndex and Resampling

  • Filtering Data

Creating a DataFrame

A crucial step in Python-based data analysis and manipulation is the creation of a DataFrame in Pandas. A sophisticated library called Pandas offers a two-dimensional labelled data structure called a DataFrame, which is comparable to a spreadsheet or a SQL table. Pandas enables easy data management and analysis by allowing data to be grouped into rows and columns.

Algorithm

  • Import the Pandas library.

  • Prepare the information you intend to utilise in the DataFrame. A dictionary, list of dictionaries, list of lists, or NumPy array are your options.

  • Use the pd.DataFrame() constructor to generate the DataFrame. Give the constructor the data as well as any optional choices, including column names and indexes.

  • You can choose to set the index using the index argument and the column names using the columns parameter of the pd.DataFrame() constructor.

  • The DataFrame is now available for editing and data analysis.

Example

import pandas as pd

data_dict = {
   'Name': ['Rahul', 'Anjali', 'Siddharth'],
   'Age': [15, 33, 51],
   'City': ['Mumbai', 'Goa', 'Jammu']
}

df1 = pd.DataFrame(data_dict)

dataListOfDicts = [
   {'Name': 'Komal', 'Age': 25, 'City': 'Pune'},
   {'Name': 'Bulbul', 'Age': 30, 'City': 'Agra'},
   {'Name': 'Aarush', 'Age': 35, 'City': 'Meerut'}
]

df2 = pd.DataFrame(dataListOfDicts)

data_list_of_lists = [
   ['Anmol', 27, 'Hyderabad'],
   ['Tarun', 20, 'Mumbai'],
   ['Srijan', 31, 'Chandigarh']
]

df3 = pd.DataFrame(data_list_of_lists, columns=['Name', 'Age', 'City'])

print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)
print("\nDataFrame 3:")
print(df3)

Output

Creating a DateTimeIndex and Resampling

A crucial step in Python-based data analysis and manipulation is the creation of a DataFrame in Pandas. A sophisticated library called Pandas offers a two-dimensional labelled data structure called a DataFrame, which is comparable to a spreadsheet or a SQL table. Pandas enables easy data management and analysis by allowing data to be grouped into rows and columns.

Algorithm

  • Import the Pandas library.

  • Prepare the data in a DataFrame that has a column for a date or timestamp.

  • Use pd.to_datetime() to transform the date or timestamp column into a Pandas DateTimeIndex.

  • Using the set_index() function, the DateTimeIndex is set as the DataFrame's index.

  • You may also use an aggregation function (like mean, sum, etc.) to get values for the new frequency after using the resample() method to resample the data to a different frequency.

Example

import pandas as pd

data = {
   'Date': ['2023-07-25', '2023-07-26', '2023-07-27', '2023-07-28', '2023-07-29'],
   'Value': [10, 15, 8, 12, 20]
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

monthly_data = df.resample('M').mean()

print(df)
print("\nResampled Monthly Data:")
print(monthly_data)

Output

Filtering Data

Pandas offers strong capabilities for boolean indexing-based data filtering. Users may quickly choose rows that satisfy the filtering criterion by building boolean masks with conditions applied to DataFrame columns. Data analysts can use this approach to concentrate on pertinent information, investigate trends, find patterns, and carry out additional research on specific data subsets.

Algorithm

  • Import the Pandas library.

  • Data preparation can be done in a DataFrame or by reading data from a CSV file, for example.

  • To filter the data according to certain requirements, combine boolean indexing with a condition.

  • Apply the condition to one or more DataFrame columns to create a boolean mask.

  • To choose the rows that satisfy the filtering requirement, use the boolean mask.

Example

import pandas as pd

data = {
   'Name': ['Arushi', 'Shobhit', 'Tarun', 'Dishmeet', 'Evan'],
   'Age': [25, 30, 35, 28, 40],
   'City': ['Mumbai', 'Delhi', 'Goa', 'Bareilly', 'Agra']
}
df = pd.DataFrame(data)

filtered_df = df[df['Age'] > 30]

print(filtered_df)

Output

Conclusion

Python's Pandas library simplifies working with time and date for temporal data processing. Users may effectively execute time-based indexing, resampling, and time zone management with the help of Pandas' DateTimeIndex and functions. The flexibility of the library makes date calculations, filtering, and time series display easier. Exploration and manipulation of data are improved by its smooth connection with other Python tools. Pandas is crucial for processing and analysing time-related data in a variety of applications, from banking and economics to weather forecasting and social trends analysis. It enables analysts to get insightful knowledge.

Updated on: 03-Aug-2023

152 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements