- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to Sort Date in Excel using Pandas?
Robust applications such as Excel have made their mark in handling data arrays, but certain intricate manipulations may require a more potent toolset. Specifically, the task of arranging date-based entries can pose unique challenges if the initial data is skewed or necessitates an advanced sorting mechanism. Pandas - a formidable Python library engineered specifically for data manipulation and scrutiny - steps into this gap. This treatise will shed light on how to finesse the sequence of dates in an Excel sheet, using Pandas, with lucid explications for each line of code.
Installing Pandas and OpenPyXL
Before we immerse into the code jungle, it's paramount to have Pandas pre-installed in your environment. Pip, the package installer for Python, does the job succinctly −
pip install pandas
Let's commence by outlining a universal algorithm for our endeavor −
Incorporate the required Python libraries.
Import the CSV file into a DataFrame.
Sort the DataFrame based on a specified column or criteria.
(Optional) Archive the sorted DataFrame back to a CSV file.
Example
In the following example we are going to learn how to sort date in excel using simple pandas library
import pandas as pd data = { 'Date': ['2023-01-01', '2022-01-01', '2023-05-01', '2022-05-01'], 'Year': [2023, 2022, 2023, 2022], 'Month': [1, 1, 5, 5], 'Day': [1, 1, 1, 1], 'Value': [100, 200, 150, 250] } df = pd.DataFrame(data) print(df)
Output
Date Year Month Day Value 0 2023-01-01 2023 1 1 100 1 2022-01-01 2022 1 1 200 2 2023-05-01 2023 5 1 150 3 2022-05-01 2022 5 1 250
Example
sort_values(by=['Date'], key=pd.to_datetime)
This technique comprises of three pivotal steps −
a. Incorporate pandas and import your CSV file into a DataFrame.
b. Organize the DataFrame using sort_values. Within the parentheses, 'by' designates the column to be sorted (here, 'Date'), and 'key' transmutes the 'Date' strings into datetime entities using pd.to_datetime to ensure a precise sorting.
c. Voila! Your DataFrame is now correctly sequenced by date, starting from the earliest and culminating at the most recent.
import pandas as pd # Read the CSV file into a DataFrame data = pd.read_csv('filename.csv') # Sort the DataFrame by the 'Date' column data.sort_values(by='Date', key=pd.to_datetime, inplace=True) # Print the sorted DataFrame print(data)
Output
Date Value 1 2022-01-01 200 3 2022-05-01 250 0 2023-01-01 100 2 2023-05-01 150
sort_values(by=['Year', 'Month', 'Day'])
This method essentially −
Begins by incorporating pandas and importing your CSV data into a DataFrame.
Employs sort_values to arrange the DataFrame. This time, we're sorting by multiple columns, specified in a list under 'by'. The DataFrame will to begin with be sorted by 'Year', at that point inside each year it'll be sorted by 'Month', and within each month it'll be sorted by 'Day'.
Your DataFrame is now neatly sequenced by year, month, and day
Example
import pandas as pd # Read the CSV file into a DataFrame data = pd.read_csv('filename.csv') # Sort the DataFrame by the 'Year', 'Month' and 'Day' columns data.sort_values(by=['Year', 'Month', 'Day'], inplace=True) # Print the sorted DataFrame print(data)
Output
Year Month Day Value 1 2022 1 1 200 3 2022 5 1 250 0 2023 1 1 100 2 2023 5 1 150
sort_values(key=pd.to_datetime)
This approach slightly differs from the others. Here is the working −
You commence by incorporating pandas and importing your CSV data into a DataFrame, akin to the previous methods
Next, you employ sort_values to sequence the DataFrame, but this time without specifying 'by'. Instead, you use 'key' to transmute the DataFrame index into datetime entities (assuming the index is a date), which ensures a correct sorting.
Your DataFrame is now sequenced according to the date index.
Example
import pandas as pd # Read the CSV file into a DataFrame data = pd.read_csv('filename.csv') # Sort the DataFrame by the 'Date' column data.sort_values(by='Date', key=pd.to_datetime, inplace=True) # Set 'Date' as the DataFrame's index data.set_index('Date', inplace=True) # Print the sorted DataFrame print(data)
Output
Date Value 2022-01-01 200 2022-05-01 250 2023-01-01 100 2023-05-01 150
sort_index()
This technique operates as follows −
Kickstart by incorporating pandas and importing your CSV file into a DataFrame.
Use sort_index to organize the DataFrame by its index. This method doesn't need any parameters, as it automatically sorts by the DataFrame's index
Your DataFrame is now sequenced according to its index
Example
import pandas as pd # Read the CSV file into a DataFrame data = pd.read_csv('filename.csv') # Sort the DataFrame by its index data.sort_index(inplace=True) # Print the sorted DataFrame print(data)
Output
Value 0 200 1 100 2 150 3 250
In all the code examples, replace 'filename.csv' with the actual path to your CSV file.
Conclusion
In this article, we have carefully illustrated the procedure of manipulating the sequence of dates in an Excel file with the assistance of Pandas. Python's versatile pandas library is an efficient tool to sort data in a CSV file based on a variety of criteria, such as by column value or index. Archiving the sorted data back into a CSV file retains the structured data for further exploration or future operations, thereby demonstrating the powerful data manipulation capabilities of Python.