How to Sort Date in Excel using Pandas?


Robust applications such as Excel have made their mark in handling data arrays, but certain intricate manipulations may require a more potent toolset. Specifically, the task of arranging date-based entries can pose unique challenges if the initial data is skewed or necessitates an advanced sorting mechanism. Pandas - a formidable Python library engineered specifically for data manipulation and scrutiny - steps into this gap. This treatise will shed light on how to finesse the sequence of dates in an Excel sheet, using Pandas, with lucid explications for each line of code.

Installing Pandas and OpenPyXL

Before we immerse into the code jungle, it's paramount to have Pandas pre-installed in your environment. Pip, the package installer for Python, does the job succinctly −

pip install pandas

Let's commence by outlining a universal algorithm for our endeavor −

  • Incorporate the required Python libraries.

  • Import the CSV file into a DataFrame.

  • Sort the DataFrame based on a specified column or criteria.

  • (Optional) Archive the sorted DataFrame back to a CSV file.

Example

In the following example we are going to learn how to sort date in excel using simple pandas library

import pandas as pd

data = {
   'Date': ['2023-01-01', '2022-01-01', '2023-05-01', '2022-05-01'],
   'Year': [2023, 2022, 2023, 2022],
   'Month': [1, 1, 5, 5],
   'Day': [1, 1, 1, 1],
   'Value': [100, 200, 150, 250]
}
df = pd.DataFrame(data)
print(df)

Output

         Date  Year  Month  Day  Value
0  2023-01-01  2023      1    1    100
1  2022-01-01  2022      1    1    200
2  2023-05-01  2023      5    1    150
3  2022-05-01  2022      5    1    250

Example

  • sort_values(by=['Date'], key=pd.to_datetime)

This technique comprises of three pivotal steps −

  • a. Incorporate pandas and import your CSV file into a DataFrame.

  • b. Organize the DataFrame using sort_values. Within the parentheses, 'by' designates the column to be sorted (here, 'Date'), and 'key' transmutes the 'Date' strings into datetime entities using pd.to_datetime to ensure a precise sorting.

  • c. Voila! Your DataFrame is now correctly sequenced by date, starting from the earliest and culminating at the most recent.

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by the 'Date' column
data.sort_values(by='Date', key=pd.to_datetime, inplace=True)

# Print the sorted DataFrame
print(data)

Output

         Date  Value
1  2022-01-01    200
3  2022-05-01    250
0  2023-01-01    100
2  2023-05-01    150
  • sort_values(by=['Year', 'Month', 'Day'])

This method essentially −

  • Begins by incorporating pandas and importing your CSV data into a DataFrame.

  • Employs sort_values to arrange the DataFrame. This time, we're sorting by multiple columns, specified in a list under 'by'. The DataFrame will to begin with be sorted by 'Year', at that point inside each year it'll be sorted by 'Month', and within each month it'll be sorted by 'Day'.

  • Your DataFrame is now neatly sequenced by year, month, and day

Example

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by the 'Year', 'Month' and 'Day' columns
data.sort_values(by=['Year', 'Month', 'Day'], inplace=True)

# Print the sorted DataFrame
print(data)

Output

   Year  Month  Day  Value
1  2022      1    1    200
3  2022      5    1    250
0  2023      1    1    100
2  2023      5    1    150
  • sort_values(key=pd.to_datetime)

This approach slightly differs from the others. Here is the working −

  • You commence by incorporating pandas and importing your CSV data into a DataFrame, akin to the previous methods

  • Next, you employ sort_values to sequence the DataFrame, but this time without specifying 'by'. Instead, you use 'key' to transmute the DataFrame index into datetime entities (assuming the index is a date), which ensures a correct sorting.

  • Your DataFrame is now sequenced according to the date index.

Example

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by the 'Date' column
data.sort_values(by='Date', key=pd.to_datetime, inplace=True)

# Set 'Date' as the DataFrame's index
data.set_index('Date', inplace=True)

# Print the sorted DataFrame
print(data)

Output

   Date      Value    
2022-01-01    200
2022-05-01    250
2023-01-01    100
2023-05-01    150
  • sort_index()

This technique operates as follows −

  • Kickstart by incorporating pandas and importing your CSV file into a DataFrame.

  • Use sort_index to organize the DataFrame by its index. This method doesn't need any parameters, as it automatically sorts by the DataFrame's index

  • Your DataFrame is now sequenced according to its index

Example

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by its index
data.sort_index(inplace=True)

# Print the sorted DataFrame
print(data)

Output

    Value
0    200
1    100
2    150
3    250

In all the code examples, replace 'filename.csv' with the actual path to your CSV file.

Conclusion

In this article, we have carefully illustrated the procedure of manipulating the sequence of dates in an Excel file with the assistance of Pandas. Python's versatile pandas library is an efficient tool to sort data in a CSV file based on a variety of criteria, such as by column value or index. Archiving the sorted data back into a CSV file retains the structured data for further exploration or future operations, thereby demonstrating the powerful data manipulation capabilities of Python.

Updated on: 09-Aug-2023

657 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements