How to Sort data by Column in a CSV File in Python ?

In the domain of current applications and technologies, numerals and data carry paramount importance, rendering CSV (comma-separated values) a format of choice for storing and manipulating such critical data. An oft-encountered task with CSV files is the arrangement of data based on specified columns. This comprehensive guide intends to spotlight numerous approaches for aligning data by columns within a CSV file, utilizing Python's capabilities. Our journey will encompass three crucial stages: absorbing the CSV file, arranging the data, and safeguarding the resultant output.

Python, with its sturdy environment for data manipulation, and its modules such as pandas and CSV, has become a preferred tool for such tasks. These modules will be our tools of choice in this guide to achieve our objectives.

Here are the revised steps, which can be formulated into an algorithm −

  • Import the necessary modules (pandas and csv).

  • Absorb the CSV file into a DataFrame.

  • Arrange the DataFrame based on the desired column(s).

  • Optionally, secure the arranged DataFrame back into a CSV file.

What you'll require

Ensure that Python is installed and operating correctly on your system. Python 3 will be our Python version for this guide. Additionally, the pandas module will be put to use, which can be installed using the command below:

pip install pandas
import pandas as pd

# Load the CSV file into a DataFrame
dataframe = pd.read_csv('filename.csv')

# Sort the DataFrame
sorted_dataframe = dataframe.sort_values('column_name')

# Save the sorted DataFrame into a CSV file
sorted_dataframe.to_csv('sorted_filename.csv', index=False)

Now, let's delve into the additional methods −

  • Utilizing pandas with DataFrame.sort_values() and DataFrame.groupby() −

We commence by importing the pandas library.Subsequently, the CSV file is absorbed into a DataFrame using the read_csv function. The DataFrame is then grouped by a specific column using the groupby function, forming new groups based on unique values discovered in the specified column.Ultimately, we arrange each group by another column using the sort_values function, yielding a new DataFrame wherein the groups are sorted independently.


import pandas as pd

dataframe = pd.read_csv('filename.csv')
sorted_dataframe = dataframe.sort_values(['column_to_group_by', 'column_to_sort_by'])


  column_to_group_by  column_to_sort_by  value
0                  A                  1     10
2                  A                  2     30
4                  A                  3     50
3                  B                  1     40
1                  B                  2     20
5                  B                  3     60

  • Utilizing pandas with DataFrame.sort_index() −

Following the import of pandas and absorption of the CSV file into a DataFrame, We utilize the sort_index function to arrange the DataFrame based on its index. This results in a new DataFrame with rows ordered in accordance with their index labels.


import pandas as pd

dataframe = pd.read_csv('filename.csv')
sorted_dataframe = dataframe.sort_index()


    name    age
0   Amy     22
1   Bob     24
2   John    23
3   Alice   25
  • Utilizing the sorted() function with key parameter −

We initiate by importing pandas and absorbing the CSV file into a DataFrame.Then, we employ Python's built-in sorted function to arrange the DataFrame. The key parameter is utilized to specify a function that extracts a comparison key from each element in the DataFrame (in this instance, a specific column).


import pandas as pd

dataframe = pd.read_csv('filename.csv')
list_of_dicts = dataframe.to_dict('records')
sorted_list_of_dicts = sorted(list_of_dicts, key=lambda x: x['column_to_sort_by'])
sorted_dataframe = pd.DataFrame(sorted_list_of_dicts)

Given a DataFrame created from the following data:

dataframe = pd.DataFrame({
   'name': ['John', 'Alice', 'Bob', 'Amy'],
   'age': [23, 25, 24, 22]

If you use 'age' as 'column_to_sort_by', the sorted DataFrame would be −


   name  age
3   Amy   22
0  John   23
2   Bob   24
1 Alice   25
  • Utilizing pandas with DataFrame.sort_values() and inplace=True −

As always, we initiate by importing pandas and absorbing the CSV file into a DataFrame. Subsequently, we utilize the sort_values function to arrange the DataFrame by a specific column. The inplace=True argument indicates that the arrangement should be done on the original DataFrame, rather than yielding a new arranged DataFrame.


import pandas as pd

dataframe = pd.read_csv('filename.csv')
dataframe.sort_values('age', inplace=True)


   name  age
1   Amy   22
0  John   23
3   Bob   24
2 Alice   25


In conclusion, Python, with its powerful modules like pandas, offers an efficient and versatile means for sorting data within CSV files, meeting diverse data manipulation requirements. This exploration only scratches the surface of Python's capabilities for handling CSV data. With further immersion into Python's data management techniques, one can discover numerous strategies for effectively manipulating data, leading to greater insights and more streamlined processing in various data-focused applications.

Updated on: 09-Aug-2023


Kickstart Your Career

Get certified by completing the course

Get Started