Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Sort CSV by multiple columns in Python ?
In Python, to sort a CSV file by multiple columns, we can use the sort_values() method provided by the Python Pandas library. This method sorts DataFrame values by taking column names as arguments.
Common methods for sorting a CSV file by multiple columns include ?
-
sort_values() with inplace ? Sort DataFrame by multiple columns, modifying the original
-
sort_values() without inplace ? Sort DataFrame by multiple columns, creating a new DataFrame
-
sort_index() ? Sort by index after sorting by specified columns
Using sort_values() with inplace
The sort_values() method with inplace=True directly modifies the original DataFrame ?
Example
Let's create a sample CSV data and sort it by multiple columns ?
import pandas as pd
import io
# Create sample CSV data
csv_data = """Car,Date_of_Purchase,Reg_Price
BMW,10/10/2020,1000
Lexus,10/12/2020,750
Audi,10/17/2020,750
Jaguar,10/16/2020,1500
Mustang,10/19/2020,1100
Lamborghini,10/22/2020,1000"""
# Read CSV data into DataFrame
dataFrame = pd.read_csv(io.StringIO(csv_data))
print("Original DataFrame:")
print(dataFrame)
# Sort by multiple columns: Reg_Price and Car
dataFrame.sort_values(["Reg_Price", "Car"], axis=0, ascending=True, inplace=True, na_position='first')
print("\nSorted DataFrame by Reg_Price and Car:")
print(dataFrame)
Original DataFrame:
Car Date_of_Purchase Reg_Price
0 BMW 10/10/2020 1000
1 Lexus 10/12/2020 750
2 Audi 10/17/2020 750
3 Jaguar 10/16/2020 1500
4 Mustang 10/19/2020 1100
5 Lamborghini 10/22/2020 1000
Sorted DataFrame by Reg_Price and Car:
Car Date_of_Purchase Reg_Price
2 Audi 10/17/2020 750
1 Lexus 10/12/2020 750
0 BMW 10/10/2020 1000
5 Lamborghini 10/22/2020 1000
4 Mustang 10/19/2020 1100
3 Jaguar 10/16/2020 1500
Using sort_values() without inplace
Using sort_values() without setting inplace=True returns a new sorted DataFrame, leaving the original unchanged ?
Example
import pandas as pd
import io
# Create sample CSV data
csv_data = """Car,Date_of_Purchase,Reg_Price
BMW,10/10/2020,1000
Lexus,10/12/2020,750
Audi,10/17/2020,750
Jaguar,10/16/2020,1500
Mustang,10/19/2020,1100
Lamborghini,10/22/2020,1000"""
# Read CSV data into DataFrame
dataFrame = pd.read_csv(io.StringIO(csv_data))
# Sort by multiple columns without modifying the original DataFrame
sortedDataFrame = dataFrame.sort_values(["Reg_Price", "Car"], axis=0, ascending=True, na_position='first')
print("Original DataFrame (unchanged):")
print(dataFrame)
print("\nSorted DataFrame:")
print(sortedDataFrame)
Original DataFrame (unchanged):
Car Date_of_Purchase Reg_Price
0 BMW 10/10/2020 1000
1 Lexus 10/12/2020 750
2 Audi 10/17/2020 750
3 Jaguar 10/16/2020 1500
4 Mustang 10/19/2020 1100
5 Lamborghini 10/22/2020 1000
Sorted DataFrame:
Car Date_of_Purchase Reg_Price
2 Audi 10/17/2020 750
1 Lexus 10/12/2020 750
0 BMW 10/10/2020 1000
5 Lamborghini 10/22/2020 1000
4 Mustang 10/19/2020 1100
3 Jaguar 10/16/2020 1500
Using sort_index()
The sort_index() method sorts a DataFrame by its index. After sorting by multiple columns, you can further sort by index to arrange rows in index order ?
Example
import pandas as pd
import io
# Create sample CSV data
csv_data = """Car,Date_of_Purchase,Reg_Price
BMW,10/10/2020,1000
Lexus,10/12/2020,750
Audi,10/17/2020,750
Jaguar,10/16/2020,1500
Mustang,10/19/2020,1100
Lamborghini,10/22/2020,1000"""
# Read CSV data into DataFrame
dataFrame = pd.read_csv(io.StringIO(csv_data))
# Sort by multiple columns first
sortedDataFrame = dataFrame.sort_values(["Reg_Price", "Car"], axis=0, ascending=True, na_position='first')
print("Sorted by columns:")
print(sortedDataFrame)
# Further sort by index to restore original index order
indexSortedDataFrame = sortedDataFrame.sort_index()
print("\nSorted by columns, then by index:")
print(indexSortedDataFrame)
Sorted by columns:
Car Date_of_Purchase Reg_Price
2 Audi 10/17/2020 750
1 Lexus 10/12/2020 750
0 BMW 10/10/2020 1000
5 Lamborghini 10/22/2020 1000
4 Mustang 10/19/2020 1100
3 Jaguar 10/16/2020 1500
Sorted by columns, then by index:
Car Date_of_Purchase Reg_Price
0 BMW 10/10/2020 1000
1 Lexus 10/12/2020 750
2 Audi 10/17/2020 750
3 Jaguar 10/16/2020 1500
4 Mustang 10/19/2020 1100
5 Lamborghini 10/22/2020 1000
Sort Parameters
Key parameters for sort_values() include ?
| Parameter | Description | Example |
|---|---|---|
by |
Column names to sort by | ["Reg_Price", "Car"] |
ascending |
Sort order (True/False or list) | [True, False] |
inplace |
Modify original DataFrame | True |
na_position |
Position of NaN values |
'first' or 'last'
|
Conclusion
Use sort_values() with column names to sort CSV data by multiple columns. Set inplace=True to modify the original DataFrame, or omit it to create a new sorted DataFrame. Use sort_index() for additional index-based sorting.
