Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas- Create multiple CSV files from existing CSV file
Pandas provides powerful functionality to split CSV files into multiple files based on specific columns. This is useful when you need to segregate data by categories, such as creating separate files for different car brands, departments, or regions.
Sample Input CSV File
Let's create a sample CSV file with car sales data to demonstrate the splitting process:
import pandas as pd
# Create sample data
data = {
'Car': ['BMW', 'Lexus', 'BMW', 'Jaguar', 'Jaguar', 'BMW'],
'Date_of_Purchase': ['10/10/2020', '10/12/2020', '10/17/2020', '10/16/2020', '10/19/2020', '10/22/2020']
}
df = pd.DataFrame(data)
print("Original CSV data:")
print(df)
Original CSV data:
Car Date_of_Purchase
0 BMW 10/10/2020
1 Lexus 10/12/2020
2 BMW 10/17/2020
3 Jaguar 10/16/2020
4 Jaguar 10/19/2020
5 BMW 10/22/2020
Using groupby() to Split CSV Files
The groupby() method groups rows by the specified column values. We can iterate through these groups to create separate CSV files:
import pandas as pd
# Create sample data
data = {
'Car': ['BMW', 'Lexus', 'BMW', 'Jaguar', 'Jaguar', 'BMW'],
'Date_of_Purchase': ['10/10/2020', '10/12/2020', '10/17/2020', '10/16/2020', '10/19/2020', '10/22/2020']
}
df = pd.DataFrame(data)
# Group by Car column and create separate CSV files
for car_name, group in df.groupby('Car'):
filename = f'{car_name}.csv'
group.to_csv(filename, index=False)
print(f"Created {filename} with {len(group)} records")
print(group)
print()
Created BMW.csv with 3 records
Car Date_of_Purchase
0 BMW 10/10/2020
2 BMW 10/17/2020
5 BMW 10/22/2020
Created Jaguar.csv with 2 records
Car Date_of_Purchase
3 Jaguar 10/16/2020
4 Jaguar 10/19/2020
Created Lexus.csv with 1 records
Car Date_of_Purchase
1 Lexus 10/12/2020
Reading from Existing CSV File
To split an existing CSV file, first read it using read_csv(), then apply the same groupby logic:
import pandas as pd
# Read existing CSV file
dataFrame = pd.read_csv("SalesRecords.csv")
print("Input CSV file:")
print(dataFrame)
# Group by Car column and create separate CSV files
for car_name, group in dataFrame.groupby('Car'):
group.to_csv(f'{car_name}.csv', index=False)
print("\nSeparate CSV files created successfully!")
Specifying Custom File Paths
You can specify custom directories for the output files:
import pandas as pd
import os
# Create output directory if it doesn't exist
output_dir = "car_files"
os.makedirs(output_dir, exist_ok=True)
# Read CSV and create separate files in the directory
dataFrame = pd.read_csv("SalesRecords.csv")
for car_name, group in dataFrame.groupby('Car'):
filepath = os.path.join(output_dir, f'{car_name}.csv')
group.to_csv(filepath, index=False)
print(f"Created: {filepath}")
Key Parameters
| Parameter | Description | Example |
|---|---|---|
index=False |
Excludes row indices from CSV | group.to_csv('file.csv', index=False) |
header=True |
Includes column headers | group.to_csv('file.csv', header=True) |
| Custom path | Saves to specific directory | group.to_csv('/path/file.csv') |
Conclusion
Use groupby() with to_csv() to efficiently split large CSV files into smaller, category-based files. This approach is memory-efficient and works well for data segregation tasks. Remember to use index=False to avoid adding unnecessary row indices to your output files.
