Python Pandas- Create multiple CSV files from existing CSV file

Pandas provides powerful functionality to split CSV files into multiple files based on specific columns. This is useful when you need to segregate data by categories, such as creating separate files for different car brands, departments, or regions.

SalesRecords.csv Car | Date_of_Purchase BMW | 10/10/2020 Lexus | 10/12/2020 BMW | 10/17/2020 Jaguar | 10/16/2020 groupby('Car') BMW.csv BMW | 10/10/2020 BMW | 10/17/2020 Lexus.csv Lexus | 10/12/2020 Jaguar.csv Jaguar | 10/16/2020 Jaguar | 10/19/2020

Sample Input CSV File

Let's create a sample CSV file with car sales data to demonstrate the splitting process:

import pandas as pd

# Create sample data
data = {
    'Car': ['BMW', 'Lexus', 'BMW', 'Jaguar', 'Jaguar', 'BMW'],
    'Date_of_Purchase': ['10/10/2020', '10/12/2020', '10/17/2020', '10/16/2020', '10/19/2020', '10/22/2020']
}

df = pd.DataFrame(data)
print("Original CSV data:")
print(df)
Original CSV data:
     Car Date_of_Purchase
0    BMW        10/10/2020
1  Lexus        10/12/2020
2    BMW        10/17/2020
3 Jaguar        10/16/2020
4 Jaguar        10/19/2020
5    BMW        10/22/2020

Using groupby() to Split CSV Files

The groupby() method groups rows by the specified column values. We can iterate through these groups to create separate CSV files:

import pandas as pd

# Create sample data
data = {
    'Car': ['BMW', 'Lexus', 'BMW', 'Jaguar', 'Jaguar', 'BMW'],
    'Date_of_Purchase': ['10/10/2020', '10/12/2020', '10/17/2020', '10/16/2020', '10/19/2020', '10/22/2020']
}

df = pd.DataFrame(data)

# Group by Car column and create separate CSV files
for car_name, group in df.groupby('Car'):
    filename = f'{car_name}.csv'
    group.to_csv(filename, index=False)
    print(f"Created {filename} with {len(group)} records")
    print(group)
    print()
Created BMW.csv with 3 records
   Car Date_of_Purchase
0  BMW        10/10/2020
2  BMW        10/17/2020
5  BMW        10/22/2020

Created Jaguar.csv with 2 records
     Car Date_of_Purchase
3 Jaguar        10/16/2020
4 Jaguar        10/19/2020

Created Lexus.csv with 1 records
     Car Date_of_Purchase
1  Lexus        10/12/2020

Reading from Existing CSV File

To split an existing CSV file, first read it using read_csv(), then apply the same groupby logic:

import pandas as pd

# Read existing CSV file
dataFrame = pd.read_csv("SalesRecords.csv")
print("Input CSV file:")
print(dataFrame)

# Group by Car column and create separate CSV files
for car_name, group in dataFrame.groupby('Car'):
    group.to_csv(f'{car_name}.csv', index=False)
    
print("\nSeparate CSV files created successfully!")

Specifying Custom File Paths

You can specify custom directories for the output files:

import pandas as pd
import os

# Create output directory if it doesn't exist
output_dir = "car_files"
os.makedirs(output_dir, exist_ok=True)

# Read CSV and create separate files in the directory
dataFrame = pd.read_csv("SalesRecords.csv")

for car_name, group in dataFrame.groupby('Car'):
    filepath = os.path.join(output_dir, f'{car_name}.csv')
    group.to_csv(filepath, index=False)
    print(f"Created: {filepath}")

Key Parameters

Parameter Description Example
index=False Excludes row indices from CSV group.to_csv('file.csv', index=False)
header=True Includes column headers group.to_csv('file.csv', header=True)
Custom path Saves to specific directory group.to_csv('/path/file.csv')

Conclusion

Use groupby() with to_csv() to efficiently split large CSV files into smaller, category-based files. This approach is memory-efficient and works well for data segregation tasks. Remember to use index=False to avoid adding unnecessary row indices to your output files.

Updated on: 2026-03-26T13:45:21+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements