Article Categories

Selected Reading

Python Pandas- Create multiple CSV files from existing CSV file

Python Server Side Programming Programming

Pandas provides powerful functionality to split CSV files into multiple files based on specific columns. This is useful when you need to segregate data by categories, such as creating separate files for different car brands, departments, or regions.

Sample Input CSV File

Let's create a sample CSV file with car sales data to demonstrate the splitting process:

import pandas as pd

# Create sample data
data = {
    'Car': ['BMW', 'Lexus', 'BMW', 'Jaguar', 'Jaguar', 'BMW'],
    'Date_of_Purchase': ['10/10/2020', '10/12/2020', '10/17/2020', '10/16/2020', '10/19/2020', '10/22/2020']
}

df = pd.DataFrame(data)
print("Original CSV data:")
print(df)

Original CSV data:
     Car Date_of_Purchase
0    BMW        10/10/2020
1  Lexus        10/12/2020
2    BMW        10/17/2020
3 Jaguar        10/16/2020
4 Jaguar        10/19/2020
5    BMW        10/22/2020

Using groupby() to Split CSV Files

The groupby() method groups rows by the specified column values. We can iterate through these groups to create separate CSV files:

import pandas as pd

# Create sample data
data = {
    'Car': ['BMW', 'Lexus', 'BMW', 'Jaguar', 'Jaguar', 'BMW'],
    'Date_of_Purchase': ['10/10/2020', '10/12/2020', '10/17/2020', '10/16/2020', '10/19/2020', '10/22/2020']
}

df = pd.DataFrame(data)

# Group by Car column and create separate CSV files
for car_name, group in df.groupby('Car'):
    filename = f'{car_name}.csv'
    group.to_csv(filename, index=False)
    print(f"Created {filename} with {len(group)} records")
    print(group)
    print()

Created BMW.csv with 3 records
   Car Date_of_Purchase
0  BMW        10/10/2020
2  BMW        10/17/2020
5  BMW        10/22/2020

Created Jaguar.csv with 2 records
     Car Date_of_Purchase
3 Jaguar        10/16/2020
4 Jaguar        10/19/2020

Created Lexus.csv with 1 records
     Car Date_of_Purchase
1  Lexus        10/12/2020

Reading from Existing CSV File

To split an existing CSV file, first read it using read_csv(), then apply the same groupby logic:

import pandas as pd

# Read existing CSV file
dataFrame = pd.read_csv("SalesRecords.csv")
print("Input CSV file:")
print(dataFrame)

# Group by Car column and create separate CSV files
for car_name, group in dataFrame.groupby('Car'):
    group.to_csv(f'{car_name}.csv', index=False)
    
print("\nSeparate CSV files created successfully!")

Specifying Custom File Paths

You can specify custom directories for the output files:

import pandas as pd
import os

# Create output directory if it doesn't exist
output_dir = "car_files"
os.makedirs(output_dir, exist_ok=True)

# Read CSV and create separate files in the directory
dataFrame = pd.read_csv("SalesRecords.csv")

for car_name, group in dataFrame.groupby('Car'):
    filepath = os.path.join(output_dir, f'{car_name}.csv')
    group.to_csv(filepath, index=False)
    print(f"Created: {filepath}")

Key Parameters

Parameter	Description	Example
`index=False`	Excludes row indices from CSV	`group.to_csv('file.csv', index=False)`
`header=True`	Includes column headers	`group.to_csv('file.csv', header=True)`
Custom path	Saves to specific directory	`group.to_csv('/path/file.csv')`

Conclusion

Use groupby() with to_csv() to efficiently split large CSV files into smaller, category-based files. This approach is memory-efficient and works well for data segregation tasks. Remember to use index=False to avoid adding unnecessary row indices to your output files.

AmitDiwan

Updated on: 2026-03-26T13:45:21+05:30

1K+ Views

Previous Next