Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Merge all CSV Files into a single dataframe – Python Pandas?
Merging multiple CSV files into a single DataFrame is a common task in data analysis. Python provides the glob module to find files matching patterns, and Pandas concat() to combine them efficiently.
-
Using os.path.join() and glob
-
Direct glob pattern matching
-
Merging files in specific order
Using os.path.join() and glob
The os.path.join() method constructs file paths safely across different operating systems, while glob.glob() finds files matching a pattern.
Example
Let's first create sample CSV files to demonstrate the merging process ?
import pandas as pd
import os
# Create sample CSV files
data1 = {'Car': ['Audi', 'Porsche', 'Rolls Royce'],
'Place': ['Bangalore', 'Mumbai', 'Pune'],
'UnitsSold': [80, 110, 100]}
df1 = pd.DataFrame(data1)
df1.to_csv('sales1.csv', index=False)
data2 = {'Car': ['BMW', 'Mercedes', 'Lamborghini'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh'],
'UnitsSold': [95, 80, 80]}
df2 = pd.DataFrame(data2)
df2.to_csv('sales2.csv', index=False)
data3 = {'Car': ['Volvo', 'Hyundai', 'Toyota'],
'Place': ['Rajasthan', 'Manipur', 'HP'],
'UnitsSold': [150, 120, 70]}
df3 = pd.DataFrame(data3)
df3.to_csv('sales3.csv', index=False)
print("Sample CSV files created successfully!")
Sample CSV files created successfully!
Now let's merge all CSV files using os.path.join() and glob ?
import pandas as pd
import glob
import os
# Setting the path pattern for CSV files
files = os.path.join(".", "sales*.csv")
# Get list of all matching files
files = glob.glob(files)
print("Files found:", files)
# Merge all CSV files using concat
merged_df = pd.concat(map(pd.read_csv, files), ignore_index=True)
print("\nMerged DataFrame:")
print(merged_df)
Files found: ['./sales1.csv', './sales2.csv', './sales3.csv']
Merged DataFrame:
Car Place UnitsSold
0 Audi Bangalore 80
1 Porsche Mumbai 110
2 Rolls Royce Pune 100
3 BMW Delhi 95
4 Mercedes Hyderabad 80
5 Lamborghini Chandigarh 80
6 Volvo Rajasthan 150
7 Hyundai Manipur 120
8 Toyota HP 70
Direct glob Pattern Matching
When you know the exact file location, you can use glob directly with a hardcoded path pattern ?
import pandas as pd
import glob
# Fetch all CSV files directly using glob pattern
files = glob.glob("sales*.csv")
# Use list comprehension to read and concatenate
merged_df = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)
print("Merged DataFrame using direct glob pattern:")
print(merged_df)
Merged DataFrame using direct glob pattern:
Car Place UnitsSold
0 Audi Bangalore 80
1 Porsche Mumbai 110
2 Rolls Royce Pune 100
3 BMW Delhi 95
4 Mercedes Hyderabad 80
5 Lamborghini Chandigarh 80
6 Volvo Rajasthan 150
7 Hyundai Manipur 120
8 Toyota HP 70
Merging Files in Specific Order
To merge files in a specific order, sort the file list before concatenation using sort() ?
import pandas as pd
import glob
# Get all CSV files matching the pattern
files = glob.glob("sales*.csv")
# Sort files alphabetically to ensure consistent order
files.sort()
print("Files in sorted order:", files)
# Concatenate in sorted order
merged_df = pd.concat(map(pd.read_csv, files), ignore_index=True)
print("\nMerged DataFrame (sorted file order):")
print(merged_df)
Files in sorted order: ['sales1.csv', 'sales2.csv', 'sales3.csv']
Merged DataFrame (sorted file order):
Car Place UnitsSold
0 Audi Bangalore 80
1 Porsche Mumbai 110
2 Rolls Royce Pune 100
3 BMW Delhi 95
4 Mercedes Hyderabad 80
5 Lamborghini Chandigarh 80
6 Volvo Rajasthan 150
7 Hyundai Manipur 120
8 Toyota HP 70
Comparison of Methods
| Method | Best For | Advantages |
|---|---|---|
| os.path.join() + glob | Cross-platform compatibility | Safe path handling |
| Direct glob pattern | Simple, known paths | Concise code |
| Sorted file order | Consistent results | Predictable merge order |
Conclusion
Use glob with pd.concat() to efficiently merge multiple CSV files. Sort the file list when order matters, and use os.path.join() for cross-platform path handling.
