Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Read all CSV files in a folder in Pandas?
Reading all CSV files from a folder is a common data processing task. Python's glob module combined with Pandas' read_csv() method provides an efficient solution for batch processing multiple CSV files.
Setting Up the File Path
First, we need to specify the directory path containing our CSV files. For this example, we'll use a relative path that works across different systems ?
import pandas as pd import glob import os # Set the path to your CSV files directory path = "data/" # Using relative path for better portability
Finding CSV Files with Glob
The glob module uses pattern matching to find all files with the .csv extension ?
import pandas as pd
import glob
# Create sample CSV files for demonstration
sample_data1 = {'Car': ['Audi', 'Porsche', 'RollsRoyce'],
'Place': ['Bangalore', 'Mumbai', 'Pune'],
'UnitsSold': [80, 110, 100]}
sample_data2 = {'Car': ['BMW', 'Mercedes', 'Lamborghini'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh'],
'UnitsSold': [95, 80, 80]}
# In practice, you would have actual CSV files in a directory
# For this demo, we'll simulate finding files
csv_files = ['sales1.csv', 'sales2.csv']
print('CSV files found:', csv_files)
CSV files found: ['sales1.csv', 'sales2.csv']
Reading Multiple CSV Files
Loop through each CSV file and read it using pd.read_csv() ?
import pandas as pd
# Sample data representing CSV file contents
csv_data = {
'sales1.csv': pd.DataFrame({
'Car': ['Audi', 'Porsche', 'RollsRoyce'],
'Place': ['Bangalore', 'Mumbai', 'Pune'],
'UnitsSold': [80, 110, 100]
}),
'sales2.csv': pd.DataFrame({
'Car': ['BMW', 'Mercedes', 'Lamborghini'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh'],
'UnitsSold': [95, 80, 80]
})
}
# Simulate reading CSV files
for filename, data in csv_data.items():
print(f"\nReading file: {filename}")
print(data)
Reading file: sales1.csv
Car Place UnitsSold
0 Audi Bangalore 80
1 Porsche Mumbai 110
2 RollsRoyce Pune 100
Reading file: sales2.csv
Car Place UnitsSold
0 BMW Delhi 95
1 Mercedes Hyderabad 80
2 Lamborghini Chandigarh 80
Combining All CSV Files
Often you'll want to combine all CSV files into a single DataFrame for analysis ?
import pandas as pd
# Sample DataFrames representing CSV files
df1 = pd.DataFrame({
'Car': ['Audi', 'Porsche', 'RollsRoyce'],
'Place': ['Bangalore', 'Mumbai', 'Pune'],
'UnitsSold': [80, 110, 100]
})
df2 = pd.DataFrame({
'Car': ['BMW', 'Mercedes', 'Lamborghini'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh'],
'UnitsSold': [95, 80, 80]
})
# Combine all DataFrames
all_dataframes = [df1, df2]
combined_df = pd.concat(all_dataframes, ignore_index=True)
print("Combined DataFrame:")
print(combined_df)
Combined DataFrame:
Car Place UnitsSold
0 Audi Bangalore 80
1 Porsche Mumbai 110
2 RollsRoyce Pune 100
3 BMW Delhi 95
4 Mercedes Hyderabad 80
5 Lamborghini Chandigarh 80
Complete Working Example
Here's the complete code structure for reading all CSV files in a folder ?
import pandas as pd
import glob
# Set path to your CSV files directory
path = "/path/to/your/csv/files/"
# Find all CSV files in the directory
csv_files = glob.glob(path + "*.csv")
print('CSV files found:', csv_files)
# Read and process each CSV file
dataframes = []
for file in csv_files:
print(f"\nReading file: {file}")
df = pd.read_csv(file)
print(df)
dataframes.append(df)
# Optional: Combine all DataFrames
if dataframes:
combined_df = pd.concat(dataframes, ignore_index=True)
print("\nCombined DataFrame:")
print(combined_df)
Key Points
- Use
glob.glob()with pattern"*.csv"to find CSV files - Loop through the file list to process each CSV individually
- Use
pd.concat()to combine multiple DataFrames if needed - Use relative paths for better code portability
Conclusion
The combination of glob and pd.read_csv() provides an efficient way to process multiple CSV files. This approach is essential for batch data processing and analysis workflows.
