Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Write multiple files data to master file
When working with multiple data files, you often need to combine them into a single master file. Python provides several approaches to merge multiple files, from basic file operations to using pandas for structured data.
Basic File Operations Approach
This method reads multiple text files and writes their content to a master file using standard file operations ?
import os
# Create sample data files
os.makedirs('data_files', exist_ok=True)
# Create sample files
with open('data_files/file1.txt', 'w') as f:
f.write('John,Developer,50000\n')
f.write('Alice,Designer,45000\n')
with open('data_files/file2.txt', 'w') as f:
f.write('Bob,Manager,60000\n')
f.write('Carol,Analyst,48000\n')
# List all files in the directory
file_list = os.listdir('data_files')
master_file = 'master.txt'
# Create master file with header
with open(master_file, 'w') as output:
output.write('Name,Position,Salary\n')
# Read each file and append to master
for filename in file_list:
if filename.endswith('.txt'):
file_path = os.path.join('data_files', filename)
with open(file_path, 'r') as input_file:
content = input_file.read()
output.write(content)
print("Files merged successfully!")
Files merged successfully!
Using Pandas for Structured Data
For CSV files or structured data, pandas provides a more efficient approach ?
import pandas as pd
import os
# Create sample CSV files
data1 = {'Name': ['John', 'Alice'],
'Position': ['Developer', 'Designer'],
'Salary': [50000, 45000]}
df1 = pd.DataFrame(data1)
df1.to_csv('emp_1.csv', index=False)
data2 = {'Name': ['Bob', 'Carol'],
'Position': ['Manager', 'Analyst'],
'Salary': [60000, 48000]}
df2 = pd.DataFrame(data2)
df2.to_csv('emp_2.csv', index=False)
# Read and combine CSV files
dataframes = []
csv_files = ['emp_1.csv', 'emp_2.csv']
for file in csv_files:
df = pd.read_csv(file)
dataframes.append(df)
# Concatenate all dataframes
combined_df = pd.concat(dataframes, ignore_index=True)
# Write to master file
combined_df.to_csv('master_employees.csv', index=False)
print("Master file created with pandas!")
print(combined_df)
Master file created with pandas!
Name Position Salary
0 John Developer 50000
1 Alice Designer 45000
2 Bob Manager 60000
3 Carol Analyst 48000
Batch Processing Multiple Files
For processing many files automatically, use glob pattern matching ?
import glob
import pandas as pd
# Create multiple sample files
for i in range(3):
data = {'ID': [i*2+1, i*2+2],
'Value': [10+i, 20+i]}
df = pd.DataFrame(data)
df.to_csv(f'data_{i}.csv', index=False)
# Use glob to find all CSV files
csv_files = glob.glob('data_*.csv')
print(f"Found files: {csv_files}")
# Read and combine all files
all_data = []
for file in csv_files:
df = pd.read_csv(file)
all_data.append(df)
# Merge all data
final_df = pd.concat(all_data, ignore_index=True)
final_df.to_csv('combined_data.csv', index=False)
print("Combined data:")
print(final_df)
Found files: ['data_0.csv', 'data_1.csv', 'data_2.csv'] Combined data: ID Value 0 1 10 1 2 10 2 3 11 3 4 11 4 5 12 5 6 12
Comparison of Methods
| Method | Best For | Advantages | Limitations |
|---|---|---|---|
| Basic File Operations | Simple text files | No dependencies | Manual handling |
| Pandas | Structured data (CSV) | Built-in data handling | Requires pandas |
| Glob Pattern | Many files | Automatic file discovery | Pattern-based only |
Conclusion
Use pandas for structured data like CSV files as it handles data types and formatting automatically. For simple text files, basic file operations work well. Use glob patterns when processing many files with similar names.
Advertisements
