Article Categories

Selected Reading

How to merge multiple excel files into a single file with Python?

Python Server Side Programming Programming

Excel is one of the most popular tools for data analysis and management. Often, we need to merge multiple Excel files into a single file for analysis or sharing with others. Manually merging these files can be time?consuming and prone to errors, especially when dealing with large datasets. Luckily, Python provides an efficient and flexible way to merge multiple Excel files into a single file.

In this article, we will learn how to merge multiple Excel files using Python. We will be using the Pandas library, which is a powerful and easy?to?use data analysis library for Python to merge the different Excel files in Python.

What is Pandas library in Python?

Pandas is a Python library used for working with data sets. It consists of functions for analyzing, cleaning, exploring, and manipulating data. Apart from this, it allows us to analyze big data and make conclusions based on statistical theories. It can also clean messy data sets, and make them readable and relevant.

When merging Excel files with Pandas, we can use the read_excel() function to read the data from each file into a Pandas DataFrame object. Once we have all the data in separate DataFrames, we can use Pandas' built?in functions like merge(), join(), and concat() to combine the data frames into a single DataFrame.

Pandas' merge() function provides the ability to merge DataFrames based on specific columns, similar to the VLOOKUP function in Excel. We can specify which columns to use as keys to join the data frames and the type of join to perform (inner, outer, left, or right).

Steps to merge multiple Excel files into a single file

Step 1: Import the libraries

The first step to merge the different Excel files is to import the required libraries. We will be using Pandas to read and merge the Excel files, and OS to navigate the file system ?

import pandas as pd
import os

Step 2: Identifying the Files to Merge

The next step is to identify the files that we want to merge. We can either specify the exact file names or use a pattern to match multiple files. In this example, we will use the latter approach, where we will match all files in a specific directory that have the extension ".xlsx" ?

import pandas as pd
import os

path = "/path/to/files/"
files = os.listdir(path)
files_xlsx = [f for f in files if f.endswith('.xlsx')]
print("Excel files found:", files_xlsx)

Excel files found: ['file1.xlsx', 'file2.xlsx', 'file3.xlsx']

Step 3: Reading your Excel Files

After identifying the files to merge, we will now read the different Excel files into Pandas data frames. We can use a loop to read all files into separate data frames and then concatenate them into a single data frame.

Let's create sample data to demonstrate the merging process ?

import pandas as pd

# Create sample DataFrames representing different Excel files
df1 = pd.DataFrame({
    'Product': ['Apple', 'Banana'],
    'Quantity': [10, 5],
    'Price': [0.5, 0.25]
})

df2 = pd.DataFrame({
    'Product': ['Orange', 'Grapes'],
    'Quantity': [8, 12],
    'Price': [0.4, 0.3]
})

df3 = pd.DataFrame({
    'Product': ['Mango', 'Pineapple'],
    'Quantity': [15, 7],
    'Price': [0.6, 0.8]
})

print("Sample data from file 1:")
print(df1)

Sample data from file 1:
  Product  Quantity  Price
0   Apple        10   0.50
1  Banana         5   0.25

Step 4: Create a new merged Excel file

The final step to merge the files is to write the merged data frame to a new Excel file. We can use the Pandas to_excel() method to do this ?

# Merge all DataFrames
df_list = [df1, df2, df3]
merged_df = pd.concat(df_list, ignore_index=True)

print("Merged DataFrame:")
print(merged_df)

Merged DataFrame:
    Product  Quantity  Price
0     Apple        10   0.50
1    Banana         5   0.25
2    Orange         8   0.40
3    Grapes        12   0.30
4     Mango        15   0.60
5  Pineapple         7   0.80

Method 1: Using pd.concat() with List Comprehension

This is the most efficient approach for merging multiple Excel files. We read all files into a list and concatenate them in one operation ?

import pandas as pd
import os

# Identifying the files to merge
path = "/path/to/files/"
files = os.listdir(path)
files_xlsx = [f for f in files if f.endswith('.xlsx')]

# Reading and concatenating Excel files in one step
df_list = [pd.read_excel(os.path.join(path, f)) for f in files_xlsx]
merged_df = pd.concat(df_list, ignore_index=True)

# Writing the merged data frame to a new Excel file
output_path = "/path/to/merged_file.xlsx"
merged_df.to_excel(output_path, index=False)

Method 2: Using a for loop with pd.concat()

This approach processes files one by one, which can be useful when dealing with very large files or when you need to perform operations on each file before merging ?

import pandas as pd

# Sample DataFrames representing Excel files
files_data = [
    pd.DataFrame({'Product': ['Apple', 'Banana'], 'Quantity': [10, 5], 'Price': [0.5, 0.25]}),
    pd.DataFrame({'Product': ['Orange', 'Grapes'], 'Quantity': [8, 12], 'Price': [0.4, 0.3]}),
    pd.DataFrame({'Product': ['Mango', 'Pineapple'], 'Quantity': [15, 7], 'Price': [0.6, 0.8]})
]

# Initialize an empty DataFrame
merged_df = pd.DataFrame()

# Reading and concatenating files using a loop
for i, data in enumerate(files_data):
    print(f"Processing file {i+1}:")
    print(data)
    merged_df = pd.concat([merged_df, data], ignore_index=True)

print("\nFinal merged DataFrame:")
print(merged_df)

Processing file 1:
  Product  Quantity  Price
0   Apple        10   0.50
1  Banana         5   0.25
Processing file 2:
  Product  Quantity  Price
0  Orange         8   0.40
1  Grapes        12   0.30
Processing file 3:
    Product  Quantity  Price
0     Mango        15   0.60
1  Pineapple         7   0.80

Final merged DataFrame:
    Product  Quantity  Price
0     Apple        10   0.50
1    Banana         5   0.25
2    Orange         8   0.40
3    Grapes        12   0.30
4     Mango        15   0.60
5  Pineapple         7   0.80

Method 3: Using pd.merge() for Column-based Merging

When you need to merge files based on common columns (similar to SQL joins), use pd.merge() instead of concat() ?

import pandas as pd

# Create sample DataFrames with common columns
df1 = pd.DataFrame({
    'Product': ['Apple', 'Banana', 'Orange'],
    'Quantity': [10, 5, 8]
})

df2 = pd.DataFrame({
    'Product': ['Apple', 'Banana', 'Orange'],
    'Price': [0.5, 0.25, 0.4]
})

# Merge based on common 'Product' column
merged_df = pd.merge(df1, df2, on='Product')
print("Merged DataFrame based on Product column:")
print(merged_df)

Merged DataFrame based on Product column:
  Product  Quantity  Price
0   Apple        10   0.50
1  Banana         5   0.25
2  Orange         8   0.40

Comparison

Method	Best For	Performance	Use Case
`pd.concat()` with list comprehension	Multiple files with same structure	Fastest	Stacking data vertically
For loop with `pd.concat()`	Large files or custom processing	Moderate	When memory is limited
`pd.merge()`	Files with common columns	Depends on data size	Joining based on keys

Conclusion

Merging multiple Excel files into a single file using Python saves time and reduces errors when dealing with large datasets. Use pd.concat() for stacking similar structured files and pd.merge() for combining files based on common columns. With Python's Pandas library, merging Excel files becomes an efficient and automated process.

Tarun Singh

Updated on: 2026-03-27T14:07:27+05:30

12K+ Views

Previous Next