How to merge multiple excel files into a single file with Python?


Excel is one of the most popular tools for data analysis and management. Often, we need to merge multiple Excel files into a single file for analysis or sharing with others. Manually merging these files can be time−consuming and prone to errors, especially when dealing with large datasets. Luckily, Python provides an efficient and flexible way to merge multiple Excel files into a single file.

In this article, we will learn how to merge multiple Excel files using Python. We will be using the Pandas library, which is a powerful and easy−to−use data analysis library for Python to merge the different Excel files in Python.

What is Pandas library in Python?

Pandas is a Python library used for working with data sets. It consists of functions for analyzing, cleaning, exploring, and manipulating data. Apart from this, it allows us to analyze big data and make conclusions based on statistical theories. It can also clean messy data sets, and make them readable and relevant.

When merging Excel files with Pandas, we can use the read_excel() function to read the data from each file into a Pandas DataFrame object. Once we have all the data in separate DataFrames, we can use Pandas' built−in functions like merge(), join(), and concat() to combine the data frames into a single DataFrame.

Pandas' merge() function provides the ability to merge DataFrames based on specific columns, similar to the VLOOKUP function in Excel. We can specify which columns to use as keys to join the data frames and the type of join to perform (inner, outer, left, or right).

Steps to merge multiple Excel files into a single file

Step 1: Import the libraries

The first step to merge the different Excel files is to import the required libraries. We will be using Pandas to read and merge the Excel files, and OS to navigate the file system. Below is the code to import these libraries:

import pandas as pd
import os

Step 2: Identifying the Files to Merge

The next step is to identify the files that we want to merge. We can either specify the exact file names or use a pattern to match multiple files. In this example, we will use the latter approach, where we will match all files in a specific directory that have the extension ".xlsx". Below is the code to get the list of files to merge:

path = "path/to/files/"
files = os.listdir(path)
files_xlsx = [f for f in files if f[-4:] == 'xlsx']

Step 3: Reading your Excel Files

After identifying the files to merge, we will now read the different Excel files into Pandas data frames. We can use a loop to read all files into separate data frames and then concatenate them into a single data frame.

The input Excel files are as follows:

#File1.xlsx
Product	Quantity	Price
Apple	10	0.5
Banana	5	0.25

#File2.xlsx
Product	Quantity	Price
Oranges	8	0.4
Grapes	12	0.3

#File3.xlsx
Product	Quantity	Price
Mango	15	0.6
Pineapple	7	0.8

Below is the code to read the Excel files:

df_list = []
for f in files_xlsx:
    data = pd.read_excel(os.path.join(path, f))
    df_list.append(data)
df = pd.concat(df_list)

Step 4: Create a new merged Excel file

The final step to merge the files is to write the merged data frame to a new Excel file. We can use the Pandas to_excel method to do this. Below is the code to write the merged data frame to a new Excel file:

output_path = "path/to/output_file.xlsx"
df.to_excel(output_path, index=False)

Now let’s see the complete examples to merge multiple excel files into a single file in Python using different approaches like pd.concat and using for loop.

Example 1: Using ‘pd.concat()’

In the below example, we first import the required libraries and identified the files to merge. We then identify the files to merge by listing all files in the specified directory that end with .xlsx. Next, we read each Excel file into a data frame and store them in a list called df_list.

import pandas as pd
import os

# Identifying the files to merge
path = "yourpath/to/files/"
files = os.listdir(path)
files_xlsx = [f for f in files if f.endswith('.xlsx')]

# Reading the Excel files and concatenate them
mydf_list = [pd.read_excel(os.path.join(path, f)) for f in files_xlsx]
mydf = pd.concat(mydf_list)

# Writing the merged data frame to a new Excel file
myoutput_path = "path/to/myoutput_file.xlsx"
mydf.to_excel(myoutput_path, index=False)

Output

The final output after merging three .xlsx files:

Product     Quantity    Price
Apple       10          0.5
Banana      5           0.25
Orange      8           0.4
Grapes      12          0.3
Mango       15          0.6
Pineapple   7           0.8

Example 2: Using a for loop and pd.concat()

In the below example, we first import the required libraries and identified the files to merge, just like in the first example. We then initialize an empty data frame called merged_df. Next, we loop through each Excel file and read it into a data frame. We use the concat() function to concatenate the current data frame with the previously merged data frames and store the result in merged_df.

import pandas as pd
import os

# Identifying the files to merge
path = "yourpath/to/files/"
files = os.listdir(path)
files_xlsx = [f for f in files if f.endswith('.xlsx')]

# Initialize an empty data frame
mymerged_df = pd.DataFrame()

# Reading the Excel files and concatenate them
for file in files_xlsx:
    data = pd.read_excel(os.path.join(path, file))
    mymerged_df = pd.concat([mymerged_df, data], ignore_index=True)

# Writing the merged data frame to a new Excel file
myoutput_path = "yourpath/to/myoutput_file.xlsx"

Output

The final output after merging three .xlsx files:

Product     Quantity    Price
Apple       10          0.5
Banana      5           0.25
Orange      8           0.4
Grapes      12          0.3
Mango       15          0.6
Pineapple   7           0.8

Example 3: Using pd.merge()

In the below example, we first import the required libraries and identified the files to merge, just like the other two examples. Now, we have read the two Excel files using the pd.read_excel() function e the pd.merge() function to merge the two data frames based on a common column that we specify with the on parameter.

import pandas as pd

# read in the first Excel file
mydf1 = pd.read_excel('myfile1.xlsx')

# read in the second Excel file
mydf2 = pd.read_excel('myfile2.xlsx')

# merge the two data frames based on a common column
mymerged_df = pd.merge(mydf1, mydf2, on='common_column')

# write the merged data frame to a new Excel file
mymerged_df.to_excel('mymerged_file.xlsx', index=False)

Output

The final output after merging three .xlsx files:

Product     Quantity    Price
Apple       10          0.5
Banana      5           0.25
Orange      8           0.4
Grapes      12          0.3
Mango       15          0.6
Pineapple   7           0.8

Conclusion

Merging multiple Excel files into a single file using Python can save you time and reduce errors when dealing with large datasets. With the help of Pandas, it is a straightforward process. In this article, we learned how to merge multiple Excel files using Python by following a few simple steps. We provided two examples that demonstrate how to merge multiple Excel files using the concat() function and a loop. Overall, with Python's flexibility and efficiency, merging multiple Excel files has never been easier.

Updated on: 31-Aug-2023

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements