How to merge many TSV files by common key using Python Pandas?

If you work with data, you've probably had to deal with the challenge of merging multiple files into one cohesive dataset. This task can be particularly difficult if you're working with tab-separated values (TSV) files. Fortunately, the Python Pandas library provides a straightforward solution for merging TSV files by a common key.

In this article, we'll learn how to merge multiple TSV files using Python Pandas. We'll explore different merging techniques including merge() for joining by common keys and concat() for combining files with identical structures.

What are TSV Files?

TSV files are a type of delimited text file that uses tabs as the delimiter between values. This format is similar to comma-separated values (CSV) files, which use commas as the delimiter. TSV files are often used to represent data that contains commas, such as addresses or names that include a comma. By using tabs instead of commas, TSV files can avoid the confusion that might arise from the presence of commas in the data.

Pandas provides a wide range of functions for manipulating and analyzing data, including functions for reading and writing TSV files using read_csv() with sep='\t' parameter.

Method 1: Using merge() for Common Key Joins

When your TSV files have related data that can be joined by a common column, use the merge() function. Let's create sample data and demonstrate merging by common keys ?

import pandas as pd

# Create sample customers data
customers_data = {
    'customer_id': [1, 2, 3, 4],
    'name': ['John', 'Emily', 'David', 'Sophie'],
    'email': ['john@email.com', 'emily@email.com', 'david@email.com', 'sophie@email.com']
}
customers = pd.DataFrame(customers_data)

# Create sample orders data  
orders_data = {
    'customer_id': [1, 2, 3, 1],
    'order_id': [101, 102, 103, 104],
    'amount': [250.50, 180.75, 320.00, 125.25]
}
orders = pd.DataFrame(orders_data)

# Merge DataFrames by common key
merged_data = pd.merge(customers, orders, on='customer_id')
print(merged_data)
   customer_id    name           email  order_id  amount
0            1    John    john@email.com       101  250.50
1            1    John    john@email.com       104  125.25
2            2   Emily   emily@email.com       102  180.75
3            3   David   david@email.com       103  320.00

Multiple File Merge

For multiple files, merge them step by step ?

import pandas as pd

# Create sample product data
products_data = {
    'order_id': [101, 102, 103, 104],
    'product_name': ['Laptop', 'Phone', 'Tablet', 'Headphones'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Accessories']
}
products = pd.DataFrame(products_data)

# Merge all three DataFrames
step1 = pd.merge(customers, orders, on='customer_id')
final_merged = pd.merge(step1, products, on='order_id')
print(final_merged)
   customer_id    name           email  order_id  amount product_name      category
0            1    John    john@email.com       101  250.50       Laptop  Electronics
1            1    John    john@email.com       104  125.25   Headphones  Accessories
2            2   Emily   emily@email.com       102  180.75        Phone  Electronics
3            3   David   david@email.com       103  320.00       Tablet  Electronics

Method 2: Using concat() for Identical Structures

When your TSV files have identical column structures and you want to combine them vertically, use concat() ?

import pandas as pd

# Create sample data for three regions
region1_data = {
    'id': [1, 2],
    'name': ['John', 'Emily'],
    'region': ['North', 'North']
}
region1 = pd.DataFrame(region1_data)

region2_data = {
    'id': [3, 4], 
    'name': ['David', 'Sophie'],
    'region': ['South', 'South']
}
region2 = pd.DataFrame(region2_data)

region3_data = {
    'id': [5, 6],
    'name': ['Michael', 'Emma'], 
    'region': ['East', 'East']
}
region3 = pd.DataFrame(region3_data)

# Combine DataFrames using concat()
combined_data = pd.concat([region1, region2, region3], ignore_index=True)
print(combined_data)
   id     name region
0   1     John  North
1   2    Emily  North
2   3    David  South
3   4   Sophie  South
4   5  Michael   East
5   6     Emma   East

Reading and Saving TSV Files

Here's how to work with actual TSV files in practice ?

import pandas as pd

# Reading TSV files
df1 = pd.read_csv('data1.tsv', sep='\t')
df2 = pd.read_csv('data2.tsv', sep='\t')
df3 = pd.read_csv('data3.tsv', sep='\t')

# Merge by common key
merged = pd.merge(df1, df2, on='common_column')
final = pd.merge(merged, df3, on='another_column')

# Save merged data to TSV file
final.to_csv('merged_output.tsv', sep='\t', index=False)

Comparison of Methods

Method Use Case Requirements Output
merge() Join related data Common key column Horizontally combined data
concat() Combine similar data Same column structure Vertically stacked data

Conclusion

Use merge() when combining TSV files with related data through common keys, and concat() when stacking files with identical structures. Both methods provide efficient ways to consolidate multiple TSV files into a single dataset using Pandas.

Updated on: 2026-03-27T14:06:53+05:30

846 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements