Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to load a TSV file into a Pandas Dataframe?
A TSV (Tab Separated Values) file is a text format where data columns are separated by tabs. Pandas provides two main methods to load TSV files into DataFrames: read_table() with delimiter='\t' and read_csv() with sep='\t'.
Method 1: Using read_table() with delimiter='\t'
The read_table() function is specifically designed for reading delimited text files ?
import pandas as pd
# Create a sample TSV data for demonstration
tsv_data = """Name Age City Salary
John 25 New York 50000
Alice 30 London 60000
Bob 28 Paris 55000
Carol 32 Tokyo 65000"""
# Save sample data to a TSV file
with open('sample.tsv', 'w') as f:
f.write(tsv_data)
# Load TSV file using read_table
df1 = pd.read_table('sample.tsv', delimiter='\t')
print(df1)
Name Age City Salary
0 John 25 New York 50000
1 Alice 30 London 60000
2 Bob 28 Paris 55000
3 Carol 32 Tokyo 65000
Using Additional Parameters
You can customize the loading with additional parameters ?
# Load with specific index column
df2 = pd.read_table('sample.tsv', delimiter='\t', index_col=0)
print("With index column:")
print(df2)
print("\nWith selected columns:")
# Load only specific columns
df3 = pd.read_table('sample.tsv', delimiter='\t', usecols=['Name', 'Salary'])
print(df3)
With index column:
Age City Salary
Name
John 25 New York 50000
Alice 30 London 60000
Bob 28 Paris 55000
Carol 32 Tokyo 65000
With selected columns:
Name Salary
0 John 50000
1 Alice 60000
2 Bob 55000
3 Carol 65000
Method 2: Using read_csv() with sep='\t'
The read_csv() function can also handle TSV files by specifying the separator ?
# Load TSV file using read_csv
df4 = pd.read_csv('sample.tsv', sep='\t')
print(df4)
# Get dataframe shape
print(f"\nDataFrame shape: {df4.shape}")
# Show column names
print(f"Columns: {list(df4.columns)}")
Name Age City Salary
0 John 25 New York 50000
1 Alice 30 London 60000
2 Bob 28 Paris 55000
3 Carol 32 Tokyo 65000
DataFrame shape: (4, 4)
Columns: ['Name', 'Age', 'City', 'Salary']
Handling Large TSV Files
For large files, you can skip rows or load data in chunks ?
# Skip the first row (useful for files with metadata)
df5 = pd.read_table('sample.tsv', delimiter='\t', skiprows=1)
print("After skipping first row:")
print(df5)
# Load only first 2 rows
df6 = pd.read_table('sample.tsv', delimiter='\t', nrows=2)
print("\nFirst 2 rows only:")
print(df6)
After skipping first row:
John 25 New York 50000
0 Alice 30 London 60000
1 Bob 28 Paris 55000
2 Carol 32 Tokyo 65000
First 2 rows only:
Name Age City Salary
0 John 25 New York 50000
1 Alice 30 London 60000
Comparison
| Method | Function | Best For | Default Delimiter |
|---|---|---|---|
| Method 1 | read_table() |
Tab-delimited files | Tab (\t) |
| Method 2 | read_csv() |
Any delimited files | Comma (,) |
Common Parameters
delimiterorsepSpecify the field separatorindex_colSet a column as the DataFrame indexusecolsLoad only specific columnsskiprowsSkip specified number of rowsnrowsLoad only first n rows
Conclusion
Both read_table() and read_csv() can load TSV files effectively. Use read_table() for tab-delimited files by default, or read_csv() with sep='\t' for more flexibility with various delimited formats.
