How to load a TSV file into a Pandas Dataframe?


Sometimes, the task is to analyze a dataset and use the data from a TSV (Tab Separated Values) file. For this, the TSV file is sometimes converted to a dataframe. A dataframe is a labeled two-dimensional structure that has different types of columns. In this article, using two different examples, this Python library called pandas is used with Python code to read a TSV file and load it into a dataframe. For these examples, a Zomato dataset given on Kaggle is used. The Kaggle dataset was available in CSV (Comma Separated Values) format so it was first downloaded and then it was converted into TSV format using an online software link. In the first example, the Python installed in the computer system is used to run a Python program that is written for reading the TSV file into a dataframe by using a pandas function. In another example, using the Google Colab the method is shown where without having Python installed in the computer, Python and pandas can still be used and a TVS file is read into a dataframe after using another function.

The Zomato.TSV file used

Fig; This tsv file contains 9551 rows and 21 columns.

Example 1: Load a TSV file into a Pandas DataFrame - using read_table function with delimiter='\t'

Design Steps and Coding

  • Step 1 − First import pandas. Pandas is an open-source, easy-to-use, and flexible library that is commonly used for data analysis and manipulation while using datasets in Python.

  • Step 2 − Now read the zomato.tsv file as the dataset given here will be used for loading it into the dataframe.

  • Step 3 − Make a dataframe dff1 and use the read_table function from pandas for reading the TSV file.

  • Step 4 − Use the delimiter='\t' and the path to the zomato.tsv . Print some rows and columns from this dataframe by using the head function.

  • Step 5 − Make a dataframe dff2 and use the read_table function again but select the index column this time.

  • Step 6 − Make a dataframe dff3 and use the read_table function again but print the rows after skipping some rows.

Saving the data file / CSV file required for data analysis

For these examples, we will use the data available on Kaggle. Login to Kaggle and download the CSV file from this link: https://www.kaggle.com/datasets/shrutimehta/zomato-restaurants-data

The dataset is available as a CSV file.

Converting the CSV file to TSV file as TSV file is required for the examples

Use the following online converter to convert the CSV file into TSV format.https://products.groupdocs.app/conversion/csv-to-tsv

Upload the CSV file, convert, and download the TSV file. Now use this zomato.tsv file for the following examples.

Load a TSV file into a Pandas DataFrame - using read_table function with delimiter='\t.

Write the following code in the Python file

import pandas as pdd
dff1 = pdd.read_table("C:/Users/saba2/Desktop/article/articles_py/tsv/zomato.tsv",delimiter='\t')
print(dff1.head())
dff2 = pdd.read_table('C:/Users/saba2/Desktop/article/articles_py/tsv/zomato.tsv', delimiter='\t',index_col=1)
print(dff2.head())
dff3 = pdd.read_table('C:/Users/saba2/Desktop/article/articles_py/tsv/zomato.tsv', delimiter='\t',skiprows=range(3,6))
print(dff3.head())

Output

Run the python file in the Command window

Fig 1: Showing the results using cmd window.

Example 2: Load a TSV file into a Pandas DataFrame - using read_csv function with sep='\t'

Design Steps and Coding

  • Step 1 − Login using your Google account. Go to Google Colab. Open a new Colab Notebook and write the python code in it.

  • Step 2 − Upload the zomato.tsv file that was converted to tsv from a csv file downloaded from Kaggle. The dataset given here will be used for loading it into a dataframe.

  • Step 3 − Now import pandas. Pandas is an open source, easy to use and flexible library that is commonly used for data analysis and manipulation while using datasets in Python.

  • Step 4 − Make a dataframe dff and use the read_csv function from pandas for reading the TSV file.

  • Step 5 − Use the sep='\t' and the name of the file 'zomato.tsv'. Print some rows and columns from this dataframe by using the head function.

  • Step 6 − Print the shape of the dataframe. It will show how many rows and columns are available in the dataset.

  • Step 7 − And then state a few column names to depict and don't print all the columns this time.

  • Step 8 − Execute the program after clicking on the playbutton presented on the given code cell. Check the result as it will be displayed in the colab notebook.

Uploading the data, tsv file

#Uploading the tsv
from google.colab import files
data_to_load = files.upload()

Code for Reading the TSV file and loading the specified columns in the dataframe

#import the required Library
import pandas as pdd
#Select all columns 
#dff = pdd.read_csv("zomato.tsv",sep="\t")
#Select specified columns 
dff = pdd.read_csv("zomato.tsv",sep="\t", usecols = ['Restaurant ID','Restaurant Name','City'])
#print the dataframe header and some rows
dff.head()

Output

   Restaurant ID          Restaurant Name                City
0        6317637         Le Petit Souffle         Makati City 
1        6304287         Izakaya Kikufuji         Makati City
2        6300002   Heat - Edsa Shangri-La    Mandaluyong City
3        6318506                     Ooma    Mandaluyong City
4        6314302              Sambo Kojin    Mandaluyong City

Conclusion

In this python article, two distinct examples are illustrated to show how to load a TSV into a dataframe, are given. First, the method is given where the CSV format dataset from Kaggle is downloaded, then it is converted into a TSV format file and saved. This TSV file would be loaded into a dataframe using the pandas read_table function in the program. In the second example, Google Colab is used for writing the Python program, and the same TSV format dataset is used with Pandas read_csv function to load the data into the dataframe.

Updated on: 11-May-2023

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements