- Trending Categories
- Data Structure
- Operating System
- MS Excel
- C Programming
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to load a TSV file into a Pandas Dataframe?
Sometimes, the task is to analyze a dataset and use the data from a TSV (Tab Separated Values) file. For this, the TSV file is sometimes converted to a dataframe. A dataframe is a labeled two-dimensional structure that has different types of columns. In this article, using two different examples, this Python library called pandas is used with Python code to read a TSV file and load it into a dataframe. For these examples, a Zomato dataset given on Kaggle is used. The Kaggle dataset was available in CSV (Comma Separated Values) format so it was first downloaded and then it was converted into TSV format using an online software link. In the first example, the Python installed in the computer system is used to run a Python program that is written for reading the TSV file into a dataframe by using a pandas function. In another example, using the Google Colab the method is shown where without having Python installed in the computer, Python and pandas can still be used and a TVS file is read into a dataframe after using another function.
The Zomato.TSV file used
Fig; This tsv file contains 9551 rows and 21 columns.
Example 1: Load a TSV file into a Pandas DataFrame - using read_table function with delimiter='\t'
Design Steps and Coding
Step 1 − First import pandas. Pandas is an open-source, easy-to-use, and flexible library that is commonly used for data analysis and manipulation while using datasets in Python.
Step 2 − Now read the zomato.tsv file as the dataset given here will be used for loading it into the dataframe.
Step 3 − Make a dataframe dff1 and use the read_table function from pandas for reading the TSV file.
Step 4 − Use the delimiter='\t' and the path to the zomato.tsv . Print some rows and columns from this dataframe by using the head function.
Step 5 − Make a dataframe dff2 and use the read_table function again but select the index column this time.
Step 6 − Make a dataframe dff3 and use the read_table function again but print the rows after skipping some rows.
Saving the data file / CSV file required for data analysis
For these examples, we will use the data available on Kaggle. Login to Kaggle and download the CSV file from this link: https://www.kaggle.com/datasets/shrutimehta/zomato-restaurants-data
The dataset is available as a CSV file.
Converting the CSV file to TSV file as TSV file is required for the examples
Use the following online converter to convert the CSV file into TSV format.https://products.groupdocs.app/conversion/csv-to-tsv
Upload the CSV file, convert, and download the TSV file. Now use this zomato.tsv file for the following examples.
Load a TSV file into a Pandas DataFrame - using read_table function with delimiter='\t.
Write the following code in the Python file
import pandas as pdd dff1 = pdd.read_table("C:/Users/saba2/Desktop/article/articles_py/tsv/zomato.tsv",delimiter='\t') print(dff1.head()) dff2 = pdd.read_table('C:/Users/saba2/Desktop/article/articles_py/tsv/zomato.tsv', delimiter='\t',index_col=1) print(dff2.head()) dff3 = pdd.read_table('C:/Users/saba2/Desktop/article/articles_py/tsv/zomato.tsv', delimiter='\t',skiprows=range(3,6)) print(dff3.head())
Run the python file in the Command window
Fig 1: Showing the results using cmd window.
Example 2: Load a TSV file into a Pandas DataFrame - using read_csv function with sep='\t'
Design Steps and Coding
Step 1 − Login using your Google account. Go to Google Colab. Open a new Colab Notebook and write the python code in it.
Step 2 − Upload the zomato.tsv file that was converted to tsv from a csv file downloaded from Kaggle. The dataset given here will be used for loading it into a dataframe.
Step 3 − Now import pandas. Pandas is an open source, easy to use and flexible library that is commonly used for data analysis and manipulation while using datasets in Python.
Step 4 − Make a dataframe dff and use the read_csv function from pandas for reading the TSV file.
Step 5 − Use the sep='\t' and the name of the file 'zomato.tsv'. Print some rows and columns from this dataframe by using the head function.
Step 6 − Print the shape of the dataframe. It will show how many rows and columns are available in the dataset.
Step 7 − And then state a few column names to depict and don't print all the columns this time.
Step 8 − Execute the program after clicking on the playbutton presented on the given code cell. Check the result as it will be displayed in the colab notebook.
Uploading the data, tsv file
#Uploading the tsv from google.colab import files data_to_load = files.upload()
Code for Reading the TSV file and loading the specified columns in the dataframe
#import the required Library import pandas as pdd #Select all columns #dff = pdd.read_csv("zomato.tsv",sep="\t") #Select specified columns dff = pdd.read_csv("zomato.tsv",sep="\t", usecols = ['Restaurant ID','Restaurant Name','City']) #print the dataframe header and some rows dff.head()
Restaurant ID Restaurant Name City 0 6317637 Le Petit Souffle Makati City 1 6304287 Izakaya Kikufuji Makati City 2 6300002 Heat - Edsa Shangri-La Mandaluyong City 3 6318506 Ooma Mandaluyong City 4 6314302 Sambo Kojin Mandaluyong City
In this python article, two distinct examples are illustrated to show how to load a TSV into a dataframe, are given. First, the method is given where the CSV format dataset from Kaggle is downloaded, then it is converted into a TSV format file and saved. This TSV file would be loaded into a dataframe using the pandas read_table function in the program. In the second example, Google Colab is used for writing the Python program, and the same TSV format dataset is used with Pandas read_csv function to load the data into the dataframe.
Kickstart Your Career
Get certified by completing the courseGet Started