- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to add header row to a Pandas Dataframe?
Pandas is a super popular data handling and manipulation library in Python which is frequently used in data analysis and data pre-processing. The Pandas library features a powerful data structure called the Pandas dataframe, which is used to store any kind of two-dimensional data. In this article we will learn about various ways to add a header row (or simply column names) to a Pandas dataframe.
NOTE − The code in this article was tested on a jupyter notebook.
We will see how to add header rows in 5 different ways −
Adding header rows when creating a dataframe with a dictionary
Adding header rows when creating a dataframe with a list of lists
Adding header rows after creating the dataframe
Adding header rows when reading files from a CSV
Adding header rows using set_axis method
Let’s begin by importing Pandas
import pandas as pd
Method 1: When creating a dataframe with a dictionary
Example
# Add header row while creating the dataframe through a dictionary data = {'course': ['Math', 'English', 'History', 'Science', 'Physics'], 'instructor': ['John Smith', 'Sarah Johnson', 'Mike Brown', 'Karen Lee', 'David Kim'], 'batch_size': [43, 25, 19, 51, 48] } df1 = pd.DataFrame(data) df1
Output
course instructor batch_size 0 Math John Smith 43 1 English Sarah Johnson 25 2 History Mike Brown 19 3 Science Karen Lee 51 4 Physics David Kim 48
In the code above we initialize dummy data for our dataframe through a dictionary. The key-value pair represents the column name and the column data respectively. Pandas automatically reads this dictionary and is able to generate the columns along with the header rows.
Method 2: When creating a dataframe with list of lists
Example
# Add header row while creating the dataframe through lists data = [['apple', 'red', 5], ['banana', 'yellow', 12]] columns = ['fruit', 'color', 'quantity'] df2 = pd.DataFrame(data, columns=columns) df2
Output
fruit color quantity 0 apple red 5 1 banana yellow 12
In this method, we have a list of lists where each sub-list stores the information for the rows of the dataframe. We make a list of column names and pass it to the pd.DataFrame method while initializing the dataframe.
Method 3: After creating the dataframe
Example
# Add header row after creating the dataframe data = [['apple', 'red', 5], ['banana', 'yellow', 12]] columns = ['fruit', 'color', 'quantity'] df3 = pd.DataFrame(data) df3.columns = columns df3
Output
fruit color quantity 0 apple red 5 1 banana yellow 12
In the code above we first initialize a dataframe without any header rows. Then we initialize a list of column names we want to use and use the pd.DataFrame.columns attribute to set the header rows of the already defined Pandas dataframe.
Method 4: When reading files from a CSV file
Example
When trying to read a CSV file using Pandas, it automatically considers the first row as the column names. However it is likely there is no column name present in our dataset as shown in the example below. Let’s assume the dataset is stored as ‘course_data.csv’.
# Incorrect header row df4 = pd.read_csv('course_data.csv') df4
Output
Math John Smith 43 0 English Sarah Johnson 25 1 History Mike Brown 19 2 Science Karen Lee 51 3 Physics David Kim 48
The output shows that Pandas is interpreting a data sample as the header row. To tackle this, we will specify the column names by passing a list of header row names through the ‘names’ argument.
Example
# Add header row while reading files from CSV columns = ['course', 'instructor', 'batch_size'] df4 = pd.read_csv('course_data.csv', names=columns) df4
Output
course instructor batch_size 0 Math John Smith 43 1 English Sarah Johnson 25 2 History Mike Brown 19 3 Science Karen Lee 51 4 Physics David Kim 48
As shown in the output above, Pandas is no longer reading the first data sample as a header row!
Method 5: Using set_axis method
Example
We already saw how to add header rows to an existing dataframe in Method 2. Now we will achieve the same using the pd.DataFrame.set_axis method.
# Add row row after creating the dataframe using set_axis data = [['dog', 'brown', 4], ['cat', 'white', 4], ['chicken', 'white', 2]] df5 = pd.DataFrame(data) columns = ['animal', 'color', 'num_legs'] df5.set_axis(columns, axis=1, inplace=True) df5
Output
animal color num_legs 0 dog brown 4 1 cat white 4 2 chicken white 2
Here first we initialize a dataframe without any header rows using the data above. Then we use the set_axis method to add the header rows. We pass axis=1 to specify that we are setting the column names. We also set the flag, ‘inplace’ to be True to do in-place.
NOTE − Setting axis = 0 would set row-names instead of column-names and may also throw errors since there are usually more rows than columns.
Conclusion
This article taught us how to add headers to dataframes in Pandas. We saw 5 different ways to do so which can be used in various different applications and projects.