How to add header row to a Pandas Dataframe?


Pandas is a super popular data handling and manipulation library in Python which is frequently used in data analysis and data pre-processing. The Pandas library features a powerful data structure called the Pandas dataframe, which is used to store any kind of two-dimensional data. In this article we will learn about various ways to add a header row (or simply column names) to a Pandas dataframe.

NOTE − The code in this article was tested on a jupyter notebook.

We will see how to add header rows in 5 different ways −

  • Adding header rows when creating a dataframe with a dictionary

  • Adding header rows when creating a dataframe with a list of lists

  • Adding header rows after creating the dataframe

  • Adding header rows when reading files from a CSV

  • Adding header rows using set_axis method

Let’s begin by importing Pandas

import pandas as pd

Method 1: When creating a dataframe with a dictionary

Example

# Add header row while creating the dataframe through a dictionary
data = {'course': ['Math', 'English', 'History', 'Science', 'Physics'], 'instructor': ['John Smith', 'Sarah Johnson', 'Mike Brown', 'Karen Lee', 'David Kim'], 'batch_size': [43, 25, 19, 51, 48]
}
df1 = pd.DataFrame(data)
df1

Output

  course      instructor    batch_size
0 Math        John Smith       43
1 English  Sarah Johnson       25
2 History     Mike Brown       19
3 Science      Karen Lee       51
4 Physics      David Kim       48

In the code above we initialize dummy data for our dataframe through a dictionary. The key-value pair represents the column name and the column data respectively. Pandas automatically reads this dictionary and is able to generate the columns along with the header rows.

Method 2: When creating a dataframe with list of lists

Example

# Add header row while creating the dataframe through lists
data = [['apple', 'red', 5], ['banana', 'yellow', 12]]
columns = ['fruit', 'color', 'quantity']
df2 = pd.DataFrame(data, columns=columns)
df2

Output

   fruit   color  quantity
0 apple      red     5
1 banana  yellow    12

In this method, we have a list of lists where each sub-list stores the information for the rows of the dataframe. We make a list of column names and pass it to the pd.DataFrame method while initializing the dataframe.

Method 3: After creating the dataframe

Example

# Add header row after creating the dataframe
data = [['apple', 'red', 5], ['banana', 'yellow', 12]]
columns = ['fruit', 'color', 'quantity']
df3 = pd.DataFrame(data)
df3.columns = columns
df3

Output

fruit      color  quantity
0 apple      red    5
1 banana  yellow   12

In the code above we first initialize a dataframe without any header rows. Then we initialize a list of column names we want to use and use the pd.DataFrame.columns attribute to set the header rows of the already defined Pandas dataframe.

Method 4: When reading files from a CSV file

Example

When trying to read a CSV file using Pandas, it automatically considers the first row as the column names. However it is likely there is no column name present in our dataset as shown in the example below. Let’s assume the dataset is stored as ‘course_data.csv’.

# Incorrect header row
df4 = pd.read_csv('course_data.csv')
df4

Output

   Math        John Smith     43
0 English   Sarah Johnson     25
1 History      Mike Brown     19
2 Science       Karen Lee     51
3 Physics       David Kim     48

The output shows that Pandas is interpreting a data sample as the header row. To tackle this, we will specify the column names by passing a list of header row names through the ‘names’ argument.

Example

# Add header row while reading files from CSV
columns = ['course', 'instructor', 'batch_size']
df4 = pd.read_csv('course_data.csv', names=columns)
df4

Output

  course      instructor    batch_size
0 Math         John Smith      43
1 English   Sarah Johnson      25
2 History      Mike Brown      19
3 Science       Karen Lee      51
4 Physics       David Kim      48

As shown in the output above, Pandas is no longer reading the first data sample as a header row!

Method 5: Using set_axis method

Example

We already saw how to add header rows to an existing dataframe in Method 2. Now we will achieve the same using the pd.DataFrame.set_axis method.

# Add row row after creating the dataframe using set_axis
data = [['dog', 'brown', 4],
['cat', 'white', 4],
['chicken', 'white', 2]]
df5 = pd.DataFrame(data)
columns = ['animal', 'color', 'num_legs']
df5.set_axis(columns, axis=1, inplace=True)
df5

Output

   animal   color    num_legs
0   dog     brown     4
1   cat     white     4
2  chicken  white     2

Here first we initialize a dataframe without any header rows using the data above. Then we use the set_axis method to add the header rows. We pass axis=1 to specify that we are setting the column names. We also set the flag, ‘inplace’ to be True to do in-place.

NOTE − Setting axis = 0 would set row-names instead of column-names and may also throw errors since there are usually more rows than columns.

Conclusion

This article taught us how to add headers to dataframes in Pandas. We saw 5 different ways to do so which can be used in various different applications and projects.

Updated on: 23-Mar-2023

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements