How to disallow duplicate labels in a Pandas DataFrame?


By default, Pandas allows duplicate labels in a DataFrame. However, it can cause issues because some Pandas methods won't work if there are duplicates in a DataFrame. In this article, we will see how to prevent Pandas from allowing duplicate labels or catch them at the compile time.

Example

Take a look at the following code. We have a DataFrame with two columns having the same column name "Name". Still, Pandas will produce the required output without any issues.

import pandas as pd

df = pd.DataFrame(
   [
      ['John', 89, 'Maths'],
      ['Jacob', 23, 'Physics'],
      ['Tom', 100, 'Chemistry']],
   columns=['Name', 'Name', 'Subjects'])

print("Input DataFrame is:\n", df)

Output

It will produce the following output −

Input DataFrame is:
    Name  Name   Subjects
0   John    89      Maths
1  Jacob    23    Physics
2    Tom   100  Chemistry

Now, let's see how to prevent Pandas from allowing duplicate labels.

We can use .set_flags(allows_duplicate_labels=False) which will automatically detect the duplicate labels in the DataFrame and throw an error.

Now, run the same code with allows_duplicate_labels set to False and it will produce errors −

import pandas as pd

df = pd.DataFrame(
   [
      ['John', 89, 'Maths'],
      ['Jacob', 23, 'Physics'],
      ['Tom', 100, 'Chemistry']],
   columns=['Name', 'Name', 'Subjects']
).set_flags(allows_duplicate_labels=False)

print("Input DataFrame is:\n", df)

Now, it will catch the duplicate labels and produce the following error −

pandas.errors.DuplicateLabelError: Index has duplicates.
label positions         
Name     [0, 1]

Updated on: 05-Sep-2023

57 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements