Hierarchical Data in Pandas


Hierarchical data is often used to represent multiple levels of nested groups or categories. For example, a company may have a hierarchy of employees, departments, and locations. A product may have a hierarchy of categories and subcategories. One of the challenges of working with hierarchical data is how to represent it in a tabular format which can make it easy to manipulate and analyze. In this article, we are going to present hierarchical data using Pandas' built-in methods like 'set_index()' and 'groupby()'.

Python Program to Represent Hierarchical Data using Pandas

First, let's briefly discuss Pandas and its built-in methods mentioned in the previous section:

Pandas

It is an open-source Python library that is mainly used for data analysis and manipulation. It can handle both relational and labeled data by performing various operations on specified data, such as cleaning, filtering, grouping, aggregating and merging. This feature makes it a perfect choice for representing hierarchical data.

To work with Pandas, we need to import it into our code using the following command:

import pandas as pd

Here, 'pd' is the reference name used for our convenience.

set_index()

It is used to set the index of a given dataframe using one or more columns. We will use this method in our program to represent the specified hierarchical dataframe with MultiIndex. It is used with the name of dataframe.

Syntax

nameOfDataframe.set_index(nameOfKeys, inplace = True)

Parameters

nameOfKeys specifies the column name.

inplace specifies whether to modify the original dataframe or not. Its default value is false and when it sets to True the original dataframe gets modified permanently.

groupby()

This method is used to split the dataframe based on the specified criteria. It provides a way to handle hierarchical data by dividing it into distinct groups based on a particular column's values. It is also used with the name of dataframe.

Syntax

nameOfDataframe.groupby(nameOfColumn) 

Example 1

The following example demonstrates how to create a hierarchical DataFrame using a MultiIndex in Pandas.

Approach

  • First, import the pandas library.

  • Then, create a dictionary called 'data' that contains four keys: 'Category', 'Item', 'Price' and 'Quantity'. Each key has a list as its corresponding value.

  • Create a DataFrame 'df' from 'data' dictionary, where each key and value will become rows and columns.

  • Now, set the columns 'Category' and 'Item' as the index of the DataFrame to create a hierarchical index. Also, set 'in-place' to true which means changes are made directly to the 'df' object.

  • Finally, print the DataFrame to display the hierarchical data and exit.

import pandas as pd
# Creating a user-defined hierarchical DataFrame
data = {
   'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
   'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
   'Price': [1.0, 0.8, 0.5, 0.7],
   'Quantity': [10, 15, 8, 12]
}
df = pd.DataFrame(data)
# redefining the dataframe based on 'Category' and 'Item'
df.set_index(['Category', 'Item'], inplace = True)
# to show the hierarchical data
print(df)

Output

Category  Item       Price  Quantity             
Fruit     Apple       1.0        10
          Orange      0.8        15
Vegetable Carrot      0.5         8
          Broccoli    0.7        12

Example 2

In the following example, we will demonstrate the use of 'groupby()' method in Pandas to group data based on a specific column. We will use the same code used in previous example with slight changes. Here, we will group the data based on the unique values in the 'Category' column. It will form separate groups for each unique category.

import pandas as pd
# Creating a user-defined hierarchical DataFrame
data = {
   'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
   'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
   'Price': [1.0, 0.8, 0.5, 0.7],
   'Quantity': [10, 15, 8, 12]
}
df = pd.DataFrame(data)
# redefining the dataframe by grouping based on 'Category'
grouped = df.groupby('Category')
# to display the hierarchical data
for name, group in grouped:
   print(f"Category: {name}") # to represent name of the category 
   print(group) # to print each group
   print()

Output

Category: Fruit
  Category    Item  Price  Quantity
0    Fruit   Apple    1.0        10
1    Fruit  Orange    0.8        15
Category: Vegetable
    Category      Item  Price  Quantity
2  Vegetable    Carrot    0.5         8
3  Vegetable  Broccoli    0.7        12

Example 3

This is another example where we again change the code of second example. We will use the groupby() method in Pandas to group hierarchical data and apply aggregation functions to the grouped data. The agg() function takes a dictionary as an argument, where the keys are the columns we want to aggregate, and the values are the aggregation functions we want to apply to those columns. The result will be stored in a new DataFrame called 'grouped'.

import pandas as pd
# Creating a user-defined hierarchical DataFrame
data = {
   'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
   'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
   'Price': [1.0, 0.8, 0.5, 0.7],
   'Quantity': [10, 15, 8, 12]
}
df = pd.DataFrame(data)
# redefining the dataframe based on 'Category' and 'Item'
grouped = df.groupby(['Category', 'Item']).agg({'Price': 'sum', 'Quantity': 'sum'})
# to show the dataframe as hierarchical data
print(grouped)

Output

Category  Item       Price  Quantity              
Fruit     Apple       1.0        10
          Orange      0.8        15
Vegetable Broccoli    0.7        12
          Carrot      0.5         8

Conclusion

In this article, we have learned a few built-in methods like 'set_index()' and 'groupby()' of Pandas. These methods allow us to easily represent, manipulate, and analyze hierarchical data. The set_index() method uses the concept of Multiindex to present hierarchical data, while groupby() splits the dataframe to present.

Updated on: 25-Jul-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements