Hierarchical Data in Pandas

Hierarchical data represents multiple levels of nested groups or categories, such as company departments with employees, or products with categories and subcategories. Pandas provides powerful tools like MultiIndex, set_index(), and groupby() to effectively represent and analyze hierarchical data structures.

Understanding MultiIndex in Pandas

A MultiIndex creates a hierarchical index structure with multiple levels, allowing you to organize data in a tree-like format within a DataFrame.

Creating Hierarchical Data with set_index()

The set_index() method converts regular columns into a hierarchical index ?

import pandas as pd

# Creating sample hierarchical data
data = {
    'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
    'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
    'Price': [1.0, 0.8, 0.5, 0.7],
    'Quantity': [10, 15, 8, 12]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
   Category      Item  Price  Quantity
0     Fruit     Apple    1.0        10
1     Fruit    Orange    0.8        15
2  Vegetable    Carrot    0.5         8
3  Vegetable  Broccoli    0.7        12

Now let's create a hierarchical index using multiple columns ?

import pandas as pd

data = {
    'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
    'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
    'Price': [1.0, 0.8, 0.5, 0.7],
    'Quantity': [10, 15, 8, 12]
}

df = pd.DataFrame(data)
hierarchical_df = df.set_index(['Category', 'Item'])
print(hierarchical_df)
                    Price  Quantity
Category  Item                    
Fruit     Apple       1.0        10
          Orange      0.8        15
Vegetable Carrot      0.5         8
          Broccoli    0.7        12

Grouping Hierarchical Data

The groupby() method splits data into groups based on specified criteria, perfect for analyzing hierarchical structures ?

import pandas as pd

data = {
    'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
    'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
    'Price': [1.0, 0.8, 0.5, 0.7],
    'Quantity': [10, 15, 8, 12]
}

df = pd.DataFrame(data)
grouped = df.groupby('Category')

for name, group in grouped:
    print(f"Category: {name}")
    print(group)
    print()
Category: Fruit
  Category    Item  Price  Quantity
0    Fruit   Apple    1.0        10
1    Fruit  Orange    0.8        15

Category: Vegetable
    Category      Item  Price  Quantity
2  Vegetable    Carrot    0.5         8
3  Vegetable  Broccoli    0.7        12

Aggregating Hierarchical Data

Combine groupby() with aggregation functions to summarize hierarchical data ?

import pandas as pd

data = {
    'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
    'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
    'Price': [1.0, 0.8, 0.5, 0.7],
    'Quantity': [10, 15, 8, 12]
}

df = pd.DataFrame(data)

# Group by Category and calculate statistics
category_stats = df.groupby('Category').agg({
    'Price': ['mean', 'sum'],
    'Quantity': ['mean', 'sum']
})

print("Category Statistics:")
print(category_stats)
Category Statistics:
          Price      Quantity     
           mean  sum     mean  sum
Category                         
Fruit       0.9  1.8     12.5   25
Vegetable   0.6  1.2     10.0   20

Working with MultiIndex Levels

Access different levels of hierarchical data using level-specific operations ?

import pandas as pd

data = {
    'Category': ['Fruit', 'Fruit', 'Vegetable', 'Vegetable'],
    'Item': ['Apple', 'Orange', 'Carrot', 'Broccoli'],
    'Price': [1.0, 0.8, 0.5, 0.7],
    'Quantity': [10, 15, 8, 12]
}

df = pd.DataFrame(data)
hierarchical_df = df.set_index(['Category', 'Item'])

# Access specific category
print("Fruit category data:")
print(hierarchical_df.loc['Fruit'])
print()

# Access specific item
print("Apple data:")
print(hierarchical_df.loc[('Fruit', 'Apple')])
Fruit category data:
        Price  Quantity
Item                  
Apple     1.0        10
Orange    0.8        15

Apple data:
Price       1.0
Quantity   10.0
Name: (Fruit, Apple), dtype: float64

Comparison of Methods

Method Purpose Output Type Best For
set_index() Create MultiIndex DataFrame with hierarchical index Organizing data structure
groupby() Split data into groups GroupBy object Analysis and aggregation
agg() Apply aggregation functions Summarized DataFrame Statistical summaries

Conclusion

Pandas provides excellent support for hierarchical data through MultiIndex, set_index(), and groupby() methods. Use set_index() to create hierarchical structures and groupby() with aggregation functions for analyzing nested data efficiently.

Updated on: 2026-03-27T09:18:20+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements