Article Categories

Selected Reading

How to Collapse Multiple Columns in Python Pandas?

Python Pandas Server Side Programming Programming

Pandas is a popular data manipulation library in Python that is widely used for working with structured data. One common task when working with data is to clean and transform it in order to prepare it for analysis.

Sometimes, the data might contain multiple columns that have similar information or are related to each other. In such cases, it might be useful to collapse these columns into a single column for easier analysis or visualization.

Pandas provides several methods to collapse multiple columns into a single column. In this tutorial, we will explore the two most common methods: melt() and stack().

Using the melt() Method

The melt() method in Pandas is a powerful tool for transforming wide data into long data. It unpivots a DataFrame from wide format to long format by collapsing multiple columns into key-value pairs.

Syntax

DataFrame.melt(id_vars=None, value_vars=None, var_name=None, value_name='value')

Parameters

id_vars Columns to keep as identifier variables
value_vars Columns to collapse into a single column
var_name Name for the new variable column
value_name Name for the new value column

Example

Let's create a DataFrame with income data across different years and collapse the income columns ?

import pandas as pd

# Create sample data
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'income_2019': [50000, 60000, 70000],
    'income_2020': [55000, 65000, 75000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Melt the income columns
melted_df = df.melt(
    id_vars=['name', 'age'],
    value_vars=['income_2019', 'income_2020'],
    var_name='year',
    value_name='income'
)

print("\nAfter melting income columns:")
print(melted_df)

Original DataFrame:
      name  age  income_2019  income_2020
0    Alice   25        50000        55000
1      Bob   30        60000        65000
2  Charlie   35        70000        75000

After melting income columns:
      name  age         year  income
0    Alice   25  income_2019   50000
1      Bob   30  income_2019   60000
2  Charlie   35  income_2019   70000
3    Alice   25  income_2020   55000
4      Bob   30  income_2020   65000
5  Charlie   35  income_2020   75000

Using the stack() Method

The stack() method is another way to collapse multiple columns. It pivots a level of column labels to create a hierarchical index and returns a Series or DataFrame with a MultiIndex.

Example

Let's demonstrate stack() with a DataFrame that has hierarchical column structure ?

import pandas as pd

# Create DataFrame with hierarchical columns
data = {
    ('Sales', 'Q1'): [100, 120, 90],
    ('Sales', 'Q2'): [110, 130, 95],
    ('Profit', 'Q1'): [20, 25, 18],
    ('Profit', 'Q2'): [22, 28, 19]
}

df = pd.DataFrame(data, index=['Store_A', 'Store_B', 'Store_C'])
df.columns = pd.MultiIndex.from_tuples(df.columns)
print("Original DataFrame with MultiIndex columns:")
print(df)

# Stack the columns
stacked_df = df.stack()
print("\nAfter stacking:")
print(stacked_df)

Original DataFrame with MultiIndex columns:
        Sales     Profit    
           Q1  Q2     Q1  Q2
Store_A   100 110     20  22
Store_B   120 130     25  28
Store_C    90  95     18  19

After stacking:
              Profit  Sales
Store_A Q1        20    100
        Q2        22    110
Store_B Q1        25    120
        Q2        28    130
Store_C Q1        18     90
        Q2        19     95

Simple stack() Example

Here's a simpler example using stack() with regular columns ?

import pandas as pd

# Create simple DataFrame
data = {
    'Product_A': [10, 20, 30],
    'Product_B': [15, 25, 35],
    'Product_C': [12, 22, 32]
}

df = pd.DataFrame(data, index=['Jan', 'Feb', 'Mar'])
print("Original DataFrame:")
print(df)

# Stack the columns
stacked = df.stack()
print("\nAfter stacking:")
print(stacked)

# Convert back to DataFrame
stacked_df = stacked.reset_index()
stacked_df.columns = ['Month', 'Product', 'Sales']
print("\nAs DataFrame:")
print(stacked_df)

Original DataFrame:
     Product_A  Product_B  Product_C
Jan         10         15         12
Feb         20         25         22
Mar         30         35         32

After stacking:
Jan  Product_A    10
     Product_B    15
     Product_C    12
Feb  Product_A    20
     Product_B    25
     Product_C    22
Mar  Product_A    30
     Product_B    35
     Product_C    32
dtype: int64

As DataFrame:
  Month    Product  Sales
0   Jan  Product_A     10
1   Jan  Product_B     15
2   Jan  Product_C     12
3   Feb  Product_A     20
4   Feb  Product_B     25
5   Feb  Product_C     22
6   Mar  Product_A     30
7   Mar  Product_B     35
8   Mar  Product_C     32

Comparison

Method	Best For	Output Type	Control
`melt()`	Wide to long transformation	DataFrame	More control over column names
`stack()`	Hierarchical data structures	Series/DataFrame	Preserves MultiIndex structure

Conclusion

Use melt() for transforming wide data to long format with full control over column naming. Use stack() when working with hierarchical data or when you need to preserve MultiIndex structures. Both methods are essential tools for data reshaping in pandas.

Mukul Latiyan

Updated on: 2026-03-27T14:35:18+05:30

3K+ Views

Previous Next

Article Categories

How to Collapse Multiple Columns in Python Pandas?

Using the melt() Method

Syntax

Parameters

Example

Using the stack() Method

Example

Simple stack() Example

Comparison

Conclusion

Learn More in Our Tutorials