Write a Python code to find a cross tabulation of two dataframes

Cross-tabulation (crosstab) creates a frequency table showing relationships between categorical variables from different DataFrames. Pandas provides the pd.crosstab() function to compute cross-tabulations between two or more factors.

Creating Sample DataFrames

Let's start by creating two DataFrames with related data ?

import pandas as pd

# First DataFrame with Id and Age
df = pd.DataFrame({'Id': [1, 2, 3, 4, 5], 'Age': [12, 13, 12, 13, 14]})
print("DataFrame 1:")
print(df)

# Second DataFrame with Mark
df1 = pd.DataFrame({'Mark': [80, 90, 80, 90, 85]})
print("\nDataFrame 2:")
print(df1)
DataFrame 1:
   Id  Age
0   1   12
1   2   13
2   3   12
3   4   13
4   5   14

DataFrame 2:
   Mark
0    80
1    90
2    80
3    90
4    85

Creating Cross-Tabulation

Use pd.crosstab() with the index parameter for row labels and columns parameter for column labels ?

import pandas as pd

df = pd.DataFrame({'Id': [1, 2, 3, 4, 5], 'Age': [12, 13, 12, 13, 14]})
df1 = pd.DataFrame({'Mark': [80, 90, 80, 90, 85]})

# Create cross-tabulation
crosstab_result = pd.crosstab(index=df['Id'], columns=[df['Age'], df1['Mark']])
print(crosstab_result)
Age   12  13  14
Mark  80  90  85
Id              
1      1   0   0
2      0   1   0
3      1   0   0
4      0   1   0
5      0   0   1

Understanding the Output

The cross-tabulation shows the frequency count of each Id against the combination of Age and Mark. Each cell contains either 0 (no occurrence) or 1 (one occurrence) since each Id appears only once.

Adding Row and Column Totals

import pandas as pd

df = pd.DataFrame({'Id': [1, 2, 3, 4, 5], 'Age': [12, 13, 12, 13, 14]})
df1 = pd.DataFrame({'Mark': [80, 90, 80, 90, 85]})

# Cross-tabulation with margins (totals)
crosstab_with_totals = pd.crosstab(index=df['Id'], 
                                   columns=[df['Age'], df1['Mark']], 
                                   margins=True)
print(crosstab_with_totals)
Age     12  13  14  All
Mark    80  90  85     
Id                     
1        1   0   0    1
2        0   1   0    1
3        1   0   0    1
4        0   1   0    1
5        0   0   1    1
All      2   2   1    5

Key Parameters

Parameter Description Example
index Values to group by in rows df['Id']
columns Values to group by in columns [df['Age'], df1['Mark']]
margins Add row/column totals margins=True

Conclusion

Cross-tabulation with pd.crosstab() helps analyze relationships between categorical variables from different DataFrames. Use the margins=True parameter to include totals for better insights.

Updated on: 2026-03-25T16:26:31+05:30

563 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements