Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Write a Python code to find a cross tabulation of two dataframes
Cross-tabulation (crosstab) creates a frequency table showing relationships between categorical variables from different DataFrames. Pandas provides the pd.crosstab() function to compute cross-tabulations between two or more factors.
Creating Sample DataFrames
Let's start by creating two DataFrames with related data ?
import pandas as pd
# First DataFrame with Id and Age
df = pd.DataFrame({'Id': [1, 2, 3, 4, 5], 'Age': [12, 13, 12, 13, 14]})
print("DataFrame 1:")
print(df)
# Second DataFrame with Mark
df1 = pd.DataFrame({'Mark': [80, 90, 80, 90, 85]})
print("\nDataFrame 2:")
print(df1)
DataFrame 1: Id Age 0 1 12 1 2 13 2 3 12 3 4 13 4 5 14 DataFrame 2: Mark 0 80 1 90 2 80 3 90 4 85
Creating Cross-Tabulation
Use pd.crosstab() with the index parameter for row labels and columns parameter for column labels ?
import pandas as pd
df = pd.DataFrame({'Id': [1, 2, 3, 4, 5], 'Age': [12, 13, 12, 13, 14]})
df1 = pd.DataFrame({'Mark': [80, 90, 80, 90, 85]})
# Create cross-tabulation
crosstab_result = pd.crosstab(index=df['Id'], columns=[df['Age'], df1['Mark']])
print(crosstab_result)
Age 12 13 14 Mark 80 90 85 Id 1 1 0 0 2 0 1 0 3 1 0 0 4 0 1 0 5 0 0 1
Understanding the Output
The cross-tabulation shows the frequency count of each Id against the combination of Age and Mark. Each cell contains either 0 (no occurrence) or 1 (one occurrence) since each Id appears only once.
Adding Row and Column Totals
import pandas as pd
df = pd.DataFrame({'Id': [1, 2, 3, 4, 5], 'Age': [12, 13, 12, 13, 14]})
df1 = pd.DataFrame({'Mark': [80, 90, 80, 90, 85]})
# Cross-tabulation with margins (totals)
crosstab_with_totals = pd.crosstab(index=df['Id'],
columns=[df['Age'], df1['Mark']],
margins=True)
print(crosstab_with_totals)
Age 12 13 14 All Mark 80 90 85 Id 1 1 0 0 1 2 0 1 0 1 3 1 0 0 1 4 0 1 0 1 5 0 0 1 1 All 2 2 1 5
Key Parameters
| Parameter | Description | Example |
|---|---|---|
index |
Values to group by in rows | df['Id'] |
columns |
Values to group by in columns | [df['Age'], df1['Mark']] |
margins |
Add row/column totals | margins=True |
Conclusion
Cross-tabulation with pd.crosstab() helps analyze relationships between categorical variables from different DataFrames. Use the margins=True parameter to include totals for better insights.
