Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
When do we use crosstab and pivot_table in Python Pandas?
In this article, we will explore when to use crosstab() and pivot_table() functions in Python Pandas. Both functions create summary tables, but they have different use cases and advantages.
When to Use crosstab vs pivot_table
The key difference lies in the input data format:
pivot_table() - Use when you already have a DataFrame and want to reorganize it. You pass column names as strings to specify index, columns, and values.
crosstab() - Use when you have separate array-like objects (lists, NumPy arrays, Series) that you want to cross-tabulate. No existing DataFrame required.
In general, use pivot_table() if you already have a DataFrame to avoid creating duplicate data structures. Use crosstab() when starting with separate arrays and focusing only on frequency analysis.
pandas.crosstab() Function
The crosstab() function computes a cross-tabulation of two or more factors. By default, it calculates frequency tables unless you specify values and an aggregation function.
Syntax
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)
Parameters
| Parameter | Description | Default Value |
|---|---|---|
| index | Array-like, Series, or values to group by in rows | Required |
| columns | Array-like, Series, or values to group by in columns | Required |
| values | Array of values to aggregate | None |
| aggfunc | Function to aggregate data (requires 'values') | count |
| margins | Add row/column totals | False |
| normalize | Normalize by dividing by sum of values | False |
Advantages of crosstab()
Can normalize output to show percentages of row or column totals
No DataFrame required as input - accepts array-like objects directly
Default aggregation is count, making it ideal for frequency analysis
Creating a Crosstab Example
Let's create a crosstab from separate NumPy arrays ?
import pandas as pd
import numpy as np
# Creating numpy arrays
animals = np.array(["cat", "cat", "cat", "cat", "rat", "rat", "rat", "rat", "cat", "cat", "cat"])
objects = np.array(["pin", "pin", "pin", "bat", "pin", "pin", "pin", "bat", "bat", "bat", "pin"])
items = np.array(["book", "book", "car", "book", "book", "car", "car", "book", "car", "car", "car"])
# Creating cross-tabulation
result = pd.crosstab(animals, [objects, items],
rownames=['Animals'],
colnames=['Objects', 'Items'])
print(result)
Objects bat pin Items book car book car Animals cat 1 2 2 2 rat 1 0 1 2
pandas.pivot_table() Function
The pivot_table() function creates a spreadsheet-style pivot table from a DataFrame. It's more suitable when you already have structured data in DataFrame format.
Syntax
pandas.pivot_table(data, values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=False,
dropna=True, margins_name='All')
Creating a DataFrame and Pivot Table
Let's create a DataFrame and then generate a pivot table from it ?
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({
'Name': ['Virat', 'Rohit', 'Meera', 'Nick', 'Sana'],
'Department': ['IT', 'Finance', 'IT', 'IT', 'Finance'],
'Age': [25, 30, 28, 35, 40],
'Salary': [50000, 55000, 48000, 60000, 65000]
})
print("Original DataFrame:")
print(df)
Original DataFrame:
Name Department Age Salary
0 Virat IT 25 50000
1 Rohit Finance 30 55000
2 Meera IT 28 48000
3 Nick IT 35 60000
4 Sana Finance 40 65000
Now let's create a pivot table to analyze average salary by department ?
import pandas as pd
df = pd.DataFrame({
'Name': ['Virat', 'Rohit', 'Meera', 'Nick', 'Sana'],
'Department': ['IT', 'Finance', 'IT', 'IT', 'Finance'],
'Age': [25, 30, 28, 35, 40],
'Salary': [50000, 55000, 48000, 60000, 65000]
})
# Creating pivot table
pivot_result = pd.pivot_table(df, values='Salary', index='Department', aggfunc='mean')
print(pivot_result)
Salary
Department
Finance 60000
IT 52667
Comparison
| Feature | crosstab() | pivot_table() |
|---|---|---|
| Input Format | Array-like objects | DataFrame required |
| Default Aggregation | Count (frequency) | Mean |
| Normalization | Yes (built-in) | No |
| Best Use Case | Frequency analysis | Existing DataFrame restructuring |
Conclusion
Use crosstab() when working with separate arrays and need frequency analysis or normalization. Use pivot_table() when restructuring existing DataFrames with custom aggregations like mean, sum, or count.
