When do we use crosstab and pivot_table in Python Pandas?

Python Server Side Programming Programming

In this article, we will show you when we use the crosstab() and pivot_table() functions in Python Pandas.

When to use crosstab or pivot_table

The pivot table expects your input data to already be a DataFrame; you pass a DataFrame to the pivot table and specify the index/columns/values by passing the column names as strings. You don't need to pass a DataFrame into a cross tab because you just pass arraylike objects for index/columns/values.

In general, use a pivot table if you already have a DataFrame so you don't have to create the same DataFrame twice. Use crosstab if you start with array-like objects and are only interested in the pivoted data. In most cases, I don't believe it makes much of a difference in which function you use.

pandas.crosstab() function

A simple cross-tabulation of two (or more) factors is computed using this crosstab() method. Unless an array of values and an aggregation function are passed, compute a frequency table of the factors by default.

This function allows us to customize the data in many ways. At first glance, it appears to be very similar to the Pandas pivot table function.

Syntax

pandas.crosstab(index, columns, values=None, rownames=None, colnames=None,
   aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False)

Parameters

Parameter	Description	Default Value
index	array-like, series, or Values to group by in rows
columns	array-like, series, or Values to group by in columns
values	These are an array of values to aggregate
rownames	name of the rows (list)
colnames	name of the columns (list)
aggfunc(optional)	specifies how to aggregate data, if given, requires ‘values’ be specified as well.	count
margins	It Add row/column totals	False
margins_name	It is the name for row/column totals	“All”
dropna	drops columns whose entries are all NaN	True
normalize	It normalizes by dividing all values by the sum of all values	False

The crosstab function has several advantages over the pivot table function, including

The summary can be normalized to display data as a percentage of row or column totals.
The data does not need to be structured as a dataframe before analysis.

Pandas Crosstab vs. Pivot Table

Much of what you can do with a Pandas Crosstab, you can do with a Pandas Pivot Table. The main differences are as follows −

A dataframe is not required as an input to the function. For its rows and columns, it can also accept array-like objects.

The function has the ability to normalize the output dataframe, which means that the numbers displayed can be expressed as a percentage of row or column totals.

The default function is len (count), whereas the pivot table function is NumPy's mean.

Creating a Crosstab in Pandas

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Use the import keyword, to import the pandas, numpy module.
Creating numpy arrays using the array() function(returns an ndarray. The ndarray is an array object that satisfies the given requirements) of the numpy module.
Create a cross tab for the given NumPy arrays using cross_tab() function and print it.

Example

The following program returns a crosstab of the given NumPy arrays using the crosstab() function −


# importing pandas, numpy modules
import pandas
import numpy

# creating numpy arrays
array_1 = numpy.array(["cat", "cat", "cat", "cat", "rat", "rat", "rat", "rat", "cat", "cat", "cat"], dtype=object)
array_2 = numpy.array(["pin", "pin", "pin", "bat", "pin", "pin", "pin", "bat", "bat", "bat", "pin"], dtype=object)
array_3 = numpy.array(["book", "book", "car", "book", "book", "car", "car", "book", "car", "car", "car"], dtype=object)

# creating a cross tab for the given numpy arrays by giving rownames as the array 1 names and column names as array2 and array 3 names
print(pandas.crosstab(array_1, [array_2, array_3], rownames=['array_1'],
colnames=['array_2', 'array_3']))

Output

On executing, the above program will generate the following output −

array_2 bat       pin
array_3 book car book car
array_1
cat      1     2  2    2
rat      1     0  1    2

pandas.pivot_table() function

The pivot_table() function creates a spreadsheet-style pivot table as a DataFrame.

Levels in the pivot table will be saved as MultiIndex objects (hierarchical indexes) on the result DataFrame's index and columns.

Syntax

pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc=’mean’, fill_value=None, margins=False, dropna=True, margins_name=’All’)

Creating a simple dataframe using pandas

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

Use the import keyword, to import the pandas, numpy module with alias names.
Create a dataframe using the DataFrame() function of the pandas module.
Print the input dataframe.

Example

The following program returns a dataframe using the DataFrame() function −


# importing pandas, numpy modules with alias names
import pandas as pd
import numpy as np

# creating a dataframe
inputDataframe = pd.DataFrame({'Name': ['Virat', 'Rohit', 'Meera', 'Nick', 'Sana'], 'Jobrole': ['Developer', 'Analyst', 'Help Desk', 'Database Developer', 'Finance accountant'], 'Age': [25, 30, 28, 25, 40]})

# displaying the dataframe
print(inputDataframe)

Output

On executing, the above program will generate the following output −

   Name  Jobrole              Age
0 Virat  Developer            25
1 Rohit  Analyst              30
2 Meera  Help Desk            28
3 Nick   Database Developer   25
4 Sana   Finance accountant   40

Creating a pivot table of the above dataframe

Example

The following program creates a simple pivot table that has data frame and an index/list of index


# importing pandas, numpy modules with alias names
import pandas as pd
import numpy as np

# creating a dataframe
inputDataframe = pd.DataFrame({'Name': ['Virat', 'Rohit', 'Meera', 'Nick','Sana'], 'Jobrole': ['Developer', 'Analyst', 'Help Desk', 'Database Developer', 'Finance accountant'],'Age': [25, 30, 28, 35, 40]})

# creating a simple pivot table that has data frame and an index/list of index.
result_pivottable = pd.pivot_table(inputDataframe, index =['Name', 'Jobrole',])

# displaying the pivot table
print(result_pivottable)

Output

On executing, the above program will generate the following output −

Name     Jobrole              Age
Meera    Help Desk            28
Nick     Database Developer   35
Rohit    Analyst              30
Sana     Finance accountant   40
Virat    Developer            25

Conclusion

By using some examples, we learned when to use the crosstab() function and when to use the pivot table function in this article.

Vikram Chiluka

Updated on: 31-Oct-2022

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started