When do we use crosstab and pivot_table in Python Pandas?


In this article, we will show you when we use the crosstab() and pivot_table() functions in Python Pandas.

When to use crosstab or pivot_table

The pivot table expects your input data to already be a DataFrame; you pass a DataFrame to the pivot table and specify the index/columns/values by passing the column names as strings. You don't need to pass a DataFrame into a cross tab because you just pass arraylike objects for index/columns/values.

In general, use a pivot table if you already have a DataFrame so you don't have to create the same DataFrame twice. Use crosstab if you start with array-like objects and are only interested in the pivoted data. In most cases, I don't believe it makes much of a difference in which function you use.

pandas.crosstab() function

A simple cross-tabulation of two (or more) factors is computed using this crosstab() method. Unless an array of values and an aggregation function are passed, compute a frequency table of the factors by default.

This function allows us to customize the data in many ways. At first glance, it appears to be very similar to the Pandas pivot table function.

Syntax

pandas.crosstab(index, columns, values=None, rownames=None, colnames=None,
   aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False)

Parameters

Parameter Description Default Value
index array-like, series, or Values to group by in rows
columns array-like, series, or Values to group by in columns
values These are an array of values to aggregate
rownames name of the rows (list)
colnames name of the columns (list)
aggfunc(optional) specifies how to aggregate data, if given, requires ‘values’ be specified as well. count
margins It Add row/column totals False
margins_name It is the name for row/column totals “All”
dropna drops columns whose entries are all NaN True
normalize It normalizes by dividing all values by the sum of all values False

The crosstab function has several advantages over the pivot table function, including

  • The summary can be normalized to display data as a percentage of row or column totals.

  • The data does not need to be structured as a dataframe before analysis.

Pandas Crosstab vs. Pivot Table

Much of what you can do with a Pandas Crosstab, you can do with a Pandas Pivot Table. The main differences are as follows −

A dataframe is not required as an input to the function. For its rows and columns, it can also accept array-like objects.

The function has the ability to normalize the output dataframe, which means that the numbers displayed can be expressed as a percentage of row or column totals.

The default function is len (count), whereas the pivot table function is NumPy's mean.

Creating a Crosstab in Pandas

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

  • Use the import keyword, to import the pandas, numpy module.

  • Creating numpy arrays using the array() function(returns an ndarray. The ndarray is an array object that satisfies the given requirements) of the numpy module.

  • Create a cross tab for the given NumPy arrays using cross_tab() function and print it.

Example

The following program returns a crosstab of the given NumPy arrays using the crosstab() function −

# importing pandas, numpy modules import pandas import numpy # creating numpy arrays array_1 = numpy.array(["cat", "cat", "cat", "cat", "rat", "rat", "rat", "rat", "cat", "cat", "cat"], dtype=object) array_2 = numpy.array(["pin", "pin", "pin", "bat", "pin", "pin", "pin", "bat", "bat", "bat", "pin"], dtype=object) array_3 = numpy.array(["book", "book", "car", "book", "book", "car", "car", "book", "car", "car", "car"], dtype=object) # creating a cross tab for the given numpy arrays by giving rownames as the array 1 names and column names as array2 and array 3 names print(pandas.crosstab(array_1, [array_2, array_3], rownames=['array_1'], colnames=['array_2', 'array_3']))

Output

On executing, the above program will generate the following output −

array_2 bat       pin
array_3 book car book car
array_1
cat      1     2  2    2
rat      1     0  1    2

pandas.pivot_table() function

The pivot_table() function creates a spreadsheet-style pivot table as a DataFrame.

Levels in the pivot table will be saved as MultiIndex objects (hierarchical indexes) on the result DataFrame's index and columns.

Syntax

pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc=’mean’, fill_value=None, margins=False, dropna=True, margins_name=’All’)

Creating a simple dataframe using pandas

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

  • Use the import keyword, to import the pandas, numpy module with alias names.

  • Create a dataframe using the DataFrame() function of the pandas module.

  • Print the input dataframe.

Example

The following program returns a dataframe using the DataFrame() function −

# importing pandas, numpy modules with alias names import pandas as pd import numpy as np # creating a dataframe inputDataframe = pd.DataFrame({'Name': ['Virat', 'Rohit', 'Meera', 'Nick', 'Sana'], 'Jobrole': ['Developer', 'Analyst', 'Help Desk', 'Database Developer', 'Finance accountant'], 'Age': [25, 30, 28, 25, 40]}) # displaying the dataframe print(inputDataframe)

Output

On executing, the above program will generate the following output −

   Name  Jobrole              Age
0 Virat  Developer            25
1 Rohit  Analyst              30
2 Meera  Help Desk            28
3 Nick   Database Developer   25
4 Sana   Finance accountant   40

Creating a pivot table of the above dataframe

Example

The following program creates a simple pivot table that has data frame and an index/list of index

# importing pandas, numpy modules with alias names import pandas as pd import numpy as np # creating a dataframe inputDataframe = pd.DataFrame({'Name': ['Virat', 'Rohit', 'Meera', 'Nick','Sana'], 'Jobrole': ['Developer', 'Analyst', 'Help Desk', 'Database Developer', 'Finance accountant'],'Age': [25, 30, 28, 35, 40]}) # creating a simple pivot table that has data frame and an index/list of index. result_pivottable = pd.pivot_table(inputDataframe, index =['Name', 'Jobrole',]) # displaying the pivot table print(result_pivottable)

Output

On executing, the above program will generate the following output −

Name     Jobrole              Age
Meera    Help Desk            28
Nick     Database Developer   35
Rohit    Analyst              30
Sana     Finance accountant   40
Virat    Developer            25

Conclusion

By using some examples, we learned when to use the crosstab() function and when to use the pivot table function in this article.

Updated on: 31-Oct-2022

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements