
- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
When do we use crosstab and pivot_table in Python Pandas?
In this article, we will show you when we use the crosstab() and pivot_table() functions in Python Pandas.
When to use crosstab or pivot_table
The pivot table expects your input data to already be a DataFrame; you pass a DataFrame to the pivot table and specify the index/columns/values by passing the column names as strings. You don't need to pass a DataFrame into a cross tab because you just pass arraylike objects for index/columns/values.
In general, use a pivot table if you already have a DataFrame so you don't have to create the same DataFrame twice. Use crosstab if you start with array-like objects and are only interested in the pivoted data. In most cases, I don't believe it makes much of a difference in which function you use.
pandas.crosstab() function
A simple cross-tabulation of two (or more) factors is computed using this crosstab() method. Unless an array of values and an aggregation function are passed, compute a frequency table of the factors by default.
This function allows us to customize the data in many ways. At first glance, it appears to be very similar to the Pandas pivot table function.
Syntax
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False)
Parameters
Parameter | Description | Default Value |
---|---|---|
index | array-like, series, or Values to group by in rows | |
columns | array-like, series, or Values to group by in columns | |
values | These are an array of values to aggregate | |
rownames | name of the rows (list) | |
colnames | name of the columns (list) | |
aggfunc(optional) | specifies how to aggregate data, if given, requires ‘values’ be specified as well. | count |
margins | It Add row/column totals | False |
margins_name | It is the name for row/column totals | “All” |
dropna | drops columns whose entries are all NaN | True |
normalize | It normalizes by dividing all values by the sum of all values | False |
The crosstab function has several advantages over the pivot table function, including
The summary can be normalized to display data as a percentage of row or column totals.
- The data does not need to be structured as a dataframe before analysis.
Pandas Crosstab vs. Pivot Table
Much of what you can do with a Pandas Crosstab, you can do with a Pandas Pivot Table. The main differences are as follows −
A dataframe is not required as an input to the function. For its rows and columns, it can also accept array-like objects.
The function has the ability to normalize the output dataframe, which means that the numbers displayed can be expressed as a percentage of row or column totals.
The default function is len (count), whereas the pivot table function is NumPy's mean.
Creating a Crosstab in Pandas
Algorithm (Steps)
Following are the Algorithm/steps to be followed to perform the desired task −
Use the import keyword, to import the pandas, numpy module.
Creating numpy arrays using the array() function(returns an ndarray. The ndarray is an array object that satisfies the given requirements) of the numpy module.
Create a cross tab for the given NumPy arrays using cross_tab() function and print it.
Example
The following program returns a crosstab of the given NumPy arrays using the crosstab() function −
# importing pandas, numpy modules import pandas import numpy # creating numpy arrays array_1 = numpy.array(["cat", "cat", "cat", "cat", "rat", "rat", "rat", "rat", "cat", "cat", "cat"], dtype=object) array_2 = numpy.array(["pin", "pin", "pin", "bat", "pin", "pin", "pin", "bat", "bat", "bat", "pin"], dtype=object) array_3 = numpy.array(["book", "book", "car", "book", "book", "car", "car", "book", "car", "car", "car"], dtype=object) # creating a cross tab for the given numpy arrays by giving rownames as the array 1 names and column names as array2 and array 3 names print(pandas.crosstab(array_1, [array_2, array_3], rownames=['array_1'], colnames=['array_2', 'array_3']))
Output
On executing, the above program will generate the following output −
array_2 bat pin array_3 book car book car array_1 cat 1 2 2 2 rat 1 0 1 2
pandas.pivot_table() function
The pivot_table() function creates a spreadsheet-style pivot table as a DataFrame.
Levels in the pivot table will be saved as MultiIndex objects (hierarchical indexes) on the result DataFrame's index and columns.
Syntax
pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc=’mean’, fill_value=None, margins=False, dropna=True, margins_name=’All’)
Creating a simple dataframe using pandas
Algorithm (Steps)
Following are the Algorithm/steps to be followed to perform the desired task −
Use the import keyword, to import the pandas, numpy module with alias names.
Create a dataframe using the DataFrame() function of the pandas module.
Print the input dataframe.
Example
The following program returns a dataframe using the DataFrame() function −
# importing pandas, numpy modules with alias names import pandas as pd import numpy as np # creating a dataframe inputDataframe = pd.DataFrame({'Name': ['Virat', 'Rohit', 'Meera', 'Nick', 'Sana'], 'Jobrole': ['Developer', 'Analyst', 'Help Desk', 'Database Developer', 'Finance accountant'], 'Age': [25, 30, 28, 25, 40]}) # displaying the dataframe print(inputDataframe)
Output
On executing, the above program will generate the following output −
Name Jobrole Age 0 Virat Developer 25 1 Rohit Analyst 30 2 Meera Help Desk 28 3 Nick Database Developer 25 4 Sana Finance accountant 40
Creating a pivot table of the above dataframe
Example
The following program creates a simple pivot table that has data frame and an index/list of index
# importing pandas, numpy modules with alias names import pandas as pd import numpy as np # creating a dataframe inputDataframe = pd.DataFrame({'Name': ['Virat', 'Rohit', 'Meera', 'Nick','Sana'], 'Jobrole': ['Developer', 'Analyst', 'Help Desk', 'Database Developer', 'Finance accountant'],'Age': [25, 30, 28, 35, 40]}) # creating a simple pivot table that has data frame and an index/list of index. result_pivottable = pd.pivot_table(inputDataframe, index =['Name', 'Jobrole',]) # displaying the pivot table print(result_pivottable)
Output
On executing, the above program will generate the following output −
Name Jobrole Age Meera Help Desk 28 Nick Database Developer 35 Rohit Analyst 30 Sana Finance accountant 40 Virat Developer 25
Conclusion
By using some examples, we learned when to use the crosstab() function and when to use the pivot table function in this article.
- Related Articles
- Why do we use pandas in python?
- Create a Pivot Table as a DataFrame – Python Pandas
- Create a Pivot Table with multiple columns – Python Pandas
- When do we use findElement() and findElements() in Selenium?
- Why do we use random.seed() in Python?
- Can we use “When” as column name in CREATE TABLE statement?
- How do we use Python in interactive mode?
- How do we use Python in script mode?
- How do we use double quotation in Python?
- Why do we need a copy constructor and when should we use a copy constructor in Java?
- What do you mean by PRIMARY KEY and how can we use it in MySQL table?
- What do you mean by FOREIGN KEY and how can we use it in MySQL table?
- How do we use re.finditer() method in Python regular expression?
- How do we use equivalence (“equality”) operator in Python classes?
- Why do we use re.compile() method in Python regular expression?
