How to Use Pandas cut() and qcut()?

Pandas is a Python library that is used for data manipulation and analysis of structured data. The cut() and qcut() methods of pandas are used for creating categorical variables from numerical data. The cut() method splits numerical data into discrete intervals based on value ranges, while qcut() splits data into quantiles with equal frequencies. In this article, we will understand the functionalities of both methods with practical examples.

The cut() Function

The cut() function divides a continuous variable into discrete bins or intervals based on specified criteria. It creates groups or categories of data based on the range of values present in the input data.

Syntax

pandas.cut(x, bins, labels=None, right=True, include_lowest=False, ...)

Parameters

  • x: The input data, which can be a Pandas Series or a NumPy array.

  • bins: This can be an integer value specifying the number of equal-width bins to create, or a sequence of scalar values defining the bin edges.

  • labels (optional): An array-like object of labels to assign to each bin. If not provided, the labels will be intervals.

  • right (optional): A boolean value indicating whether the intervals should be right-closed (default: True).

  • include_lowest (optional): A boolean value indicating whether to include the lowest value (default: False).

Example 1: Equal-width Bins

Here we divide numeric data into three equal-width bins ?

import pandas as pd

# Example 1: Equal-width bins
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
bins = 3
categories = pd.cut(data, bins)
print(categories)
[(9.91, 40.0], (9.91, 40.0], (9.91, 40.0], (9.91, 40.0], (40.0, 70.0], (40.0, 70.0], (40.0, 70.0], (70.0, 100.0], (70.0, 100.0], (70.0, 100.0]]
Categories (3, interval[float64, right]): [(9.91, 40.0] < (40.0, 70.0] < (70.0, 100.0]]

Example 2: Custom Bin Edges and Labels

We can define custom bin edges and assign meaningful labels ?

import pandas as pd

# Example 2: Custom bin edges and labels
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
bins = [0, 30, 60, 100]
labels = ['Low', 'Medium', 'High']
categories = pd.cut(data, bins, labels=labels)
print(categories)
['Low', 'Low', 'Low', 'Medium', 'Medium', 'Medium', 'High', 'High', 'High', 'High']
Categories (3, object): ['Low' < 'Medium' < 'High']

The qcut() Function

The qcut() function splits the data based on quantiles or percentiles, ensuring each bin contains approximately the same number of data points. This is useful for creating evenly distributed groups.

Syntax

pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates='raise')

Parameters

  • x: The input data, which can be a Pandas Series or a NumPy array.

  • q: An integer value specifying the number of quantiles or a sequence of quantiles (values between 0 and 1).

  • labels (optional): An array-like object of labels to assign to each bin.

  • retbins (optional): A boolean value indicating whether to return the bin edges (default: False).

  • precision (optional): An integer value specifying the precision of quantile values (default: 3).

  • duplicates (optional): How to handle duplicate values (default: 'raise').

Example 1: Equal Number of Quantiles

Here we divide data into three equal-sized quantiles ?

import pandas as pd

# Example 1: Equal number of quantiles
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
quantiles = 3
categories = pd.qcut(data, quantiles)
print(categories)
[(9.999, 40.0], (9.999, 40.0], (9.999, 40.0], (9.999, 40.0], (40.0, 70.0], (40.0, 70.0], (40.0, 70.0], (70.0, 100.0], (70.0, 100.0], (70.0, 100.0]]
Categories (3, interval[float64, right]): [(9.999, 40.0] < (40.0, 70.0] < (70.0, 100.0]]

Example 2: Custom Quantiles and Labels

We can define custom quantiles with specific percentile thresholds ?

import pandas as pd

# Example 2: Custom quantiles and labels
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
quantiles = [0, 0.3, 0.6, 1]
labels = ['Low', 'Medium', 'High']
categories = pd.qcut(data, quantiles, labels=labels)
print(categories)
['Low', 'Low', 'Low', 'Medium', 'Medium', 'Medium', 'High', 'High', 'High', 'High']
Categories (3, object): ['Low' < 'Medium' < 'High']

Comparison

Aspect cut() qcut()
Bin Creation Equal-width intervals Equal-frequency quantiles
Data Distribution Bins may have different counts Bins have approximately equal counts
Best For Value-based categorization Percentile-based ranking

Conclusion

Use cut() when you need value-based intervals with specific ranges, and qcut() when you need equal-frequency bins based on data distribution. Both functions are essential for converting continuous data into categorical variables for analysis.

Updated on: 2026-03-27T15:15:26+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements