Categorical and distribution plots in Python Data Visualization


A matplotlib-based Python visualization package is called Seaborn. It offers an advanced drawing interface for beautiful statistics visuals. It is based on Matplotlib and supports the pandas and numpy data structures and the statistical functions from scipy and statsmodels.

A connection involving categorical data may be shown in seaborn in various ways. There are two ways to create these charts, which is similar to the relationship between relplot() and either scatterplot() or lineplot(). There are various axes-level methods for charting categorical data in various ways, and the figure-level interface catplot() provides uniform higher-level access to them.

What is categorical data?

Categorical data is data that belongs to a particular category or group. In Python, categorical data is usually stored as a Pandas categorical data type.

Categorical data is often used to store data that can be divided into a fixed number of categories, such as gender (male, female), product type (phone, computer, TV), or blood type (A, B, AB, O). A variable that only permits categorization but not a definite ordering of the variables is purely categorical. A variable will be considered ordinal if it has a distinct ordering.

Now let’s discuss using seaborn to plot categorical data! There are a few main plot types for this −

  • barplot

  • countplot

  • boxplot

  • violinplot

  • striplot

  • swarmplot

Let’s go through examples of each!

First, we will import the library, Seaborn.

import seaborn as sns

%matplotlib inline

#to plot the graphs inline on jupyter notebook

Bar plot

A bar plot is a graphical data display using bars of different heights. It is used to visualise the distribution of a categorical variable. The height of the bars represents the count or frequency of the category.

sns.barplot(x='sex',y='total_bill',data=t)

Here parameters x, and y refers to the name of the variables in the dataset provided in parameter ‘data’.

Count plot

A countplot is a bar plot showing a categorical variable's counts or frequencies. It is a specialised version of the bar plot and is useful for quickly visualising the data distribution. The command for creating a countplot is −

sns.countplot(x='sex',data=t)

Box plot

A box plot is a graphical display of data using a box and whiskers. It is used to visualise a numerical variable's distribution and summary statistics. The box represents the interquartile range (IQR), and the whiskers represent the minimum and maximum values of the data.

sns.boxplot(x='day',y='total_bill',data=t,palette='rainbow')

Violin plot

A violin plot is a graphical display of data that combines a box plot with a kernel density plot. It is used to visualise a numerical variable's distribution and summary statistics. The width of the violin represents the kernel density estimate of the data, and the box inside the violin represents the IQR.

sns.violinplot(x="day", y="total_bill", data=t,palette='rainbow')

Strip plot AND swarn plot

A strip plot is a graphical display of data using a scatter plot where one of the variables is categorical. It is used to visualise the distribution of a numerical variable for different categories.

A swarm plot is a graphical display of data using a scatter plot where the points are adjusted so they don't overlap. It is similar to a strip plot, but the points are not positioned on a categorical axis.

Distribution Plots

Various functions in the distributions module can provide answers to problems like these. Histplot(), kdeplot(), ecdfplot(), and rugplot are axes-level functions (). The figure-level displot(), jointplot(), and pairplot() routines contain them all together.

Displot

displot() is a function in the Seaborn library in Python that is used to visualise the distribution of a single numerical variable. It creates a histogram and fits a probability density function (PDF) or kernel density estimate (KDE) to the data.

Syntax

distplot(a[, bins, hist, kde, rug, fit, ...])

Joinplot

jointplot() is a function in the Seaborn library used to visualise the relationship between two numerical variables. It creates a scatter plot, fits a regression line to the data, and displays each variable's distribution with a histogram or KDE plot.

Syntax

jointplot(x, y[, data, kind, stat_func, ...])

pairplot

pairplot() is a function in the Seaborn library used to visualise the relationships between all pairs of variables in a dataset. It creates a grid of scatter plots and histograms or KDE plots for each pair of variables, which allows you to explore and visualise the relationships in the data quickly.

Syntax

pairplot(data, *, hue=None, hue_order=None, palette=None, vars=None, x_vars=None, y_vars=None, kind='scatter', diag_kind='auto', markers=None, height=2.5, aspect=1, corner=False, dropna=False, plot_kws=None, diag_kws=None, grid_kws=None, size=None)

rugplot

A rug plot is a graphical display of data that shows the distribution of a numerical variable along an axis. It is similar to a histogram, but instead of showing the frequency or density of the data, it shows the actual data points as lines or ticks along the axis.

The rug plot is useful for visualising the data distribution and identifying the presence of outliers. It can also be used to compare the distribution of multiple variables.

Syntax

rugplot(data=None, *, x=None, y=None, hue=None, height=0.025, expand_margins=True, palette=None, hue_order=None, hue_norm=None, legend=True, ax=None, **kwargs)

Conclusion

In conclusion, categorical and distribution plots are types of visualizations used to explore and analyze categorical and continuous data, respectively. Categorical plots visualize the distribution of categories within a dataset and can be useful for identifying patterns and trends within the data. Distribution plots are used to visualize the distribution of continuous variables and can be useful for identifying patterns and trends within the data.

Many libraries in Python provide a wide range of categorical and distribution plots, including Matplotlib, Seaborn, and Plotly. Some examples of categorical plots include bar plots, pie charts, and box plots, and some examples of distribution plots include histograms, kernel density plots, and violin plots. Understanding the different types of plots and when to use them can be useful for exploring and analyzing data in Python.

Updated on: 05-May-2023

937 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements