Python Pandas - Parallel Coordinates



Parallel Coordinates is a data visualization technique used for analyzing high-dimensional datasets. It represents multivariate data points as lines connecting multiple vertical axes, where each vertical axis corresponds to one variable, and the position of the line segment on the axis indicates the value of the variable.

Parallel Coordinates plots are used to see clusters, relationships in data, and to estimate other statistics visually in multivariate data. In this tutorial, we will learn how to use Python's Pandas library to create Parallel Coordinates plots and customize them for effective visual analysis.

Pandas parallel_coordinates() Function

Pandas provides a direct function called parallel_coordinates() function within the plotting module for generating Parallel Coordinates plots. This function takes a DataFrame with multivariate data, a column with class labels, and other parameters for customization. This function returns a matplotlib.axes.Axes object representing the plot.

Syntax

Following is the syntax of the parallel_coordinates() Function −

pandas.plotting.parallel_coordinates(frame, class_column, cols=None, ax=None, color=None, use_columns=False, xticks=None, colormap=None, axvlines=True, axvlines_kwds=None, sort_labels=False, **kwargs)

Where,

  • frame: A DataFrame containing the data to be plotted.

  • class_column: The DataFrame column containing class labels.

  • cols: An optional parameter used to specify the list of columns to include in the plot.

  • ax: It is an optional parameter, by default it is set to None. It is a Matplotlib axes object where the plot will be drawn.

  • color: Specify the colors to use for different classes.

  • use_columns: If specified to True, columns will be used as xticks.

  • xticks: Used to determine the list of values to use for xticks.

  • colormap: A colormap for selecting colors for lines.

  • axvlines: Boolean to control whether vertical lines are drawn or not, by default set to True.

  • sort_labels: Boolean to sort class labels, by default it is set to False.

  • **kwargs: Additional keyword arguments for customizing the plot.

Example

Let us see an example to understand how to create Parallel Coordinates plots using Pandas.

import numpy as np
import pandas as pd
from pandas.plotting import parallel_coordinates
import matplotlib.pyplot as plt

# Generate sample data
df = pd.DataFrame({ch: np.random.randn(100) for ch in 'ABCDE'})
df['Group'] = np.random.choice(['Group1', 'Group2', 'Group3'], size=len(df))

# Create Parallel Coordinates plot
plt.figure(figsize=(10, 6))
parallel_coordinates(df, class_column='Group')
plt.title('Parallel Coordinates Plot')
plt.xlabel('Attributes')
plt.ylabel('Values')
plt.show()

Following is the output of the above code −

Basic Parallel Coordinates Plot

Customizing Colors in Parallel Coordinates plots

To customize the colors of the lines in a parallel coordinates plot, you can use the color parameter. Which is useful for highlighting specific groups of the class_column.

Example

The following example manually assigns colors to the parallel coordinates plots by using the color parameter of the pandas plotting.parallel_coordinates() function.

import numpy as np
import pandas as pd
from pandas.plotting import parallel_coordinates
import matplotlib.pyplot as plt

# Generate sample data
df = pd.DataFrame({ch: np.random.randn(50) for ch in 'XYZ'})
df['Category'] = np.random.choice(['Alpha', 'Beta', 'Gamma'], size=len(df))

# Assign custom colors 
plt.figure(figsize=(7, 4))
parallel_coordinates(df, class_column='Category',
color=['#FF5733', '#33FF57', '#3357FF'])
plt.title('Customizing Colors in Parallel Coordinates Plot')
plt.xlabel('Attributes')
plt.ylabel('Values')
plt.grid(True)
plt.show()

On executing the above code we will get the following output −

Customizing Colors in Parallel Coordinates Plot

Parallel Coordinates Line Width Customization

You can also adjust the line width and enabling the vertical lines at axis ticks for better visual of the parallel coordinates plots.

Example

The following example customizes the parallel coordinates plots line width using the lw parameter and enables the vertical lines at axis ticks using the axvlines parameter of the parallel_coordinates() function.

import numpy as np
import pandas as pd
from pandas.plotting import parallel_coordinates
import matplotlib.pyplot as plt

# Generate sample data
df = pd.DataFrame({ch: np.random.randn(50) for ch in 'XYZ'})
df['Category'] = np.random.choice(['Alpha', 'Beta', 'Gamma'], size=len(df))

# Create Parallel Coordinates plot
fig, axes = plt.subplots(ncols=2, figsize=(7, 4))
for ax, lw in zip(axes, [1, 3]):
    parallel_coordinates(df, class_column='Category', colormap='cool', 
    ax=ax, lw=lw, axvlines=True)
    ax.set_title(f'Line Width = {lw}')
    ax.get_legend().remove()
plt.show()

Following is the output of the above code −

Parallel Coordinates Plot Line Width Customization
Advertisements