Python Pandas - Andrews Curves



Andrews Curves provide a way to visualize multivariate, high-dimensional data using smooth curves. Theses curves are generated based on a Fourier series, where the dataset attributes are treated as coefficients. By assigning different colors to these curves based on class labels, we can easily see clusters and identify patterns in the data. The curves representing similar classes appear closer together, helping us understand how data points relate to one another.

In this tutorial, we will learn how to create Andrews Curves in Python using Pandas library.

Pandas plotting.andrews_curves() Function

Pandas provides a direct function called pandas.plotting.andrews_curves() for generating Andrews curves. This function takes a DataFrame with multivariate data, a column with class labels, and other parameters for customization. This function returns a matplotlib.axes.Axes object representing the plot.

Mathematical Form of Andrews Curves

The function for Andrews Curves is based on Fourier series and represented as −

f(t)=x1/2+x2sin(t)+x3cos(t)+x4sin(2t)+x5cos(2t)+

Where,

  • x coefficients are the values of different attributes (features) for each data sample.

  • t is linearly spaced from - to +, which forms a continuous curve.

Syntax

Following is the syntax of the pandas.plotting.andrews_curves() function −

pandas.plotting.andrews_curves(frame, class_column, ax=None, samples=200, color=None, colormap=None, **kwargs)

Where,

  • frame: A DataFrame containing the data to be plotted.

  • class_column: The column name containing class labels. Curves are colored based on these labels.

  • ax: It is a optional parameter, by default it is set to None. It is a Matplotlib axes object where the plot will be drawn.

  • samples: Number of points for each curve, by default it is set to 200.

  • color: Specify the colors to use for different classes.

  • colormap: A colormap for selecting colors based on class labels.

Example

Here is the basic example of plotting the Andrews Curves for iris dataset using the pandas.plotting.andrews_curves() function.

import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import andrews_curves

# Load the Iris dataset
url = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/iris.csv'
df = pd.read_csv(url)

# Plot Andrews Curves, using the 'Name' column 
plt.figure(figsize=(7, 3))
andrews_curves(df, 'Name')
plt.title("Andrews Curves for Iris Dataset")
plt.show()

Following is the output of the above code −

Andrews Curves for Iris Dataset

Customizing Colors in Andrews Curves

We can customize the Andrews curves plot by specifying colors for each class or applying a Matplotlib colormap.

Example: Using Custom Colors

The following example demonstrates customize the Andrews curves plot by specifying custom colors to the pandas.plotting.andrews_curves() function.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import andrews_curves

# Create data
df_random = pd.DataFrame(np.random.randn(100, 3), columns=['Feature1', 'Feature2', 'Feature3'])
df_random['Category'] = np.random.choice(['Class1', 'Class2', 'Class3'], size=100)
                   
# Plot Andrews Curves for the random data
plt.figure(figsize=(7, 4))
andrews_curves(df_random, 'Category', color=['red', 'blue', 'green'])
plt.title("Andrews Curves with Custom Colors")
plt.show()

Following is the output of the above code −

Andrews Curves with Custom Colors

Example: Applying a Colormap

The following example demonstrates customize the Andrews curves plot by specifying custom colors to the pandas.plotting.andrews_curves() function.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import andrews_curves

# Create data
df_random = pd.DataFrame(np.random.randn(100, 3), columns=['Feature1', 'Feature2', 'Feature3'])
df_random['Category'] = np.random.choice(['Class1', 'Class2', 'Class3'], size=100)
                   
# Plot Andrews Curves for the random data
plt.figure(figsize=(7, 4))
andrews_curves(df_random, 'Category', colormap='viridis')
plt.title("Andrews Curves with Colormap")
plt.show()

Following is the output of the above code −

Andrews Curves with Colormaps
Advertisements