Article Categories

Selected Reading

Plotting graph For IRIS Dataset Using Seaborn And Matplotlib

Matplotlib Python Data Visualization

The Iris dataset is a widely recognized benchmark in data analysis and visualization. This article presents a comprehensive guide on plotting graphs for the Iris dataset using two powerful Python libraries: Seaborn and Matplotlib. We'll explore data loading, preprocessing, analysis, and creating insightful visualizations.

Using Seaborn's built-in Iris dataset and pairplot function, we'll create scatter plots that showcase relationships between different features and the distinct species of Iris flowers.

Loading the Iris Dataset

First, let's import the required libraries and load the dataset ?

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the Iris dataset from Seaborn
iris = sns.load_dataset('iris')

# Display basic information about the dataset
print("Dataset shape:", iris.shape)
print("\nFirst few rows:")
print(iris.head())
print("\nDataset info:")
print(iris.info())

Dataset shape: (150, 5)

First few rows:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None

Data Preprocessing and Analysis

Let's separate features from the target variable and calculate summary statistics ?

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the Iris dataset
iris = sns.load_dataset('iris')

# Separate features and target variable
features = iris.drop('species', axis=1)
target = iris['species']

# Calculate summary statistics
summary_stats = features.describe()
print("Summary Statistics:")
print(summary_stats)

print("\nSpecies distribution:")
print(iris['species'].value_counts())

Summary Statistics:
       sepal_length  sepal_width  petal_length  petal_width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.057333      3.758000     1.199333
std        0.828066     0.435866      1.765298     0.762238
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000

Species distribution:
setosa        50
versicolor    50
virginica     50

Creating Pairplot Visualization

Now let's create the main visualization using Seaborn's pairplot function ?

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the Iris dataset
iris = sns.load_dataset('iris')

# Set the style for better visualization
sns.set_style("whitegrid")

# Create pairplot
plt.figure(figsize=(12, 10))
pairplot = sns.pairplot(iris, hue="species", markers=["o", "s", "D"])
pairplot.fig.suptitle("Iris Dataset - Pairwise Feature Relationships", y=1.02)

plt.show()

Individual Feature Distributions

Let's also examine individual feature distributions using histograms ?

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the Iris dataset
iris = sns.load_dataset('iris')

# Create subplots for individual feature distributions
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Distribution of Iris Features by Species', fontsize=16)

# List of features to plot
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']

# Create histograms for each feature
for i, feature in enumerate(features):
    row = i // 2
    col = i % 2
    
    sns.histplot(data=iris, x=feature, hue='species', kde=True, ax=axes[row, col])
    axes[row, col].set_title(f'{feature.replace("_", " ").title()} Distribution')

plt.tight_layout()
plt.show()

Correlation Matrix

Let's create a correlation heatmap to understand feature relationships ?

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the Iris dataset
iris = sns.load_dataset('iris')

# Calculate correlation matrix for numeric features
features = iris.drop('species', axis=1)
correlation_matrix = features.corr()

# Create heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=0.5)
plt.title('Iris Dataset - Feature Correlation Matrix')
plt.tight_layout()
plt.show()

print("Correlation Matrix:")
print(correlation_matrix)

Correlation Matrix:
              sepal_length  sepal_width  petal_length  petal_width
sepal_length      1.000000    -0.117570      0.871754     0.817941
sepal_width      -0.117570     1.000000     -0.428440    -0.366126
petal_length      0.871754    -0.428440      1.000000     0.962865
petal_width       0.817941    -0.366126      0.962865     1.000000

Key Insights

From our visualizations, we can observe several important patterns ?

Species Separation: The three Iris species show distinct clustering patterns, especially in petal measurements
Feature Correlation: Petal length and petal width are highly correlated (0.96)
Setosa Distinction: Setosa species is clearly separable from the other two species
Versicolor vs Virginica: These two species show some overlap but can still be distinguished

Conclusion

This article demonstrated how to effectively visualize the Iris dataset using Seaborn and Matplotlib. The combination of pairplots, histograms, and correlation heatmaps provides comprehensive insights into feature relationships and species characteristics. These visualization techniques are essential for exploratory data analysis and understanding dataset patterns before applying machine learning algorithms.

---

Priya Mishra

Updated on: 2026-03-27T09:43:07+05:30

3K+ Views

Previous Next