Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Plotting graph For IRIS Dataset Using Seaborn And Matplotlib
The Iris dataset is a widely recognized benchmark in data analysis and visualization. This article presents a comprehensive guide on plotting graphs for the Iris dataset using two powerful Python libraries: Seaborn and Matplotlib. We'll explore data loading, preprocessing, analysis, and creating insightful visualizations.
Using Seaborn's built-in Iris dataset and pairplot function, we'll create scatter plots that showcase relationships between different features and the distinct species of Iris flowers.
Loading the Iris Dataset
First, let's import the required libraries and load the dataset ?
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the Iris dataset from Seaborn
iris = sns.load_dataset('iris')
# Display basic information about the dataset
print("Dataset shape:", iris.shape)
print("\nFirst few rows:")
print(iris.head())
print("\nDataset info:")
print(iris.info())
Dataset shape: (150, 5) First few rows: sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Dataset info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 sepal_length 150 non-null float64 1 sepal_width 150 non-null float64 2 petal_length 150 non-null float64 3 petal_width 150 non-null float64 4 species 150 non-null object dtypes: float64(4), object(1) memory usage: 6.0+ KB None
Data Preprocessing and Analysis
Let's separate features from the target variable and calculate summary statistics ?
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the Iris dataset
iris = sns.load_dataset('iris')
# Separate features and target variable
features = iris.drop('species', axis=1)
target = iris['species']
# Calculate summary statistics
summary_stats = features.describe()
print("Summary Statistics:")
print(summary_stats)
print("\nSpecies distribution:")
print(iris['species'].value_counts())
Summary Statistics:
sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
Species distribution:
setosa 50
versicolor 50
virginica 50
Creating Pairplot Visualization
Now let's create the main visualization using Seaborn's pairplot function ?
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the Iris dataset
iris = sns.load_dataset('iris')
# Set the style for better visualization
sns.set_style("whitegrid")
# Create pairplot
plt.figure(figsize=(12, 10))
pairplot = sns.pairplot(iris, hue="species", markers=["o", "s", "D"])
pairplot.fig.suptitle("Iris Dataset - Pairwise Feature Relationships", y=1.02)
plt.show()
Individual Feature Distributions
Let's also examine individual feature distributions using histograms ?
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the Iris dataset
iris = sns.load_dataset('iris')
# Create subplots for individual feature distributions
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Distribution of Iris Features by Species', fontsize=16)
# List of features to plot
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
# Create histograms for each feature
for i, feature in enumerate(features):
row = i // 2
col = i % 2
sns.histplot(data=iris, x=feature, hue='species', kde=True, ax=axes[row, col])
axes[row, col].set_title(f'{feature.replace("_", " ").title()} Distribution')
plt.tight_layout()
plt.show()
Correlation Matrix
Let's create a correlation heatmap to understand feature relationships ?
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the Iris dataset
iris = sns.load_dataset('iris')
# Calculate correlation matrix for numeric features
features = iris.drop('species', axis=1)
correlation_matrix = features.corr()
# Create heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
square=True, linewidths=0.5)
plt.title('Iris Dataset - Feature Correlation Matrix')
plt.tight_layout()
plt.show()
print("Correlation Matrix:")
print(correlation_matrix)
Correlation Matrix:
sepal_length sepal_width petal_length petal_width
sepal_length 1.000000 -0.117570 0.871754 0.817941
sepal_width -0.117570 1.000000 -0.428440 -0.366126
petal_length 0.871754 -0.428440 1.000000 0.962865
petal_width 0.817941 -0.366126 0.962865 1.000000
Key Insights
From our visualizations, we can observe several important patterns ?
Species Separation: The three Iris species show distinct clustering patterns, especially in petal measurements
Feature Correlation: Petal length and petal width are highly correlated (0.96)
Setosa Distinction: Setosa species is clearly separable from the other two species
Versicolor vs Virginica: These two species show some overlap but can still be distinguished
Conclusion
This article demonstrated how to effectively visualize the Iris dataset using Seaborn and Matplotlib. The combination of pairplots, histograms, and correlation heatmaps provides comprehensive insights into feature relationships and species characteristics. These visualization techniques are essential for exploratory data analysis and understanding dataset patterns before applying machine learning algorithms.
---