Article Categories

Selected Reading

How can scikit-learn library be used to load data in Python?

Python Server Side Programming Programming

Scikit-learn, commonly known as sklearn, is an open-source library in Python that provides tools for implementing machine learning algorithms. This includes classification, regression, clustering, dimensionality reduction, and much more with the help of a powerful and stable interface. The library is built on top of NumPy, SciPy, and Matplotlib.

Scikit-learn comes with several built-in datasets that are perfect for learning and experimenting with machine learning algorithms. Let's explore how to load and examine data using sklearn ?

Loading the Iris Dataset

The Iris dataset is one of the most popular datasets in machine learning. It contains measurements of iris flowers from three different species ?

from sklearn.datasets import load_iris

# Load the iris dataset
my_data = load_iris()

# Separate features and target values
X = my_data.data
y = my_data.target

# Get feature and target names
feature_names = my_data.feature_names
target_names = my_data.target_names

print("Feature names:", feature_names)
print("Target names:", target_names)
print("\nFirst 8 rows of the dataset:")
print(X[:8])

Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']

First 8 rows of the dataset:
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]]

Understanding Dataset Structure

Let's examine the dataset structure and dimensions ?

from sklearn.datasets import load_iris

my_data = load_iris()

print("Dataset shape:", my_data.data.shape)
print("Number of features:", len(my_data.feature_names))
print("Number of classes:", len(my_data.target_names))
print("Dataset description:")
print(my_data.DESCR[:200] + "...")

Dataset shape: (150, 4)
Number of features: 4
Number of classes: 3
Dataset description:
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric...

Other Built-in Datasets

Scikit-learn provides many other datasets for different types of machine learning problems ?

from sklearn.datasets import load_boston, load_wine, load_digits

# Load different datasets
wine_data = load_wine()
digits_data = load_digits()

print("Wine dataset shape:", wine_data.data.shape)
print("Wine features:", len(wine_data.feature_names))

print("\nDigits dataset shape:", digits_data.data.shape)
print("Digits classes:", len(digits_data.target_names))

Wine dataset shape: (178, 13)
Wine features: 13

Digits dataset shape: (1797, 64)
Digits classes: 10

Key Components of Dataset Objects

Attribute	Description	Example
`data`	Feature matrix (X)	`my_data.data`
`target`	Target values (y)	`my_data.target`
`feature_names`	Names of features	`my_data.feature_names`
`target_names`	Names of target classes	`my_data.target_names`
`DESCR`	Dataset description	`my_data.DESCR`

Conclusion

Scikit-learn's built-in datasets provide an easy way to start experimenting with machine learning algorithms. Use load_iris(), load_wine(), or other dataset functions to quickly access structured data with features, targets, and descriptions ready for analysis.

AmitDiwan

Updated on: 2026-03-25T13:18:56+05:30

504 Views

Previous Next