Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can scikit-learn library be used to load data in Python?
Scikit-learn, commonly known as sklearn, is an open-source library in Python that provides tools for implementing machine learning algorithms. This includes classification, regression, clustering, dimensionality reduction, and much more with the help of a powerful and stable interface. The library is built on top of NumPy, SciPy, and Matplotlib.
Scikit-learn comes with several built-in datasets that are perfect for learning and experimenting with machine learning algorithms. Let's explore how to load and examine data using sklearn ?
Loading the Iris Dataset
The Iris dataset is one of the most popular datasets in machine learning. It contains measurements of iris flowers from three different species ?
from sklearn.datasets import load_iris
# Load the iris dataset
my_data = load_iris()
# Separate features and target values
X = my_data.data
y = my_data.target
# Get feature and target names
feature_names = my_data.feature_names
target_names = my_data.target_names
print("Feature names:", feature_names)
print("Target names:", target_names)
print("\nFirst 8 rows of the dataset:")
print(X[:8])
Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Target names: ['setosa' 'versicolor' 'virginica'] First 8 rows of the dataset: [[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2] [5.4 3.9 1.7 0.4] [4.6 3.4 1.4 0.3] [5. 3.4 1.5 0.2]]
Understanding Dataset Structure
Let's examine the dataset structure and dimensions ?
from sklearn.datasets import load_iris
my_data = load_iris()
print("Dataset shape:", my_data.data.shape)
print("Number of features:", len(my_data.feature_names))
print("Number of classes:", len(my_data.target_names))
print("Dataset description:")
print(my_data.DESCR[:200] + "...")
Dataset shape: (150, 4)
Number of features: 4
Number of classes: 3
Dataset description:
.. _iris_dataset:
Iris plants dataset
--------------------
**Data Set Characteristics:**
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric...
Other Built-in Datasets
Scikit-learn provides many other datasets for different types of machine learning problems ?
from sklearn.datasets import load_boston, load_wine, load_digits
# Load different datasets
wine_data = load_wine()
digits_data = load_digits()
print("Wine dataset shape:", wine_data.data.shape)
print("Wine features:", len(wine_data.feature_names))
print("\nDigits dataset shape:", digits_data.data.shape)
print("Digits classes:", len(digits_data.target_names))
Wine dataset shape: (178, 13) Wine features: 13 Digits dataset shape: (1797, 64) Digits classes: 10
Key Components of Dataset Objects
| Attribute | Description | Example |
|---|---|---|
data |
Feature matrix (X) | my_data.data |
target |
Target values (y) | my_data.target |
feature_names |
Names of features | my_data.feature_names |
target_names |
Names of target classes | my_data.target_names |
DESCR |
Dataset description | my_data.DESCR |
Conclusion
Scikit-learn's built-in datasets provide an easy way to start experimenting with machine learning algorithms. Use load_iris(), load_wine(), or other dataset functions to quickly access structured data with features, targets, and descriptions ready for analysis.
