Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to get dictionary-like objects from dataset using Python Scikit-learn?
Scikit-learn datasets are returned as dictionary-like objects called Bunch objects. These objects contain structured data with several useful attributes that provide access to the dataset features, targets, and metadata.
Dictionary-like Object Attributes
Scikit-learn dataset objects contain the following key attributes ?
data ? The feature matrix containing the data to learn.
target ? The target values for regression or classification.
DESCR ? Complete description of the dataset including characteristics.
target_names ? Names of the target variable(s).
feature_names ? Names of the feature columns.
frame ? Optional pandas DataFrame (when
as_frame=True).
Example 1: Accessing Dataset Keys
Let's load the California Housing dataset and explore its structure ?
from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing() # Print available keys in the dataset object print(housing.keys())
dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])
Example 2: Exploring Dataset Attributes
Now let's examine the details of each attribute ?
from sklearn.datasets import fetch_california_housing
# Loading the California housing dataset
housing = fetch_california_housing()
# Shape of data and target
print("Data shape:", housing.data.shape)
print("Target shape:", housing.target.shape)
print()
# Feature and target names
print("Feature names:", housing.feature_names)
print()
print("Target names:", housing.target_names)
print()
# First few lines of description
print("Dataset description (first 300 characters):")
print(housing.DESCR[:300] + "...")
Data shape: (20640, 8)
Target shape: (20640,)
Feature names: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
Target names: ['MedHouseVal']
Dataset description (first 300 characters):
.. _california_housing_dataset:
California Housing dataset
--------------------------
**Data Set Characteristics:**
:Number of Instances: 20640
:Number of Attributes: 8 numeric, predictive attributes and the target
:Attribute Information:
- MedInc median income in block...
Example 3: Working with DataFrame Format
You can load the dataset as a pandas DataFrame using the as_frame=True parameter ?
from sklearn.datasets import fetch_california_housing
# Loading dataset as DataFrame
housing = fetch_california_housing(as_frame=True)
# Display DataFrame info
print("DataFrame information:")
housing.frame.info()
print()
# Show first few rows
print("First 3 rows:")
print(housing.frame.head(3))
DataFrame information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MedInc 20640 non-null float64
1 HouseAge 20640 non-null float64
2 AveRooms 20640 non-null float64
3 AveBedrms 20640 non-null float64
4 Population 20640 non-null float64
5 AveOccup 20640 non-null float64
6 Latitude 20640 non-null float64
7 Longitude 20640 non-null float64
8 MedHouseVal 20640 non-null float64
dtypes: float64(9)
memory usage: 1.4 MB
First 3 rows:
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85
Longitude MedHouseVal
0 -122.23 4.526
1 -122.22 3.585
2 -122.24 3.521
Common Use Cases
Dictionary-like objects make it easy to ?
Separate features (
data) and targets (target) for machine learning modelsUnderstand dataset characteristics through
DESCRCreate meaningful column names using
feature_namesWork with pandas DataFrames using
as_frame=True
Conclusion
Scikit-learn's dictionary-like dataset objects provide a consistent interface for accessing features, targets, and metadata. Use as_frame=True when you prefer working with pandas DataFrames for data exploration and preprocessing.
