How to get dictionary-like objects from dataset using Python Scikit-learn?

Scikit-learn datasets are returned as dictionary-like objects called Bunch objects. These objects contain structured data with several useful attributes that provide access to the dataset features, targets, and metadata.

Dictionary-like Object Attributes

Scikit-learn dataset objects contain the following key attributes ?

  • data ? The feature matrix containing the data to learn.

  • target ? The target values for regression or classification.

  • DESCR ? Complete description of the dataset including characteristics.

  • target_names ? Names of the target variable(s).

  • feature_names ? Names of the feature columns.

  • frame ? Optional pandas DataFrame (when as_frame=True).

Example 1: Accessing Dataset Keys

Let's load the California Housing dataset and explore its structure ?

from sklearn.datasets import fetch_california_housing

# Loading the California housing dataset
housing = fetch_california_housing()

# Print available keys in the dataset object
print(housing.keys())
dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

Example 2: Exploring Dataset Attributes

Now let's examine the details of each attribute ?

from sklearn.datasets import fetch_california_housing

# Loading the California housing dataset
housing = fetch_california_housing()

# Shape of data and target
print("Data shape:", housing.data.shape)
print("Target shape:", housing.target.shape)
print()

# Feature and target names
print("Feature names:", housing.feature_names)
print()
print("Target names:", housing.target_names)
print()

# First few lines of description
print("Dataset description (first 300 characters):")
print(housing.DESCR[:300] + "...")
Data shape: (20640, 8)
Target shape: (20640,)

Feature names: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']

Target names: ['MedHouseVal']

Dataset description (first 300 characters):
.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 20640

    :Number of Attributes: 8 numeric, predictive attributes and the target

    :Attribute Information:
        - MedInc        median income in block...

Example 3: Working with DataFrame Format

You can load the dataset as a pandas DataFrame using the as_frame=True parameter ?

from sklearn.datasets import fetch_california_housing

# Loading dataset as DataFrame
housing = fetch_california_housing(as_frame=True)

# Display DataFrame info
print("DataFrame information:")
housing.frame.info()
print()

# Show first few rows
print("First 3 rows:")
print(housing.frame.head(3))
DataFrame information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   MedInc       20640 non-null  float64
 1   HouseAge     20640 non-null  float64
 2   AveRooms     20640 non-null  float64
 3   AveBedrms    20640 non-null  float64
 4   Population   20640 non-null  float64
 5   AveOccup     20640 non-null  float64
 6   Latitude     20640 non-null  float64
 7   Longitude    20640 non-null  float64
 8   MedHouseVal  20640 non-null  float64
dtypes: float64(9)
memory usage: 1.4 MB

First 3 rows:
      MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0     8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1     8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2     7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   

   Longitude  MedHouseVal  
0    -122.23        4.526  
1    -122.22        3.585  
2    -122.24        3.521  

Common Use Cases

Dictionary-like objects make it easy to ?

  • Separate features (data) and targets (target) for machine learning models

  • Understand dataset characteristics through DESCR

  • Create meaningful column names using feature_names

  • Work with pandas DataFrames using as_frame=True

Conclusion

Scikit-learn's dictionary-like dataset objects provide a consistent interface for accessing features, targets, and metadata. Use as_frame=True when you prefer working with pandas DataFrames for data exploration and preprocessing.

Updated on: 2026-03-26T22:13:15+05:30

415 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements