How to get dictionary-like objects from dataset using Python Scikit-learn?


With the help of the Scikit-learn python library, we can get the dictionary-like objects of a dataset. Some of the interesting attributes of dictionary-like objects are as follows −

  • data − It represents the data to learn.

  • target − It represents the regression target.

  • DESCR − The description of the dataset.

  • target_names − It gives the target names on of the dataset.

  • feature_names − It gives the feature names from the dataset.

Example 1

In the example below we use the California Housing dataset to get its dictionary-like objects.

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing() # Print dictionary-like objects print(housing.keys())

Output

It will produce the following output −

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

Example 2

We can also get more details about these dictionary-like objects as follows −

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing print(housing.data.shape) print('\n') print(housing.target.shape) print('\n') print(housing.feature_names) print('\n') print(housing.target_names) print('\n') print(housing.DESCR)

Output

It will produce the following output −

(20640, 8)
(20640,)
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
['MedHouseVal']
.. _california_housing_dataset:
California Housing dataset
--------------------------
**Data Set Characteristics:**
   :Number of Instances: 20640
   :Number of Attributes: 8 numeric, predictive attributes and the target
   :Attribute Information:
      - MedInc median income in block group
      - HouseAge median house age in block group
      - AveRooms average number of rooms per household
      - AveBedrms average number of bedrooms per household
      - Population block group population
      - AveOccup average number of household members
      - Latitude block group latitude
      - Longitude block group longitude
   :Missing Attribute Values: None
Omitted due to length of the output…

Example 3

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing(as_frame=True) print(housing.frame.info())

Output

It will produce the following output −

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
#    Column       Non-Null Count    Dtype
---  ------       --------------    -----
 0   MedInc       20640 non-null   float64
 1   HouseAge     20640 non-null   float64
 2   AveRooms     20640 non-null   float64
 3   AveBedrms    20640 non-null   float64
 4   Population   20640 non-null   float64
 5   AveOccup     20640 non-null   float64
 6   Latitude     20640 non-null   float64
 7   Longitude    20640 non-null   float64
 8   MedHouseVal  20640 non-null   float64
dtypes: float64(9)
memory usage: 1.4 MB

Updated on: 04-Oct-2022

203 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements