How to transform Sklearn DIGITS dataset to 2 and 3-feature dataset in Python?

PythonScikit-learnServer Side ProgrammingProgramming

Sklearn DIGITS dataset has 64 features as each image of the digit is of size 8 by 8 pixels. We can use Principal Component Analysis (PCA) to transform Scikit-learn DIGITS dataset into new feature space with 2 features. Transforming 64 features dataset to 2-feature dataset will be a big reduction in the size of data and we’ll lose some useful information. It will also impact the classification accuracy of ML model.

Steps to Transform DIGITS Dataset to 2-feature Dataset

We can follow the below given steps to transform DIGITS dataset to 2-feature dataset using PCA −

  • First, import the necessary packages from scikit-learn. We need to import datasets and decomposition packages.

  • Load the DIGITS dataset.

  • Initialize principal component analysis (PCA) and apply fit() function to fit the data.

  • Transform the dataset to new dimensions i.e., 2-feature dataset.

Example

In the below example, we will use the above steps to transform the scikit-learn DIGITS dataset to 2-features with PCA.

# Importing the necessary packages from sklearn import datasets from sklearn import decomposition # Load DIGITS dataset DIGITS = datasets.load_digits() X_digits, Y_digits = DIGITS.data, DIGITS.target print('DIGITS Dataset Size: ', X_digits.shape, Y_digits.shape) # Initialize PCA and fit the data pca_2 = decomposition.PCA(n_components=2) pca_2.fit(X_digits) # Transforming DIGITS data to new dimensions(with 2 features) X_digits_pca2 = pca_2.transform(X_digits) # Printing new dataset print('New Dataset size after transformations: ', X_digits_pca2.shape)

Output

It will produce the following output −

DIGITS Dataset Size: (1797, 64) (1797,)
New Dataset size after transformations: (1797, 2)

Transform DIGITS dataset with 6 classes to 2-feature dataset

Sklearn DIGITS dataset has 64 features with 10 classes for 0-9 digits. We can use Principal Component Analysis (PCA) to transform DIGITS dataset with first 6 classes into new feature space with 2 features.

We can follow the below given steps to transform DIGITS dataset with first 6 classes to 2-feature dataset using PCA −

  • First, import the necessary packages from scikit-learn. We need to import datasets and decomposition packages.

  • Load the DIGITS dataset with 6 classes.

  • Initialize principal component analysis (PCA) and apply fit() function to fit the data.

  • Transform the dataset to new dimensions i.e., 2-feature dataset.

Example

In the below example, we will use the above steps to transform the scikit-learn DIGITS dataset with 6 classes to 2-features with PCA.

# Importing the necessary packages from sklearn import datasets from sklearn import decomposition # Load DIGITS dataset with 6 classes DIGITS = datasets.load_digits(n_class = 6) X_digits, Y_digits = DIGITS.data, DIGITS.target print('DIGITS Dataset Size: ', X_digits.shape, Y_digits.shape) # Initialize PCA and fit the data pca_2 = decomposition.PCA(n_components=2) pca_2.fit(X_digits) # Transforming DIGITS data to new dimensions(with 2 features) X_digits_pca2 = pca_2.transform(X_digits) # Printing new dataset print('New Dataset size after transformations: ', X_digits_pca2.shape)

Output

It will produce the following output −

DIGITS Dataset Size: (1083, 64) (1083,)
New Dataset size after transformations: (1083, 2)

Transform DIGITS dataset to 3-feature dataset using PCA

Sklearn DIGITS dataset has 64 features as each image of digit is of size 8 by 8 pixels. We can use Principal Component Analysis (PCA) to transform DIGITS dataset into new feature space with 3 features. Transforming 64 features dataset to 3-feature dataset will be a big reduction in the size of data and we’ll lose some useful information. It will also impact the classification accuracy of ML model.

We can follow the below given steps to transform DIGITS dataset to 3-feature dataset using PCA −

  • First, import the necessary packages from scikit-learn. We need to import datasets and decomposition packages.

  • Load the DIGITS dataset.

  • Initialize principal component analysis (PCA) and apply fit() function to fit the data.

  • Transform the dataset to new dimensions i.e., 3-feature dataset.

Example

In the below example, we will use the above steps to transform the scikit-learn DIGITS dataset to 3-features with PCA.

# Importing the necessary packages from sklearn import datasets from sklearn import decomposition # Load DIGITS dataset DIGITS = datasets.load_digits() X_digits, Y_digits = DIGITS.data, DIGITS.target print('DIGITS Dataset Size: ', X_digits.shape, Y_digits.shape) # Initialize PCA and fit the data pca_3 = decomposition.PCA(n_components=3) pca_3.fit(X_digits) # Transforming DIGITS data to new dimensions(with 3 features) X_digits_pca3 = pca_3.transform(X_digits) # Printing new dataset print('New Dataset size after transformations: ', X_digits_pca3.shape)

Output

It will produce the following output

DIGITS Dataset Size: (1797, 64) (1797,)
New Dataset size after transformations: (1797, 3)
raja
Updated on 04-Oct-2022 08:35:06

Advertisements