- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How To Convert Sklearn Dataset To Pandas Dataframe in Python?
Scikit−learn (sklearn) is one of the most popular machine learning libraries for Python. It provides a range of efficient tools for machine learning and statistical modelling, including a variety of datasets. These datasets are provided in the form of numpy arrays, which can be difficult to work with for certain tasks, such as exploratory data analysis.
Pandas is a popular data manipulation library that provides powerful tools for data analysis and manipulation. It provides data structures for efficiently storing and manipulating large datasets, and provides a wide range of tools for data cleaning, transformation, and analysis.
Below are the two approaches with which we can convert a sklearn dataset to pandas dataframe.
Converting sklearn Bunch object to pandas DataFrame: In this approach, we will convert the sklearn Bunch object directly to a pandas dataframe using the pd.DataFrame() method.
Using load_iris() method to load iris dataset into pandas DataFrame:In this approach, we will load the iris dataset using the load_iris() method provided by sklearn and then convert the data into a pandas dataframe.
Now that we are aware of both the approaches let's make use of them with the help of examples.
Using sklearn Bunch object
Consider the code shown below.
Example
from sklearn.datasets import load_iris import pandas as pd # Load the iris dataset from sklearn iris = load_iris() # Convert the iris dataset to a pandas dataframe df = pd.DataFrame(iris.data, columns=iris.feature_names) # Add the target variable to the dataframe df['target'] = iris.target # Print the first 5 rows of the dataframe print(df.head())
Explanation
First, we import the load_iris function from the sklearn.datasets module and the pandas library.
Then, we load the iris dataset into the iris variable using the load_iris() function.
We create a pandas dataframe df using the iris data and feature names. Here, we pass iris.data as the data and iris.feature_names as the columns parameter in the pd.DataFrame() method.
Next, we add the target variable to the pandas dataframe using iris.target and assign it to a new column target in the dataframe df.
Finally, we print the first 5 rows of the pandas dataframe df using the head() method.
Output
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target 0 5.1 3.5 1.4 0.2 0 1 4.9 3.0 1.4 0.2 0 2 4.7 3.2 1.3 0.2 0 3 4.6 3.1 1.5 0.2 0 4 5.0 3.6 1.4 0.2 0
Using load_iris() method
Consider the code shown below.
Example
from sklearn.datasets import load_iris import pandas as pd # Load the iris dataset from sklearn iris = load_iris() # Convert the dataset to a list of tuples data = [tuple(row) for row in iris.data] # Create a Pandas DataFrame from the list of tuples df = pd.DataFrame.from_records(data, columns=iris.feature_names) # Add the target variable to the DataFrame df['target'] = iris.target # Show the first five rows of the DataFrame print(df.head())
Explanation
Import the required libraries: We first import the necessary libraries for this approach, including pandas, numpy, and sklearn.
Load the dataset using the load_boston function from the sklearn.datasets module: We use the load_boston function to load the Boston Housing dataset into a variable called boston.
Convert the data into a Pandas dataframe: We convert the data into a Pandas dataframe using the pd.DataFrame() function.
Add the feature names as column names: We set the column names of the dataframe using the feature_names attribute of the boston dataset.
Add the target variable to the dataframe: We add the target variable to the dataframe by creating a new column called "PRICE" and setting its values to the target variable in the boston dataset.
Display the first few rows of the dataframe: We use the head() function to display the first few rows of the newly created Pandas dataframe.
Output
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target 0 5.1 3.5 1.4 0.2 0 1 4.9 3.0 1.4 0.2 0 2 4.7 3.2 1.3 0.2 0 3 4.6 3.1 1.5 0.2 0 4 5.0 3.6 1.4 0.2 0
Conclusion
In conclusion, converting a Sklearn dataset to a Pandas dataframe is a simple process that can be done in multiple ways. Whether you choose to use the Sklearn built−in method or the Pandas method, the resulting Pandas dataframe can be easily manipulated and analysed using various data science libraries in Python.