Article Categories

Selected Reading

How To Convert Sklearn Dataset To Pandas Dataframe in Python?

Python Pandas Programming

Scikit-learn (sklearn) is one of the most popular machine learning libraries for Python. It provides a range of efficient tools for machine learning and statistical modeling, including a variety of datasets. These datasets are provided in the form of numpy arrays, which can be difficult to work with for certain tasks, such as exploratory data analysis.

Pandas is a popular data manipulation library that provides powerful tools for data analysis and manipulation. It provides data structures for efficiently storing and manipulating large datasets, and provides a wide range of tools for data cleaning, transformation, and analysis.

Below are the two main approaches to convert a sklearn dataset to pandas dataframe ?

Using pd.DataFrame() directly ? Convert the sklearn Bunch object directly to a pandas dataframe using the pd.DataFrame() method.
Using pd.DataFrame.from_records() ? Load the dataset and convert data into records before creating a pandas dataframe.

Method 1: Using pd.DataFrame() Directly

This is the most straightforward approach to convert sklearn datasets to pandas DataFrames ?

Example

from sklearn.datasets import load_iris
import pandas as pd

# Load the iris dataset from sklearn
iris = load_iris()

# Convert the iris dataset to a pandas dataframe
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# Add the target variable to the dataframe
df['target'] = iris.target

# Print the first 5 rows of the dataframe
print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  target
0                5.1               3.5                1.4               0.2       0
1                4.9               3.0                1.4               0.2       0
2                4.7               3.2                1.3               0.2       0
3                4.6               3.1                1.5               0.2       0
4                5.0               3.6                1.4               0.2       0

How It Works

Load the iris dataset using load_iris() function
Create a pandas dataframe using iris.data as the data and iris.feature_names as column names
Add the target variable using iris.target as a new column
Display the first 5 rows using the head() method

Method 2: Using pd.DataFrame.from_records()

This method converts data to records first, then creates the DataFrame ?

Example

from sklearn.datasets import load_iris
import pandas as pd

# Load the iris dataset from sklearn
iris = load_iris()

# Convert the dataset to a list of tuples
data = [tuple(row) for row in iris.data]

# Create a Pandas DataFrame from the list of tuples
df = pd.DataFrame.from_records(data, columns=iris.feature_names)

# Add the target variable to the DataFrame
df['target'] = iris.target

# Show the first five rows of the DataFrame
print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  target
0                5.1               3.5                1.4               0.2       0
1                4.9               3.0                1.4               0.2       0
2                4.7               3.2                1.3               0.2       0
3                4.6               3.1                1.5               0.2       0
4                5.0               3.6                1.4               0.2       0

Using Different Sklearn Datasets

You can apply the same approach to other sklearn datasets ?

from sklearn.datasets import load_wine, load_breast_cancer
import pandas as pd

# Load wine dataset
wine = load_wine()
wine_df = pd.DataFrame(wine.data, columns=wine.feature_names)
wine_df['target'] = wine.target

print("Wine dataset shape:", wine_df.shape)
print("Wine columns:", wine_df.columns.tolist()[:5])  # Show first 5 columns

Wine dataset shape: (178, 14)
Wine columns: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium']

Comparison

Method	Performance	Code Simplicity	Best For
`pd.DataFrame()`	Faster	Simple	Most use cases
`pd.DataFrame.from_records()`	Slower	More complex	When data needs preprocessing

Conclusion

Converting sklearn datasets to pandas DataFrames is straightforward using pd.DataFrame() directly. This approach preserves feature names and allows easy addition of target variables. Use pd.DataFrame.from_records() only when you need to preprocess the data first.

Mukul Latiyan

Updated on: 2026-03-27T11:03:36+05:30

7K+ Views

Previous Next

Article Categories

How To Convert Sklearn Dataset To Pandas Dataframe in Python?

Method 1: Using pd.DataFrame() Directly

Example

How It Works

Method 2: Using pd.DataFrame.from_records()

Example

Using Different Sklearn Datasets

Comparison

Conclusion

Learn More in Our Tutorials