Create a Pandas DataFrame from lists


A Pandas DataFrame is a two-dimensional table with rows and columns that are immutable, meaning they cannot be changed once they are created. Creating a DataFrame from scratch with lists is a common task in data science and information technology. A list is an ordered collection of elements, and it is one of the most commonly used data structures in Python. A list can store any type of values such as numbers, strings and boolean values.

In this document, I will provide a detailed explanation of how to create Pandas DataFrame from lists with real-world examples using step-by-step instructions, code snippets and explanations of each subsection.

What are the key differences between dataframe and list?

A list is a basic data structure in Python that can hold a collection of elements of any data type, while a dataframe is a two-dimensional table-like structure, similar to a spreadsheet or SQL table, that stores data in rows and columns. Here are some key differences between a dataframe and a list −

  • Structure − A list is a simple, one-dimensional collection of values, while a dataframe is a two-dimensional table-like structure that has rows and columns.

  • Data types − A list can hold elements of any data type, including numbers, strings, and even other lists, while a dataframe is designed to hold data in a tabular format, with columns of specific data types, such as integers, floats, and strings.

  • Size − A list can hold any number of elements, while a dataframe is typically designed to hold a large amount of data, with potentially millions of rows and columns.

  • Operations − A list supports basic operations such as indexing, slicing, and appending, while a dataframe supports more complex operations such as filtering, joining, and grouping.

  • Data manipulation − A list provides basic functionality for data manipulation, while a dataframe provides powerful tools for data manipulation, such as filtering, sorting, and aggregating data based on specific criteria.

Prerequisites

Before we dive into the task few things should is expected to be installed onto your system −

List of recommended settings −

  • pip install pandas, bokeh

  • It is expected that the user will have access to any standalone IDE such as VS-Code, PyCharm, Atom or Sublime text.

  • Even online Python compilers can also be used such as Kaggle.com, Google Cloud platform or any other will do.

  • Updated version of Python. At the time of writing the article I have used 3.10.9 version.

  • Knowledge of the use of Jupyter notebook.

  • Knowledge and application of virtual environment would be beneficial but not required.

  • It is also expected that the person will have a good understanding of statistics and mathematics.

Steps required

Importing Libraries

To create a DataFrame in Pandas, we need to import the Pandas library. The following code is used to import the Pandas library −

import pandas as pd

Creating Lists

Before we can create a DataFrame using lists, we first need to create lists to store the data. In this section, I will show you how to create lists with real-world examples using simple data.

Creating a List of Names

names = ['John', 'Mary', 'Peter', 'Jane', 'Daniel']

In the code snippet above, we created a list called `names` that contains five string values representing the names of individuals.

Creating a List of Ages

ages = [32, 25, 41, 29, 36]

In the code snippet above, we created a list called `ages` that contains five integer values representing the ages of individuals.

Creating a List of Boolean Values

current_status = [True, False, True, False, True]

In the code snippet above, we created a list called `current_status` that contains five Boolean values representing the current status of individuals.

Creating a DataFrame from Lists

Once we have the lists containing the data, we can use the `pd.DataFrame()` function to create a DataFrame in Pandas. We can pass the lists as arguments to the `pd.DataFrame()` function. The following code is used to create a DataFrame from lists −

df = pd.DataFrame(list(zip(names, ages, current_status)), columns=['Name', 'Age', 'Current_Status'])

In the code snippet above, we first created a list of tuples using the `zip()` function. The `zip()` function combines the lists into a single list of tuples. We then passed this list of tuples as the first argument to the `pd.DataFrame()` function.

The second argument to the `pd.DataFrame()` function is a list of column names for the DataFrame. In this case, we used `columns=['Name', 'Age', 'Current_Status']` to specify the column names as `Name`, `Age` and `Current_Status`.

Viewing the DataFrame

After creating the DataFrame, we can use the `.head()` function to view the first few rows of the DataFrame. The following code is used to display the first few rows of the DataFrame −

print(df.head())

In the code snippet above, we used the `.head()` function to display the first few rows of the DataFrame.

Output

    Name   Age  Current_Status
0   John   32            True
1   Mary   25           False
2   Peter  41            True
3   Jane   29           False
4   Daniel 36            True

In the above output we can the creation of a dataframe from list.

Conclusion

In this document, I provided a step-by-step guide on how to create a Pandas DataFrame from lists. I demonstrated how to import the Pandas library, create lists, and how to create a DataFrame using the `pd.DataFrame()` function. Additionally, I showed how to view the first few rows of the DataFrame using the `.head()` function. By following these instructions, you should now be able to create Pandas DataFrame from lists with real-world examples.

Updated on: 25-Apr-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements