Convert a NumPy array to Pandas dataframe with headers


Both pandas and NumPy are validly used open-source libraries in python. Numpy stands for Numerical Python. This is the core library for scientific computing. A Numpy array is a powerful N-dimensional array object which is in the form of rows and columns.

NumPy array
array([[1, 2], [3, 4]])

Pandas provide high-performance data manipulation and analysis tools in Python, it allows us to work with tabular data like spreadsheets, CSV, and SQL data. And it has data structures like DataFrame and Series that are mainly used for analyzing the data.

DataFrame is a 2-dimensional labeled data structure used to represent the data in rows and columns format. Data present in each column may have different data types.

DataFrame:
  Col1 Col2
0    a   i
1    b   j
2    c   k
3    d   l

In this example, we will convert a NumPy array to Pandas dataframe with headers.

Input Output Scenarios

Let’s see the input-output scenarios to understand how to convert a NumPy array to a Pandas dataframe.

Assuming we have a two-dimensional Numpy array with few values, and in the output, we will see a DataFrame with columns names.

Input numpy array:
[[1 2]
 [3 4]]

Output DataFrame:
   header1  header2
0        1        2
1        3        4

To create a Pandas DataFrame from a Numpy array with the headers, we can use the pandas DataFrame() method, by using the columns parameter we can specify the column headers while creating the dataframe object.

Using the DataFrame() method

The pandas.DataFrame() method is used to create a DataFrame object based on the given data. Following is the syntax –

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)

Where,

  • data − NumPy array, Iterable, dict, or DataFrame.

  • index − The row labels are specified using this parameter. The default value is 0 to n-1.

  • columns − The column labels/headers are specified using this parameter. by default, the range index is 0 to n-1.

  • dtype − None by default.

  • copy − Copying data from inputs, the default is None.

Example

In this example, we will create pandas data frame using the numpy array with headers. Initially, we will create a 2-d numpy array using the np.array() method then we will convert it to the pandas DataFrame.

import numpy as np
import pandas as pd

# Creating a 2 dimensional numpy array
numpy_array = np.array([[1, 2], [3, 4]])
print("Input numpy array:")
print(numpy_array)

# Convert NumPy array to DataFrame
df = pd.DataFrame(numpy_array, columns = ['header1', 'header2'])
print("Output DataFrame:")
print(df)

Output

Input numpy array:
[[1 2]
 [3 4]]

Output DataFrame:
   header1  header2
0        1        2
1        3        4

By assigning a list of names to the columns parameter of the DataFrame() method we can specify the headers.

Example

Here, we will specify the list of header names to the columns parameter of the DataFrame() method.

import numpy as np
import pandas as pd

# Creating a 2 dimensional numpy array
numpy_array = np.array(np.random.randn(10,3))
print("Input numpy array:")
print(numpy_array)

headers = ['Acol','Ccol','Bcol']

# Convert NumPy array to DataFrame
df = pd.DataFrame(numpy_array, columns = headers)
print("Output DataFrame:")
print(df)

Output

Input numpy array:
[[ 0.51863463 -1.04180497 -0.53410509]
 [-1.67632426 -1.05587564  1.26963293]
 [ 0.1904154   1.89355907 -0.7596976 ]
 [-1.20464873 -0.45258193 -0.17936747]
 [ 0.17513833  0.78481916 -1.52235579]
 [-1.38108854  0.28470621  0.52897571]
 [-0.62921794  0.95548506  0.03370699]
 [ 0.30533368 -0.09951884  0.38484346]
 [ 0.06951039  0.94497233  0.82353788]
 [ 0.82560537  2.10383935  0.52618909]]
Output DataFrame:
       Acol      Ccol      Bcol
0  0.518635 -1.041805 -0.534105
1 -1.676324 -1.055876  1.269633
2  0.190415  1.893559 -0.759698
3 -1.204649 -0.452582 -0.179367
4  0.175138  0.784819 -1.522356
5 -1.381089  0.284706  0.528976
6 -0.629218  0.955485  0.033707
7  0.305334 -0.099519  0.384843
8  0.069510  0.944972  0.823538
9  0.825605  2.103839  0.526189

Initially the numpy array is created by using the random numbers, then it is converted into a dataframe with column labels.

Example

In this example, we will create a dataframe by specifying the dictionary of sliced numpy array elements.

# importing packages
import numpy as np
import pandas as pd

# Creating a 2 dimensional numpy array
numpy_array = np.array([[5.8, 2.8], [6.0, 2.2]])
print("Input numpy array:")
print(numpy_array)

# Convert NumPy array to DataFrame
df = pd.DataFrame({'Column1': numpy_array[:, 0], 'Column2': numpy_array[:, 1]})
print("Output DataFrame:")
print(df)

Output

Input numpy array:
[[5.8 2.8]
 [6.  2.2]]
Output DataFrame:
   Column1  Column2
0      5.8      2.8
1      6.0      2.2

We have successfully created a pandas DataFrame from the numpy array with headers.

Updated on: 30-May-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements