How can a new column be created to a dataframe using the already present columns in Python?

PythonServer Side ProgrammingProgramming

Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns. It can be visualized as an SQL data table or an excel sheet representation.

It can be created using the following constructor −

pd.Dataframe(data, index, columns, dtype, copy)

We previously saw a method in which a new column was created as a Series data structure. This was indexed to the original dataframe and hence got added to the dataframe.

Let us use how we can create a column using the already present columns of the dataframe. This is useful when we need to perform some computation on the already present columns and store their result in a new column −

Example

 Live Demo

import pandas as pd
my_data = {'ab' : pd.Series([1, 8, 7], index=['a', 'b', 'c']),
'cd' : pd.Series([1, 2, 0, 9], index=['a', 'b', 'c', 'd']),
'ef' :pd.Series([56, 78, 32],index=['a','b','c'])}
my_df = pd.DataFrame(my_data)
print("The dataframe is :")
print(my_df)
my_df['gh'] = my_df['ab'] + my_df['ef']
print("After adding column 0 and 2 to the dataframe, :")
print(my_df)

Output

The dataframe is :
   ab   cd  ef
a  1.0  1  56.0
b  8.0  2  78.0
c  7.0  0  32.0
d NaN 9 NaN
After adding column 0 and 2 to the dataframe, :
   ab   cd  ef    gh
a  1.0  1   56.0  57.0
b  8.0  2   78.0  86.0
c  7.0  0   32.0  39.0
d  NaN  9   NaN   NaN

Explanation

  • The required libraries are imported, and given alias names for ease of use.

  • Dictionary values consisting of key and value is created, wherein a value is actually a series data structure.

  • Multiple such dictionary values are created.

  • This dictionary is later passed as a parameter to the ‘Dataframe’ function present in the ‘pandas’ library

  • The dataframe is created by passing the dictionary as parameters to it.

  • A new column is indexed to the dataframe, and the 0th and 2nd column are added to create this new column.

  • The dataframe is printed on the console.

Note − The word ‘NaN’ refers to ‘Not a Number’, which means that specific [row,col] value doesn’t have any valid entry.

raja
Published on 10-Dec-2020 13:10:38
Advertisements