How can a new column be added to an existing dataframe in Python?

PythonServer Side ProgrammingProgramming

Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns.

It can be visualized as an SQL data table or an excel sheet representation. It can be created using the following constructor −

pd.Dataframe(data, index, columns, dtype, copy)

A new column can be added to a dataframe in different ways.

Let us see one of the ways, in which a new column is created by first forming a series data structure and passing this as an additional column to the existing dataframe.

Let us see the code in action −

Example

 Live Demo

import pandas as pd
my_data = {'ab' : pd.Series([1, 8, 7], index=['a', 'b', 'c']),
'cd' : pd.Series([1, 2, 0, 9], index=['a', 'b', 'c', 'd'])}
my_df = pd.DataFrame(my_data)
print("The dataframe is :")
print(my_df)
print ("Adding a new column to the dataframe by passing it as a Series structure :")
my_df['ef']=pd.Series([56, 78, 32],index=['a','b','c'])
print("After adding a new column to the dataframe, :")
print(my_df)

Output

The dataframe is :
   ab   cd
a  1.0  1
b  8.0  2
c  7.0  0
d  NaN  9
Adding a new column to the dataframe by passing it as a Series structure :
After adding a new column to the dataframe, :
    ab  cd  ef
a  1.0  1   56.0
b  8.0  2  78.0
c  7.0  0  32.0
d  NaN  9  NaN

Explanation

  • The required libraries are imported, and given alias names for ease of use.

  • A dictionary data structure is created, wherein a key-value pair is present in one dictionary.

  • In this way, multiple dictionaries are created and stored in a list.

  • The ‘value’ in a key value pair is actually a Series data structure.

  • The index is also a customized list of values.

  • This dictionary is later passed as a parameter to the ‘Dataframe’ function present in the ‘pandas’ library

  • The dataframe is created by passing the list of dictionary values as parameters to it.

  • Another new column is created and values are initialised in it.

  • This new column is indexed to the original dataframe.

  • This way, the new column gets bound to the dataframe.

  • The dataframe is printed on the console.

Note − The word ‘NaN’ refers to ‘Not a Number’, which means that specific [row,col] value doesn’t have any valid entry.

raja
Updated on 10-Dec-2020 12:55:27

Advertisements