Difference between Shallow Copy vs Deep Copy in Pandas Dataframe


One of the most useful data structures in Pandas is the Pandas DataFrame which is a 2-Dimensional table-like structure that contains rows and columns to store data. It allows users to store and manipulate the data, very similar to a spreadsheet or SQL table.

It also provides a serial or linear data structure which is called the 1-Dimensional labelled array that can hold elements of any data type.

Shallow Copy

A shallow copy, as the name suggests, creates a new DataFrame object that references the original data. In other words, a shallow copy points to the same memory location as the original DataFrame. Any modifications made to the shallow copy will reflect in the original DataFrame and vice versa. This behaviour is due to the shared references between the original and the copied object.

Syntax

pandas.DataFrame.copy(deep=False)

If 'deep=False', a shallow copy of the DataFrame is created but the data and index labels are not copied, instead, both the original and the new DataFrame will refer the same data and index labels.

Deep Copy

A deep copy refers to developing a very independent copy of a dataframe, which includes all its data and metadata. In other phrases, a deep copy creates a brand new dataframe object with its personal memory space, become independent from the original dataframe.

Syntax

pandas.DataFrame.copy(deep=True)

The parameter “deep”: is optional and its default value is set to True. If 'deep=True', a deep copy of the DataFrame is created. So, we infer that a new DataFrame object is created and all the data and index labels are copied from the original DataFrame to the new one.

Example

In this code, we will create a dataframe and make a deep copy and shallow copy then modify the three dataframes with some operations and demonstrate the different changes in the original shallow deep copy dataframes.

Algorithm

  • Import the pandas library.

  • Define a dictionary containing the DataFrame's data.

  • Create a DataFrame df using the data and the pd.DataFrame() method.

  • Using the copy() function with the arguments deep=False and deep=True, make a shallow and deep duplicate of the original dataframe.

  • To see the various changes, change any of the required values in each dataframe.

  • Print the original dataframe, the shallow and deep copy.

  • We display the ID of the original DataFrame and its deep and shallow copy.

Example

import pandas as pd

# Create a DataFrame
data = {'Name': ['Rahul', 'Priya', 'Amit'],
    'Age': [25, 28, 22],
    'City': ['Mumbai', 'Delhi', 'Kolkata']}
df = pd.DataFrame(data)

# Shallow copy
shallow_copy = df.copy(deep=False)

# Deep copy
deep_copy = df.copy(deep=True)

# Modify a value in the original DataFrame
df.loc[0, 'Age'] = 30
shallow_copy.loc[1, 'City'] = 'Chennai'
deep_copy.loc[0, 'Age'] = 85

# Print the original DataFrame and its ID
print("Original DataFrame:\n", df)
print("Shallow Copy:\n", shallow_copy)
print("Deep Copy:\n", deep_copy)

print()
print("Original DataFrame ID:", id(df))
print("Shallow Copy ID:", id(shallow_copy))
print("Deep Copy ID:", id(deep_copy))

Output

   Original DataFrame:
   Name  Age   City
0  Rahul   30   Mumbai
1  Priya   28  Chennai
2   Amit   22  Kolkata
Shallow Copy:
   Name  Age   City
0  Rahul   30   Mumbai
1  Priya   28  Chennai
2   Amit   22  Kolkata
Deep Copy:
   Name  Age   City
0  Rahul   85   Mumbai
1  Priya   28   Delhi
2   Amit   22  Kolkata

Original DataFrame ID: 140268239802704
Shallow Copy ID: 140269600767952
Deep Copy ID: 140269600767904

Here, the original DataFrame is modified by changing the age of the first row from 25 to 30 this is also reflected in the shallow copy. The shallow copy's data is not changed and hence is independent of the original DataFrame.

Also when the city of the second row is changed to "Chennai", it affects both the shallow copy and the original DataFrame. The deep copy, on the other hand, is completely independent, so when the age of the first row is changed to 85, it does not affect the original DataFrame.

The IDs show that the original DataFrame and its copies are distinct objects having different ids but the shallow copy does share the memory space with the original DataFrame for most data elements, except for some metadata.

Hence we infer that, the object that is deeply copied is a completely new object now, whereas the shallow copied object is just another alias pointer pointing to the original dataframe.

Example

The following code demonstrates the concept of copying a DataFrame of countries and its population in pandas and showcases the difference between shallow copy and deep copy when changes are made in the original and its shallow and deep copy.

Algorithm

  • Import the pandas library.

  • Create a dictionary data with 'Country' and 'Population (Millions)' as keys and corresponding values.

  • Create a DataFrame df_original using the dictionary data.

  • Create a shallow copy of df_original and assign it to df_shallow_copy using the copy() method with deep=False.

  • Create a deep copy of df_original and assign it to df_deep_copy using the copy() method with deep=True.

  • Modify the shallow copy and the deep copy by changing the values in specific rows using loc[].

  • Add a new row to the original DataFrame df_original and the shallow copy DataFrame df_shallow_copy using the append() method.

  • Print the original dataframe and its shallow and deep copy.

import pandas as pd

# Create a DataFrame
data = {'Country': ['USA', 'Germany', 'Japan'],
    'Population (Millions)': [328, 83, 126]}
df_original = pd.DataFrame(data)

# Shallow copy
df_shallow_copy = df_original.copy(deep=False)

# Deep copy
df_deep_copy = df_original.copy(deep=True)

# Modify the shallow copy 
df_shallow_copy.loc[0, 'Country'] = 'United States Of America'
df_shallow_copy.loc[1, 'Population (Millions)'] = 82

# Modify the deep copy 
df_deep_copy.loc[2, 'Country'] = 'India'
df_deep_copy.loc[2, 'Population (Millions)'] = 1400

# Add a new row to the original DataFrame
new_row = {'Country': 'Canada', 'Population (Millions)': 38}
df_original = df_original.append(new_row, ignore_index=True)

# Print the original DataFrame
print("Original DataFrame:")
print(df_original)

# Print the shallow copy
print("\nShallow Copy:")
print(df_shallow_copy)

# Print the deep copy
print("\nDeep Copy:")
print(df_deep_copy)

# Add a new row to the shallow copy DataFrame
new_row_shallow = {'Country': 'Australia', 'Population (Millions)': 25}
df_shallow_copy = df_shallow_copy.append(new_row_shallow, ignore_index=True)

# Print the modified shallow copy DataFrame
print("\nModified Shallow Copy:")
print(df_shallow_copy)

Output

Original DataFrame:
      Country  Population (Millions)
0  United States Of America         328
1         Germany         82
2         Japan         126
3         Canada         38

Shallow Copy:
     Country  Population (Millions)
0  United States Of America         328
1         Germany         82
2         Japan         126

Deep Copy:
     Country  Population (Millions)
0   USA         328
1  Germany         83
2   India         1400

Modified Shallow Copy:
     Country  Population (Millions)
0  United States Of America         328
1         Germany         82
2         Japan         126
3        Australia         25

The 'Country' value of the first row was modified from 'USA' to 'United States Of America' and The 'Population (Millions)' value of the second row was modified from 83 to 82 in the shallow copy which is reflected both in shallow copy as well as the original dataframe whereas the change of country name from japan to India and its population in deep copy did not affect the original DataFrame.

The newly added rows in both the original and its shallow copy are affected only by their respective dataframes as adding a new object in copy is personal and does not affect the original dataframe and vice versa.

Differences between shallow copy and deep copy

Shallow Copy Deep Copy
Definition Creates a new object with references to the same data as the original object. Creates a completely independent copy with its own data and metadata.
Data Sharing Shares data between the original and copied objects. Does not share data with the original object. Does not share data with the original object.
Memory Space Shares memory space with the original object. Has its own memory space separate from the original object.
Modifiability Changes made to the copy can affect the original object and vice versa But adding a new item that is personal to the dataframe doesn't reflect on the original dataframe. Changes made to the copy do not affect the original object, and vice versa. Similarly, adding a new item that is personal to the dataframe doesn't reflect on the original dataframe.
Performance Is Faster and requires less memory as it avoids duplicating the data. Slower and requires more memory due to duplicating the data.

Conclusion

A shallow Copy is suitable when we want to create a new DataFrame that shares the same memory space as the original DataFrame. It is Efficient when working with large datasets since it avoids unnecessary memory duplication. A deep Copy is recommended when we need to create an independent copy of the DataFrame.

Updated on: 10-Aug-2023

132 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements