Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Difference between Shallow Copy vs Deep Copy in Pandas Dataframe
One of the most useful data structures in Pandas is the DataFrame ? a 2-dimensional table-like structure containing rows and columns to store data. Understanding the difference between shallow and deep copies is crucial when manipulating DataFrames, as it affects how changes propagate between original and copied objects.
What is Shallow Copy?
A shallow copy creates a new DataFrame object that references the original data. The copied DataFrame points to the same memory location as the original DataFrame. Any modifications to the underlying data will affect both the original and shallow copy.
Syntax
df_shallow = df.copy(deep=False)
When deep=False, only the DataFrame structure is copied, but the underlying data remains shared between both objects.
What is Deep Copy?
A deep copy creates a completely independent copy of a DataFrame, including all its data and metadata. The new DataFrame has its own memory space and is entirely separate from the original.
Syntax
df_deep = df.copy(deep=True) # deep=True is default
The deep parameter defaults to True, creating an independent DataFrame with copied data and index labels.
Example: Comparing Shallow vs Deep Copy
Let's demonstrate the key differences with a practical example ?
import pandas as pd
# Create original DataFrame
data = {'Name': ['Rahul', 'Priya', 'Amit'],
'Age': [25, 28, 22],
'City': ['Mumbai', 'Delhi', 'Kolkata']}
df_original = pd.DataFrame(data)
# Create copies
df_shallow = df_original.copy(deep=False)
df_deep = df_original.copy(deep=True)
# Modify original DataFrame
df_original.loc[0, 'Age'] = 30
print("Original DataFrame:")
print(df_original)
print("\nShallow Copy:")
print(df_shallow)
print("\nDeep Copy:")
print(df_deep)
Original DataFrame: Name Age City 0 Rahul 30 Mumbai 1 Priya 28 Delhi 2 Amit 22 Kolkata Shallow Copy: Name Age City 0 Rahul 30 Mumbai 1 Priya 28 Delhi 2 Amit 22 Kolkata Deep Copy: Name Age City 0 Rahul 25 Mumbai 1 Priya 28 Delhi 2 Amit 22 Kolkata
Notice how modifying the original DataFrame affected the shallow copy but not the deep copy. This demonstrates that shallow copies share the underlying data.
Memory and Independence Example
Here's another example showing structural changes ?
import pandas as pd
# Create DataFrame
data = {'Country': ['USA', 'Germany', 'Japan'],
'Population': [328, 83, 126]}
df_original = pd.DataFrame(data)
# Create copies
df_shallow = df_original.copy(deep=False)
df_deep = df_original.copy(deep=True)
# Modify deep copy independently
df_deep.loc[2, 'Country'] = 'India'
df_deep.loc[2, 'Population'] = 1400
# Add new row to original (structural change)
new_row = pd.DataFrame({'Country': ['Canada'], 'Population': [38]})
df_original = pd.concat([df_original, new_row], ignore_index=True)
print("Original DataFrame:")
print(df_original)
print("\nShallow Copy:")
print(df_shallow)
print("\nDeep Copy:")
print(df_deep)
# Check memory addresses
print(f"\nMemory addresses:")
print(f"Original: {id(df_original)}")
print(f"Shallow: {id(df_shallow)}")
print(f"Deep: {id(df_deep)}")
Original DataFrame: Country Population 0 USA 328 1 Germany 83 2 Japan 126 3 Canada 38 Shallow Copy: Country Population 0 USA 328 1 Germany 83 2 Japan 126 Deep Copy: Country Population 0 USA 328 1 Germany 83 2 India 1400 Memory addresses: Original: 140234567890123 Shallow: 140234567890456 Deep: 140234567890789
Comparison Table
| Aspect | Shallow Copy | Deep Copy |
|---|---|---|
| Data Sharing | Shares underlying data | Independent data copy |
| Memory Usage | Low (shared memory) | High (duplicated data) |
| Performance | Faster creation | Slower creation |
| Independence | Data changes affect both | Completely independent |
| Use Case | Temporary views, large data | Independent manipulation |
When to Use Each Type
Use Shallow Copy when:
- Working with large datasets where memory is a concern
- Creating temporary views for analysis
- You want changes to reflect across both objects
Use Deep Copy when:
- You need completely independent DataFrames
- Performing different transformations on each copy
- Want to preserve the original data unchanged
Conclusion
Shallow copies share underlying data and are memory-efficient but changes affect both objects. Deep copies create independent DataFrames with separate memory, ideal when you need complete isolation between original and copied data.
