How to add metadata to a DataFrame or Series with Pandas in Python?

Pandas provides the ability to add metadata to DataFrames and Series, which is additional information about your data that provides context and meaning. This metadata can include descriptions, units of measurement, data sources, or any other relevant information that helps understand your data better.

What is Metadata in Pandas?

Metadata is information about the data itself - it describes the characteristics, origin, and context of your data. In Pandas, metadata can include data types, units of measurement, descriptions, scaling factors, or any custom information that provides context about your dataset.

Why is Metadata Important?

Metadata is crucial in data analysis because it:

  • Provides context and meaning to raw data
  • Helps understand units of measurement for accurate calculations
  • Documents data sources and transformations
  • Makes datasets more reproducible and shareable

Adding Metadata Using attrs Attribute

Pandas provides the attrs attribute, which works like a dictionary to store arbitrary metadata for DataFrames and Series.

Basic Example

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Temperature': [25.5, 30.2, 28.1], 
                   'Humidity': [60, 65, 58]})

# Add metadata using attrs
df.attrs['description'] = 'Weather sensor data'
df.attrs['units'] = {'Temperature': 'Celsius', 'Humidity': 'Percent'}
df.attrs['location'] = 'New York'
df.attrs['sensor_id'] = 'WS001'

print("DataFrame:")
print(df)
print("\nMetadata:")
print(df.attrs)
DataFrame:
   Temperature  Humidity
0         25.5        60
1         30.2        65
2         28.1        58

Metadata:
{'description': 'Weather sensor data', 'units': {'Temperature': 'Celsius', 'Humidity': 'Percent'}, 'location': 'New York', 'sensor_id': 'WS001'}

Metadata with Series

You can also add metadata to individual Series objects ?

import pandas as pd

# Create a Series
temperatures = pd.Series([25.5, 30.2, 28.1, 26.8], 
                        name='Temperature')

# Add metadata to Series
temperatures.attrs['unit'] = 'Celsius'
temperatures.attrs['measurement_date'] = '2024-01-15'
temperatures.attrs['instrument'] = 'Digital Thermometer'

print("Series:")
print(temperatures)
print("\nSeries Metadata:")
print(temperatures.attrs)
Series:
0    25.5
1    30.2
2    28.1
3    26.8
Name: Temperature, dtype: float64

Series Metadata:
{'unit': 'Celsius', 'measurement_date': '2024-01-15', 'instrument': 'Digital Thermometer'}

Applying Scaling with Metadata

A common use case is storing scaling factors in metadata for data transformation ?

import pandas as pd

# Create original DataFrame
df = pd.DataFrame({'Value1': [10, 20, 30], 'Value2': [40, 50, 60]})

# Add scaling metadata
df.attrs['description'] = 'Sensor readings'
df.attrs['scale_factor'] = 0.1
df.attrs['offset'] = 5

print("Original DataFrame:")
print(df)

# Apply scaling transformation
df_scaled = (df * df.attrs['scale_factor']) + df.attrs['offset']
df_scaled.attrs = df.attrs.copy()  # Copy metadata to scaled DataFrame

print("\nScaled DataFrame:")
print(df_scaled)
print("\nScaling metadata:", df_scaled.attrs['scale_factor'], df_scaled.attrs['offset'])
Original DataFrame:
   Value1  Value2
0      10      40
1      20      50
2      30      60

Scaled DataFrame:
   Value1  Value2
0     6.0     9.0
1     7.0    10.0
2     8.0    11.0

Scaling metadata: 0.1 5

Important Notes

Metadata Persistence: The attrs metadata is not automatically preserved during most DataFrame operations. You need to explicitly copy it when needed.

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})
df.attrs['source'] = 'test_data'

# Metadata is lost in operations
df_copy = df.copy()
print("Metadata after copy():", df_copy.attrs)

# Metadata is lost in most operations
df_filtered = df[df['A'] > 1]
print("Metadata after filtering:", df_filtered.attrs)

# Explicitly preserve metadata
df_filtered.attrs = df.attrs.copy()
print("Metadata after manual copy:", df_filtered.attrs)
Metadata after copy(): {}
Metadata after filtering: {}
Metadata after manual copy: {'source': 'test_data'}

Conclusion

The attrs attribute provides a convenient way to store metadata with Pandas DataFrames and Series. Remember that metadata is not automatically preserved during operations, so you need to explicitly copy it when creating new objects from existing ones.

Updated on: 2026-03-27T06:59:35+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements