Article Categories

Selected Reading

Write a program in Python to compute grouped data covariance and calculate grouped data covariance between two columns in a given dataframe

Python Pandas Server Side Programming Programming

Covariance measures how much two variables change together. In pandas, you can compute grouped data covariance using groupby() with cov() to analyze relationships within different groups of your data.

Understanding Grouped Covariance

When you have categorical data, computing covariance within each group helps identify patterns specific to each category. The cov() function returns a covariance matrix showing relationships between all numeric columns.

Creating Sample Data

Let's start with a DataFrame containing student marks grouped by subjects ?

import pandas as pd

df = pd.DataFrame({
    'subjects': ['maths', 'maths', 'maths', 'science', 'science', 'science'],
    'mark1': [80, 90, 85, 95, 93, 85],
    'mark2': [85, 90, 70, 75, 95, 65]
})
print("DataFrame is:")
print(df)

DataFrame is:
   subjects  mark1  mark2
0     maths     80     85
1     maths     90     90
2     maths     85     70
3   science     95     75
4   science     93     95
5   science     85     65

Computing Grouped Covariance Matrix

Use groupby() with cov() to get the complete covariance matrix for each group ?

import pandas as pd

df = pd.DataFrame({
    'subjects': ['maths', 'maths', 'maths', 'science', 'science', 'science'],
    'mark1': [80, 90, 85, 95, 93, 85],
    'mark2': [85, 90, 70, 75, 95, 65]
})

group_data = df.groupby('subjects').cov()
print("Grouped data covariance matrix:")
print(group_data)

Grouped data covariance matrix:
              mark1       mark2
subjects                      
maths    mark1  25.0   12.500000
         mark2  12.5  108.333333
science  mark1  28.0   50.000000
         mark2  50.0  233.333333

Computing Covariance Between Two Specific Columns

To get covariance between just two columns for each group, use apply() with a lambda function ?

import pandas as pd

df = pd.DataFrame({
    'subjects': ['maths', 'maths', 'maths', 'science', 'science', 'science'],
    'mark1': [80, 90, 85, 95, 93, 85],
    'mark2': [85, 90, 70, 75, 95, 65]
})

result = df.groupby('subjects').apply(lambda x: x['mark1'].cov(x['mark2']))
print("Grouped data covariance between two columns:")
print(result)

Grouped data covariance between two columns:
subjects
maths      12.5
science    50.0
dtype: float64

Complete Example

Here's the complete solution combining both approaches ?

import pandas as pd

# Create DataFrame
df = pd.DataFrame({
    'subjects': ['maths', 'maths', 'maths', 'science', 'science', 'science'],
    'mark1': [80, 90, 85, 95, 93, 85],
    'mark2': [85, 90, 70, 75, 95, 65]
})

print("DataFrame:")
print(df)
print()

# Grouped covariance matrix
group_data = df.groupby('subjects').cov()
print("Grouped data covariance matrix:")
print(group_data)
print()

# Covariance between specific columns
result = df.groupby('subjects').apply(lambda x: x['mark1'].cov(x['mark2']))
print("Covariance between mark1 and mark2:")
print(result)

DataFrame:
   subjects  mark1  mark2
0     maths     80     85
1     maths     90     90
2     maths     85     70
3   science     95     75
4   science     93     95
5   science     85     65

Grouped data covariance matrix:
              mark1       mark2
subjects                      
maths    mark1  25.0   12.500000
         mark2  12.5  108.333333
science  mark1  28.0   50.000000
         mark2  50.0  233.333333

Covariance between mark1 and mark2:
subjects
maths      12.5
science    50.0
dtype: float64

Key Points

groupby().cov() returns a complete covariance matrix for each group
Use apply(lambda x: x['col1'].cov(x['col2'])) for specific column pairs
Positive covariance indicates variables tend to increase together
The diagonal values represent variances of individual columns

Conclusion

Use groupby().cov() for complete covariance matrices within groups. For specific column pairs, combine groupby() with apply() and lambda functions to extract targeted covariance values efficiently.

Vani Nalliappan

Updated on: 2026-03-25T16:22:32+05:30

487 Views

Previous Next