Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python - Calculate the variance of a column in a Pandas DataFrame
To calculate the variance of column values in a Pandas DataFrame, use the var() method. Variance measures how spread out the data points are from the mean value.
Syntax
The basic syntax for calculating variance is ?
DataFrame['column_name'].var()
Creating a DataFrame
First, import the required Pandas library and create a DataFrame ?
import pandas as pd
# Create DataFrame with car data
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
print("DataFrame1:")
print(dataFrame1)
DataFrame1:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Tesla 80
4 Bentley 110
5 Jaguar 90
Calculating Variance of a Single Column
Use the var() method to find the variance of the "Units" column ?
import pandas as pd
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
# Calculate variance of Units column
variance = dataFrame1['Units'].var()
print("Variance of Units column:", variance)
Variance of Units column: 586.6666666666666
Multiple DataFrame Example
Here's a complete example calculating variance for different DataFrames ?
import pandas as pd
# Create DataFrame1
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Tesla', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
print("DataFrame1:")
print(dataFrame1)
# Finding Variance of "Units" column values
print("\nVariance of Units column from DataFrame1 =", dataFrame1['Units'].var())
# Create DataFrame2
dataFrame2 = pd.DataFrame({
"Product": ['TV', 'PenDrive', 'HeadPhone', 'EarPhone', 'HDD', 'SSD'],
"Price": [8000, 500, 3000, 1500, 3000, 4000]
})
print("\nDataFrame2:")
print(dataFrame2)
# Finding Variance of "Price" column values
print("\nVariance of Price column from DataFrame2 =", dataFrame2['Price'].var())
DataFrame1:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Tesla 80
4 Bentley 110
5 Jaguar 90
Variance of Units column from DataFrame1 = 586.6666666666666
DataFrame2:
Product Price
0 TV 8000
1 PenDrive 500
2 HeadPhone 3000
3 EarPhone 1500
4 HDD 3000
5 SSD 4000
Variance of Price column from DataFrame2 = 6766666.666666667
Key Points
- The
var()method calculates sample variance by default (divides by N-1) - For population variance, use
var(ddof=0)which divides by N - Higher variance indicates more spread out data points
- Variance is always non-negative
Conclusion
Use the var() method to calculate variance of DataFrame columns. This statistical measure helps understand data dispersion and variability in your datasets.
Advertisements
