Python - Variance of List


In this article we will learn about variance and how to calculate the variance of list. You may have encountered this problem of finding the variance particularly in data science. So in this article we will learn how to find the variance.

Variance

This tells us how the data is spread, it gives us a measure of the degree of a set of points. We can calculate the variance of the list using various methods. Let’s learn about those methods.

Method 1: Using the Statistics Module

In this method we will use the built-in statistics model python for calculating the variance of the list.

Example

import statistics

data = [2, 4, 6, 8, 10, 34, 23, 46, 67]
variance = statistics.variance(data)

print("Original list value: ",data)
print("Variance of given list is: ", variance)

Output

Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] 
Variance of given list is: 508.19444444444446

Explanation

Here in the above program we used the statistics module by importing it into the program. Then we use the variance() function provided by the statistics module to calculate the variance.

Method 2: Using Formula

This is very basic method to calculate the variance of any list. We will simply use the formula for finding variance.

Example

data = [2, 4, 6, 8, 10, 34, 23, 46, 67]
mean = sum(data) / len(data)
variance = sum((x - mean) ** 2 for x in data) / len(data)

print("Original list value: ",data)
print("Variance of given list is: ",variance)

Output

Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] 
Variance of given list is: 451.7283950617284

Explanation

Here in the above program we manually calculate the variance of the list data. First, we calculate the mean of the data points. Then, we sum the squares of the differences between each data point and the mean, and finally, divide by the number of data points for getting the variance.

Method 3: Using Pandas

In this method we will use the concept of pandas dataframe library which is very popular for data manipulation. It provides us with the var() method which is used to calculate the variance of any data.

Example

import pandas as pd

data = [2, 4, 6, 8, 10, 34, 23, 46, 67]
variance = pd.Series(data).var()

print("Original list value: ",data)
print("Variance of given list is: ",variance)

Output

Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] 
Variance of given list is: 508.19444444444446

Explanation

Here in the above program we imported the pandas module and converted the list into PandasSeries and then using the var() method we calculated the variance of the series.

Method 4: Using welford's algorithm

In this method we will use the welford's algorithm to calculate the variance of the list values. It uses the single pass to calculate the variance of the list data.

Example

import pandas as pd

data = [2, 4, 6, 8, 10, 34, 23, 46, 67]
n = len(data)
mean = variance = 0

for i, x in enumerate(data, 1):
   delta = x - mean
   mean += delta / i
   variance += delta * (x - mean)

variance /= n
print("Original list value: ",data)
print("Variance of given list is: ",variance)

Output

Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] 
Variance of given list is: 451.7283950617284

Explanation

Here in the above program we used Welford's algorithm to calculate the variance of the list data in a single pass through the data. This method is more memory efficient than other methods as it computes the variance in single pass.

Method 5: Using Numba Library

In this method we will use the Numba library in the python to calculate the variance of the list. Numa is just-in-time compiler for Python which is used to accelerate the numerical function. Lets see using an example.

Example

import numba as nb
def calc_variance(data):
   mean = variance = 0
   n = len(data)

   for i, x in enumerate(data, 1):
      delta = x - mean
      mean += delta / i
      variance += delta * (x - mean)

   return variance / n

data = [2, 4, 6, 8, 10, 34, 23, 46, 67]
variance =calc_variance(data)

print("Original list value: ",data)
print("Variance of given list is: ",variance)

Output

Original list value:  [2, 4, 6, 8, 10, 34, 23, 46, 67]
Variance of given list is:  451.7283950617284

Explanation

Here in the above example we used the Numba's decorator to compile the function calc_variance with just-in-time (JIT) compilation. This optimizes the calculator of variation faster.

Method 6: Using Scipy Library

In this method we will use the Scipy library which is very powerful and useful library. It provides us function for calculating the variance of the list. Lets see using program.

Example

from scipy import stats

data = [2, 4, 6, 8, 10, 34, 23, 46, 67]
variance = stats.tvar(data)

print("Original list value: ",data)
print("Variance of given list is: ",variance)

Output

Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] 
Variance of given list is: 508.19444444444446

Explanation

Here in the above program we imported the scipy.stats module and used the tvar() function to calculate the variance of the list data.

Conclusion

So we get to know about variance and how to calculate the variance of any given list using various methods. We saw different methods like Scipy, statistics, pandas, welford’s where welford's is very memory efficient technique. You can use any of these methods which most suits your requirement but having knowledge is crucial for the learning purpose.

Updated on: 06-Oct-2023

267 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements