- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Python - Variance of List
In this article we will learn about variance and how to calculate the variance of list. You may have encountered this problem of finding the variance particularly in data science. So in this article we will learn how to find the variance.
Variance
This tells us how the data is spread, it gives us a measure of the degree of a set of points. We can calculate the variance of the list using various methods. Let’s learn about those methods.
Method 1: Using the Statistics Module
In this method we will use the built-in statistics model python for calculating the variance of the list.
Example
import statistics data = [2, 4, 6, 8, 10, 34, 23, 46, 67] variance = statistics.variance(data) print("Original list value: ",data) print("Variance of given list is: ", variance)
Output
Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] Variance of given list is: 508.19444444444446
Explanation
Here in the above program we used the statistics module by importing it into the program. Then we use the variance() function provided by the statistics module to calculate the variance.
Method 2: Using Formula
This is very basic method to calculate the variance of any list. We will simply use the formula for finding variance.
Example
data = [2, 4, 6, 8, 10, 34, 23, 46, 67] mean = sum(data) / len(data) variance = sum((x - mean) ** 2 for x in data) / len(data) print("Original list value: ",data) print("Variance of given list is: ",variance)
Output
Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] Variance of given list is: 451.7283950617284
Explanation
Here in the above program we manually calculate the variance of the list data. First, we calculate the mean of the data points. Then, we sum the squares of the differences between each data point and the mean, and finally, divide by the number of data points for getting the variance.
Method 3: Using Pandas
In this method we will use the concept of pandas dataframe library which is very popular for data manipulation. It provides us with the var() method which is used to calculate the variance of any data.
Example
import pandas as pd data = [2, 4, 6, 8, 10, 34, 23, 46, 67] variance = pd.Series(data).var() print("Original list value: ",data) print("Variance of given list is: ",variance)
Output
Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] Variance of given list is: 508.19444444444446
Explanation
Here in the above program we imported the pandas module and converted the list into PandasSeries and then using the var() method we calculated the variance of the series.
Method 4: Using welford's algorithm
In this method we will use the welford's algorithm to calculate the variance of the list values. It uses the single pass to calculate the variance of the list data.
Example
import pandas as pd data = [2, 4, 6, 8, 10, 34, 23, 46, 67] n = len(data) mean = variance = 0 for i, x in enumerate(data, 1): delta = x - mean mean += delta / i variance += delta * (x - mean) variance /= n print("Original list value: ",data) print("Variance of given list is: ",variance)
Output
Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] Variance of given list is: 451.7283950617284
Explanation
Here in the above program we used Welford's algorithm to calculate the variance of the list data in a single pass through the data. This method is more memory efficient than other methods as it computes the variance in single pass.
Method 5: Using Numba Library
In this method we will use the Numba library in the python to calculate the variance of the list. Numa is just-in-time compiler for Python which is used to accelerate the numerical function. Lets see using an example.
Example
import numba as nb def calc_variance(data): mean = variance = 0 n = len(data) for i, x in enumerate(data, 1): delta = x - mean mean += delta / i variance += delta * (x - mean) return variance / n data = [2, 4, 6, 8, 10, 34, 23, 46, 67] variance =calc_variance(data) print("Original list value: ",data) print("Variance of given list is: ",variance)
Output
Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] Variance of given list is: 451.7283950617284
Explanation
Here in the above example we used the Numba's decorator to compile the function calc_variance with just-in-time (JIT) compilation. This optimizes the calculator of variation faster.
Method 6: Using Scipy Library
In this method we will use the Scipy library which is very powerful and useful library. It provides us function for calculating the variance of the list. Lets see using program.
Example
from scipy import stats data = [2, 4, 6, 8, 10, 34, 23, 46, 67] variance = stats.tvar(data) print("Original list value: ",data) print("Variance of given list is: ",variance)
Output
Original list value: [2, 4, 6, 8, 10, 34, 23, 46, 67] Variance of given list is: 508.19444444444446
Explanation
Here in the above program we imported the scipy.stats module and used the tvar() function to calculate the variance of the list data.
Conclusion
So we get to know about variance and how to calculate the variance of any given list using various methods. We saw different methods like Scipy, statistics, pandas, welford’s where welford's is very memory efficient technique. You can use any of these methods which most suits your requirement but having knowledge is crucial for the learning purpose.