Show Pareto Distribution in Statistics using Python


The Pareto distribution is a type of power-law probability distribution commonly employed to describe measurable phenomena, such as social, scientific, geophysical, or actuarial data. It is named after Vilfredo Pareto, an Italian economist, sociologist, and civil engineer. The Pareto distribution is often used to model the distribution of diverse sets of data, such as city sizes, website traffic, and scientific publication citations.

The Pareto principle, known as the 80/20 rule, suggests that 20% of the input contributes to 80% of the outcome in each scenario or system. Python offers various libraries for working with probability distributions, such as the scipy.stats library. To compute the Pareto distribution in Python, the pareto function from the scipy.stats library can be used with the shape parameter alpha and scale parameter xm as its arguments.

Syntax

The following syntax is used to generate 500 random numbers from a Pareto distribution −

import numpy as np
   from scipy.stats import pareto
   data = pareto.rvs(alpha_value, 500, scale_value)
print(np.mean(data))

Algorithm

  • Step 1 − Import the libraries.

  • Step 2 − Define the shape parameter (alpha) and scale parameter (xm)

  • Step 3 − Give a specific size for generating the random number and use the ‘pareto.rvs’ function.

  • Step 4 − Print the generated random number.

Method 1: Here we are using scipy.stats library

Example 1

To generate 500 random numbers from a Pareto distribution with alpha = 2 and xm = 1, we can use following code −

import numpy as np
from scipy.stats import pareto
data = pareto.rvs(2, size=500, scale=1)
print(np.mean(data))

Output

1.9138055526628364

This code generates 500 random numbers from a Pareto distribution defined by the number operation and the prime. The average (mean) of the generated numbers is calculated and printed. This can be useful for organising distribution statistics and performing primary analysis in Python.

Example 2

To generate 700 random numbers from a Pareto distribution with alpha = 3 and xm = 2, we can use following code −

import numpy as np
from scipy.stats import pareto
data = pareto.rvs(3, size=700, scale=2)
print(np.median(data))

Output

2.517223926313278

This code generates 700 random numbers using the Pareto distribution with the size parameter set to 3 and the scale parameter set to 2. The median (middle score) of the generated numbers is calculated and printed. This can be used to check the middle digit of the generated numbers.

Example 3

To generate 1000 random numbers from a Pareto distribution with alpha = 5 and xm = 1, we can use following code-

import numpy as np
from scipy.stats import pareto
data = pareto.rvs(5, size=1000, scale=1)
print(np.median(data))

Output

1.1557246772718455

In this code, 1000 random numbers are generated from the Pareto distribution with the size parameter set to 3 and the scale parameter set to 2. After this, the median of the generated numbers is calculated and printed. This gives the median of the numbers generated.

Method 2: Here we are using Numpy library

Example 1

To generate 500 random numbers from a Pareto distribution with alpha = 2 and xm = 1, we can use following code −

import numpy as np
alpha = 2
xm = 1
size = 500
data = np.random.pareto(alpha, size) + xm
print(np.mean(data))

Output

1.8557392857152564

This code generates 500 random numbers from a Pareto distribution defined by the number operation and prime. It calculates and prints the average (mean) of the numbers generated.

Example 2

To generate 500 random numbers from a Pareto distribution with alpha = 4 and xm = 2, we can use following code −

import numpy as np
alpha = 4
xm = 2
size = 500
data = np.random.pareto(alpha, size) + xm
print(np.mean(data))

Output

2.33759634002971

This code generates 500 random numbers from a Pareto distribution defined by the number operation and prime. It calculates and prints the average (mean) of the numbers generated.

Example 3

To generate 700 random numbers from a Pareto distribution with alpha = 4 and xm = 2, we can use following code −

import numpy as np
alpha = 4
xm = 2
size = 700
data = np.random.pareto(alpha, size) + xm
print(np.median(data))

Output

2.202691921458917

In this code, the alpha parameter for the Pareto distribution is set as 4 and the scale (xm) as 2. After this, 700 random numbers are generated which are determined according to the Pareto distribution. The median of the generated numbers is calculated and printed. This gives the median of the numbers generated.

Note − All above Programs/Codes, will give different output every time because it generates random numbers in the programs.

Example 4

In this program/example we are calculating the probability density function (PDF) and cumulative distribution function (CDF) of the Pareto distribution using the pdf and cdf methods, respectively and plot the PDF and CDF using matplotlib to visualize the distribution.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pareto
 
alpha = 2  # define shape parameter
xm = 1 	# define scale parameter
 
# Generate random numbers from a Pareto distribution
random_numbers = pareto.rvs(alpha, scale=xm, size=1000)
 
data = np.linspace(pareto.ppf(0.10, alpha, scale=xm), pareto.ppf(0.90, alpha, scale=xm), 100)
pdf = pareto.pdf(data, alpha, scale=xm) # Calculate the PDF
cdf = pareto.cdf(data, alpha, scale=xm) # Calculate the CDF
 
# Plotting the PDF and CDF
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(data, pdf, 'r', lw=2, label='PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title('Pareto Distribution PDF')
plt.legend()
 
plt.subplot(1, 2, 2)
plt.plot(data, cdf, 'b', lw=2, label='CDF')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.title('Pareto Distribution CDF')
plt.legend() 
plt.show()

Output


Conclusion

In conclusion, the Pareto distribution is an effective statistical tool for simulating events with a limited number of extreme values and a large number of minor values. The Pareto distribution has many uses in various fields since it can simulate income disparity, city sizes, and other economic factors. We can quickly compute, fit, visualise and analyse the Pareto distribution and get important insights into such instances by using Python modules like scipy.stats, Numpy and matplotlib.

Updated on: 29-Sep-2023

395 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements