- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Show Pareto Distribution in Statistics using Python
The Pareto distribution is a type of power-law probability distribution commonly employed to describe measurable phenomena, such as social, scientific, geophysical, or actuarial data. It is named after Vilfredo Pareto, an Italian economist, sociologist, and civil engineer. The Pareto distribution is often used to model the distribution of diverse sets of data, such as city sizes, website traffic, and scientific publication citations.
The Pareto principle, known as the 80/20 rule, suggests that 20% of the input contributes to 80% of the outcome in each scenario or system. Python offers various libraries for working with probability distributions, such as the scipy.stats library. To compute the Pareto distribution in Python, the pareto function from the scipy.stats library can be used with the shape parameter alpha and scale parameter xm as its arguments.
Syntax
The following syntax is used to generate 500 random numbers from a Pareto distribution −
import numpy as np from scipy.stats import pareto data = pareto.rvs(alpha_value, 500, scale_value) print(np.mean(data))
Algorithm
Step 1 − Import the libraries.
Step 2 − Define the shape parameter (alpha) and scale parameter (xm)
Step 3 − Give a specific size for generating the random number and use the ‘pareto.rvs’ function.
Step 4 − Print the generated random number.
Method 1: Here we are using scipy.stats library
Example 1
To generate 500 random numbers from a Pareto distribution with alpha = 2 and xm = 1, we can use following code −
import numpy as np from scipy.stats import pareto data = pareto.rvs(2, size=500, scale=1) print(np.mean(data))
Output
1.9138055526628364
This code generates 500 random numbers from a Pareto distribution defined by the number operation and the prime. The average (mean) of the generated numbers is calculated and printed. This can be useful for organising distribution statistics and performing primary analysis in Python.
Example 2
To generate 700 random numbers from a Pareto distribution with alpha = 3 and xm = 2, we can use following code −
import numpy as np from scipy.stats import pareto data = pareto.rvs(3, size=700, scale=2) print(np.median(data))
Output
2.517223926313278
This code generates 700 random numbers using the Pareto distribution with the size parameter set to 3 and the scale parameter set to 2. The median (middle score) of the generated numbers is calculated and printed. This can be used to check the middle digit of the generated numbers.
Example 3
To generate 1000 random numbers from a Pareto distribution with alpha = 5 and xm = 1, we can use following code-
import numpy as np from scipy.stats import pareto data = pareto.rvs(5, size=1000, scale=1) print(np.median(data))
Output
1.1557246772718455
In this code, 1000 random numbers are generated from the Pareto distribution with the size parameter set to 3 and the scale parameter set to 2. After this, the median of the generated numbers is calculated and printed. This gives the median of the numbers generated.
Method 2: Here we are using Numpy library
Example 1
To generate 500 random numbers from a Pareto distribution with alpha = 2 and xm = 1, we can use following code −
import numpy as np alpha = 2 xm = 1 size = 500 data = np.random.pareto(alpha, size) + xm print(np.mean(data))
Output
1.8557392857152564
This code generates 500 random numbers from a Pareto distribution defined by the number operation and prime. It calculates and prints the average (mean) of the numbers generated.
Example 2
To generate 500 random numbers from a Pareto distribution with alpha = 4 and xm = 2, we can use following code −
import numpy as np alpha = 4 xm = 2 size = 500 data = np.random.pareto(alpha, size) + xm print(np.mean(data))
Output
2.33759634002971
This code generates 500 random numbers from a Pareto distribution defined by the number operation and prime. It calculates and prints the average (mean) of the numbers generated.
Example 3
To generate 700 random numbers from a Pareto distribution with alpha = 4 and xm = 2, we can use following code −
import numpy as np alpha = 4 xm = 2 size = 700 data = np.random.pareto(alpha, size) + xm print(np.median(data))
Output
2.202691921458917
In this code, the alpha parameter for the Pareto distribution is set as 4 and the scale (xm) as 2. After this, 700 random numbers are generated which are determined according to the Pareto distribution. The median of the generated numbers is calculated and printed. This gives the median of the numbers generated.
Note − All above Programs/Codes, will give different output every time because it generates random numbers in the programs.
Example 4
In this program/example we are calculating the probability density function (PDF) and cumulative distribution function (CDF) of the Pareto distribution using the pdf and cdf methods, respectively and plot the PDF and CDF using matplotlib to visualize the distribution.
import numpy as np import matplotlib.pyplot as plt from scipy.stats import pareto alpha = 2 # define shape parameter xm = 1 # define scale parameter # Generate random numbers from a Pareto distribution random_numbers = pareto.rvs(alpha, scale=xm, size=1000) data = np.linspace(pareto.ppf(0.10, alpha, scale=xm), pareto.ppf(0.90, alpha, scale=xm), 100) pdf = pareto.pdf(data, alpha, scale=xm) # Calculate the PDF cdf = pareto.cdf(data, alpha, scale=xm) # Calculate the CDF # Plotting the PDF and CDF plt.figure(figsize=(10, 4)) plt.subplot(1, 2, 1) plt.plot(data, pdf, 'r', lw=2, label='PDF') plt.xlabel('x') plt.ylabel('Probability Density') plt.title('Pareto Distribution PDF') plt.legend() plt.subplot(1, 2, 2) plt.plot(data, cdf, 'b', lw=2, label='CDF') plt.xlabel('x') plt.ylabel('Cumulative Probability') plt.title('Pareto Distribution CDF') plt.legend() plt.show()
Output
Conclusion
In conclusion, the Pareto distribution is an effective statistical tool for simulating events with a limited number of extreme values and a large number of minor values. The Pareto distribution has many uses in various fields since it can simulate income disparity, city sizes, and other economic factors. We can quickly compute, fit, visualise and analyse the Pareto distribution and get important insights into such instances by using Python modules like scipy.stats, Numpy and matplotlib.
To Continue Learning Please Login
Login with Google