Foundations of Probability in Python

Python Server Side Programming Programming

Probability deals with the study of random events as well as their outcomes. It is an essential concept in various fields like finance, physics, engineering and data science. It is defined as the likelihood of an event occurring as no event can be predicted with 100% certainty. Hence probability is just a guide. In this article, we are going to be seeing the foundations of probability in Python. Python offers a number of libraries that allow us to work with probability distributions and perform statistical computations as well as generate random numbers.

The basic concepts and keywords of probability that are needed before we get started with python are:

Sample space − A set of all possible outcomes of a random experiment.
Event − A subset of the sample space representing a particular outcome or perhaps a set of outcomes for a random experiment.
Probability − A number between 0 and 1 that represents the likelihood of an event occurring. A probability 0 indicates that an event is impossible where as a probability of 1 indicates that it is a for sure to occur.

Now that we have understood the basic terminology let us now implement these using Python.

Generating Random Numbers

Here we will se how to generate random numbers in Python for which we require the random module. This module provides several functions like randint and random which help us generate different kinds of random values of different data types.

Example

import random

# Generate a random integer number having values between 0 and 100
random_integer = random.randint(0, 100)
print("Random integer:", random_integer)

# Generate a random float number having values between 0 and 1
random_float = random.random()
print("Random float:", random_float)

Output

Random integer: 89
Random float: 0.16460963462567613

Defining a Sample Space and Calculating the Probability of an Event

The sample space which is a set of all possible outcomes is defined as a list and another list which is an event and is the sub set of the sample space. Our goal here is calculating the probability of the event which number of outcomes divided by sample space.

Example

# Define a sample space
sample_space = [1, 2, 3, 4, 5]

# Define an event
event = [1, 2, 3]

# Calculate the probability of the event
probability = len(event) / len(sample_space)
print("Probability of the events occuring is:", probability)

Output

Probability of the events occuring is: 0.6

Computing Conditional Probability

Calculating the probability of event A, given that event B has already occurred is known as conditional probability. In simpler terms, conditional probability is used to model the relationship between 2 events. For example, if you know that it is raining outside, you might have a different estimation of the probability of taking an umbrella with you compared to if you didn't know about the rain.

Example

# Define a sample space
sample_space = [1, 2, 3, 4, 5]

# Define events A and B
event_A = [1, 2, 3]
event_B = [2, 3, 4]

# Calculate the joint probability of A and B
joint_probability = len([x for x in event_A if x in event_B]) / len(sample_space)
print("Joint probability of A and B:", joint_probability)

# Calculate the conditional probability of A given B
conditional_probability = joint_probability / (len(event_B) / len(sample_space))
print("Conditional probability of A given B:", conditional_probability)

Output

Joint probability of A and B: 0.4
Conditional probability of A given B: 0.6666666666666667

Calculating Expected Value

Expected value of a random variable is the measure of its central tendency. It is an estimate of the average outcome of many repetitions for a random experiment, like in the case when you flip a coin too many times, the probability of heads and tails is equal.

Example

# Define a probability distribution
probabilities = [0.2, 0.3, 0.5]
outcomes = [10, 20, 30]

# Calculate the expected value
expected_value = sum([p * x for p, x in zip(probabilities, outcomes)])
print("Expected value:", expected_value)

Output

Expected value: 23.0

Calculating the Probability of Rolling a 6 on a Fair Die

A die has 6 sides labelled 1 to 6. Using the basics explained above, we are now going to calculate the probability of rolling a 6 on a fair die.

Example

import random

def roll_die():
   return random.randint(1, 6)

num_trials = 100000
num_sixes = 0

for i in range(num_trials):
   result = roll_die()
   if result == 6:
      num_sixes += 1

prob_six = num_sixes / num_trials
print("Probability of 6 is:",prob_six)

Output

Probability of 6 is: 0.1667

The output will always be close to 1/6, since a fair die has an equal chance of rolling each number.

Calculating the conditional probability of rolling a 6 on the first die given that the sum of the two dice is 7

Here we are going to make use of the conditional probability explained above. We will calculate the conditional probability of a rolling a 6 on the 1st die knowing that the sum of the two die is 7. It can also be stated as the probability of 6 on die 1, given that die 2 is 1 or vice versa.

Example

def roll_two_dice():
   die1 = random.randint(1, 6)
   die2 = random.randint(1, 6)
   return (die1, die2)

num_trials = 100000
num_six_given_seven = 0
num_seven = 0

for i in range(num_trials):
   result = roll_two_dice()
   if sum(result) == 7:
      num_seven += 1
      if result[0] == 6:
         num_six_given_seven += 1

prob_six_given_seven = num_six_given_seven / num_seven
print("The Probability of rolling a 6 on the first die given that the sum of the two dice is 7 is:",prob_six_given_seven)

Output

The Probability of rolling a 6 on the first die given that the sum of the two dice is 7 is: 0.16626851409460106

Conclusion

Python provides us with a variety of tools as well as libraries that help us work with the foundations of probability. Probability has a wide scale use case from AI content detection to card games. The random module is often used for probability related problem statements. This combined with libraries like numpy and scipy (and matplotlib and seaborn for visualization) can be of great advantage when the data is large scale and mainly in the form of csv files. Probability problem statements can further be clubbed with statistics to gain more insights. It doesn’t matter if you are a beginner or a practitioner, there always be more to find out in the field of probability.

Pranay Arora

Updated on: 04-Oct-2023

194 Views

Kickstart Your Career

Get certified by completing the course

Get Started