Generating Random Integers in Pandas Dataframe


Generating random integers in a DataFrame using Python's Pandas library is an instrumental data analysis and manipulation technique. By developing and inserting random integers into a DataFrame, you open up a world of possibilities for various applications. This functionality proves particularly valuable in tasks like data simulation, algorithm testing, and generating synthetic datasets. Familiarizing yourself with this feature will undoubtedly enhance the flexibility and versatility of your data analysis workflows.

Method 1: Using the randint() function from NumPy

The randint() function found in the NumPy library is commonly utilized to generate random integers within a designated range, in this particular code snippet.

In this program, we determine the desired size of a structure resembling a table called a DataFrame to generate random whole numbers within a specified range. Finally, we construct the DataFrame by integrating these randomly generated numbers.

Algorithm

Step 1 - Start with Importing pandas and the numpy library

Step 2- Create a variable "row and cols" to set row and columns for Dataframe

Step 3-To create random integers in a certain range, use the numpy.random.randint() function.

Step 4 - Create dataframe "df" using random integer from variable "data".

Step 5 - Print "df"

Example

import pandas as pd
import numpy as np

row = 5
cols = 5

Random = np.random.randint(low=0, high=100, size=(row, cols))

df = pd.DataFrame(Random)

print(df)

Output

    0   1   2   3   4
0  92   5  54   9  32
1  64  12  21  16  98
2  29  36  91  95  74
3   4  10  46  25   8
4  84  24  21  27   9

Method 2: Using the pandas.DataFrame.sample() method

The sample() method is utilized to obtain a random sample from a DataFrame.

In the provided code snippet, a DataFrame called 'df' is built with 5 rows and 3 columns ('A' 'B' 'C'). The sample() method is subsequently implemented to choose and assign new values to the 'A randomly. 'B'. And 'C' columns based on their respective samples with replacement. The sample size is set at 5 replace=True allows for sampling with replacement and random_state=42 establishes the random seed for reproducibility purposes. Finally, the updated DataFrame is displayed.

Example

import pandas as pd
import numpy as np

# Set the seed for reproducibility (optional)
np.random.seed(42)

# declare a variable with rows and columns size & name
df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C'])

# Generating random number using sample()
df['A'] = df['A'].sample(n=5, replace=True, random_state=42).values
df['B'] = df['B'].sample(n=5, replace=True, random_state=42).values
df['C'] = df['C'].sample(n=5, replace=True, random_state=42).values

print(df)

Output

   A  B  C
0  4  3  7
1  7  2  5
2  2  6  7
3  7  2  5
4  7  2  5

Method 3: Using the pandas.DataFrame.apply() method with a lambda function.

The code provided below utilizes the pandas.DataFrame.apply() method along with lambda functions to generate random integers and allocate them to columns in a Pandas DataFrame. A DataFrame called df is formed, consisting of 5 rows and 3 columns. Through the application of lambda functions using apply(). Random integers ranging from 0 to 9 are generated for every row. These randomly generated integers are then assigned to their corresponding columns, namely 'RandomA' 'RandomB' and 'RandomC'. Lastly, the data frame is printed to showcase the generated random integers.

Algorithm

Step 1 - Import the pandas library and the random module.

Step 2 - Set the seed to 42 for reproducibility (optional).

Step 3 - Create a DataFrame with 5 rows and 3 columns named 'RandomA', 'RandomB', and 'RandomC'.

Step 4 - Generate random integers between 0 and 9 for each column using the apply() function and a lambda function.

Step 5 - Assign the generated random values to the respective columns in the DataFrame.

Step 6 - Print the DataFrame.

Example

import pandas as pd
import random

# Set the seed for reproducibility (optional)
random.seed(42)

# Create a data frame with 5 rows and 3 columns containing random integers between 0 and 9
df = pd.DataFrame(index=range(5), columns=['RandomA', 'RandomB', 'RandomC'])

# Generate random integers using apply() and a lambda function
df['RandomA'] = df.apply(lambda _: random.randint(0, 9), axis=1)
df['RandomB'] = df.apply(lambda _: random.randint(0, 9), axis=1)
df['RandomC'] = df.apply(lambda _: random.randint(0, 9), axis=1)

print(df)

Output

   RandomA  RandomB  RandomC
0        1        2        6
1        0        1        0
2        4        8        0
3        3        1        1
4        3        9        3

Method 4: Using the pandas.Series.apply() function

The pandas.Series.apply() function is a valuable method in the panda's library. It enables the application of a customized function to each element within a Series object.

In the provided code snippet, a DataFrame is constructed using a nested list comprehension. To produce random integers ranging from 0 to 100 the generate_random_int() function comes into play alongside the apply() function. This combination allows for generating distinct random numbers for every element in the DataFrame. Consequently, a DataFrame consisting entirely of randomly generated integers is obtained. Lastly, this resultant DataFrame is printed for further analysis or utilization purposes.

Algorithm

Step 1 - Import the required libraries: pandas for data manipulation and random for generating random integers.

Step 2 - Declare the number of rows and columns for the DataFrame.

Step 3 - Define a function to generate a random integer between 0 and 100.

Step 4 - Create a DataFrame using a nested list comprehension to generate random integers for each cell.

Step 5 - Print the DataFrame to display the generated random integers.

Step 6 - End the program.

Example

import pandas as pd
import random
#setting the number of rows and columns for data frame
num_rows = 10
num_cols = 5
#defining the function for generating random numbers
def generate_random_int():
   return random.randint(0, 100)
#creating a variable to store a random number in data frame
df = pd.DataFrame([[generate_random_int() for _ in range(num_cols)] for _ in range(num_rows)])

print(df)

Output

    0    1   2   3   4
0  23   77  66  60  19
1  51   31  79  51  88
2   6   38  73  38  64
3   5   79  97  25  43
4  24   53   6  23   6
5  63   82  47  56  10
6  72   91   4  84  32
7  81   74  17  21  44
8  28  100  43  31  58
9  64   57  16  15  14

Conclusion

To sum up in regards to creating random integers in a Pandas data frame, there exist numerous methods. The commonly utilised options include the randint() function and pandas.DataFrame.sample(). Pandas.DataFrame.apply(). And pandas.Series.apply(). However, each method has its advantages. Determining the best approach relies on the specific use case at hand. If the goal is to generate random integers directly within a data frame column, then the randint() function would be an ideal choice.

On the other hand, if randomly sampling rows is more pertinent. Sample () would be suitable. For situations requiring more complex operations involving random integers, apply() functions can be used effectively.

Updated on: 10-Aug-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements