Article Categories

Selected Reading

Generating Random Integers in Pandas Dataframe

Python Server Side Programming Programming

Generating random integers in a Pandas DataFrame is a fundamental technique for data simulation, testing algorithms, and creating synthetic datasets. This article explores four different approaches to populate DataFrames with random integer values.

Method 1: Using NumPy's randint() Function

The most straightforward approach uses NumPy's randint() function to generate a matrix of random integers, which is then converted to a DataFrame ?

import pandas as pd
import numpy as np

# Set dimensions
rows = 5
cols = 3

# Generate random integers between 0 and 100
random_data = np.random.randint(low=0, high=100, size=(rows, cols))

# Create DataFrame
df = pd.DataFrame(random_data, columns=['A', 'B', 'C'])
print(df)

    A   B   C
0  37  12  72
1  9   75  5
2  79  64  16
3  1   76  71
4  6   25  50

Method 2: Using DataFrame Constructor with Random Generation

Create a DataFrame directly using list comprehension with random integer generation ?

import pandas as pd
import random

# Set seed for reproducibility
random.seed(42)

# Define dimensions
num_rows = 4
num_cols = 3

# Create DataFrame with random integers
df = pd.DataFrame(
    [[random.randint(0, 50) for _ in range(num_cols)] for _ in range(num_rows)],
    columns=['X', 'Y', 'Z']
)

print(df)

    X   Y   Z
0  49  15  35
1  16  17  46
2  26  9   1
3  17  35  45

Method 3: Using apply() with Lambda Functions

Generate random integers column by column using the apply() method ?

import pandas as pd
import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Create empty DataFrame structure
df = pd.DataFrame(index=range(4), columns=['Col1', 'Col2', 'Col3'])

# Fill columns with random integers using apply
df['Col1'] = df.apply(lambda x: np.random.randint(1, 20), axis=1)
df['Col2'] = df.apply(lambda x: np.random.randint(1, 20), axis=1)
df['Col3'] = df.apply(lambda x: np.random.randint(1, 20), axis=1)

print(df)

   Col1  Col2  Col3
0     7     6    12
1    13     8     8
2    14     8     6
3     2    13     8

Method 4: Using Random Choice for Specific Values

Generate random integers from a specific set of values using random.choice() ?

import pandas as pd
import random

# Define possible values
values = [10, 20, 30, 40, 50]
random.seed(42)

# Create DataFrame with random choices
df = pd.DataFrame({
    'Category_A': [random.choice(values) for _ in range(5)],
    'Category_B': [random.choice(values) for _ in range(5)],
    'Category_C': [random.choice(values) for _ in range(5)]
})

print(df)

   Category_A  Category_B  Category_C
0          50          20          40
1          20          20          50
2          30          10          10
3          20          40          50
4          20          10          40

Comparison

Method	Best For	Performance	Flexibility
NumPy randint()	Large datasets	Fastest	Medium
List comprehension	Custom logic	Medium	High
apply() with lambda	Column-wise operations	Slower	High
random.choice()	Specific value sets	Medium	Medium

Conclusion

For generating random integers in Pandas DataFrames, NumPy's randint() offers the best performance for large datasets. Use apply() methods when you need column-specific logic, and random.choice() when working with predefined value sets.

Jaisshree

Updated on: 2026-03-27T12:05:59+05:30

7K+ Views

Previous Next