Generating Random Integers in Pandas Dataframe

Generating random integers in a Pandas DataFrame is a fundamental technique for data simulation, testing algorithms, and creating synthetic datasets. This article explores four different approaches to populate DataFrames with random integer values.

Method 1: Using NumPy's randint() Function

The most straightforward approach uses NumPy's randint() function to generate a matrix of random integers, which is then converted to a DataFrame ?

import pandas as pd
import numpy as np

# Set dimensions
rows = 5
cols = 3

# Generate random integers between 0 and 100
random_data = np.random.randint(low=0, high=100, size=(rows, cols))

# Create DataFrame
df = pd.DataFrame(random_data, columns=['A', 'B', 'C'])
print(df)
    A   B   C
0  37  12  72
1  9   75  5
2  79  64  16
3  1   76  71
4  6   25  50

Method 2: Using DataFrame Constructor with Random Generation

Create a DataFrame directly using list comprehension with random integer generation ?

import pandas as pd
import random

# Set seed for reproducibility
random.seed(42)

# Define dimensions
num_rows = 4
num_cols = 3

# Create DataFrame with random integers
df = pd.DataFrame(
    [[random.randint(0, 50) for _ in range(num_cols)] for _ in range(num_rows)],
    columns=['X', 'Y', 'Z']
)

print(df)
    X   Y   Z
0  49  15  35
1  16  17  46
2  26  9   1
3  17  35  45

Method 3: Using apply() with Lambda Functions

Generate random integers column by column using the apply() method ?

import pandas as pd
import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Create empty DataFrame structure
df = pd.DataFrame(index=range(4), columns=['Col1', 'Col2', 'Col3'])

# Fill columns with random integers using apply
df['Col1'] = df.apply(lambda x: np.random.randint(1, 20), axis=1)
df['Col2'] = df.apply(lambda x: np.random.randint(1, 20), axis=1)
df['Col3'] = df.apply(lambda x: np.random.randint(1, 20), axis=1)

print(df)
   Col1  Col2  Col3
0     7     6    12
1    13     8     8
2    14     8     6
3     2    13     8

Method 4: Using Random Choice for Specific Values

Generate random integers from a specific set of values using random.choice() ?

import pandas as pd
import random

# Define possible values
values = [10, 20, 30, 40, 50]
random.seed(42)

# Create DataFrame with random choices
df = pd.DataFrame({
    'Category_A': [random.choice(values) for _ in range(5)],
    'Category_B': [random.choice(values) for _ in range(5)],
    'Category_C': [random.choice(values) for _ in range(5)]
})

print(df)
   Category_A  Category_B  Category_C
0          50          20          40
1          20          20          50
2          30          10          10
3          20          40          50
4          20          10          40

Comparison

Method Best For Performance Flexibility
NumPy randint() Large datasets Fastest Medium
List comprehension Custom logic Medium High
apply() with lambda Column-wise operations Slower High
random.choice() Specific value sets Medium Medium

Conclusion

For generating random integers in Pandas DataFrames, NumPy's randint() offers the best performance for large datasets. Use apply() methods when you need column-specific logic, and random.choice() when working with predefined value sets.

Updated on: 2026-03-27T12:05:59+05:30

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements