Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Generating Random Integers in Pandas Dataframe
Generating random integers in a Pandas DataFrame is a fundamental technique for data simulation, testing algorithms, and creating synthetic datasets. This article explores four different approaches to populate DataFrames with random integer values.
Method 1: Using NumPy's randint() Function
The most straightforward approach uses NumPy's randint() function to generate a matrix of random integers, which is then converted to a DataFrame ?
import pandas as pd import numpy as np # Set dimensions rows = 5 cols = 3 # Generate random integers between 0 and 100 random_data = np.random.randint(low=0, high=100, size=(rows, cols)) # Create DataFrame df = pd.DataFrame(random_data, columns=['A', 'B', 'C']) print(df)
A B C
0 37 12 72
1 9 75 5
2 79 64 16
3 1 76 71
4 6 25 50
Method 2: Using DataFrame Constructor with Random Generation
Create a DataFrame directly using list comprehension with random integer generation ?
import pandas as pd
import random
# Set seed for reproducibility
random.seed(42)
# Define dimensions
num_rows = 4
num_cols = 3
# Create DataFrame with random integers
df = pd.DataFrame(
[[random.randint(0, 50) for _ in range(num_cols)] for _ in range(num_rows)],
columns=['X', 'Y', 'Z']
)
print(df)
X Y Z
0 49 15 35
1 16 17 46
2 26 9 1
3 17 35 45
Method 3: Using apply() with Lambda Functions
Generate random integers column by column using the apply() method ?
import pandas as pd import numpy as np # Set seed for reproducibility np.random.seed(42) # Create empty DataFrame structure df = pd.DataFrame(index=range(4), columns=['Col1', 'Col2', 'Col3']) # Fill columns with random integers using apply df['Col1'] = df.apply(lambda x: np.random.randint(1, 20), axis=1) df['Col2'] = df.apply(lambda x: np.random.randint(1, 20), axis=1) df['Col3'] = df.apply(lambda x: np.random.randint(1, 20), axis=1) print(df)
Col1 Col2 Col3 0 7 6 12 1 13 8 8 2 14 8 6 3 2 13 8
Method 4: Using Random Choice for Specific Values
Generate random integers from a specific set of values using random.choice() ?
import pandas as pd
import random
# Define possible values
values = [10, 20, 30, 40, 50]
random.seed(42)
# Create DataFrame with random choices
df = pd.DataFrame({
'Category_A': [random.choice(values) for _ in range(5)],
'Category_B': [random.choice(values) for _ in range(5)],
'Category_C': [random.choice(values) for _ in range(5)]
})
print(df)
Category_A Category_B Category_C 0 50 20 40 1 20 20 50 2 30 10 10 3 20 40 50 4 20 10 40
Comparison
| Method | Best For | Performance | Flexibility |
|---|---|---|---|
| NumPy randint() | Large datasets | Fastest | Medium |
| List comprehension | Custom logic | Medium | High |
| apply() with lambda | Column-wise operations | Slower | High |
| random.choice() | Specific value sets | Medium | Medium |
Conclusion
For generating random integers in Pandas DataFrames, NumPy's randint() offers the best performance for large datasets. Use apply() methods when you need column-specific logic, and random.choice() when working with predefined value sets.
