- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Generating Random Integers in Pandas Dataframe
Generating random integers in a DataFrame using Python's Pandas library is an instrumental data analysis and manipulation technique. By developing and inserting random integers into a DataFrame, you open up a world of possibilities for various applications. This functionality proves particularly valuable in tasks like data simulation, algorithm testing, and generating synthetic datasets. Familiarizing yourself with this feature will undoubtedly enhance the flexibility and versatility of your data analysis workflows.
Method 1: Using the randint() function from NumPy
The randint() function found in the NumPy library is commonly utilized to generate random integers within a designated range, in this particular code snippet.
In this program, we determine the desired size of a structure resembling a table called a DataFrame to generate random whole numbers within a specified range. Finally, we construct the DataFrame by integrating these randomly generated numbers.
Algorithm
Step 1 - Start with Importing pandas and the numpy library
Step 2- Create a variable "row and cols" to set row and columns for Dataframe
Step 3-To create random integers in a certain range, use the numpy.random.randint() function.
Step 4 - Create dataframe "df" using random integer from variable "data".
Step 5 - Print "df"
Example
import pandas as pd import numpy as np row = 5 cols = 5 Random = np.random.randint(low=0, high=100, size=(row, cols)) df = pd.DataFrame(Random) print(df)
Output
0 1 2 3 4 0 92 5 54 9 32 1 64 12 21 16 98 2 29 36 91 95 74 3 4 10 46 25 8 4 84 24 21 27 9
Method 2: Using the pandas.DataFrame.sample() method
The sample() method is utilized to obtain a random sample from a DataFrame.
In the provided code snippet, a DataFrame called 'df' is built with 5 rows and 3 columns ('A' 'B' 'C'). The sample() method is subsequently implemented to choose and assign new values to the 'A randomly. 'B'. And 'C' columns based on their respective samples with replacement. The sample size is set at 5 replace=True allows for sampling with replacement and random_state=42 establishes the random seed for reproducibility purposes. Finally, the updated DataFrame is displayed.
Example
import pandas as pd import numpy as np # Set the seed for reproducibility (optional) np.random.seed(42) # declare a variable with rows and columns size & name df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C']) # Generating random number using sample() df['A'] = df['A'].sample(n=5, replace=True, random_state=42).values df['B'] = df['B'].sample(n=5, replace=True, random_state=42).values df['C'] = df['C'].sample(n=5, replace=True, random_state=42).values print(df)
Output
A B C 0 4 3 7 1 7 2 5 2 2 6 7 3 7 2 5 4 7 2 5
Method 3: Using the pandas.DataFrame.apply() method with a lambda function.
The code provided below utilizes the pandas.DataFrame.apply() method along with lambda functions to generate random integers and allocate them to columns in a Pandas DataFrame. A DataFrame called df is formed, consisting of 5 rows and 3 columns. Through the application of lambda functions using apply(). Random integers ranging from 0 to 9 are generated for every row. These randomly generated integers are then assigned to their corresponding columns, namely 'RandomA' 'RandomB' and 'RandomC'. Lastly, the data frame is printed to showcase the generated random integers.
Algorithm
Step 1 - Import the pandas library and the random module.
Step 2 - Set the seed to 42 for reproducibility (optional).
Step 3 - Create a DataFrame with 5 rows and 3 columns named 'RandomA', 'RandomB', and 'RandomC'.
Step 4 - Generate random integers between 0 and 9 for each column using the apply() function and a lambda function.
Step 5 - Assign the generated random values to the respective columns in the DataFrame.
Step 6 - Print the DataFrame.
Example
import pandas as pd import random # Set the seed for reproducibility (optional) random.seed(42) # Create a data frame with 5 rows and 3 columns containing random integers between 0 and 9 df = pd.DataFrame(index=range(5), columns=['RandomA', 'RandomB', 'RandomC']) # Generate random integers using apply() and a lambda function df['RandomA'] = df.apply(lambda _: random.randint(0, 9), axis=1) df['RandomB'] = df.apply(lambda _: random.randint(0, 9), axis=1) df['RandomC'] = df.apply(lambda _: random.randint(0, 9), axis=1) print(df)
Output
RandomA RandomB RandomC 0 1 2 6 1 0 1 0 2 4 8 0 3 3 1 1 4 3 9 3
Method 4: Using the pandas.Series.apply() function
The pandas.Series.apply() function is a valuable method in the panda's library. It enables the application of a customized function to each element within a Series object.
In the provided code snippet, a DataFrame is constructed using a nested list comprehension. To produce random integers ranging from 0 to 100 the generate_random_int() function comes into play alongside the apply() function. This combination allows for generating distinct random numbers for every element in the DataFrame. Consequently, a DataFrame consisting entirely of randomly generated integers is obtained. Lastly, this resultant DataFrame is printed for further analysis or utilization purposes.
Algorithm
Step 1 - Import the required libraries: pandas for data manipulation and random for generating random integers.
Step 2 - Declare the number of rows and columns for the DataFrame.
Step 3 - Define a function to generate a random integer between 0 and 100.
Step 4 - Create a DataFrame using a nested list comprehension to generate random integers for each cell.
Step 5 - Print the DataFrame to display the generated random integers.
Step 6 - End the program.
Example
import pandas as pd import random #setting the number of rows and columns for data frame num_rows = 10 num_cols = 5 #defining the function for generating random numbers def generate_random_int(): return random.randint(0, 100) #creating a variable to store a random number in data frame df = pd.DataFrame([[generate_random_int() for _ in range(num_cols)] for _ in range(num_rows)]) print(df)
Output
0 1 2 3 4 0 23 77 66 60 19 1 51 31 79 51 88 2 6 38 73 38 64 3 5 79 97 25 43 4 24 53 6 23 6 5 63 82 47 56 10 6 72 91 4 84 32 7 81 74 17 21 44 8 28 100 43 31 58 9 64 57 16 15 14
Conclusion
To sum up in regards to creating random integers in a Pandas data frame, there exist numerous methods. The commonly utilised options include the randint() function and pandas.DataFrame.sample(). Pandas.DataFrame.apply(). And pandas.Series.apply(). However, each method has its advantages. Determining the best approach relies on the specific use case at hand. If the goal is to generate random integers directly within a data frame column, then the randint() function would be an ideal choice.
On the other hand, if randomly sampling rows is more pertinent. Sample () would be suitable. For situations requiring more complex operations involving random integers, apply() functions can be used effectively.