Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Write a Python code to select any one random row from a given DataFrame
Sometimes you need to select a random row from a Pandas DataFrame for sampling or testing purposes. Python provides several approaches to accomplish this task using iloc with random index generation or the sample() method.
Sample DataFrame
Let's start with a sample DataFrame to demonstrate the methods ?
import pandas as pd
data = {'Id': [1, 2, 3, 4, 5], 'Name': ['Adam', 'Michael', 'David', 'Jack', 'Peter']}
df = pd.DataFrame(data)
print("DataFrame is")
print(df)
DataFrame is Id Name 0 1 Adam 1 2 Michael 2 3 David 3 4 Jack 4 5 Peter
Method 1: Using iloc with randrange()
Generate a random index using randrange() and select the row with iloc ?
import pandas as pd
import random as r
data = {'Id': [1, 2, 3, 4, 5], 'Name': ['Adam', 'Michael', 'David', 'Jack', 'Peter']}
df = pd.DataFrame(data)
rows = df.shape[0]
print("Total number of rows:", rows)
random_row = r.randrange(rows)
print("Random row is")
print(df.iloc[random_row, :])
Total number of rows: 5 Random row is Id 3 Name David Name: 2, dtype: object
Method 2: Using sample() Method
The sample() method is the most efficient way to select random rows ?
import pandas as pd
data = {'Id': [1, 2, 3, 4, 5], 'Name': ['Adam', 'Michael', 'David', 'Jack', 'Peter']}
df = pd.DataFrame(data)
random_row = df.sample(n=1)
print("Random row using sample():")
print(random_row)
Random row using sample(): Id Name 1 2 Michael
Method 3: Using choice() with Index
Use numpy.random.choice() to select a random index ?
import pandas as pd
import numpy as np
data = {'Id': [1, 2, 3, 4, 5], 'Name': ['Adam', 'Michael', 'David', 'Jack', 'Peter']}
df = pd.DataFrame(data)
random_index = np.random.choice(df.index)
random_row = df.loc[random_index]
print("Random row using choice():")
print(random_row)
Random row using choice(): Id 4 Name Jack Name: 3, dtype: object
Comparison
| Method | Code Complexity | Return Type | Best For |
|---|---|---|---|
iloc + randrange() |
Medium | Series | Learning purposes |
sample() |
Low | DataFrame | Most scenarios |
choice() + loc |
Medium | Series | When using NumPy |
Conclusion
The sample() method is the most Pythonic and efficient way to select random rows from a DataFrame. Use iloc with randrange() when you need to understand the underlying mechanics of random selection.
Advertisements
