Article Categories

Selected Reading

Generate a list using given frequency list

Python Server Side Programming Programming

Generating a list using a given frequency list is a common problem in programming. The task involves creating a list of elements based on the frequency distribution of those elements. This can be useful for generating passwords with specific character distributions, creating datasets for machine learning, or building recommendation systems.

Basic Approach Using Lists

The simplest method is to repeat each element according to its frequency ?

# Generate list from frequency pairs
frequency_list = [('a', 4), ('b', 2), ('c', 1), ('d', 3)]
result = []

for element, freq in frequency_list:
    result.extend([element] * freq)

print("Generated list:", result)
print("Length:", len(result))

Generated list: ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'd', 'd']
Length: 10

Using Counter from collections

The Counter class provides a convenient way to work with frequency data ?

from collections import Counter
import random

# Create Counter from frequency data
freq_data = {'a': 4, 'b': 2, 'c': 1, 'd': 3}
counter = Counter(freq_data)

# Generate elements based on frequency
generated_list = []
for element, frequency in counter.items():
    generated_list.extend([element] * frequency)

# Shuffle for randomness
random.shuffle(generated_list)
print("Shuffled list:", generated_list)

Shuffled list: ['d', 'a', 'b', 'a', 'c', 'a', 'd', 'b', 'a', 'd']

Proportional Generation

Generate a list of specific size based on frequency proportions ?

from collections import Counter
import random

# Frequency data
freq_list = [('red', 5), ('blue', 3), ('green', 2)]
freq_dict = dict(freq_list)
counter = Counter(freq_dict)

# Generate list of specific size (20 elements)
total_freq = sum(counter.values())
target_size = 20
result = []

for element, frequency in counter.items():
    count = int((frequency / total_freq) * target_size)
    result.extend([element] * count)

# Handle rounding differences
while len(result) < target_size:
    result.append(random.choice(list(counter.keys())))

random.shuffle(result)
print("Generated list:", result)
print("Actual frequencies:", Counter(result))

Generated list: ['blue', 'red', 'green', 'red', 'blue', 'red', 'red', 'green', 'red', 'blue', 'red', 'green', 'red', 'blue', 'red', 'red', 'green', 'blue', 'blue', 'red']
Actual frequencies: Counter({'red': 10, 'blue': 6, 'green': 4})

Using NumPy for Weighted Random Sampling

NumPy provides efficient random sampling with weights ?

import numpy as np

# Elements and their weights
elements = ['apple', 'banana', 'cherry']
weights = [0.5, 0.3, 0.2]  # Probabilities sum to 1.0

# Generate random sample
sample_size = 15
generated = np.random.choice(elements, size=sample_size, p=weights)

print("Generated sample:", generated.tolist())
print("Frequencies:", {elem: list(generated).count(elem) for elem in elements})

Generated sample: ['apple', 'banana', 'apple', 'apple', 'cherry', 'banana', 'apple', 'apple', 'banana', 'apple', 'apple', 'cherry', 'banana', 'apple', 'apple']
Frequencies: {'apple': 9, 'banana': 4, 'cherry': 2}

Comparison of Methods

Method	Best For	Randomness	Exact Frequencies
Basic List Extension	Simple cases	No (unless shuffled)	Yes
Counter + Shuffle	Working with frequency data	Yes	Yes
Proportional Generation	Fixed output size	Yes	Approximate
NumPy Sampling	Large datasets	Yes	Approximate

Common Use Cases

Password Generation Create passwords with specific character type distributions
Dataset Creation Generate balanced or imbalanced datasets for machine learning
Simulation Model real-world distributions in statistical simulations
Testing Create test data with known frequency patterns

Conclusion

Use basic list extension for exact frequencies, Counter for working with frequency data, and NumPy for large-scale random sampling. Choose the method based on whether you need exact frequencies or probabilistic sampling.

Atharva Shah

Updated on: 2026-03-27T13:17:00+05:30

545 Views

Previous Next