Generate a list using given frequency list

Generating a list using a given frequency list is a common problem in programming. The task involves creating a list of elements based on the frequency distribution of those elements. This can be useful for generating passwords with specific character distributions, creating datasets for machine learning, or building recommendation systems.

Basic Approach Using Lists

The simplest method is to repeat each element according to its frequency ?

# Generate list from frequency pairs
frequency_list = [('a', 4), ('b', 2), ('c', 1), ('d', 3)]
result = []

for element, freq in frequency_list:
    result.extend([element] * freq)

print("Generated list:", result)
print("Length:", len(result))
Generated list: ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'd', 'd']
Length: 10

Using Counter from collections

The Counter class provides a convenient way to work with frequency data ?

from collections import Counter
import random

# Create Counter from frequency data
freq_data = {'a': 4, 'b': 2, 'c': 1, 'd': 3}
counter = Counter(freq_data)

# Generate elements based on frequency
generated_list = []
for element, frequency in counter.items():
    generated_list.extend([element] * frequency)

# Shuffle for randomness
random.shuffle(generated_list)
print("Shuffled list:", generated_list)
Shuffled list: ['d', 'a', 'b', 'a', 'c', 'a', 'd', 'b', 'a', 'd']

Proportional Generation

Generate a list of specific size based on frequency proportions ?

from collections import Counter
import random

# Frequency data
freq_list = [('red', 5), ('blue', 3), ('green', 2)]
freq_dict = dict(freq_list)
counter = Counter(freq_dict)

# Generate list of specific size (20 elements)
total_freq = sum(counter.values())
target_size = 20
result = []

for element, frequency in counter.items():
    count = int((frequency / total_freq) * target_size)
    result.extend([element] * count)

# Handle rounding differences
while len(result) < target_size:
    result.append(random.choice(list(counter.keys())))

random.shuffle(result)
print("Generated list:", result)
print("Actual frequencies:", Counter(result))
Generated list: ['blue', 'red', 'green', 'red', 'blue', 'red', 'red', 'green', 'red', 'blue', 'red', 'green', 'red', 'blue', 'red', 'red', 'green', 'blue', 'blue', 'red']
Actual frequencies: Counter({'red': 10, 'blue': 6, 'green': 4})

Using NumPy for Weighted Random Sampling

NumPy provides efficient random sampling with weights ?

import numpy as np

# Elements and their weights
elements = ['apple', 'banana', 'cherry']
weights = [0.5, 0.3, 0.2]  # Probabilities sum to 1.0

# Generate random sample
sample_size = 15
generated = np.random.choice(elements, size=sample_size, p=weights)

print("Generated sample:", generated.tolist())
print("Frequencies:", {elem: list(generated).count(elem) for elem in elements})
Generated sample: ['apple', 'banana', 'apple', 'apple', 'cherry', 'banana', 'apple', 'apple', 'banana', 'apple', 'apple', 'cherry', 'banana', 'apple', 'apple']
Frequencies: {'apple': 9, 'banana': 4, 'cherry': 2}

Comparison of Methods

Method Best For Randomness Exact Frequencies
Basic List Extension Simple cases No (unless shuffled) Yes
Counter + Shuffle Working with frequency data Yes Yes
Proportional Generation Fixed output size Yes Approximate
NumPy Sampling Large datasets Yes Approximate

Common Use Cases

  • Password Generation Create passwords with specific character type distributions

  • Dataset Creation Generate balanced or imbalanced datasets for machine learning

  • Simulation Model real-world distributions in statistical simulations

  • Testing Create test data with known frequency patterns

Conclusion

Use basic list extension for exact frequencies, Counter for working with frequency data, and NumPy for large-scale random sampling. Choose the method based on whether you need exact frequencies or probabilistic sampling.

Updated on: 2026-03-27T13:17:00+05:30

444 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements