How can the preprocessed data be shuffled using Tensorflow and Python?

TensorFlow is a machine learning framework provided by Google. It is an open-source framework used with Python to implement algorithms, deep learning applications, and much more. It uses NumPy and multi-dimensional arrays called tensors for efficient mathematical operations.

The tensorflow package can be installed on Windows using the following command ?

pip install tensorflow

In this tutorial, we'll demonstrate how to shuffle preprocessed data using TensorFlow's dataset operations. We'll use the Iliad dataset containing text translations from William Cowper, Edward (Earl of Derby), and Samuel Butler.

Dataset Preparation

The text files have been preprocessed to remove document headers, footers, line numbers, and chapter titles. The model will be trained to identify the translator from a single line of text.

Shuffling Preprocessed Data

Here's how to combine labeled datasets and shuffle them using TensorFlow ?

print("Combine the labelled dataset and reshuffle it")
BUFFER_SIZE = 50000
BATCH_SIZE = 64
VALIDATION_SIZE = 5000

# Combine all labeled datasets
all_labeled_data = labeled_data_sets[0]
for labeled_dataset in labeled_data_sets[1:]:
    all_labeled_data = all_labeled_data.concatenate(labeled_dataset)

# Shuffle the combined dataset
all_labeled_data = all_labeled_data.shuffle(
    BUFFER_SIZE, reshuffle_each_iteration=False)

print("Displaying a few samples of input data")
for text, label in all_labeled_data.take(8):
    print("The sentence is : ", text.numpy())
    print("The label is :", label.numpy())

The output of the above code is ?

Combine the labelled dataset and reshuffle it
Displaying a few samples of input data
The sentence is : b'But I have now both tasted food, and given'
The label is : 0
The sentence is : b'All these shall now be thine: but if the Gods'
The label is : 1
The sentence is : b'Their spiry summits waved. There, unperceived'
The label is : 0
The sentence is : b'"I pray you, would you show your love, dear friends,'
The label is : 1
The sentence is : b'Entering beneath the clavicle the point'
The label is : 0
The sentence is : b'But grief, his father lost, awaits him now,'
The label is : 1
The sentence is : b'in the fore-arm where the sinews of the elbow are united, whereon he'
The label is : 2
The sentence is : b'For, as I think, I have already chased'
The label is : 0

Key Parameters

  • BUFFER_SIZE (50000): Size of the shuffle buffer. Larger values provide better randomization but use more memory.

  • reshuffle_each_iteration=False: Maintains the same shuffle order across epochs for reproducibility.

  • Labels: 0, 1, 2 represent the three different translators (Cowper, Derby, Butler).

How Shuffling Works

The shuffle() method randomly reorders dataset elements within the specified buffer size. This prevents the model from learning patterns based on data order, improving training effectiveness.

TensorFlow Dataset Shuffling Process Dataset 1 Dataset 2 Dataset 3 concatenate() Combined Dataset shuffle() Shuffled Dataset Buffer Size: 50,000 ? Controls randomization quality ? Higher = better shuffle, more memory ? Lower = less memory, less random

Conclusion

Use TensorFlow's shuffle() method to randomize dataset order for better model training. The buffer size controls randomization quality while concatenate() combines multiple datasets before shuffling.

Updated on: 2026-03-25T15:25:39+05:30

203 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements