Article Categories

Selected Reading

How can Tensorflow be used to pre-process the flower training dataset?

Python Server Side Programming Programming Tensorflow

TensorFlow can preprocess the flower training dataset using the Keras preprocessing API. The image_dataset_from_directory method efficiently loads images from directories and creates validation datasets with proper batching and image resizing.

About the Flower Dataset

The flower dataset contains 3,700 images of flowers divided into 5 classes: daisy, dandelion, roses, sunflowers, and tulips. Each class has its own subdirectory, making it perfect for the image_dataset_from_directory function.

Preprocessing the Dataset

Here's how to preprocess the flower dataset using TensorFlow's Keras preprocessing API ?

import tensorflow as tf

# Set parameters
data_dir = "path/to/flower_photos"  # Replace with actual path
img_height = 180
img_width = 180
batch_size = 32

print("Pre-processing the dataset using keras.preprocessing")

# Create training dataset
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)

# Create validation dataset
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)

# Display class names
class_names = train_ds.class_names
print("The class names are:")
print(class_names)

The output shows the dataset split and class names ?

Pre-processing the dataset using keras.preprocessing
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
The class names are:
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

Key Parameters Explained

Parameter	Purpose	Value
`validation_split`	Fraction of data for validation	0.2 (20%)
`seed`	Random seed for reproducibility	123
`image_size`	Resize all images to this size	(180, 180)
`batch_size`	Number of images per batch	32

Performance Optimization

For better performance during training, configure the dataset for caching and prefetching ?

AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

print("Dataset optimized with caching and prefetching")

Dataset optimized with caching and prefetching

Conclusion

TensorFlow's image_dataset_from_directory efficiently preprocesses the flower dataset by automatically splitting data into training and validation sets, resizing images, and creating batches. This preprocessing step is essential for building robust image classification models.

AmitDiwan

Updated on: 2026-03-25T16:12:11+05:30

279 Views

Previous Next