How can Tensorflow be used to pre-process the flower training dataset?

TensorFlow can preprocess the flower training dataset using the Keras preprocessing API. The image_dataset_from_directory method efficiently loads images from directories and creates validation datasets with proper batching and image resizing.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

About the Flower Dataset

The flower dataset contains 3,700 images of flowers divided into 5 classes: daisy, dandelion, roses, sunflowers, and tulips. Each class has its own subdirectory, making it perfect for the image_dataset_from_directory function.

Preprocessing the Dataset

Here's how to preprocess the flower dataset using TensorFlow's Keras preprocessing API ?

import tensorflow as tf

# Set parameters
data_dir = "path/to/flower_photos"  # Replace with actual path
img_height = 180
img_width = 180
batch_size = 32

print("Pre-processing the dataset using keras.preprocessing")

# Create training dataset
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)

# Create validation dataset
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)

# Display class names
class_names = train_ds.class_names
print("The class names are:")
print(class_names)

The output shows the dataset split and class names ?

Pre-processing the dataset using keras.preprocessing
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
The class names are:
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

Key Parameters Explained

Parameter Purpose Value
validation_split Fraction of data for validation 0.2 (20%)
seed Random seed for reproducibility 123
image_size Resize all images to this size (180, 180)
batch_size Number of images per batch 32

Performance Optimization

For better performance during training, configure the dataset for caching and prefetching ?

AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

print("Dataset optimized with caching and prefetching")
Dataset optimized with caching and prefetching

Conclusion

TensorFlow's image_dataset_from_directory efficiently preprocesses the flower dataset by automatically splitting data into training and validation sets, resizing images, and creating batches. This preprocessing step is essential for building robust image classification models.

Updated on: 2026-03-25T16:12:11+05:30

234 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements