How can Tensorflow be used to split the flower dataset into training and validation?

The flower dataset can be split into training and validation sets using TensorFlow's Keras preprocessing API. The image_dataset_from_directory function provides an easy way to load images from directories and automatically split them into training and validation sets.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

About the Flower Dataset

The flower dataset contains approximately 3,700 images of flowers organized into 5 subdirectories, with one subdirectory per class: daisy, dandelion, roses, sunflowers, and tulips. This structure makes it perfect for supervised learning tasks.

Splitting the Dataset

Here's how to split the flower dataset into training and validation sets using TensorFlow ?

import tensorflow as tf

# Define parameters
batch_size = 32
img_height = 180
img_width = 180
data_dir = "path/to/flower/dataset"  # Replace with actual path

print("The data is being split into training and validation set")

# Create training dataset (80% of data)
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

# Create validation dataset (20% of data)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

Output

The data is being split into training and validation set
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.

Key Parameters

  • validation_split=0.2: Reserves 20% of images for validation
  • subset: Specifies whether to return training or validation data
  • seed=123: Ensures reproducible splits across runs
  • image_size: Resizes all images to specified dimensions
  • batch_size: Number of images per batch for efficient processing

How It Works

The image_dataset_from_directory function automatically:

  • Scans the directory structure to identify classes from folder names
  • Loads images efficiently using tf.data.Dataset
  • Applies the specified train/validation split randomly
  • Resizes images to the target dimensions
  • Organizes data into batches for training

Conclusion

TensorFlow's image_dataset_from_directory provides a convenient way to split image datasets. The 80/20 training/validation split ensures sufficient data for both learning and evaluation while maintaining reproducibility through seed control.

Updated on: 2026-03-25T16:11:53+05:30

496 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements