Article Categories

Selected Reading

How can Tensorflow be used to split the flower dataset into training and validation?

Python Server Side Programming Programming Tensorflow

The flower dataset can be split into training and validation sets using TensorFlow's Keras preprocessing API. The image_dataset_from_directory function provides an easy way to load images from directories and automatically split them into training and validation sets.

About the Flower Dataset

The flower dataset contains approximately 3,700 images of flowers organized into 5 subdirectories, with one subdirectory per class: daisy, dandelion, roses, sunflowers, and tulips. This structure makes it perfect for supervised learning tasks.

Splitting the Dataset

Here's how to split the flower dataset into training and validation sets using TensorFlow ?

import tensorflow as tf

# Define parameters
batch_size = 32
img_height = 180
img_width = 180
data_dir = "path/to/flower/dataset"  # Replace with actual path

print("The data is being split into training and validation set")

# Create training dataset (80% of data)
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

# Create validation dataset (20% of data)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

Output

The data is being split into training and validation set
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.

Key Parameters

validation_split=0.2: Reserves 20% of images for validation
subset: Specifies whether to return training or validation data
seed=123: Ensures reproducible splits across runs
image_size: Resizes all images to specified dimensions
batch_size: Number of images per batch for efficient processing

How It Works

The image_dataset_from_directory function automatically:

Scans the directory structure to identify classes from folder names
Loads images efficiently using tf.data.Dataset
Applies the specified train/validation split randomly
Resizes images to the target dimensions
Organizes data into batches for training

Conclusion

TensorFlow's image_dataset_from_directory provides a convenient way to split image datasets. The 80/20 training/validation split ensures sufficient data for both learning and evaluation while maintaining reproducibility through seed control.

AmitDiwan

Updated on: 2026-03-25T16:11:53+05:30

531 Views

Previous Next