Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can Tensorflow be used to split the flower dataset into training and validation?
The flower dataset can be split into training and validation sets using TensorFlow's Keras preprocessing API. The image_dataset_from_directory function provides an easy way to load images from directories and automatically split them into training and validation sets.
Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?
About the Flower Dataset
The flower dataset contains approximately 3,700 images of flowers organized into 5 subdirectories, with one subdirectory per class: daisy, dandelion, roses, sunflowers, and tulips. This structure makes it perfect for supervised learning tasks.
Splitting the Dataset
Here's how to split the flower dataset into training and validation sets using TensorFlow ?
import tensorflow as tf
# Define parameters
batch_size = 32
img_height = 180
img_width = 180
data_dir = "path/to/flower/dataset" # Replace with actual path
print("The data is being split into training and validation set")
# Create training dataset (80% of data)
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
# Create validation dataset (20% of data)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
Output
The data is being split into training and validation set Found 3670 files belonging to 5 classes. Using 2936 files for training. Found 3670 files belonging to 5 classes. Using 734 files for validation.
Key Parameters
- validation_split=0.2: Reserves 20% of images for validation
- subset: Specifies whether to return training or validation data
- seed=123: Ensures reproducible splits across runs
- image_size: Resizes all images to specified dimensions
- batch_size: Number of images per batch for efficient processing
How It Works
The image_dataset_from_directory function automatically:
- Scans the directory structure to identify classes from folder names
- Loads images efficiently using
tf.data.Dataset - Applies the specified train/validation split randomly
- Resizes images to the target dimensions
- Organizes data into batches for training
Conclusion
TensorFlow's image_dataset_from_directory provides a convenient way to split image datasets. The 80/20 training/validation split ensures sufficient data for both learning and evaluation while maintaining reproducibility through seed control.
