Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can Tensorflow be used to pre-process the flower training dataset?
TensorFlow can preprocess the flower training dataset using the Keras preprocessing API. The image_dataset_from_directory method efficiently loads images from directories and creates validation datasets with proper batching and image resizing.
Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?
About the Flower Dataset
The flower dataset contains 3,700 images of flowers divided into 5 classes: daisy, dandelion, roses, sunflowers, and tulips. Each class has its own subdirectory, making it perfect for the image_dataset_from_directory function.
Preprocessing the Dataset
Here's how to preprocess the flower dataset using TensorFlow's Keras preprocessing API ?
import tensorflow as tf
# Set parameters
data_dir = "path/to/flower_photos" # Replace with actual path
img_height = 180
img_width = 180
batch_size = 32
print("Pre-processing the dataset using keras.preprocessing")
# Create training dataset
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
# Create validation dataset
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
# Display class names
class_names = train_ds.class_names
print("The class names are:")
print(class_names)
The output shows the dataset split and class names ?
Pre-processing the dataset using keras.preprocessing Found 3670 files belonging to 5 classes. Using 2936 files for training. Found 3670 files belonging to 5 classes. Using 734 files for validation. The class names are: ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
Key Parameters Explained
| Parameter | Purpose | Value |
|---|---|---|
validation_split |
Fraction of data for validation | 0.2 (20%) |
seed |
Random seed for reproducibility | 123 |
image_size |
Resize all images to this size | (180, 180) |
batch_size |
Number of images per batch | 32 |
Performance Optimization
For better performance during training, configure the dataset for caching and prefetching ?
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
print("Dataset optimized with caching and prefetching")
Dataset optimized with caching and prefetching
Conclusion
TensorFlow's image_dataset_from_directory efficiently preprocesses the flower dataset by automatically splitting data into training and validation sets, resizing images, and creating batches. This preprocessing step is essential for building robust image classification models.
