The ‘tf.Data’ helps in customizing the model building pipeline, by shuffling the data in the dataset so that all types of data get evenly distributed (if possible).
We will be using the flowers dataset, containing images of several thousands of flowers. It contains 5 sub-directories, and there is one sub-directory for every class.
We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.
print("Defining customized input pipeline") list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'), shuffle=False) list_ds = list_ds.shuffle(image_count, reshuffle_each_iteration=False) for f in list_ds.take(5): print(f.numpy()) class_names = np.array(sorted([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])) print(class_names) print("The dataset is split into training and validation set") val_size = int(image_count * 0.2) train_ds = list_ds.skip(val_size) val_ds = list_ds.take(val_size) print("Length of each subset is displayed below") print(tf.data.experimental.cardinality(train_ds).numpy()) print(tf.data.experimental.cardinality(val_ds).numpy())
Code credit: https://www.tensorflow.org/tutorials/load_data/images
Defining customized input pipeline b'/root/.keras/datasets/flower_photos/dandelion/14306875733_61d71c64c0_n.jpg' b'/root/.keras/datasets/flower_photos/dandelion/8935477500_89f22cca03_n.jpg' b'/root/.keras/datasets/flower_photos/sunflowers/3001531316_efae24d37d_n.jpg' b'/root/.keras/datasets/flower_photos/daisy/7133935763_82b17c8e1b_n.jpg' b'/root/.keras/datasets/flower_photos/tulips/17844723633_da85357fe3.jpg' ['daisy' 'dandelion' 'roses' 'sunflowers' 'tulips'] The dataset is split into training and validation set Length of each subset is displayed below 2936 734