Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can TensorFlow be used to create a plot that visualizes training and validation accuracy in the trained IMDB dataset in Python?
TensorFlow is a machine learning framework provided by Google. It is an open-source framework used with Python to implement algorithms, deep learning applications, and much more. It is used in research and production purposes.
The 'tensorflow' package can be installed on Windows using the below line of code ?
pip install tensorflow
The 'IMDB' dataset contains reviews of over 50 thousand movies. This dataset is commonly used for Natural Language Processing operations and text classification tasks.
We are using Google Colaboratory to run the below code. Google Colab helps run Python code in the browser with zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory is built on top of Jupyter Notebook.
Complete Training and Visualization Example
Here's a complete example showing how to train a model on the IMDB dataset and visualize the training and validation accuracy ?
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
# Load the IMDB dataset
imdb = keras.datasets.imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
# Prepare the data
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
# Create validation set
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
# Build the model
model = keras.Sequential([
keras.layers.Dense(16, activation='relu', input_shape=(10000,)),
keras.layers.Dense(16, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val, y_val))
Visualizing Training and Validation Accuracy
After training the model, we can extract the accuracy metrics from the history object and create a visualization ?
# Extract training and validation accuracy
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1, len(acc) + 1)
# Create the accuracy plot
plt.figure(figsize=(10, 6))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()
Understanding the Plot
The resulting plot shows two curves ?
- Training accuracy (blue circles): Shows how well the model performs on the training data
- Validation accuracy (blue line): Shows how well the model performs on unseen validation data
- Overfitting detection: If training accuracy continues to improve while validation accuracy plateaus or decreases, the model is overfitting
Key Components Explained
history.history: Dictionary containing training metrics recorded during model training
epochs: Range representing the number of training iterations
matplotlib.pyplot: Used to create the visualization plot
Legend placement: 'lower right' typically shows the best view for accuracy plots
Conclusion
Visualizing training and validation accuracy helps identify overfitting and determine the optimal number of epochs. The matplotlib library provides an effective way to plot these metrics from TensorFlow's training history.
