Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can Tensorflow be used to train the Illiad dataset using Python?
TensorFlow is a machine learning framework provided by Google. It is an open-source framework used with Python to implement algorithms, deep learning applications and much more. It is used for research and production purposes. It has optimization techniques that help perform complicated mathematical operations quickly by using NumPy and multi-dimensional arrays, also known as tensors.
The tensorflow package can be installed on Windows using the below command −
pip install tensorflow
We will be using the Iliad dataset, which contains text data of three translation works from William Cowper, Edward (Earl of Derby), and Samuel Butler. The model is trained to identify the translator when a single line of text is given. The text files have been preprocessed by removing document headers, footers, line numbers and chapter titles.
Dataset Configuration and Model Training
The following code demonstrates training a text classification model on the Iliad dataset. The model learns to identify which translator wrote a given line of text −
vocab_size += 2
print("Configure the dataset for better performance")
train_data = configure_dataset(train_data)
validation_data = configure_dataset(validation_data)
print("Train the model")
model = create_model(vocab_size=vocab_size, num_labels=3)
model.compile(
optimizer='adam',
loss=losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
print("Fit the training data to the model")
history = model.fit(train_data, validation_data=validation_data, epochs=3)
print("Finding the accuracy and loss associated with training")
loss, accuracy = model.evaluate(validation_data)
print("The loss is : ", loss)
print("The accuracy is : {:2.2%}".format(accuracy))
The output of the training process shows −
Configure the dataset for better performance Train the model Fit the training data to the model Epoch 1/3 697/697 [==============================] - 35s 17ms/step - loss: 0.6891 - accuracy: 0.6736 - val_loss: 0.3718 - val_accuracy: 0.8404 Epoch 2/3 697/697 [==============================] - 8s 11ms/step - loss: 0.3149 - accuracy: 0.8713 - val_loss: 0.3621 - val_accuracy: 0.8422 Epoch 3/3 697/697 [==============================] - 8s 11ms/step - loss: 0.2165 - accuracy: 0.9162 - val_loss: 0.4002 - val_accuracy: 0.8404 Finding the accuracy and loss associated with training 79/79 [==============================] - 1s 2ms/step - loss: 0.4002 - accuracy: 0.8404 The loss is : 0.40021833777427673 The accuracy is : 84.04%
How the Training Process Works
The training process involves several key steps −
Dataset Configuration: The dataset is optimized for better performance using batching and prefetching techniques.
Model Creation: A neural network model is created with the vocabulary size and number of labels (3 translators).
Model Compilation: The model is compiled with Adam optimizer and sparse categorical crossentropy loss.
Training: The model is trained for 3 epochs, showing improving accuracy from 67% to 91% on training data.
Evaluation: Final validation accuracy reaches 84.04% with a loss of 0.40.
Key Training Metrics
| Epoch | Training Accuracy | Validation Accuracy | Training Loss |
|---|---|---|---|
| 1 | 67.36% | 84.04% | 0.6891 |
| 2 | 87.13% | 84.22% | 0.3149 |
| 3 | 91.62% | 84.04% | 0.2165 |
Code credit − https://www.tensorflow.org/tutorials/load_data/text
Conclusion
The TensorFlow model successfully learns to classify text by translator with 84% validation accuracy. The training shows good convergence over 3 epochs, demonstrating TensorFlow's effectiveness for natural language processing tasks.
