How can Tensorflow be used to download and explore the Illiad dataset using Python?

TensorFlow is a machine learning framework provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications, and much more. It is used in research and for production purposes.

The 'tensorflow' package can be installed on Windows using the below line of code −

pip install tensorflow

A Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram known as the 'Data flow graph'. Tensors are multidimensional arrays or lists that can be identified using three main attributes −

  • Rank − It tells about the dimensionality of the tensor

  • Type − It tells about the data type associated with the elements

  • Shape − It is the number of rows and columns together

About the Illiad Dataset

We will be using the Illiad dataset, which contains text data of three translation works from William Cowper, Edward (Earl of Derby) and Samuel Butler. The model is trained to identify the translator when a single line of text is given. The text files have been preprocessed by removing document headers, footers, line numbers and chapter titles.

Downloading the Dataset

The following code downloads the Illiad dataset files using TensorFlow's utility functions ?

import tensorflow as tf
from tensorflow.keras import utils
import pathlib

print("Loading the Illiad dataset")
DIRECTORY_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/illiad/'
FILE_NAMES = ['cowper.txt', 'derby.txt', 'butler.txt']

print("Iterating through the name of the files")
for name in FILE_NAMES:
    text_dir = utils.get_file(name, origin=DIRECTORY_URL + name)

parent_dir = pathlib.Path(text_dir).parent
print("The list of files in the directory")
print(list(parent_dir.iterdir()))

The output of the above code is ?

Loading the Illiad dataset
Iterating through the name of the files
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/illiad/cowper.txt
819200/815980 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/illiad/derby.txt
811008/809730 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/illiad/butler.txt
811008/807992 [==============================] - 0s 0us/step
The list of files in the directory
[PosixPath('/root/.keras/datasets/derby.txt'), PosixPath('/root/.keras/datasets/cowper.txt'), PosixPath('/root/.keras/datasets/butler.txt')]

Exploring the Downloaded Files

Once downloaded, you can explore the content of these text files to understand the data structure ?

# Read and display sample content from one file
sample_file = pathlib.Path(text_dir).parent / 'cowper.txt'

with open(sample_file, 'r', encoding='utf-8') as f:
    sample_text = f.read(200)  # Read first 200 characters
    print("Sample text from Cowper's translation:")
    print(sample_text)

# Check file sizes
for name in FILE_NAMES:
    file_path = pathlib.Path(text_dir).parent / name
    size = file_path.stat().st_size
    print(f"{name}: {size} bytes")

Key Points

  • The tf.keras.utils.get_file() function downloads files and caches them locally

  • Files are stored in the ~/.keras/datasets/ directory by default

  • The dataset contains three translation versions of Homer's Illiad

  • Each text file has been preprocessed for machine learning tasks

Conclusion

TensorFlow provides convenient utilities to download and explore text datasets like the Illiad collection. The utils.get_file() function handles downloading and caching, making it easy to access preprocessed text data for natural language processing tasks.

Updated on: 2026-03-25T15:25:10+05:30

208 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements