How can Tensorflow be used to load the Illiad dataset using Python?

PythonServer Side ProgrammingProgramming

Tensorflow is a machine learning framework that is provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications, and much more. It is used in research and for production purposes.

Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but a multidimensional array or a list.

They can be identified using three main attributes −

  • Rank − It tells about the dimensionality of the tensor. It can be understood as the order of the tensor or the number of dimensions in the tensor that has been defined.

  • Type − It tells about the data type associated with the elements of the Tensor. It can be a one dimensional, two dimensional or n-dimensional tensor.

  • Shape − It is the number of rows and columns together.

We will be using the Illiad’s dataset, which contains text data of three translation works from William Cowper, Edward (Earl of Derby) and Samuel Butler. The model is trained to identify the translator when a single line of text is given. The text files used have been preprocessing. This includes removing the document header and footer, line numbers and chapter titles.

We are using Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

Example

Following is the code snippet −

def labeler(example, index):
   return example, tf.cast(index, tf.int64)
print(“An empty list has been created”)
labeled_data_sets = []
print(“Iterate through the file names and create a dataset from text file using ‘TextLineDataset’
method”)
for i, file_name in enumerate(FILE_NAMES):
   lines_dataset = tf.data.TextLineDataset(str(parent_dir/file_name))
   labeled_dataset = lines_dataset.map(lambda ex: labeler(ex, i))
   labeled_data_sets.append(labeled_dataset)

Code credit − https://www.tensorflow.org/tutorials/load_data/text

Output

An empty list has been created
Iterate through the file names and create a dataset from text file using ‘TextLineDataset’ method

Explanation

  • The ‘TextLineDataset’ is used, which creates a tf.data.A dataset from a text file.

  • Every example is a line of text from the original file

  • The ‘text_dataset_from_directory’ treats the contents of a file as a single example.

  • TextLineDataset is useful when working with text data that is line-based.

  • Iterate through these files and load every row into its own dataset.

  • Every example should be individually labeled, so ‘tf.data.Dataset.map’ is used to apply a labeler function to every row.

  • This will iterate over every example in the dataset, and returns (example, label) pairs as output.

raja
Published on 19-Jan-2021 07:45:39
Advertisements