- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
- Python Advanced Tutorial
- Python - Classes/Objects
- Python - Reg Expressions
- Python - CGI Programming
- Python - Database Access
- Python - Networking
- Python - Sending Email
- Python - Multithreading
- Python - XML Processing
- Python - GUI Programming
- Python - Further Extensions
How can the Illiad dataset be prepared for training using Python?
Tensorflow is a machine learning framework that is provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications, and much more. It is used in research and for production purposes.
The ‘tensorflow’ package can be installed on Windows using the below line of code −
pip install tensorflow
Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but a multidimensional array or a list.
We will be using the Illiad’s dataset, which contains text data of three translation works from William Cowper, Edward (Earl of Derby), and Samuel Butler. The model is trained to identify the translator when a single line of text is given. The text files used have been preprocessing. This includes removing the document header and footer, line numbers and chapter titles.
We are using Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Collaboratory has been built on top of Jupyter Notebook.
Following is the code snippet −
print("Prepare the dataset for training") tokenizer = tf_text.UnicodeScriptTokenizer() print("Defining a function named 'tokenize' to tokenize the text data") def tokenize(text, unused_label): lower_case = tf_text.case_fold_utf8(text) return tokenizer.tokenize(lower_case) tokenized_ds = all_labeled_data.map(tokenize) print("Iterate over the dataset and print a few samples") for text_batch in tokenized_ds.take(6): print("Tokens: ", text_batch.numpy())
Code credit − https://www.tensorflow.org/tutorials/load_data/text
Prepare the dataset for training Defining a function named 'tokenize' to tokenize the text data WARNING:tensorflow:From /usr/local/lib/python3.6/distpackages/tensorflow/python/util/dispatch.py:201: batch_gather (from tensorflow.python.ops.array_ops) is deprecated and will be removed after 2017-10-25. Instructions for updating: `tf.batch_gather` is deprecated, please use `tf.gather` with `batch_dims=-1` instead. Iterate over the dataset and print a few samples Tokens: [b'but' b'i' b'have' b'now' b'both' b'tasted' b'food' b',' b'and' b'given'] Tokens: [b'all' b'these' b'shall' b'now' b'be' b'thine' b':' b'but' b'if' b'the' b'gods'] Tokens: [b'their' b'spiry' b'summits' b'waved' b'.' b'there' b',' b'unperceived'] Tokens: [b'"' b'i' b'pray' b'you' b',' b'would' b'you' b'show' b'your' b'love' b',' b'dear' b'friends' b','] Tokens: [b'entering' b'beneath' b'the' b'clavicle' b'the' b'point'] Tokens: [b'but' b'grief' b',' b'his' b'father' b'lost' b',' b'awaits' b'him' b'now' b',']
A ‘tokenize’ function is defined that splits the sentences in the dataset into words by eliminating spaces.
This function is called on the dataset in its entirety.
A sample of the dataset after tokenizing is displayed on the console.
- Related Articles
- How can Tensorflow be used to load the Illiad dataset using Python?
- How can Tensorflow be used to train the Illiad dataset using Python?
- How can Tensorflow be used to split the Illiad dataset into training and test data in Python?
- How can Tensorflow be used to download and explore the Illiad dataset using Python?
- How can Tensorflow be used to build vocabulary from tokenized words for Illiad dataset using Python?
- How can Tensorflow be used to create a dataset of raw strings from the Illiad dataset using Python?
- How can Tensorflow be used to prepare the IMDB dataset for training in Python?
- How can Tensorflow be used to convert the tokenized words from Illiad dataset into integers using Python?
- How can Tensorflow be used with Illiad dataset to check how well the test data performs using Python?
- Explain how scikit-learn library can be used to split the dataset for training and testing purposes in Python?
- How can Tensorflow be used to pre-process the flower training dataset?
- How can Tensorflow be used with flower dataset to continue training the model?
- How can Tensorflow be used to split the flower dataset into training and validation?
- How can Tensorflow and Estimator be used to define input function for training and evaluation of dataset?
- How can Tensorflow be used to visualize the flower dataset using Python?