How can Tensorflow text be used to preprocess text data?

Tensorflow Server Side Programming Programming

Tensorflow text is a package that can be used with the Tensorflow library. It has to be installed explicitly before using it. It can be used to pre-process data for text-based models.

We will use the Keras Sequential API, which is helpful in building a sequential model that is used to work with a plain stack of layers, where every layer has exactly one input tensor and one output tensor.

A neural network that contains at least one layer is known as a convolutional layer. We can use the Convolutional Neural Network to build learning model.

TensorFlow Text contains collection of text related classes and ops that can be used with TensorFlow 2.0. The TensorFlow Text can be used to preprocess sequence modelling.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

Example

import tensorflow as tf
import tensorflow_text as text
print("Converting to UTF-8 encoding")
docs = tf.constant([u'Everything not saved will be lost.'.encode('UTF-16-BE'), u'Sad?'.encode('UTF-16-BE')])
utf8_docs = tf.strings.unicode_transcode(docs, input_encoding='UTF-16-BE', output_encoding='UTF-8')

Code credit −https://www.tensorflow.org/tutorials/tensorflow_text/intro

Output

Converting to UTF-8 encoding

Explanation

The strings can be converted to UTF-8 encoding with the help of the ‘encode’ method.
Once this is done, the strings are transcoded to UTF-8 encoding

AmitDiwan

Updated on: 2021-02-22T07:26:04+05:30

277 Views

Kickstart Your Career

Get certified by completing the course

Get Started