How can TensorFlow Text be used to preprocess sequence modelling?

TensorFlow Text contains collection of text related classes and ops that can be used with TensorFlow 2.0. The library helps in pre-processing which is required by text-based models, and includes other features that are needed for sequence modelling. These features are not present in TensorFlow.

Using the ops during text pre-processing is similar to working with Tensorflow graph. This means the user wouldn’t need to worry about tokenization in training being different from tokenization at interference. Ops also helps in managing pre-processing scripts.

It can be installed using the below command:

pip install -q tensorflow-text

TensorFlow Text requires TensorFlow 2.0, and is compatible with eager mode and graph mode.

Some ops require strings to be in UTF-8 encoding. If a different encoding is used, core Tensorflow can be used to transcode op to transcode into UTF-8. The same op can be used to coerce the string to structurally valid UTF-8 encoding if the input is invalid.