How to encode multiple strings that have the same length using Tensorflow and Python?

Multiple strings of same length can be encoded using tf.Tensor as an input value. When encoding multiple strings of varying lengths, a tf.RaggedTensor should be used as an input. If a tensor contains multiple strings in padded/sparse format, it needs to be converted to a tf.RaggedTensor before calling unicode_encode.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

Let us understand how to represent Unicode strings using Python, and manipulate those using Unicode equivalents. We separate the Unicode strings into tokens based on script detection with the help of the Unicode equivalents of standard string operations.

We are using Google Colaboratory to run the below code. Google Colab helps run Python code over the browser and requires zero configuration with free access to GPUs.

Setting Up TensorFlow

First, let's import TensorFlow and set up our environment ?

import tensorflow as tf
print("TensorFlow version:", tf.__version__)
TensorFlow version: 2.13.0

Encoding Strings of Same Length

When encoding multiple strings of same length, tf.Tensor can be used as input ?

import tensorflow as tf

# Unicode code points for "cat", "dog", "cow"
same_length_strings = [[99, 97, 116], [100, 111, 103], [99, 111, 119]]

print("Encoding multiple strings of same lengths using tf.Tensor:")
encoded = tf.strings.unicode_encode(same_length_strings, output_encoding='UTF-8')
print(encoded)
Encoding multiple strings of same lengths using tf.Tensor:
tf.Tensor([b'cat' b'dog' b'cow'], shape=(3,), dtype=string)

Encoding Strings of Varying Length

For strings with different lengths, we need to use tf.RaggedTensor ?

import tensorflow as tf

# Create a RaggedTensor for varying length strings
batch_chars_ragged = tf.ragged.constant([
    [99, 97, 116],        # "cat" - 3 chars
    [100, 111, 103, 115], # "dogs" - 4 chars  
    [99, 111, 119]        # "cow" - 3 chars
])

print("Encoding strings with varying length using tf.RaggedTensor:")
encoded_ragged = tf.strings.unicode_encode(batch_chars_ragged, output_encoding='UTF-8')
print(encoded_ragged)
Encoding strings with varying length using tf.RaggedTensor:
tf.Tensor([b'cat' b'dogs' b'cow'], shape=(3,), dtype=string)

Converting Padded/Sparse Tensors

When working with padded or sparse tensors, convert them to tf.RaggedTensor first ?

import tensorflow as tf

# Example with padded tensor (using -1 as padding)
batch_chars_padded = tf.constant([
    [99, 97, 116, -1],    # "cat" + padding
    [100, 111, 103, 115], # "dogs"
    [99, 111, 119, -1]    # "cow" + padding
])

print("Converting padded tensor to RaggedTensor and encoding:")
ragged_from_padded = tf.RaggedTensor.from_tensor(batch_chars_padded, padding=-1)
encoded_from_padded = tf.strings.unicode_encode(ragged_from_padded, output_encoding='UTF-8')
print(encoded_from_padded)
Converting padded tensor to RaggedTensor and encoding:
tf.Tensor([b'cat' b'dogs' b'cow'], shape=(3,), dtype=string)

Summary

Input Type Use Case Method
tf.Tensor Same length strings Direct encoding
tf.RaggedTensor Varying length strings Direct encoding
Padded/Sparse Tensor Mixed format data Convert to RaggedTensor first

Conclusion

Use tf.Tensor for encoding strings of equal length, and tf.RaggedTensor for varying lengths. Always convert padded or sparse tensors to tf.RaggedTensor before encoding using tf.strings.unicode_encode().

Updated on: 2026-03-26T13:12:43+05:30

324 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements