How can Tensorflow be used to view a sample of the vectorised data using Python?

TensorFlow is a machine learning framework provided by Google. It is an open-source framework used with Python to implement algorithms, deep learning applications and much more. It is used in research and production purposes.

The tensorflow package can be installed on Windows using the below line of code −

pip install tensorflow

Tensor is a data structure used in TensorFlow that helps connect edges in a flow diagram known as the Data flow graph. Tensors are multidimensional arrays or lists.

We will be using the Iliad's dataset, which contains text data of three translation works from William Cowper, Edward (Earl of Derby) and Samuel Butler. The model is trained to identify the translator when a single line of text is given. The text files have been preprocessed by removing document headers, footers, line numbers and chapter titles.

What is Text Vectorization?

Text vectorization converts text into numerical format that machine learning models can understand. Each word or token gets mapped to a unique integer value, creating a numerical representation of the text.

Viewing Vectorized Sample Data

Here's how to examine a sample of vectorized data using TensorFlow ?

print("Look at sample data after processing it")
example_text, example_label = next(iter(all_labeled_data))

print("The sentence is : ", example_text.numpy())
vectorized_text, example_label = preprocess_text(example_text, example_label)

print("The vectorized sentence is : ", vectorized_text.numpy())
print("Run the pre-process function on the data")

all_encoded_data = all_labeled_data.map(preprocess_text)

Output

Look at sample data after processing it
The sentence is : b'But I have now both tasted food, and given'
The vectorized sentence is : [ 20 21 58 49 107 3497 909 2 4 540]
Run the pre-process function on the data

Understanding the Results

The output shows the transformation process ?

  • Original text: "But I have now both tasted food, and given" − human-readable sentence

  • Vectorized text: [20 21 58 49 107 3497 909 2 4 540] − numerical representation where each number corresponds to a word in the vocabulary

Key Points

  • Each word gets mapped to a unique integer based on the vocabulary

  • Common words like "I", "and" get lower numbers as they appear frequently

  • Rare words like "tasted" get higher numbers

  • This numerical format allows neural networks to process text data

Conclusion

Viewing vectorized data helps understand how TensorFlow converts text into numerical format. This transformation is essential for training machine learning models on text data, as models can only process numerical inputs.

Updated on: 2026-03-25T15:26:35+05:30

224 Views

Advertisements