- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How can Tensorflow used to segment word code point of ragged tensor back to sentences?
The word code point of a ragged tensor can be segmented in the following method: Segmentation refers to the act of splitting text into word-like units. This is used in cases where space characters are utilized in order to separate words, but some languages like Chinese and Japanese don’t use spaces. Some languages such as German contain long compounds that need to be split in order to analyse their meaning.
The word’s code point is segmented back to sentence. The next step is to check if the code point for a character in a word is present in the sentence or not. If it is present, a ragged tensor is created, and the sentence is encoded back to standard encoding.
Let us understand how to represent Unicode strings using Python, and manipulate those using Unicode equivalents. First, we separate the Unicode strings into tokens based on script detection with the help of the Unicode equivalents of standard string ops.
We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.
print("Segment the word code points back to sentences") print("Check if code point for a character in a word is present in the sentence") sentence_word_char_codepoint = tf.RaggedTensor.from_row_lengths( values=word_char_codepoint, row_lengths=sentence_num_words) print(sentence_word_char_codepoint) print("Encoding it back to UTF-8") tf.strings.unicode_encode(sentence_word_char_codepoint, 'UTF-8').to_list()
Code credit: https://www.tensorflow.org/tutorials/load_data/unicode
Segment the word code points back to sentences Check if code point for a character in a word is present in the sentence <tf.RaggedTensor [[[72, 101, 108, 108, 111], [44, 32], [116, 104, 101, 114, 101], ], [[19990, 30028], [12371, 12435, 12395, 12385, 12399]]]> Encoding it back to UTF-8 [[b'Hello', b', ', b'there', b'.'], [b'\xe4\xb8\x96\xe7\x95\x8c', b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf']]
- The code points are segmented to sentences.
- It is determined whether a code point for a character is present in the sentence or not.
- The decoded data is encoded back to UTF-8 encoding.
- How can Tensorflow and Python be used to build ragged tensor from list of words?
- How can Tensorflow and Python be used to get code point of every word in the sentence?
- How can TensorFlow be used to create a tensor and display a message using Python?
- How can Tensorflow and Tensorflow text be used to tokenize string data?
- How can Tensorflow be used to implement custom layers?
- Find middle point segment from given segment lengths in C++
- How can Tensorflow be used to visualize the results of the model?
- How can Tensorflow be used to standardize the flower dataset?
- How can Tensorflow text be used to preprocess text data?
- How can Tensorflow be used to compose layers using Python?
- How can TensorFlow Text be used to preprocess sequence modelling?
- How can Tensorflow be used to export the model so that it can be used later?
- How can Tensorflow be used to visualize the data using Python?
- How can Tensorflow be used to configure the dataset for performance?