How can Tensorflow be used to prepare the dataset with stackoverflow questions using Python?


Tensorflow is a machine learning framework that is provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications and much more. It is used in research and for production purposes.

The ‘tensorflow’ package can be installed on Windows using the below line of code −

pip install tensorflow

Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but multidimensional array or a list. We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook. Following is the code snippet −

Example

VOCAB_SIZE = 10000
print("The preprocessing of text begins")
binary_vectorize_layer = TextVectorization(
   max_tokens=VOCAB_SIZE,
   output_mode='binary')
MAX_SEQUENCE_LENGTH = 250
int_vectorize_layer = TextVectorization(
   max_tokens=VOCAB_SIZE,
   output_mode='int',
   output_sequence_length=MAX_SEQUENCE_LENGTH)

Code credit −  https://www.tensorflow.org/tutorials/load_data/text

Output

The preprocessing of text begins

Explanation

  • The data is standardized, tokenized, and vectorized using the ‘TextVectorization’ layer.

  • Standardization involves pre-processing the text and removing punctuation and HTML elements.

  • Tokenization involves splitting the sentences into words, by splitting the whitespace.

  • Vectorization involves converting the tokens into numbers so that it can be understood by neural network when fed to it.

  • The binary model uses a bag-of-words model to build models.

Updated on: 18-Jan-2021

58 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements