How can text data be embedded into dimensional vectors using Python?


Tensorflow is a machine learning framework that is provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications and much more. It is used in research and for production purposes.

Keras was developed as a part of the research for the project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System). Keras is a deep learning API, which is written in Python. It is a high-level API that has a productive interface that helps solve machine learning problems. It runs on top of the Tensorflow framework. It was built to help experiment in a quick manner. It provides essential abstractions and building blocks that are essential in developing and encapsulating machine learning solutions.

Keras is already present within the Tensorflow package. It can be accessed using the below line of code.

import tensorflow
from tensorflow import keras

The Keras functional API helps create models that are more flexible in comparison to models created using sequential API. The functional API can work with models that have non-linear topology, can share layers and work with multiple inputs and outputs. A deep learning model is usually a directed acyclic graph (DAG) that contains multiple layers. The functional API helps build the graph of layers.

We are using Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook. Following is the code snippet wherein we will embed every word in the title to a 64-dimensional vector −

Example

print("Number of unique issue tags")
num_tags = 12
print("Size of vocabulary while preprocessing text data")
num_words = 10000
print("Number of classes for predictions")
num_classes = 4
title_input = keras.Input(
   shape=(None,), name="title"
)
print("Variable length int sequence")
body_input = keras.Input(shape=(None,), name="body")
tags_input = keras.Input(
   shape=(num_tags,), name="tags"
)
print("Embed every word in the title to a 64-dimensional vector")
title_features = layers.Embedding(num_words, 64)(title_input)
print("Embed every word into a 64-dimensional vector")
body_features = layers.Embedding(num_words, 64)(body_input)
print("Reduce sequence of embedded words into single 128-dimensional vector")
title_features = layers.LSTM(128)(title_features)
print("Reduce sequence of embedded words into single 132-dimensional vector")
body_features = layers.LSTM(32)(body_features)
print("Merge available features into a single vector by concatenating it")
x = layers.concatenate([title_features, body_features, tags_input])
print("Use logistic regression to predict the features")
priority_pred = layers.Dense(1, name="priority")(x)
department_pred = layers.Dense(num_classes, name="class")(x)
print("Instantiate a model that predicts priority and class")
model = keras.Model(
   inputs=[title_input, body_input, tags_input],
   outputs=[priority_pred, department_pred],
)

Code credit − https://www.tensorflow.org/guide/keras/functional

Output

Number of unique issue tags
Size of vocabulary while preprocessing text data
Number of classes for predictions
Variable length int sequence
Embed every word in the title to a 64-dimensional vector
Embed every word into a 64-dimensional vector
Reduce sequence of embedded words into single 128-dimensional vector
Reduce sequence of embedded words into single 132-dimensional vector
Merge available features into a single vector by concatenating it
Use logistic regression to predict the features
Instantiate a model that predicts priority and class

Explanation

  • The functional API can be used to work with multiple inputs and outputs.

  • This can’t be done with sequential API.

Updated on: 18-Jan-2021

117 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements