How can text data be embedded into dimensional vectors using Python?

Text embedding is the process of converting text data into numerical vectors that machine learning models can understand. Python provides powerful libraries like TensorFlow and Keras to create embeddings that capture semantic meaning in high-dimensional space.

TensorFlow is Google's open-source machine learning framework that works seamlessly with Python for implementing deep learning applications. Keras, now integrated within TensorFlow, provides a high-level API for building neural networks quickly and efficiently.

Understanding Text Embeddings

Text embeddings transform words or sentences into dense numerical vectors where similar words have similar vector representations. This allows models to understand relationships between words mathematically.

Setting Up the Environment

First, let's import the necessary libraries ?

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Creating Text Embeddings with Keras

Here's how to embed text data into dimensional vectors using Keras Embedding layers ?

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print("Setting up parameters for text embedding")
num_tags = 12
num_words = 10000  # Vocabulary size
num_classes = 4
embedding_dim = 64  # Dimension of embedding vectors

# Create input layers for different text components
title_input = keras.Input(shape=(None,), name="title")
body_input = keras.Input(shape=(None,), name="body") 
tags_input = keras.Input(shape=(num_tags,), name="tags")

print("Creating embedding layers...")

# Embed every word in the title to a 64-dimensional vector
title_features = layers.Embedding(num_words, embedding_dim)(title_input)
print(f"Title embedding shape: {title_features.shape}")

# Embed every word in the body to a 64-dimensional vector  
body_features = layers.Embedding(num_words, embedding_dim)(body_input)
print(f"Body embedding shape: {body_features.shape}")

# Reduce sequence of embedded words using LSTM layers
title_features = layers.LSTM(128)(title_features)
body_features = layers.LSTM(32)(body_features)

print("Combining features...")
# Merge all features into a single vector
combined_features = layers.concatenate([title_features, body_features, tags_input])

# Add prediction layers
priority_pred = layers.Dense(1, name="priority")(combined_features)
department_pred = layers.Dense(num_classes, name="department")(combined_features)

# Create the model
model = keras.Model(
    inputs=[title_input, body_input, tags_input],
    outputs=[priority_pred, department_pred],
)

print("Model created successfully!")
print(f"Model summary:")
model.summary()
Setting up parameters for text embedding
Creating embedding layers...
Title embedding shape: (None, None, 64)
Body embedding shape: (None, None, 64)
Combining features...
Model created successfully!
Model summary:
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 title (InputLayer)             [(None, None)]       0           []                               
                                                                                                  
 body (InputLayer)              [(None, None)]       0           []                               
                                                                                                  
 embedding (Embedding)          (None, None, 64)     640000      ['title[0][0]']                  
                                                                                                  
 embedding_1 (Embedding)        (None, None, 64)     640000      ['body[0][0]']                   
                                                                                                  
 lstm (LSTM)                    (None, 128)          98816       ['embedding[0][0]']              
                                                                                                  
 lstm_1 (LSTM)                  (None, 32)           12416       ['embedding_1[0][0]']            
                                                                                                  
 tags (InputLayer)              [(None, 12)]         0           []                               
                                                                                                  
 concatenate (Concatenate)      (None, 172)          0           ['lstm[0][0]',                   
                                                                  'lstm_1[0][0]',                 
                                                                  'tags[0][0]']                   
                                                                                                  
 priority (Dense)               (None, 1)            173         ['concatenate[0][0]']            
                                                                                                  
 department (Dense)             (None, 4)            692         ['concatenate[0][0]']            
                                                                                                  
==================================================================================================
Total params: 1,392,097
Trainable params: 1,392,097
Non-trainable params: 0
__________________________________________________________________________________________________

How Embedding Works

The Embedding layer creates a lookup table that maps each word index to a dense vector. Key parameters include:

  • input_dim: Size of vocabulary (10,000 unique words)
  • output_dim: Dimension of embedding vectors (64 dimensions)
  • input_length: Length of input sequences (None for variable length)

Processing Pipeline

Step Input Output Purpose
Text Input Word indices Integer sequences Convert text to numbers
Embedding Integer sequences Dense vectors Create semantic representations
LSTM Sequence of vectors Single vector Capture sequence patterns
Dense Layer Combined features Predictions Make final classifications

Conclusion

Text embedding transforms words into meaningful numerical vectors using Keras Embedding layers. The functional API allows combining multiple text inputs with LSTM layers to capture sequential patterns and make predictions on multi-input, multi-output models.

Updated on: 2026-03-25T14:49:15+05:30

290 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements