Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How can text data be embedded into dimensional vectors using Python?
Text embedding is the process of converting text data into numerical vectors that machine learning models can understand. Python provides powerful libraries like TensorFlow and Keras to create embeddings that capture semantic meaning in high-dimensional space.
TensorFlow is Google's open-source machine learning framework that works seamlessly with Python for implementing deep learning applications. Keras, now integrated within TensorFlow, provides a high-level API for building neural networks quickly and efficiently.
Understanding Text Embeddings
Text embeddings transform words or sentences into dense numerical vectors where similar words have similar vector representations. This allows models to understand relationships between words mathematically.
Setting Up the Environment
First, let's import the necessary libraries ?
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers
Creating Text Embeddings with Keras
Here's how to embed text data into dimensional vectors using Keras Embedding layers ?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print("Setting up parameters for text embedding")
num_tags = 12
num_words = 10000 # Vocabulary size
num_classes = 4
embedding_dim = 64 # Dimension of embedding vectors
# Create input layers for different text components
title_input = keras.Input(shape=(None,), name="title")
body_input = keras.Input(shape=(None,), name="body")
tags_input = keras.Input(shape=(num_tags,), name="tags")
print("Creating embedding layers...")
# Embed every word in the title to a 64-dimensional vector
title_features = layers.Embedding(num_words, embedding_dim)(title_input)
print(f"Title embedding shape: {title_features.shape}")
# Embed every word in the body to a 64-dimensional vector
body_features = layers.Embedding(num_words, embedding_dim)(body_input)
print(f"Body embedding shape: {body_features.shape}")
# Reduce sequence of embedded words using LSTM layers
title_features = layers.LSTM(128)(title_features)
body_features = layers.LSTM(32)(body_features)
print("Combining features...")
# Merge all features into a single vector
combined_features = layers.concatenate([title_features, body_features, tags_input])
# Add prediction layers
priority_pred = layers.Dense(1, name="priority")(combined_features)
department_pred = layers.Dense(num_classes, name="department")(combined_features)
# Create the model
model = keras.Model(
inputs=[title_input, body_input, tags_input],
outputs=[priority_pred, department_pred],
)
print("Model created successfully!")
print(f"Model summary:")
model.summary()
Setting up parameters for text embedding
Creating embedding layers...
Title embedding shape: (None, None, 64)
Body embedding shape: (None, None, 64)
Combining features...
Model created successfully!
Model summary:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
title (InputLayer) [(None, None)] 0 []
body (InputLayer) [(None, None)] 0 []
embedding (Embedding) (None, None, 64) 640000 ['title[0][0]']
embedding_1 (Embedding) (None, None, 64) 640000 ['body[0][0]']
lstm (LSTM) (None, 128) 98816 ['embedding[0][0]']
lstm_1 (LSTM) (None, 32) 12416 ['embedding_1[0][0]']
tags (InputLayer) [(None, 12)] 0 []
concatenate (Concatenate) (None, 172) 0 ['lstm[0][0]',
'lstm_1[0][0]',
'tags[0][0]']
priority (Dense) (None, 1) 173 ['concatenate[0][0]']
department (Dense) (None, 4) 692 ['concatenate[0][0]']
==================================================================================================
Total params: 1,392,097
Trainable params: 1,392,097
Non-trainable params: 0
__________________________________________________________________________________________________
How Embedding Works
The Embedding layer creates a lookup table that maps each word index to a dense vector. Key parameters include:
- input_dim: Size of vocabulary (10,000 unique words)
- output_dim: Dimension of embedding vectors (64 dimensions)
- input_length: Length of input sequences (None for variable length)
Processing Pipeline
| Step | Input | Output | Purpose |
|---|---|---|---|
| Text Input | Word indices | Integer sequences | Convert text to numbers |
| Embedding | Integer sequences | Dense vectors | Create semantic representations |
| LSTM | Sequence of vectors | Single vector | Capture sequence patterns |
| Dense Layer | Combined features | Predictions | Make final classifications |
Conclusion
Text embedding transforms words into meaningful numerical vectors using Keras Embedding layers. The functional API allows combining multiple text inputs with LSTM layers to capture sequential patterns and make predictions on multi-input, multi-output models.
