Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to deploy model in Python using TensorFlow Serving?
Deploying machine learning models is crucial for making AI applications functional in production environments. TensorFlow Serving provides a robust, high-performance solution for serving trained models efficiently to handle real-time requests.
In this article, we will explore how to deploy a TensorFlow model using TensorFlow Serving, from installation to testing the deployed model.
What is TensorFlow Serving?
TensorFlow Serving is a flexible, high-performance serving system for machine learning models designed for production environments. It allows you to deploy new algorithms and experiments while keeping the same server architecture and APIs.
Installation and Setup
Installing TensorFlow Serving
Install the TensorFlow Serving API using pip ?
pip install tensorflow-serving-api
Installing TensorFlow Serving via Docker
For a complete setup, use Docker to install TensorFlow Serving ?
docker pull tensorflow/serving
Preparing and Saving Your Model
Before deployment, save your trained model in the SavedModel format that TensorFlow Serving understands ?
import tensorflow as tf
from tensorflow import keras
import numpy as np
# Create a simple model for demonstration
model = keras.Sequential([
keras.layers.Dense(10, activation='relu', input_shape=(4,)),
keras.layers.Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Generate sample training data
X_train = np.random.random((100, 4))
y_train = np.random.randint(0, 3, (100,))
# Train the model
model.fit(X_train, y_train, epochs=5, verbose=0)
# Save the model in SavedModel format
model_path = "./saved_model/1"
tf.saved_model.save(model, model_path)
print(f"Model saved to {model_path}")
Epoch 1/5 Epoch 2/5 Epoch 3/5 Epoch 4/5 Epoch 5/5 Model saved to ./saved_model/1
Starting TensorFlow Serving
Start the TensorFlow Serving server using Docker ?
docker run -p 8501:8501 \ --mount type=bind,source=$(pwd)/saved_model,target=/models/my_model \ -e MODEL_NAME=my_model \ -t tensorflow/serving
Making Predictions via REST API
Once the server is running, you can make predictions using the REST API ?
import requests
import json
import numpy as np
# Prepare sample input data
input_data = np.random.random((1, 4)).tolist()
# Create the request payload
data = {
"signature_name": "serving_default",
"instances": input_data
}
# Send POST request to TensorFlow Serving
url = "http://localhost:8501/v1/models/my_model:predict"
response = requests.post(url, data=json.dumps(data))
if response.status_code == 200:
predictions = response.json()["predictions"]
print("Prediction:", predictions[0])
else:
print("Error:", response.status_code, response.text)
Making Predictions via gRPC
For better performance, use gRPC protocol ?
import grpc
import numpy as np
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import tensorflow as tf
# Create gRPC channel
channel = grpc.insecure_channel('localhost:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# Prepare request
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = 'serving_default'
# Add input data
input_data = np.random.random((1, 4)).astype(np.float32)
request.inputs['dense_input'].CopyFrom(
tf.make_tensor_proto(input_data, shape=input_data.shape))
# Get prediction
response = stub.Predict(request, 10.0) # 10 seconds timeout
output = tf.make_ndarray(response.outputs['dense_1'])
print("Prediction:", output[0])
Model Versioning
TensorFlow Serving supports model versioning by organizing models in numbered directories ?
saved_model/ ??? 1/ # Version 1 ? ??? saved_model.pb ? ??? variables/ ??? 2/ # Version 2 ? ??? saved_model.pb ? ??? variables/
Comparison of Serving Methods
| Method | Protocol | Performance | Best For |
|---|---|---|---|
| REST API | HTTP | Good | Web applications, debugging |
| gRPC | gRPC | Excellent | High-throughput applications |
| TensorFlow Lite | N/A | Mobile-optimized | Mobile and edge devices |
Monitoring and Scaling
For production deployments, consider these strategies ?
- Load Balancing: Use multiple TensorFlow Serving instances behind a load balancer
- Containerization: Deploy using Docker and Kubernetes for scalability
- Monitoring: Track metrics like latency, throughput, and error rates
- Health Checks: Implement health check endpoints for monitoring
Conclusion
TensorFlow Serving provides a robust solution for deploying machine learning models in production. Use REST API for web applications and gRPC for high-performance scenarios. Proper model versioning and monitoring ensure reliable production deployments.
