How to deploy model in Python using TensorFlow Serving?

Machine Learning Python Tensorflow

Deploying machine learning models plays a vital role in making AI applications functional, to serve models effectively in a production environment, TensorFlow Serving offers a reliable solution. When a model is trained and prepared for deployment, it's crucial to serve it efficiently to handle real-time requests. TensorFlow Serving is a strong tool that facilitates the smooth deployment of machine learning models in a production setting.

In this article, we will delve into the steps involved in deploying a model in Python using TensorFlow Serving.

What is Model Deployment?

Model deployment involves making a trained machine-learning model usable for real-time predictions. This means transferring the model from a development environment to a production system where it can handle incoming requests efficiently. TensorFlow Serving is a specialized high-performance system designed specifically for deploying machine learning models.

Setting Up TensorFlow Serving

To begin with, we need to install TensorFlow Serving on our system. Follow the steps below to set up TensorFlow Serving −

Step 1: Install TensorFlow Serving

Begin by installing TensorFlow Serving using the package manager pip. Open your command prompt or terminal and enter the following command −

pip install tensorflow-serving-api

Step 2: Start the TensorFlow Serving Server

After the installation, start the TensorFlow Serving server by running the following command −

tensorflow_model_server --rest_api_port=8501 --model_name=my_model --model_base_path=/path/to/model/directory

Replace `/path/to/model/directory` with the path where your trained model is stored.

Preparing the Model for Deployment

Before deploying the model, it needs to be saved in a format that TensorFlow Serving can understand. Follow these steps to prepare your model for deployment −

Saving the Model in the SavedModel Format

In the Python script, use the following code to save the trained model in the SavedModel format −

import tensorflow as tf

# Assuming `model` is your trained TensorFlow model
tf.saved_model.save(model, '/path/to/model/directory')

Defining the Model Signature

A model signature provides information about the input and output tensors of the model. Define the model signature using the `tf.saved_model.signature_def_utils.build_signature_def` function. Here's an example −

inputs = {'input': tf.saved_model.utils.build_tensor_info(model.input)}
outputs = {'output': tf.saved_model.utils.build_tensor_info(model.output)}

signature = tf.saved_model.signature_def_utils.build_signature_def(
   inputs=inputs,
   outputs=outputs,
   method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
)

Saving the Model with the Signature

To save the model along with the signature, use the following code −

builder = tf.saved_model.builder.SavedModelBuilder('/path/to/model/directory')
builder.add_meta_graph_and_variables(
   sess=tf.keras.backend.get_session(),
   tags=[tf.saved_model.tag_constants.SERVING],
   signature_def_map={
      tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature
   }
)
builder.save

()

Serving the Model with TensorFlow Serving

Now that our model is prepared, it's time to serve it using TensorFlow Serving. Follow these steps −

Establishing a Connection with TensorFlow Serving

In the Python script, establish a connection with TensorFlow Serving using the gRPC protocol. Here's an example −

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

channel = grpc.insecure_channel('localhost:8501')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

Creating a Request

To make predictions, create a request protobuf message and specify the model name and signature name. Here's an example −

request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
request.inputs['input'].CopyFrom(tf.contrib.util.make_tensor_proto(data, shape=data.shape))

Replace `data` with the input data you want to make predictions on.

Sending the Request and Getting the Response

Send the request to TensorFlow Serving and retrieve the response. Here's an example −

response = stub.Predict(request, timeout_seconds)
output = tf.contrib.util.make_ndarray(response.outputs['output'])

The `timeout_seconds` parameter specifies the maximum time to wait for a response.

Testing the Deployed Model

To ensure the deployed model is functioning correctly, it is essential to test it with sample inputs. Here's how you can test the deployed model −

Prepare Sample Data

Create a set of sample input data that matches the expected input format of the model.

Send a Request to the Deployed Model

Create request and send it to the deployed model.

request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
request.inputs['input'].CopyFrom(tf.contrib.util.make_tensor_proto(data, shape=data.shape))

Evaluate the Output

Compare the output received from the deployed model with the expected output. This step ensures that the model is making accurate predictions.

Scaling and Monitoring the Deployment

As the demand for predictions increases, it is crucial to scale the deployment to handle high volumes of incoming requests. Additionally, monitoring the deployment helps track the performance and health of the deployed model. Consider implementing the following strategies for scaling and monitoring −

Scaling

Load balancing using multiple instances of TensorFlow Serving.
Containerization using platforms like Docker and Kubernetes.

Monitoring

Collecting metrics such as request latency, error rate, and throughput.
Setting up alerts and notifications for critical events.

Example

Below is the program example that shows how to deploy a model using TensorFlow serving −

import tensorflow as tf
from tensorflow import keras

# Load the trained model
model = keras.models.load_model("/path/to/your/trained/model")

# Convert the model to the TensorFlow SavedModel format
export_path = "/path/to/exported/model"
tf.saved_model.save(model, export_path)

# Start the TensorFlow Serving server
import os
os.system("tensorflow_model_server --port=8501 --model_name=your_model --model_base_path={}".format(export_path))

In the above example, you need to replace "/path/to/your/trained/model" with the actual path to your trained model. The model will be loaded using Keras' load_model() function.

Next, the model is converted to the TensorFlow SavedModel format and saved at the specified export_path.

The TensorFlow Serving server is then started using the os.system() function, which executes the tensorflow_model_server command. This command specifies the server port, the model name (your_model), and the base path where the exported model is located.

Please ensure that you have TensorFlow Serving installed and replace the file paths with the appropriate values for your system.

Desired Output

Once the server is successfully started, it will be ready to serve predictions. You can send a prediction request to the server using another program or API, and the server will respond with the predicted output based on the loaded model.

Conclusion

In conclusion, it is important to deploy machine learning models in a production environment to take advantage of their predictive capabilities. In this article, we explored the process of deploying a model in Python using TensorFlow Serving. We discussed the installation of TensorFlow Serving, preparing the model for deployment, serving the model, and testing its performance. By following these steps, we can successfully deploy TensorFlow models and make precise real-time predictions.

Priya Mishra

Updated on: 11-Jul-2023

112 Views

Kickstart Your Career

Get certified by completing the course

Get Started