Gen AI on AWS - EC2

Quiz

Amazon EC2 (Elastic Compute Cloud) is a multipurpose computing service that provides virtual machines to run various types of workloads. AWS EC2 is an important component for training, deploying, and running those models, especially Gen AI models, that require high performance computing (HPC) resources.

AWS EC2 offers high computing power, scalability, flexibility, and cost effectiveness. These powerful features can be useful for training and deploying Generative AI.

Using AWS Elastic Inference with and EC2 Instance

AWS Elastic Inference can be used for Gen AI models to scale GPU inference without handling dedicated GPU servers and other instances.

AWS Elastic Inference allows us to attach the required amount of GPU power to EC2, AWS SageMaker, or EC2 instance.

Implementation Example

In the following example, we will use AWS Elastic Inference with an EC2 instance and a pre-trained Generative AI model like GPT or GAN.

The prerequisites for implementing this example are following −

An Elastic Inference Accelerator (attachable to EC2).
A pre-trained Generative AI model (e.g., GAN, GPT) that you want to use for inference.
AWS CLI and Elastic Inference-enabled Deep Learning AMI for EC2 instances.

Now, follow the steps given below −

Step 1: Set Up Elastic Inference with EC2

When you launch an EC2 instance for inference tasks, you will need to attach an Elastic Inference Accelerator. Lets see how we can do this −

To launch an EC2 instance with Elastic Inference −

First, go to the EC2 console and click on Launch Instance.
Choose an Elastic Inference-enabled AMI. For example- Deep Learning AMI.
Next, select an instance type (e.g., t2.medium). But remember not to select a GPU instance because you will attach an Elastic Inference accelerator.
Finally, under Elastic Inference Accelerator, select an appropriate accelerator (e.g., eia2.medium, which provides moderate GPU power).

After launching an EC2 instance, attach an Elastic Inference accelerator when launching the EC2 instance to provide the required GPU power for inference.

Step 2: Install Necessary Libraries

Once your EC2 instance with Elastic Inference is attached and running, install the following Python libraries −

# Update and install pip
sudo apt-get update
sudo apt-get install -y python3-pip

# Install torch, torchvision, and the AWS Elastic Inference Client
pip3 install torch torchvision
pip3 install awscli --upgrade
pip3 install elastic-inference

Step 3: Load a Pre-Trained Generative AI Model (e.g., GPT)

For this example, we will use a pre-trained GPT-2 model (Generative Pre-trained Transformer) from Hugging Face.

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load pre-trained GPT-2 model and tokenizer from Hugging Face
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Move the model to the Elastic Inference accelerator (if available)
if torch.cuda.is_available():
    model.to('cuda')

# Set the model to evaluation mode for inference
model.eval()

The model is now loaded and ready to perform inference using Elastic Inference.

Step 4: Define a Function to Run Real-Time Inference

We define a function to generate text using the GPT-2 model.

def generate_text(prompt, max_length=50):
    # Tokenize the input prompt
    inputs = tokenizer.encode(prompt, return_tensors="pt")

    # Move input to GPU if Elastic Inference is available
    if torch.cuda.is_available():
        inputs = inputs.to('cuda')

    # Generate text using GPT-2
    with torch.no_grad():
        outputs = model.generate(inputs, max_length = max_length, num_return_sequences = 1)

    # Decode and return the generated text
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

Step 5: Testing the Model

Let us test the model by running inference. This function will generate text based on a prompt and return the generated text.

prompt = "In the future, artificial intelligence will"
generated_text = generate_text(prompt)
print("Generated Text:\n", generated_text)

Print Page