
Gen AI on AWS - EC2
Amazon EC2 (Elastic Compute Cloud) is a multipurpose computing service that provides virtual machines to run various types of workloads. AWS EC2 is an important component for training, deploying, and running those models, especially Gen AI models, that require high performance computing (HPC) resources.
AWS EC2 offers high computing power, scalability, flexibility, and cost effectiveness. These powerful features can be useful for training and deploying Generative AI.
Using AWS Elastic Inference with and EC2 Instance
AWS Elastic Inference can be used for Gen AI models to scale GPU inference without handling dedicated GPU servers and other instances.
AWS Elastic Inference allows us to attach the required amount of GPU power to EC2, AWS SageMaker, or EC2 instance.
Implementation Example
In the following example, we will use AWS Elastic Inference with an EC2 instance and a pre-trained Generative AI model like GPT or GAN.
The prerequisites for implementing this example are following −
- An Elastic Inference Accelerator (attachable to EC2).
- A pre-trained Generative AI model (e.g., GAN, GPT) that you want to use for inference.
- AWS CLI and Elastic Inference-enabled Deep Learning AMI for EC2 instances.
Now, follow the steps given below −
Step 1: Set Up Elastic Inference with EC2
When you launch an EC2 instance for inference tasks, you will need to attach an Elastic Inference Accelerator. Lets see how we can do this −
To launch an EC2 instance with Elastic Inference −
- First, go to the EC2 console and click on Launch Instance.
- Choose an Elastic Inference-enabled AMI. For example- Deep Learning AMI.
- Next, select an instance type (e.g., t2.medium). But remember not to select a GPU instance because you will attach an Elastic Inference accelerator.
- Finally, under Elastic Inference Accelerator, select an appropriate accelerator (e.g., eia2.medium, which provides moderate GPU power).
After launching an EC2 instance, attach an Elastic Inference accelerator when launching the EC2 instance to provide the required GPU power for inference.
Step 2: Install Necessary Libraries
Once your EC2 instance with Elastic Inference is attached and running, install the following Python libraries −
# Update and install pip sudo apt-get update sudo apt-get install -y python3-pip # Install torch, torchvision, and the AWS Elastic Inference Client pip3 install torch torchvision pip3 install awscli --upgrade pip3 install elastic-inference
Step 3: Load a Pre-Trained Generative AI Model (e.g., GPT)
For this example, we will use a pre-trained GPT-2 model (Generative Pre-trained Transformer) from Hugging Face.
import torch from transformers import GPT2Tokenizer, GPT2LMHeadModel # Load pre-trained GPT-2 model and tokenizer from Hugging Face tokenizer = GPT2Tokenizer.from_pretrained("gpt2") model = GPT2LMHeadModel.from_pretrained("gpt2") # Move the model to the Elastic Inference accelerator (if available) if torch.cuda.is_available(): model.to('cuda') # Set the model to evaluation mode for inference model.eval()
The model is now loaded and ready to perform inference using Elastic Inference.
Step 4: Define a Function to Run Real-Time Inference
We define a function to generate text using the GPT-2 model.
def generate_text(prompt, max_length=50): # Tokenize the input prompt inputs = tokenizer.encode(prompt, return_tensors="pt") # Move input to GPU if Elastic Inference is available if torch.cuda.is_available(): inputs = inputs.to('cuda') # Generate text using GPT-2 with torch.no_grad(): outputs = model.generate(inputs, max_length = max_length, num_return_sequences = 1) # Decode and return the generated text generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) return generated_text
Step 5: Testing the Model
Let us test the model by running inference. This function will generate text based on a prompt and return the generated text.
prompt = "In the future, artificial intelligence will" generated_text = generate_text(prompt) print("Generated Text:\n", generated_text)