Amazon SageMaker - Training ML Models



You can easily train machine learning models by using Amazon SageMakers fully managed training service.

To train a ML model, you can either use Amazon SageMaker's built-in algorithms or use our own model. In both the cases, Amazon SageMaker allows you to run training jobs efficiently.

How to Train Models Using Amazon SageMaker?

Lets understand how you can train models using Amazon SageMaker with the help of below given Python program −

Step 1: Prepare Your Data

First, prepare your data and store it in Amazon S3 in CSV format or any other suitable format. Amazon SageMaker reads data from S3 for training jobs.

Step 2: Define the Estimator

Now, you need to define the estimator. You can use the Estimator object to configure the training job. For this example, we'll train a model using the built-in XGBoost algorithm as follows −

import SageMaker
from SageMaker import get_execution_role
from SageMaker.inputs import TrainingInput

# Define your Amazon SageMaker session and role
session = SageMaker.Session()
role = get_execution_role()

# Define the XGBoost estimator
xgboost = SageMaker.estimator.Estimator(
    image_uri=SageMaker.image_uris.retrieve("xgboost", session.boto_region_name),
    role=role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path=f"s3://your-bucket/output",
    SageMaker_session=session,
)

# Set hyperparameters
xgboost.set_hyperparameters(objective="binary:logistic", num_round=100)

Step 3: Specify Training Data

We need to specify the training data for further processing. You can use the TrainingInput class to specify the location of your data in S3 as follows −

# Specify training data in S3
train_input = TrainingInput
   (s3_data="s3://your-bucket/train", content_type="csv")
validation_input = TrainingInput
   (s3_data="s3://your-bucket/validation", content_type="csv")

Step 4: Train the Model

Finally, start the training job by calling the fit method as follows −

# Train the model
xgboost.fit({"train": train_input, "validation": validation_input})

Once trained, Amazon SageMaker will automatically provision resources, run the training job, and save the model output to the specified S3 location.

Distributed Training with Amazon SageMaker

Amazon SageMaker supports distributed training which enables you to scale training across multiple instances. This is useful in case when you are dealing with large datasets or deep learning models. Amazon SageMaker provides frameworks like TensorFlow and PyTorch that support distributed training.

To enable distributed training, you can increase the instance_count parameter in the Estimator object.

Example

Given below is an example using TensorFlow −

from SageMaker.tensorflow import TensorFlow

# Define the TensorFlow estimator with distributed training
tensorflow_estimator = TensorFlow(
    entry_point="train.py",
    role=role,
    instance_count=2,
    instance_type="ml.p3.2xlarge",
    framework_version="2.3",
    py_version="py37",
)

# Train the model on multiple instances
tensorflow_estimator.fit({"train": train_input, "validation": validation_input})

In this example, Amazon SageMaker uses two ml.p3.2xlarge instances for distributed training. It will reduce the training time for large models.

Advertisements