Article Categories

Selected Reading

How to use Boto3 library in Python to run a Glue Job?

Boto3 Python Server Side Programming Programming

AWS Glue is a serverless ETL service that helps you prepare and transform data. You can trigger Glue jobs programmatically using the Boto3 library in Python, which provides access to AWS services through the AWS SDK.

Prerequisites

Before running a Glue job, ensure you have ?

AWS credentials configured (via AWS CLI, IAM roles, or environment variables)
An existing AWS Glue job created in your AWS account
Proper IAM permissions to execute Glue jobs

Approach to Run a Glue Job

Step 1 − Import boto3 and botocore.exceptions to handle AWS service errors.

Step 2 − Define the job name (mandatory) and arguments (optional). Some jobs require specific arguments passed as a dictionary ?

arguments = {'--argument1': 'value1', '--argument2': 'value2'}

Step 3 − Create an AWS session using boto3.session.Session(). Ensure your default profile includes the region name.

Step 4 − Create a Glue client using session.client('glue').

Step 5 − Use start_job_run() method with JobName and optional Arguments.

Step 6 − The method returns a job run ID and metadata upon successful execution.

Example

Here's a complete example to run an AWS Glue job ?

import boto3
from botocore.exceptions import ClientError

def run_glue_job(job_name, arguments={}):
    """
    Starts an AWS Glue job run
    
    Args:
        job_name (str): Name of the Glue job to run
        arguments (dict): Optional job arguments
    
    Returns:
        dict: Job run response with JobRunId
    """
    session = boto3.session.Session()
    glue_client = session.client('glue')
    
    try:
        job_run_id = glue_client.start_job_run(
            JobName=job_name,
            Arguments=arguments
        )
        return job_run_id
    except ClientError as e:
        raise Exception("boto3 client error in run_glue_job: " + str(e))
    except Exception as e:
        raise Exception("Unexpected error in run_glue_job: " + str(e))

# Run the Glue job
result = run_glue_job("run_s3_file_job")
print(result)

Output

{'JobRunId': 'jr_5f8136286322ce5b7d0387e28df6742abc6f5e6892751431692ffd717f45fc00',
 'ResponseMetadata': {
     'RequestId': '36c48542-a060-468b-83cc-b067a540bc3c',
     'HTTPStatusCode': 200,
     'HTTPHeaders': {
         'date': 'Sat, 13 Feb 2021 13:36:50 GMT',
         'content-type': 'application/x-amz-json-1.1',
         'content-length': '82',
         'connection': 'keep-alive',
         'x-amzn-requestid': '36c48542-a060-468b-83cc-b067a540bc3c'
     },
     'RetryAttempts': 0
 }
}

Running a Job with Arguments

Some Glue jobs require specific parameters. Pass them using the Arguments parameter ?

import boto3
from botocore.exceptions import ClientError

def run_glue_job_with_args():
    session = boto3.session.Session()
    glue_client = session.client('glue')
    
    job_arguments = {
        '--input_path': 's3://my-bucket/input/',
        '--output_path': 's3://my-bucket/output/',
        '--job_bookmark_option': 'job-bookmark-enable'
    }
    
    try:
        response = glue_client.start_job_run(
            JobName="data_transformation_job",
            Arguments=job_arguments
        )
        print(f"Job started with ID: {response['JobRunId']}")
        return response['JobRunId']
    except ClientError as e:
        print(f"Error starting Glue job: {e}")
        return None

# Execute the function
job_id = run_glue_job_with_args()

Key Points

JobName is mandatory and must match an existing Glue job
Arguments are optional and job-specific
The function returns a JobRunId for tracking job status
Always handle ClientError exceptions for AWS-specific errors

Conclusion

Using Boto3 to run AWS Glue jobs provides programmatic control over your ETL processes. The start_job_run() method returns a job ID that you can use to monitor job progress and handle job execution in your data pipelines.

Ashish Anand

Updated on: 2026-03-25T18:12:59+05:30

3K+ Views

Previous Next