Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to use Boto3 library in Python to run a Glue Job?
AWS Glue is a serverless ETL service that helps you prepare and transform data. You can trigger Glue jobs programmatically using the Boto3 library in Python, which provides access to AWS services through the AWS SDK.
Prerequisites
Before running a Glue job, ensure you have ?
- AWS credentials configured (via AWS CLI, IAM roles, or environment variables)
- An existing AWS Glue job created in your AWS account
- Proper IAM permissions to execute Glue jobs
Approach to Run a Glue Job
Step 1 − Import boto3 and botocore.exceptions to handle AWS service errors.
Step 2 − Define the job name (mandatory) and arguments (optional). Some jobs require specific arguments passed as a dictionary ?
arguments = {'--argument1': 'value1', '--argument2': 'value2'}
Step 3 − Create an AWS session using boto3.session.Session(). Ensure your default profile includes the region name.
Step 4 − Create a Glue client using session.client('glue').
Step 5 − Use start_job_run() method with JobName and optional Arguments.
Step 6 − The method returns a job run ID and metadata upon successful execution.
Example
Here's a complete example to run an AWS Glue job ?
import boto3
from botocore.exceptions import ClientError
def run_glue_job(job_name, arguments={}):
"""
Starts an AWS Glue job run
Args:
job_name (str): Name of the Glue job to run
arguments (dict): Optional job arguments
Returns:
dict: Job run response with JobRunId
"""
session = boto3.session.Session()
glue_client = session.client('glue')
try:
job_run_id = glue_client.start_job_run(
JobName=job_name,
Arguments=arguments
)
return job_run_id
except ClientError as e:
raise Exception("boto3 client error in run_glue_job: " + str(e))
except Exception as e:
raise Exception("Unexpected error in run_glue_job: " + str(e))
# Run the Glue job
result = run_glue_job("run_s3_file_job")
print(result)
Output
{'JobRunId': 'jr_5f8136286322ce5b7d0387e28df6742abc6f5e6892751431692ffd717f45fc00',
'ResponseMetadata': {
'RequestId': '36c48542-a060-468b-83cc-b067a540bc3c',
'HTTPStatusCode': 200,
'HTTPHeaders': {
'date': 'Sat, 13 Feb 2021 13:36:50 GMT',
'content-type': 'application/x-amz-json-1.1',
'content-length': '82',
'connection': 'keep-alive',
'x-amzn-requestid': '36c48542-a060-468b-83cc-b067a540bc3c'
},
'RetryAttempts': 0
}
}
Running a Job with Arguments
Some Glue jobs require specific parameters. Pass them using the Arguments parameter ?
import boto3
from botocore.exceptions import ClientError
def run_glue_job_with_args():
session = boto3.session.Session()
glue_client = session.client('glue')
job_arguments = {
'--input_path': 's3://my-bucket/input/',
'--output_path': 's3://my-bucket/output/',
'--job_bookmark_option': 'job-bookmark-enable'
}
try:
response = glue_client.start_job_run(
JobName="data_transformation_job",
Arguments=job_arguments
)
print(f"Job started with ID: {response['JobRunId']}")
return response['JobRunId']
except ClientError as e:
print(f"Error starting Glue job: {e}")
return None
# Execute the function
job_id = run_glue_job_with_args()
Key Points
-
JobNameis mandatory and must match an existing Glue job -
Argumentsare optional and job-specific - The function returns a
JobRunIdfor tracking job status - Always handle
ClientErrorexceptions for AWS-specific errors
Conclusion
Using Boto3 to run AWS Glue jobs provides programmatic control over your ETL processes. The start_job_run() method returns a job ID that you can use to monitor job progress and handle job execution in your data pipelines.
