Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to use Boto3 to paginate through the job runs of a job present in AWS Glue
In this article, we will see how to paginate through all the job runs of a job present in AWS Glue using the boto3 library. This is useful when dealing with jobs that have many runs and you need to retrieve them efficiently in smaller chunks.
Understanding Pagination Parameters
The pagination function accepts several optional parameters along with the required JobName:
max_items ? Total number of records to return. If more records are available, a
NextTokenwill be provided for continuation.page_size ? Number of records per page.
starting_token ? Token from previous response to continue pagination.
Step?by?Step Implementation
Step 1: Import Required Libraries
Import boto3 and botocore exceptions to handle AWS service interactions ?
import boto3 from botocore.exceptions import ClientError
Step 2: Create AWS Session and Client
Set up the AWS session and create a Glue client ?
session = boto3.session.Session()
glue_client = session.client('glue')
Step 3: Create Paginator Object
Use the get_paginator method to create a paginator for job runs ?
paginator = glue_client.get_paginator('get_job_runs')
Complete Example
Here's the complete implementation to paginate through job runs ?
import boto3
from botocore.exceptions import ClientError
def paginate_through_jobruns(job_name, max_items=None, page_size=None, starting_token=None):
session = boto3.session.Session()
glue_client = session.client('glue')
try:
paginator = glue_client.get_paginator('get_job_runs')
response = paginator.paginate(
JobName=job_name,
PaginationConfig={
'MaxItems': max_items,
'PageSize': page_size,
'StartingToken': starting_token
}
)
return response
except ClientError as e:
raise Exception("boto3 client error in paginate_through_jobruns: " + str(e))
except Exception as e:
raise Exception("Unexpected error in paginate_through_jobruns: " + str(e))
# Example usage
response = paginate_through_jobruns("glue_test_job", max_items=1, page_size=5)
for page in response:
print(page)
Sample Output
The function returns a paginated response containing job run details ?
{
'JobRuns': [
{
'Id': 'jr_435b66cfe451adf5fa7c7f914be3c87d199616f52bd13bdd91bb1269f02db705',
'Attempt': 0,
'JobName': 'glue_test_job',
'StartedOn': datetime.datetime(2021, 1, 25, 22, 19, 56, 52000, tzinfo=tzlocal()),
'LastModifiedOn': datetime.datetime(2021, 1, 25, 22, 21, 50, 603000, tzinfo=tzlocal()),
'CompletedOn': datetime.datetime(2021, 1, 25, 22, 21, 50, 603000, tzinfo=tzlocal()),
'JobRunState': 'SUCCEEDED',
'Arguments': {
'--additional-python-modules': 'pandas==1.1.5',
'--enable-glue-datacatalog': 'true',
'--job-bookmark-option': 'job-bookmark-disable'
},
'AllocatedCapacity': 2,
'ExecutionTime': 107,
'MaxCapacity': 2.0,
'WorkerType': 'G.1X',
'NumberOfWorkers': 2,
'GlueVersion': '2.0'
}
],
'NextToken': 'eyJleHBpcmF0aW9uIjp7InNlY29uZHMiOjE2MTc0NTQ0NDgsIm5hbm9zIjo2OTUwMDAwMDB9...',
'ResponseMetadata': {
'RequestId': '1874370e-***********-40d',
'HTTPStatusCode': 200
}
}
Key Points
Use
NextTokenfrom the response to continue pagination for subsequent requests.The response includes job run details like state, execution time, and worker configuration.
Proper error handling ensures graceful failure in case of AWS service issues.
Conclusion
Boto3 pagination helps efficiently retrieve AWS Glue job runs in manageable chunks. Use the get_paginator method with appropriate pagination config to handle large result sets without overwhelming memory usage.
