- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to use Boto3 to paginate through all jobs present in AWS Glue
In this article, we will see how to paginate through all jobs present in AWS Glue.
Example
Problem Statement: Use boto3 library in Python to paginate through jobs from AWS Glue Data Catalog that is created in your account
Approach/Algorithm to solve this problem
Step 1: Import boto3 and botocore exceptions to handle exceptions.
Step 2: max_items, page_size and starting_token are the optional parameters for this function.
max_items denote the total number of records to return. If the number of available records > max_items then a NextToken will be provided in the response to resume pagination.
page_size denotes the size of each page.
starting_token helps to paginate, and it uses NextToken from a previous response.
Step 3: Create an AWS session using boto3 lib. Make sure region_name is mentioned in the default profile. If it is not mentioned, then explicitly pass the region_name while creating the session.
Step 4: Create an AWS client for glue.
Step 5: Create a paginator object that contains the details of all crawlers using get_jobs
Step 5: Call the paginate function and pass the max_items, page_size and starting_token as PaginationConfig parameter
Step 6: It returns the number of records based on max_size and page_size.
Step 7: Handle the generic exception if something went wrong while paginating.
Example Code
Use the following code to paginate through all jobs created in user account −
import boto3 from botocore.exceptions import ClientError def paginate_through_jobs(max_items=None:int,page_size=None:int, starting_token=None:string): session = boto3.session.Session() glue_client = session.client('glue') try: paginator = glue_client.get_paginator('get_jobs') response = paginator.paginate(PaginationConfig={ 'MaxItems':max_items, 'PageSize':page_size, 'StartingToken':starting_token} ) return response except ClientError as e: raise Exception("boto3 client error in paginate_through_jobs: " + e.__str__()) except Exception as e: raise Exception("Unexpected error in paginate_through_jobs: " + e.__str__()) a = paginate_through_jobs(2,5) print(*a)
Output
{'Jobs': [{'Name': 'PythonShellTest1', 'Role': 'arn:aws:iam::7***********:role/dev-edl-glue-role', 'CreatedOn': datetime.datetime(2021, 1, 6, 19, 59, 19, 387000, tzinfo=tzlocal()), 'LastModifiedOn': datetime.datetime(2021, 2, 9, 21, 47, 31, 614000, tzinfo=tzlocal()), 'ExecutionProperty': {'MaxConcurrentRuns': 1}, 'Command': {'Name': 'pythonshell', 'ScriptLocation': s3://pythonShellTest/test1/*', 'PythonVersion': '3'}, 'DefaultArguments': {'--job-bookmark-option': 'job-bookmark-disable', '--job-language': 'python'}, 'MaxRetries': 0, 'AllocatedCapacity': 0, 'Timeout': 2880, 'MaxCapacity': 0.0625, 'GlueVersion': '1.0'}, {'Name': 'pythonSHELL_14012021', 'Role': 'arn:aws:iam::7*************:role/dev-edl-glue-role', 'CreatedOn': datetime.datetime(2021, 1, 14, 20, 22, 40, 965000, tzinfo=tzlocal()), 'LastModifiedOn': datetime.datetime(2021, 1, 14, 20, 22, 40, 965000, tzinfo=tzlocal()), 'ExecutionProperty': {'MaxConcurrentRuns': 1}, 'Command': {'Name': 'pythonshell', 'DefaultArguments': {'--job-bookmark-option': 'job-bookmark-disable'}, 'MaxRetries': 0, 'AllocatedCapacity': 0, 'Timeout': 2880, 'MaxCapacity': 0.0625, 'GlueVersion': '1.0'}], 'NextToken': 'eyJleHBpcmF0aW9uIjp7InNlY29uZHMiOjE2MTc0NTUzOTYsIm5hbm9zIjo1MjUwMDAwMDB9LCJsYXN0RXZhbHVhdGVkS2V5Ijp7ImpvYk5hbWUiOnsicyI6IlRpY2tkYXRhLXBlcmZvcm1hbmNldGVzdC1qZXR0ZWxhIn0sImFjY291bnRJZCI6eyJzIjoiNzgyMjU4NDg1ODQxIn0sImpvYklkIjp7InMiOiJqXzkyZGQ5ZDNhMWRkOGY2NTJkYzA4MzNmMTM0ZTRiNDRhNmE0YzEzNWY0ZTYwZTkwNmYyOTBhY2NiZDZiMWIxZTcifX19', 'ResponseMetadata': {'RequestId': '3be6708e-*************-389', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Fri, 02 Apr 2021 13:09:56 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '3182', 'connection': 'keep-alive', 'x-amzn-requestid': '3be6708e-*************-8389'}, 'RetryAttempts': 0}}