How to use Boto3 to paginate through all jobs present in AWS Glue

In this article, we will see how to paginate through all jobs present in AWS Glue using the boto3 library in Python.

Problem Statement

Use boto3 library in Python to paginate through jobs from AWS Glue Data Catalog that is created in your account.

Understanding Pagination Parameters

Before implementing the solution, let's understand the key pagination parameters ?

  • max_items − Total number of records to return. If available records exceed this limit, a NextToken is provided for resuming pagination.

  • page_size − Number of items per page during pagination.

  • starting_token − Token from previous response to continue pagination from a specific point.

Implementation Steps

Follow these steps to paginate through AWS Glue jobs ?

  1. Import required libraries (boto3 and botocore.exceptions)

  2. Create AWS session and Glue client

  3. Create paginator object using get_jobs

  4. Configure pagination parameters and execute

  5. Handle exceptions appropriately

Example Code

Use the following code to paginate through all jobs created in your AWS account ?

import boto3
from botocore.exceptions import ClientError

def paginate_through_jobs(max_items=None, page_size=None, starting_token=None):
    session = boto3.session.Session()
    glue_client = session.client('glue')
    
    try:
        paginator = glue_client.get_paginator('get_jobs')
        
        # Configure pagination parameters
        pagination_config = {}
        if max_items:
            pagination_config['MaxItems'] = max_items
        if page_size:
            pagination_config['PageSize'] = page_size
        if starting_token:
            pagination_config['StartingToken'] = starting_token
        
        response = paginator.paginate(PaginationConfig=pagination_config)
        return response
        
    except ClientError as e:
        raise Exception("boto3 client error in paginate_through_jobs: " + str(e))
    except Exception as e:
        raise Exception("Unexpected error in paginate_through_jobs: " + str(e))

# Example usage
jobs_paginator = paginate_through_jobs(max_items=2, page_size=5)

# Iterate through pages
for page in jobs_paginator:
    print("Jobs in this page:", len(page['Jobs']))
    for job in page['Jobs']:
        print(f"Job Name: {job['Name']}")

Output

Jobs in this page: 2
Job Name: PythonShellTest1
Job Name: pythonSHELL_14012021

Key Points

  • The paginator returns a PageIterator object that can be used to iterate through pages of results.

  • Each page contains a Jobs list with job details and metadata like NextToken for continuation.

  • Always handle ClientError exceptions when working with AWS services.

  • Pagination parameters are optional − omit them to get all available jobs.

Alternative Approach: Processing All Jobs

If you need to process all jobs without pagination limits ?

import boto3
from botocore.exceptions import ClientError

def get_all_glue_jobs():
    session = boto3.session.Session()
    glue_client = session.client('glue')
    
    try:
        paginator = glue_client.get_paginator('get_jobs')
        all_jobs = []
        
        for page in paginator.paginate():
            all_jobs.extend(page['Jobs'])
        
        return all_jobs
        
    except ClientError as e:
        raise Exception("Error retrieving Glue jobs: " + str(e))

# Get all jobs
jobs = get_all_glue_jobs()
print(f"Total jobs found: {len(jobs)}")

Conclusion

Use AWS Glue pagination to efficiently retrieve jobs from your Data Catalog. The paginator approach helps manage large datasets by fetching results in controlled chunks, preventing memory issues and API throttling.

---
Updated on: 2026-03-25T18:54:30+05:30

751 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements