Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to use Boto3 to get the details of a job that is bookmarked in AWS Glue Data Catalog?
AWS Glue Data Catalog stores job bookmarks to track processed data and prevent reprocessing. You can use boto3 to retrieve bookmark details for any bookmarked job using the get_job_bookmark() method.
Prerequisites
Before retrieving job bookmark details, ensure:
- The job exists and has been bookmarked in AWS Glue
- You have proper AWS credentials configured
- The job name is correct (case-sensitive)
Approach
Step 1 ? Import boto3 and botocore exceptions to handle errors.
Step 2 ? Define the bookmarked job name parameter (must be an existing bookmarked job).
Step 3 ? Create an AWS session with proper region configuration.
Step 4 ? Create a Glue client using the session.
Step 5 ? Use get_job_bookmark() with the JobName parameter.
Step 6 ? Handle exceptions for non-existent jobs and other errors.
Example
Here's how to retrieve details of a bookmarked job in AWS Glue Data Catalog ?
import boto3
from botocore.exceptions import ClientError
def get_bookmarked_job_details(bookmarked_job_name):
session = boto3.session.Session()
glue_client = session.client('glue')
try:
response = glue_client.get_job_bookmark(JobName=bookmarked_job_name)
return response
except ClientError as e:
if e.response['Error']['Code'] == 'EntityNotFoundException':
print(f"Job '{bookmarked_job_name}' not found or not bookmarked")
else:
raise Exception("boto3 client error: " + str(e))
except Exception as e:
raise Exception("Unexpected error: " + str(e))
# Retrieve bookmark details for 'book-job'
result = get_bookmarked_job_details("book-job")
if result:
print(result)
Output
{
'JobBookmarkEntry': {
'JobName': 'book-job',
'Version': 8,
'Run': 2,
'Attempt': 2,
'PreviousRunId': 'jr_dee547c2f78422e34136aa12c85de010b823787833eee04fbf34bc9b8cb4f7b9',
'RunId': 'jr_a035fe15daa31e9a751f02876c26e5d11a829f2689803a9e9643bd61f70273e4',
'JobBookmark': '{"gdf":{"jsonClass":"HadoopDataSourceJobBookmarkState","timestamps":{"RUN":"1","HIGH_BAND":"900000","CURR_LATEST_PARTITION":"0"}}}'
},
'ResponseMetadata': {
'RequestId': 'bacf1497-***************996f05b3c1',
'HTTPStatusCode': 200,
'HTTPHeaders': {...},
'RetryAttempts': 0
}
}
Key Response Fields
The response contains important bookmark information:
- JobName ? Name of the bookmarked job
- Version ? Job version number
- Run ? Current run number
- RunId ? Unique identifier for the current run
- JobBookmark ? JSON string containing processing state details
Error Handling
Common exceptions include:
- EntityNotFoundException ? Job doesn't exist or isn't bookmarked
- AccessDeniedException ? Insufficient permissions
- InvalidInputException ? Invalid job name format
Conclusion
Use get_job_bookmark() to retrieve AWS Glue job bookmark details. Always handle EntityNotFoundException for non-existent or unbookmarked jobs. The response provides valuable information about job execution state and processing history.
