How to use Boto3 to get the details of a single crawler?

AWS Glue crawlers automatically discover and catalog data stored in various sources like Amazon S3, databases, and data warehouses. Using Boto3, Python's AWS SDK, you can programmatically retrieve detailed information about a specific crawler.

Prerequisites

Before using this code, ensure you have ?

  • AWS credentials configured (via AWS CLI, IAM roles, or environment variables)
  • Boto3 library installed: pip install boto3
  • Appropriate IAM permissions for AWS Glue operations

Syntax

glue_client.get_crawler(Name=crawler_name)

Parameters

  • Name (string, required) − The name of the crawler to retrieve details for

Example

The following example demonstrates how to get details of a single crawler ?

import boto3
from botocore.exceptions import ClientError

def get_one_crawler_details(crawler_name: str):
    session = boto3.session.Session()
    glue_client = session.client('glue')
    try:
        crawler_details = glue_client.get_crawler(Name=crawler_name)
        return crawler_details
    except ClientError as e:
        raise Exception("boto3 client error in get_one_crawler_details: " + e.__str__())
    except Exception as e:
        raise Exception("Unexpected error in get_one_crawler_details: " + e.__str__())

# Get details for a specific crawler
print(get_one_crawler_details("crawler_for_s3_file_job"))

Output

{'Crawler': {'Name': 'crawler_for_s3_file_job', 'Role': 'glue-role',
'Targets': {'S3Targets': [{'Path': 's3://test/', 'Exclusions': []}],
'JdbcTargets': [], 'DynamoDBTargets': [], 'CatalogTargets': []},
'DatabaseName': 'default', 'Classifiers': [], 'SchemaChangePolicy':
{'UpdateBehavior': 'UPDATE_IN_DATABASE', 'DeleteBehavior':
'DEPRECATE_IN_DATABASE'}, 'State': 'READY', 'TablePrefix': 'prod_scdk_',
'CrawlElapsedTime': 0, 'CreationTime': datetime.datetime(2018, 9, 24,
20, 42, 7, tzinfo=tzlocal()), 'LastUpdated': datetime.datetime(2020, 4,
27, 14, 49, 12, tzinfo=tzlocal()), 'LastCrawl': {'Status': 'SUCCEEDED',
'LogGroup': '/aws-glue/crawlers', 'LogStream':
'crawler_for_s3_file_job', 'MessagePrefix': '************-90ad1',
'StartTime': datetime.datetime(2020, 4, 27, 14, 49, 19,
tzinfo=tzlocal())}, 'Version': 15}, 'ResponseMetadata': {'RequestId':
'8c7dcbde-***********************-774', 'HTTPStatusCode': 200,
'HTTPHeaders': {'date': 'Sun, 28 Feb 2021 11:34:32 GMT', 'content-type':
'application/x-amz-json-1.1', 'content-length': '805', 'connection':
'keep-alive', 'x-amzn-requestid': '8c7dcbde-**********************774'},
'RetryAttempts': 0}}

Key Response Fields

The response contains important crawler information ?

  • Name − The crawler's name
  • State − Current state (READY, RUNNING, STOPPING)
  • Targets − Data sources the crawler scans (S3, JDBC, DynamoDB)
  • DatabaseName − Target database in AWS Glue Data Catalog
  • LastCrawl − Information about the most recent crawl run
  • Role − IAM role used by the crawler

Error Handling

Common exceptions you might encounter ?

  • EntityNotFoundException − Crawler with the specified name doesn't exist
  • InvalidInputException − Invalid crawler name format
  • OperationTimeoutException − Request timeout

Conclusion

The get_crawler() function provides comprehensive details about AWS Glue crawlers, including their configuration, state, and execution history. This information is essential for monitoring and managing your data discovery workflows.

Updated on: 2026-03-25T18:17:29+05:30

589 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements