How to use Boto3 get the details of all the databases from AWS Glue Data Catalog?

The AWS Glue Data Catalog stores metadata for databases, tables, and partitions. Using Boto3, Python's AWS SDK, you can retrieve details of all databases in your Glue Data Catalog with the get_databases() method.

Prerequisites

Before using this code, ensure you have ?

  • AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
  • Appropriate IAM permissions for Glue operations
  • Boto3 library installed: pip install boto3

Basic Implementation

Here's how to retrieve all database definitions from AWS Glue Data Catalog ?

import boto3
from botocore.exceptions import ClientError

def get_all_databases():
    session = boto3.session.Session()
    glue_client = session.client('glue')
    try:
        response = glue_client.get_databases()
        return response
    except ClientError as e:
        raise Exception("boto3 client error in get_all_databases: " + str(e))
    except Exception as e:
        raise Exception("Unexpected error in get_all_databases: " + str(e))

# Execute the function
result = get_all_databases()
print(result)

Sample Output

{
    'DatabaseList': [
        {
            'Name': 'QA-test', 
            'CreateTime': datetime.datetime(2020, 11, 18, 14, 24, 46, tzinfo=tzlocal())
        },
        {
            'Name': 'custdb', 
            'CreateTime': datetime.datetime(2020, 8, 31, 20, 30, 9, tzinfo=tzlocal())
        },
        {
            'Name': 'default', 
            'Description': 'Default Hive database',
            'LocationUri': 'hdfs://ip-example.ec2.internal:8020/user/hive/warehouse', 
            'CreateTime': datetime.datetime(2018, 5, 25, 16, 4, 54, tzinfo=tzlocal())
        }
    ],
    'NextToken': 'eyJsYXN0RXZhbHVhdGVkS2V5...',
    'ResponseMetadata': {
        'RequestId': 'fa0a2069-example-a0617',
        'HTTPStatusCode': 200,
        'RetryAttempts': 0
    }
}

Enhanced Version with Pagination

For accounts with many databases, use pagination to retrieve all results ?

import boto3
from botocore.exceptions import ClientError

def get_all_databases_paginated():
    session = boto3.session.Session()
    glue_client = session.client('glue')
    
    all_databases = []
    next_token = None
    
    try:
        while True:
            if next_token:
                response = glue_client.get_databases(NextToken=next_token)
            else:
                response = glue_client.get_databases()
            
            all_databases.extend(response['DatabaseList'])
            
            if 'NextToken' not in response:
                break
            next_token = response['NextToken']
        
        return {'DatabaseList': all_databases, 'Count': len(all_databases)}
        
    except ClientError as e:
        raise Exception(f"AWS Glue error: {str(e)}")
    except Exception as e:
        raise Exception(f"Unexpected error: {str(e)}")

# Get all databases with pagination
result = get_all_databases_paginated()
print(f"Found {result['Count']} databases")
for db in result['DatabaseList']:
    print(f"- {db['Name']}: {db.get('Description', 'No description')}")

Key Response Fields

Field Description Always Present?
Name Database name Yes
Description Database description No
LocationUri Physical location URI No
CreateTime Creation timestamp Yes
Parameters Key-value parameters No

Conclusion

Use Boto3's get_databases() method to retrieve AWS Glue Data Catalog database metadata. Implement pagination for large datasets and proper error handling for production use.

Updated on: 2026-03-25T18:18:32+05:30

805 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements