Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
How to use Boto3 get the details of all the databases from AWS Glue Data Catalog?
The AWS Glue Data Catalog stores metadata for databases, tables, and partitions. Using Boto3, Python's AWS SDK, you can retrieve details of all databases in your Glue Data Catalog with the get_databases() method.
Prerequisites
Before using this code, ensure you have ?
- AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
- Appropriate IAM permissions for Glue operations
- Boto3 library installed:
pip install boto3
Basic Implementation
Here's how to retrieve all database definitions from AWS Glue Data Catalog ?
import boto3
from botocore.exceptions import ClientError
def get_all_databases():
session = boto3.session.Session()
glue_client = session.client('glue')
try:
response = glue_client.get_databases()
return response
except ClientError as e:
raise Exception("boto3 client error in get_all_databases: " + str(e))
except Exception as e:
raise Exception("Unexpected error in get_all_databases: " + str(e))
# Execute the function
result = get_all_databases()
print(result)
Sample Output
{
'DatabaseList': [
{
'Name': 'QA-test',
'CreateTime': datetime.datetime(2020, 11, 18, 14, 24, 46, tzinfo=tzlocal())
},
{
'Name': 'custdb',
'CreateTime': datetime.datetime(2020, 8, 31, 20, 30, 9, tzinfo=tzlocal())
},
{
'Name': 'default',
'Description': 'Default Hive database',
'LocationUri': 'hdfs://ip-example.ec2.internal:8020/user/hive/warehouse',
'CreateTime': datetime.datetime(2018, 5, 25, 16, 4, 54, tzinfo=tzlocal())
}
],
'NextToken': 'eyJsYXN0RXZhbHVhdGVkS2V5...',
'ResponseMetadata': {
'RequestId': 'fa0a2069-example-a0617',
'HTTPStatusCode': 200,
'RetryAttempts': 0
}
}
Enhanced Version with Pagination
For accounts with many databases, use pagination to retrieve all results ?
import boto3
from botocore.exceptions import ClientError
def get_all_databases_paginated():
session = boto3.session.Session()
glue_client = session.client('glue')
all_databases = []
next_token = None
try:
while True:
if next_token:
response = glue_client.get_databases(NextToken=next_token)
else:
response = glue_client.get_databases()
all_databases.extend(response['DatabaseList'])
if 'NextToken' not in response:
break
next_token = response['NextToken']
return {'DatabaseList': all_databases, 'Count': len(all_databases)}
except ClientError as e:
raise Exception(f"AWS Glue error: {str(e)}")
except Exception as e:
raise Exception(f"Unexpected error: {str(e)}")
# Get all databases with pagination
result = get_all_databases_paginated()
print(f"Found {result['Count']} databases")
for db in result['DatabaseList']:
print(f"- {db['Name']}: {db.get('Description', 'No description')}")
Key Response Fields
| Field | Description | Always Present? |
|---|---|---|
Name |
Database name | Yes |
Description |
Database description | No |
LocationUri |
Physical location URI | No |
CreateTime |
Creation timestamp | Yes |
Parameters |
Key-value parameters | No |
Conclusion
Use Boto3's get_databases() method to retrieve AWS Glue Data Catalog database metadata. Implement pagination for large datasets and proper error handling for production use.
Advertisements
