- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to get the table definition in a database from AWS Glue Data Catalog using Boto3
Let's see how a user can retrieve the definition of all tables or some of the tables that are matching as per regular expression in a database from AWS Glue Data Catalog.
Example
Retrieve table definition of all tables in a database 'QA-test' and table as 'security', and 'employee'.
Approach/Algorithm to solve this problem
Step 1: Import boto3 and botocore exceptions to handle the exceptions.
Step 2: database_name is the mandatory parameter while regular_expression_for_table_name is the optional parameter. If the user wants to fetch the details of all tables, then there's no need to provide the regular expression. However, if only limited tables need to be fetched, then a regular expression is required. Be careful while writing the regular expression as the result might change if it is not written properly.
Step 3: Create an AWS session using boto3 lib. Make sure region_name is mentioned in default profile. If it is not mentioned, then explicitly pass the region_name while creating the session.
Step 4: Create an AWS client for glue.
Step 5: Now use get_tables function and pass the database_name as DatabaseName and regular_expression_for_table_name as Expression parameter.
Step 6: It returns the definition of all or matched tables in a given database.
Step 7: Handle the generic exception if something went wrong while checking the job.
Example Code
The following code retrieves all the table definitions from a given database −
import boto3 from botocore.exceptions import ClientError def retrieves_tables_detail(database_name, regular_expression_for_table_name=None) session = boto3.session.Session() glue_client = session.client('glue') try: response = glue_client.get_tables(DatabaseName = database_name, Expression=regular_expression_for_table_name) return response except ClientError as e: raise Exception( "boto3 client error in retrieves_tables_detail: " + e.__str__()) except Exception as e: raise Exception( "Unexpected error in retrieves_tables_detail: " + e.__str__()) print(retrieves_tables_detail('QA-test'))
Output
{'TableList': [ {'Name': 'security', 'DatabaseName': 'QA-test', 'Owner': 'owner', 'CreateTime': datetime.datetime(2020, 9, 10, 22, 27, 24, tzinfo=tzlocal()), 'UpdateTime': datetime.datetime(2021, 3, 1, 11, 43, 49, tzinfo=tzlocal()), 'LastAccessTime': datetime.datetime(2020, 9, 10, 22, 27, 24, tzinfo=tzlocal()), 'Retention': 0, 'StorageDescriptor': {'Columns': [{'Name': 'assettypecode', 'Type': 'string'}, {'Name': 'industrysector', 'Type': 'varchar'}, {'Name': 'securitycode', 'Type': 'char'}, {'Name': 'contractsize', 'Type': 'string'}, {'Name': 'conversionperiodenddate', 'Type': 'string'}, {'Name': 'conversionperiodstartdate', 'Type': 'string'}, {'Name': 'expirationdate', 'Type': 'string'}, {'Name': 'issuercountrycode', 'Type': 'string'}, {'Name': 'issuercountrydesc', 'Type': 'string'}, {'Name': 'originalissuedate', 'Type': 'string'}, {'Name': 'securitynamelong', 'Type': 'string'}, {'Name': 'issueshortname', 'Type': 'string'}, {'Name': 'gicssector', 'Type': 'string'}, {'Name': 'maturitydate', 'Type': 'string'}, {'Name': 'optioncode', 'Type': 'string'}, {'Name': 'optiontypename', 'Type': 'string'}, {'Name': 'paramount', 'Type': 'string'}, {'Name': 'priceindex', 'Type': 'string'}, {'Name': 'countrycoderisk', 'Type': 'string'}, {'Name': 'countrydescrisk', 'Type': 'string'}, {'Name': 'countrycode', 'Type': 'string'}], 'Location': 's3://test/security/', 'InputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat', 'OutputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat', 'Compressed': False, 'NumberOfBuckets': -1, 'SerdeInfo': {'SerializationLibrary': 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe', 'Parameters': {'serialization.format': '1'}}, 'BucketColumns': [], 'SortColumns': [], 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'security', 'averageRecordSize': '181', 'classification': 'parquet', 'compressionType': 'none', 'objectCount': '5', 'recordCount': '154800', 'sizeKey': '20337230', 'typeOfData': 'file'}, 'StoredAsSubDirectories': False}, 'PartitionKeys': [], 'TableType': 'EXTERNAL_TABLE', 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'security', 'averageRecordSize': '181', 'classification': 'parquet', 'compressionType': 'none', 'objectCount': '5', 'recordCount': '154800', 'sizeKey': '20337230', 'typeOfData': 'file'}, 'CreatedBy': 'arn:aws:sts::*********:assumed-role/glue-role/AWS-Crawler'}, 'VersionId': '2'}, {'Name': 'employee', 'DatabaseName': 'QA-test', 'Owner': 'owner', 'CreateTime': datetime.datetime(2020, 9, 10, 22, 27, 24, tzinfo=tzlocal()), 'UpdateTime': datetime.datetime(2021, 3, 1, 11, 43, 49, tzinfo=tzlocal()), 'LastAccessTime': datetime.datetime(2020, 9, 10, 22, 27, 24, tzinfo=tzlocal()), 'Retention': 0, 'StorageDescriptor': {'Columns': [{'Name': 'assettypecode', 'Type': 'string'}, {'Name': 'industrysector', 'Type': 'varchar'}, {'Name': 'code', 'Type': 'char'}, {'Name': 'size', 'Type': 'string'}, {'Name': 'countrycode', 'Type': 'string'}, {'Name': 'countrydesc', 'Type': 'string'}], 'Location': 's3://test/security/', 'InputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat', 'OutputFormat': 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat', 'Compressed': False, 'NumberOfBuckets': -1, 'SerdeInfo': {'SerializationLibrary': 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe', 'Parameters': {'serialization.format': '1'}}, 'BucketColumns': [], 'SortColumns': [], 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'security', 'averageRecordSize': '181', 'classification': 'parquet', 'compressionType': 'none', 'objectCount': '5', 'recordCount': '154800', 'sizeKey': '20337230', 'typeOfData': 'file'}, 'StoredAsSubDirectories': False}, 'PartitionKeys': [], 'TableType': 'EXTERNAL_TABLE', 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'employee', 'averageRecordSize': '181', 'classification': 'parquet', 'compressionType': 'none', 'objectCount': '5', 'recordCount': '154800', 'sizeKey': '20337230', 'typeOfData': 'file'}, 'CreatedBy': 'arn:aws:sts::*********:assumed-role/glue-role/AWS-Crawler'}, 'VersionId': '2'}, 'ResponseMetadata': {'RequestId': '431db171-*******************0', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 01 Mar 2021 06:15:30 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '3916', 'connection': 'keep-alive', 'x-amzn-requestid': '431db171-*****************0'}, 'RetryAttempts': 0}}
- Related Articles
- How to use Boto3 to get the table definition of a database from AWS Glue Data Catalog?
- How to use Boto3 to get the specified version table definition of a database from AWS Glue Data Catalog?
- How to use Boto3 to get the details of a database from AWS Glue Data Catalog?
- How to get the details of a trigger from AWS Glue Data catalog using Boto3
- How to use Boto3 to delete a table from AWS Glue Data catalog?
- How to get the details of a user-defined function in a database from AWS Glue Data catalog using Boto3
- How to use Boto3 to get the details of a classifier from AWS Glue Data catalog?
- How to use Boto3 to get the details of a connection from AWS Glue Data catalog?
- How to use Boto3 to get the security configuration/encryption settings of a catalog from AWS Glue Data Catalog?
- How to use Boto3 to delete a specific version of table from AWS Glue Data catalog?
- How to use Boto3 to delete a crawler from AWS Glue Data Catalog?
- How to use Boto3 get the details of all the databases from AWS Glue Data Catalog?
- How to get the details of multiple function definitions in a database from AWS Data catalog using Boto3
- How to use Boto3 to delete a database from AWS Data Catalog?
- How to get the details of all the triggers associated with a job from AWS Glue Data catalog using Boto3
