BigQuery - Table Metadata



While being able to analyze and understand the scope and content of organizational data is important, it is also essential for SQL developers to understand aspects of performance and storage cost using BigQuery.

This is when querying BigQuery table metadata can be useful for developers and invaluable for organizations seeking to leverage the SQL engine to its fullest extent.

For those who have not worked extensively with metadata: Metadata is, as the name implies, data about data. Typically this relates to statistics surrounding things like resource performance or monitoring.

BigQuery offers several metadata stores that users can query to better understand how their project consumes resources, some of which include the following −

  • INFORMATION_SCHEMA
  • __TABLES__ view
  • BigQuery audit logs

Each of these tables can be queried in the same way as stored data.

For both the INFORMATION_SCHEMA and __TABLES__ it is helpful to note there is a different syntax regarding the table reference.

Instead of following the typical: project.dataset.table notation, both INFORMATION_SCHEMA and __TABLES__ have elements that are referenced following the closing backtick.

INFORMATION_SCHEMA is a data source that has several offshoot resources like COLUMNS or JOBS_BY_PROJECT

For instance, referencing the INFORMATION_SCHEMA looks like this −

Project Dataset Table

It will fetch the following output

Project Dataset Table

TABLES View

The TABLES view provides information at a table-level, such as table creation time and the user who last accessed the table. It is important to note that the TABLES view is accessible at the dataset level.

Bigquery TABLES View

It will fetch the following output

Bigquery TABLES View Output

Creating a Schema Based on an Existing Table

One helpful use case for INFORMATION_SCHEMA.COLUMNS is the ability to create a schema based on an existing table using this query −

Creating a Schema Based on an Existing Table
Advertisements