BigQuery - Monitoring Usage & Performance



Understanding usage and performance constraints is critical, especially for those working within an organization. Maintaining constant visibility regarding users and data sources that are consuming disproportionate amounts of slot time can help BigQuery administrators make informed decisions about limiting access to resources and spawn productive conversations among teams who run resource-intensive jobs.

Performance Dashboards

To achieve more transparent real-time monitoring, BigQuery has made performance dashboards available in the monitoring tab within the BigQuery Studio UI.

Note − Performance is charted each time a query is executed.

Monitoring Usage & Performance

Additionally, BigQuery provides an execution graph for a more visually-friendly interpretation of query performance.

Bigquery Execution Graph

On a project level, administrators can view BigQuery data within the "Monitoring" dashboard. Monitoring is available at project level and dataset level.

Project Level Monitoring

The project level shows how many queries are currently running or "in flight".

Project Level Monitoring

Dataset Level Monitoring

At a dataset level, we can see stored bytes per table.

Dataset Level Monitoring

To access and manipulate these charts, follow these steps −

  • Navigate to BigQuery
  • Select Administration
  • Click Monitoring
  • If desired: Toggle live data

Within this UI, users can also access Operational Health, which offers information in the form of two different views: A summary view and a more detailed view.

Some top line metrics within Operational Health summary tables that both administrators and users would find helpful include −

  • Slot usage
  • Shuffle (reservation re-allocation) usage
  • Concurrency (simultaneous jobs)
  • Bytes processed
  • Job duration
  • Total storage

Note − Usage units are bytes (can be converted to gigabytes or terabytes if necessary).

For a more granular view, users can query the INFORMATION_SCHEMA views using the metadata query techniques from earlier in this tutorial.

Common Errors of BigQuery

Despite being armed with education and information regarding BigQuery features, like any development process, errors are possible and, in fact, are likely inevitable. New users are particularly susceptible to BigQuery errors given their inexperience with the platform.

Types of BigQuery Errors

BigQuery errors present in two categories −

  • Code-based errors
  • Platform-based errors

Although BigQuery's SQL dialect (Google SQL) is intended to be universally understood and reminiscent of other SQL dialects, syntax errors can occur and, frankly, can be quite frustrating.

BigQuery Syntax Errors

Here is a non-exhaustive list of BigQuery syntax mistakes −

  • Using apostrophes instead of backticks.
  • Omitting either the dataset or table in a FROM clause.
  • Using UNNEST() incorrectly (on a STRUCT instead of an ARRAY).
  • Forgetting a GROUP BY clause when using an aggregate function like AVG().
  • Forgetting a comma between column names.

Platform-based Errors

Platform-based errors stem from misunderstanding BigQuery's execution constraints and may include −

  • Running more than 20 concurrent queries.
  • Not writing the results of a large query to a table resulting in a "result too large" error.
  • Not running a large query as a batch job.
  • Overwriting or truncating a table instead of appending a result.

Unfortunately, it is impossible to flag nearly every error a new BigQuery SQL developer will encounter. However, the above represent a broad variety of issues you may encounter.

Advertisements