
- BigQuery - Home
- BigQuery - Overview
- BigQuery - Initial Setup
- BigQuery vs Local SQL Engines
- BigQuery - Google Cloud Console
- BigQuery - Google Cloud Hierarchy
- What is Dremel?
- What is BigQuery Studio?
- BigQuery - Datasets
- BigQuery - Tables
- BigQuery - Views
- BigQuery - Create Table
- BigQuery - Basic Schema Design
- BigQuery - Alter Table
- BigQuery - Copy Table
- Delete and Recover Table
- BigQuery - Populate Table
- Standard SQL vs Legacy SQL
- BigQuery - Write First Query
- BigQuery - CRUD Operations
- Partitioning & Clustering
- BigQuery - Data Types
- BigQuery - Complex Data Types
- BigQuery - STRUCT Data Type
- BigQuery - ARRAY Data Type
- BigQuery - JSON Data Type
- BigQuery - Table Metadata
- BigQuery - User-defined Functions
- Connecting to External Sources
- Integrate Scheduled Queries
- Integrate BigQuery API
- BigQuery - Integrate Airflow
- Integrate Connected Sheets
- Integrate Data Transfers
- BigQuery - Materialized View
- BigQuery - Roles & Permissions
- BigQuery - Query Optimization
- BigQuery - BI Engine
- Monitoring Usage & Performance
- BigQuery - Data Warehouse
- Challenges & Best Practices
BigQuery - Query Optimization
BigQuery is powered by cloud computing, however it does not imply that compute power is infinite. It also does not mean that every query will run for the exact same length of time, no matter the time of day or how many processes are competing for slots.
What is Query Optimization?
Optimization is a buzzword that gets used often by those in data engineering and other programming disciplines.
With SQL, optimization comes in two forms −
- Code-based Optimization
- Platform-based Optimization
Code-based optimization is complex to conceptualize and execute. Therefore, it is beyond the scope of this tutorial. Instead, we will focus on tools within BigQuery that enable users to accurately track and actively curb over usage.
Through increased visibility and creative slot allocation, it is possible to maintain a BigQuery project with multiple users and have adequate storage and slot space for all. This is achieved through −
- Tracking usage through execution graphs and data lineage tools.
- Running queries in different modes to lessen the amount of data processed at a given time.
- Leveraging tools like BI Engine to segment repeat offenders (tables) to proactively limit the scope of data processed during high-volume traffic hours.
Batch vs. Interactive Modes
Upon writing your first few queries in the BigQuery Studio SQL environment, it may seem like all queries run in the same way. And, in a sense, you're correct. All queries utilize a certain amount of slots which is expressed in slot hours during execution. However, there are actually two different ways to run BigQuery queries to conserve both processing and costs.
The majority of BigQuery queries are executed in what is called interactive mode. In fact, this is the default execution state for BigQuery queries. And the UI doesn't make the ability to change modes visible or obvious. To view or change a query's execution mode requires navigating to query settings.
Once in that view, you can configure your next query run. In addition to selecting a query mode, developers also have the option to choose how to save a query's results, being presented with options ranging from a temporary table, a new BigQuery table or overwriting an existing table's contents. Below these options is the menu for selecting batch vs. interactive.
While interactive mode executes queries instantly, batch mode allows users to −
- Queue desired BigQuery jobs.
- Run lower-priority queries without impacting higher-priority jobs (that likely consume more resources).
Running a batch mode job helps users circumvent a BigQuery query execution restriction: Users can run, at most, 20 concurrent queries.
If a batch job is competing with an in-progress interactive job for slots, the batch job will be put on hold or "queued" until space is available. This helps both conserve resources and avoid hitting the hard rate limits imposed on interactive queries.