
- BigQuery - Home
- BigQuery - Overview
- BigQuery - Initial Setup
- BigQuery vs Local SQL Engines
- BigQuery - Google Cloud Console
- BigQuery - Google Cloud Hierarchy
- What is Dremel?
- What is BigQuery Studio?
- BigQuery - Datasets
- BigQuery - Tables
- BigQuery - Views
- BigQuery - Create Table
- BigQuery - Basic Schema Design
- BigQuery - Alter Table
- BigQuery - Copy Table
- Delete and Recover Table
- BigQuery - Populate Table
- Standard SQL vs Legacy SQL
- BigQuery - Write First Query
- BigQuery - CRUD Operations
- Partitioning & Clustering
- BigQuery - Data Types
- BigQuery - Complex Data Types
- BigQuery - STRUCT Data Type
- BigQuery - ARRAY Data Type
- BigQuery - JSON Data Type
- BigQuery - Table Metadata
- BigQuery - User-defined Functions
- Connecting to External Sources
- Integrate Scheduled Queries
- Integrate BigQuery API
- BigQuery - Integrate Airflow
- Integrate Connected Sheets
- Integrate Data Transfers
- BigQuery - Materialized View
- BigQuery - Roles & Permissions
- BigQuery - Query Optimization
- BigQuery - BI Engine
- Monitoring Usage & Performance
- BigQuery - Data Warehouse
- Challenges & Best Practices
BigQuery - Google Cloud Hierarchy
Before proceeding, it's important to grasp the fundamental concepts and vocabulary relating to BigQuery and its associated processes.
First, it's important to understand that even though cloud computing offers nearly limitless processing capabilities, BigQuery users will encounter issues if they are required to do the following activities −
- Performing a computationally-taxing SQL operation like a cross join or cartesian join.
- Attempting to run a large query without specifying a destination table.
- Running a large query at times of peak usage (if using BigQuery as an enterprise user).
- On-demand or "ad hoc" queries can and will create gridlock, especially if competing with scheduled processes for execution slots.
Google Cloud Hierarchy
If you anticipate creating and populating data sources within BigQuery, it is important to note the Google Cloud hierarchy −
- Organization
- Project
- Dataset
- Table
1. Organization Layer
Unless you are an account owner, executive or decision maker, you're unlikely to have to worry about the organization layer. Think of it as the entity that contains the other elements you'll encounter when navigating BigQuery Studio and writing SQL queries within the SQL environment.
2. Multiple Projects Inside Google Cloud Organization
Any Google Cloud organization can have multiple projects. Sometimes companies or enterprise users (we'll purposefully avoid the term "organization" here to avoid confusion) create different projects to separate staging and production environments.
Other times, these high-level users create distinct projects to have more control over potentially sensitive data like personally identifiable information (PII) and confidential revenue information.
In either case, when you begin using BigQuery, you'll create or receive permissions to access BigQuery as a user with specific permissions and role scopes.
3. Datasets and Tables within Project
Within the project, the most important entities to remember are datasets and tables. To clarify, a dataset contains a table or multiple tables. In order to maintain accuracy in technical discussions, do your best to avoid using these terms interchangeably.
Other elements you'll see within a dataset include −
- Routines
- Models
- Views
These additional data elements will be discussed in more depth in the following chapters.