BigQuery - Complex Data Types



BigQuery, in addition to supporting "regular" data types like STRING, INTEGER and BOOLEAN, also provides support for so-called complex data. Often this is also referred to as nested data, since the data does not fit into a conventionally flat table and must live in a subset of a column.

Complex Data Structures Are Common

Support for nested schemas allows for more streamlined loading processes. Although Google Cloud lists various tutorials regarding the following data types as "advanced", nested data is very common.

Knowing how to flatten and work with these data types is a marketable skill for any SQL-oriented developer.

The reason these data structures are common are because source data, like a JSON output from an API, often returns data in this format −

[{data: "id": '125467", "name": "Acme Inc.", "locations": 
   {"store_no": 4, "employee_count": 15}}]
  • Notice how both "id" and "name" are on the same level of reference. These could each be accessed using "data" as a key.
  • However, to get fields like "store_no" and "employee_count", it would be necessary to not only access the data key, but also to flatten the "locations" array.

This is where BigQuery's complex data type support is helpful. Instead of a data engineer needing to write a script that would iterate through and unnest "locations", this data could be loaded into a BigQuery table as is.

Complex Data Structures in BigQuery

BigQuery supports the following three types of complex or nested data −

  • STRUCT
  • ARRAY
  • JSON

Strategies for handling these types will be explained in the next chapters.

Advertisements