BigQuery - Data Types



Understanding BigQuery's usage and interpretation of data types is essential when it comes to loading and querying data. As demonstrated in the schema chapter, every column loaded into BigQuery must have a defined, acceptable type.

BigQuery accepts many data types that other SQL dialects use, and also offers data types unique to BigQuery.

Common Data Types Across SQL Dialects

Data types that are common across SQL dialects include −

  • STRING
  • INTEGER
  • FLOAT
  • BOOLEAN

BigQuery also facilitates special data types, like JSON arrays, which will be discussed in a later chapter.

Note − If a schema is not provided or a type is not specified during a load process, BigQuery will infer the type. Though, for integrations that rely on unpredictable upstream data, this is not necessarily a positive outcome.

Like partitioning and clustering columns, types must be specified at the time of table creation.

Alter Types Using the CAST() Function

It is possible to alter types (temporarily or permanently) within a query. To do this, use the CAST() function. CAST uses the following syntax −

CAST(column_name AS column_type)

For Instance −

CAST(id AS STRING)

Like specifying the wrong type, trying to force CAST an incompatible type can be frustrating and lead to infrastructure breaks.

For a more proven way to convert types, use SAFE_CAST(), which will return NULL for incompatible rows, instead of breaking entirely −

SAFE_CAST(id AS STRING)

Developing a solid understanding of how BigQuery will interpret a given input's type is essential to creating robust SQL queries.

BigQuery Data Types STRING

Among the most common data types SQL developers work with, STRING types in BigQuery are typically easy to identify. However, there are occasional quirks regarding a string's manipulation or interpretation.

Generally, a string type consists of alphanumeric characters. While a string type can include integer-like digits and symbols, if STRING is the specified type, this information will be stored like a conventional string.

One tricky situation new developers encounter is working with a row that contains an INTEGER or FLOAT value with a symbol, like in the case of currency.

While it might be assumed $5.00 is a FLOAT because of the decimal point, the U.S. dollar sign makes it a string. Therefore, when loading a row with a dollar sign, BigQuery will expect this to be defined in your schema as a STRING type.

DATE Data Type

A subset of the STRING type is the DATE data type.

  • Even though BigQuery has its own designation for date values, the date values themselves are represented as strings by default.
  • For those who work with Pandas in Python, this is similar to how date fields are represented as objects within a data frame.
  • There are a variety of functions dedicated solely to parsing STRING data.

STRING Functions

Notable STRING functions include −

  • LOWER() − Converts everything within a string to lowercase
  • UPPER() − The inverse of lower; converts values to uppercase
  • INITCAP() − Capitalizes only the first letter of each sentence; i.e. sentence case
  • CONCAT() − Combines string elements

Importantly, data types in BigQuery default to STRING. In addition to conventional string elements, this also includes lists for which a developer does not specify a REPEATED mode.

INTEGER & FLOAT

Chances are, if you're working with BigQuery on an enterprise (business) scale, you'll work with data involving numbers. This could be anything from attendance data to operating revenue.

In any case, a STRING data type does not make sense for these use cases. This is not just because such numbers don't have a currency symbol like a dollar sign or euro accompanying them, but because, to generate useful insights, it requires utilizing functions specific to numeric data.

Note The distinguishing mark between an INTEGER and FLOAT is simple: "."

In many SQL dialects, when it comes to specifying numeric values, developers are required to tell a SQL engine how many digits to expect. This is where conventional SQL gets designations like BIGINT.

BigQuery encodes FLOAT types as 64-bit entities. This is why, when you CAST() a column to a FLOAT type, you do so like this −

CAST(column_name AS FLOAT64) 

Notable INTEGER and FLOAT Functions

Other notable INTEGER and FLOAT functions include −

  • ROUND()
  • AVG()
  • MAX()
  • MIN()

Points to Note

An important caveat of the FLOAT type is that in addition to specifying period separated digits, the FLOAT type is also the default type for NULL.

If using SAFE_CAST(), it may be a good idea to include additional logic to convert any returned NULL values from FLOAT to a desired type.

Advertisements