Pre-Defined Data Type in Apache Cassandra

Introduction

As an open-source, distributed NoSQL database system renowned for its ability to manage large amounts of structured data, mastering Cassandra's varying data types is crucial to fully leveraging its robust design.

This article will walk you through the critical elements like built-in and collection datatypes, equipping you with knowledge that can help you optimize your database operations with ease.

Built-In Data Types in Apache Cassandra

Apache Cassandra provides a wide range of pre-defined data types, including numeric types, text types, date types, counter type, and other custom data types.

Numeric Types

It offers various numeric data types that cater to diverse computation needs. These include tinyint, smallint, int, bigint, decimal, double and float. The tiniest integer representation is signed 8-bit integers by tinyint whereas the largest integer can be represented using bigint which stores signed 64-bit long integers.

The decimal type caters to exact fixed-point numbers while float and double are designed for floating-point numbers giving a balance between precision and range. This array of options helps in optimizing your data storage strategy based on specific use cases in orderly fashion promoting efficiency in performance.

Text Types

The text types hold and manipulate string data. They are incredibly versatile and used in various capacities within your database.

Text − This is a standard string type that holds characters encoded as UTF - 8. It accommodates a maximum number of 2 billion characters, making it suitable for storing vast amounts of textual data. Its syntax is 'text'. An example code might be −

String exampleCode = "CREATE TABLE my_table ( "+ " id UUID PRIMARY KEY, "+ " name text, "+ " description text "+ ");";

Varchar − Similar to the text data type, varchar also stores character strings with a limit of 2 billion characters. The difference lies in their internal processing by Apache Cassandra; however, from user's perspective they're identical. The syntax for using varchar is simply 'varchar'. For instance −

CREATE TABLE users ( id UUID PRIMARY KEY, name VARCHAR );

ASCII − Unlike text and varchar types, the ASCII type only holds ASCII compatible characters and has less storage capacity with maximum length up to 64k bytes (~65KB). It's useful when working with legacy systems or data requiring precise control over character encoding. Syntax is 'ascii', an example being −

CREATE TABLE users (
   id UUID PRIMARY KEY,
   name ASCII
);
INSERT INTO users (id, name) VALUES (uuid(), 'V Sharma');
SELECT * FROM users;

UUID − Universally Unique Identifier (UUID) fields store standardized 128-bit identifiers suited for globally unique identification usage cases such as session keys, transaction IDs etc. Two types exist in Apache Cassandra - `uuid` and `timeuuid`. While 'uuid' can carry any random UUID, 'timeuuid' is generally used with time-based UUID values where sorting based on time order matters. sorting based on time order matters. Example −

CREATE TABLE events (
   event_id timeuuid PRIMARY KEY,
   event_name text,
   event_time timestamp
);
INSERT INTO events (event_id, event_name, event_time) VALUES (now(), 'Event 1', toTimestamp(now()));
INSERT INTO events (event_id, event_name, event_time) VALUES (now(), 'Event 2', toTimestamp(now()));
INSERT INTO events (event_id, event_name, event_time) VALUES (now(), 'Event 3', toTimestamp(now()));
SELECT * FROM events ORDER BY event_time DESC;

Inet − This type stores IP addresses in either IPv4 or IPv6 format and can elegantly handle Internet network addressing requirements inside the columnar database system. For example −

CREATE TABLE users ( id UUID PRIMARY KEY, name text, ip_address inet );

Date Types

As part of Apache Cassandra's built-in data types, Date Types are essential for managing and manipulating date and time data in your database. The use of these types allows precise control over how dates and times are represented in your data sets, contributing to effective data modeling techniques.

Date − It represents a particular day without the time. Its syntax format is 'YYYY-MM-DD'. To set some date value, you could write −

`INSERT INTO table_name (column1) VALUES ('2022-10-14');`

Time − This type stands for the time of the day without any reference to a particular day. It uses nanoseconds since midnight and ranges from 0 to 86399999999999. For instance, −

`INSERT INTO table_name (column1) VALUES ('12:30');`

Timestamp − Often used for time-series data modeling, this pre-defined data type keeps track of when specific events occurred down to the millisecond. An example would be −

`INSERT INTO table_name (column1) VALUES ('2022-10-14 12:30');`

Duration − Measures time duration or periods using a combination of months, days, and nanoseconds intervals as components which can handle large spans of time accurately.

Counter Type

The Counter data type in Apache Cassandra allows you to perform increment and decrement operations on a counter column. Here's the syntax and code for using the Counter type −

To create a table with a Counter column, you need to specify the data type as 'counter' in the column definition. For example −

CREATE TABLE my_table ( id UUID PRIMARY KEY, counter_column COUNTER );

To increment or decrement the value of a Counter column, you can use the UPDATE statement with the '+= n' or '-= n' syntax. For example −
To increment the value of a Counter column −

UPDATE my_table SET counter_column = counter_column + 1 WHERE id = 'some_id';

To decrement the value of a Counter column −

UPDATE my_table SET counter_column = counter_column - 1 WHERE id = 'some_id';

Retrieving the value of a Counter column is done similarly to regular columns. However, keep in mind that Cassandra does not guarantee absolute precision for counter values due to its distributed nature. For example −

SELECT counter_column FROM my_table WHERE id = 'some_id';

Collection Data Types in Apache Cassandra

In Apache Cassandra, collection data types provide a way to store multiple values within a single column. The three main collection data types in Cassandra are sets, lists, and maps.

Sets are unordered collections of unique elements, while lists maintain the order of elements and allow duplicates. Maps consist of key-value pairs, where both keys and values can be of any data type supported by Cassandra.

User-Defined Data Types in Apache Cassandra

User-defined data types (UDTs) in Apache Cassandra allow you to create your own custom data structures, which can be used as column types in your database schema. This provides the flexibility to model complex and nested data relationships within a single column.

To define a UDT, you need to specify the name of the type and its fields along with their corresponding data types. For example −

cql
CREATE TYPE address (
   street text,
   city text,
   state text,
   zip int
);

In this example, we define a UDT called "address" that consists of four fields: street (text), city (text), state (text), and zip (int).

Once a UDT is defined, it can be used as a column type when creating tables. Here's an example −

cql
CREATE TABLE users (
   id UUID PRIMARY KEY,
   name text,
   email text,
   home_address frozen
   ,
);

In this table definition, we have a column named "home_address" with the UDT "address" as its type. The keyword "frozen" is used to indicate that the UDT should be treated as an atomic value.

When inserting or updating rows in this table, you would provide values for each field of the "home_address" column using dot notation. For example −

cql
INSERT INTO users (id, name, email, home_address)
VALUES
(UUID(), 'John Doe', 'john.doe@example.com',
{text: '123 Main St', city: 'New York', state: 'NY', zip: 10001});

You can also nest UDTs within other UDTs or even use them in collections like sets, lists or maps.

Overall, user-defined data types in Apache Cassandra offer great flexibility for modeling complex data structures in your database schema. By defining your own custom types, you can create more meaningful and intuitive representations of your data, making it easier to work with and query.

Conclusion

Understanding the pre-defined data types in Apache Cassandra is crucial for effective data modelling and storage. By leveraging these built-in data types such as numeric, text, date, and counter types, along with collections and user-defined types, developers can create flexible and scalable architectures that meet their specific requirements.

With Apache Cassandra's high performance and fault-tolerant features, it remains a leading choice for building distributed systems and handling large-scale data storage needs. Take advantage of the extensive querying capabilities provided by this NoSQL database to optimize your application's performance even further.

sudhir sharma

Updated on: 31-Jan-2024

Kickstart Your Career

Get certified by completing the course

Get Started