Apache Drill - SQL Operations



Apache Drill is an open-source SQL-On-Everything engine. It allows SQL queries to be executed on any kind of data source, ranging from a simple CSV file to an advanced SQL and NoSQL database servers.

To execute a query in a Drill shell, open your terminal move to the Drill installed directory and then type the following command.

$ bin/drill-embedded

Then you will see the response as shown in the following program.

0: jdbc:drill:zk = local>

Now you can execute your queries. Otherwise you can run your queries through web console application to the url of http://localhost:8047. If you need any help info type the following command.

$ 0: jdbc:drill:zk = local> !help

Primitive Data Types

Apache Drill supports the following list of data types.

Sr.No Datatype & Description
1

BIGINT

8-byte signed integer in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

2

BINARY

Variable-length byte string

3

BOOLEAN

True or false

4

DATE

Years, months, and days in YYYY-MM-DD format since 4713 BC

5

DECIMAL(p,s), DEC(p,s), or NUMERIC(p,s)*

38-digit precision number, precision is p, and scale is s

6

INTEGER or INT

4-byte signed integer in the range -2,147,483,648 to 2,147,483,647

7

INTERVAL

A day-time or year-month interval

8

SMALLINT

2-byte signed integer in the range -32,768 to 32,767

9

FLOAT

4-byte floating point number

10

DOUBLE

8-byte floating point number

11

TIME

24-hour based time in hours, minutes, seconds format: HH:mm:ss

12

TIMESTAMP

JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS

13

CHARACTER VARYING, CHAR or VARCHAR

variable-length string. The default limit is 1 character. The maximum character limit is 2,147,483,647.

Let us continue with simple examples on the data types.

Date, Time and Timestamp

Apache Drill supports time functions in the range from 1971 to 2037. The processing logic of data types can be easily tested by “VALUES()” statement. The following query returns date, time and timestamp for the given values.

Query

0: jdbc:drill:zk = local> select DATE '2016-04-07',
   TIME '12:12:23',TIMESTAMP '2016-04-07 12:12:23' from (values(1));

Result

+-------------+-----------+------------------------+
|   EXPR$0    |  EXPR$1   |         EXPR$2         |
+-------------+-----------+------------------------+
| 2016-04-07  | 12:12:23  | 2016-04-07 12:12:23.0  |
+-------------+-----------+------------------------+

Interval

  • The INTERVALYEAR and INTERVALDAY internal types represent a period of time.
  • The INTERVALYEAR type specifies values from a year to a month.
  • The INTERVALDAY type specifies values from a day to seconds.

INTERVALYEAR Query

0: jdbc:drill:zk = local> select timestamp '2016-04-07 12:45:50' +
   interval '10' year(2) from (values(1));

Result

+------------------------+
|         EXPR$0         |
+------------------------+
| 2026-04-07 12:45:50.0  |
+———————————————————————-+
1 row selected (0.251 seconds)

In the above query, INTERVAL keyword followed by 10 adds 10 years to the timestamp. The 2 in parentheses in YEAR(2) specifies the precision of the year interval, 2 digits in this case to support the ten interval.

INTERVALDAY Query

0: jdbc:drill:zk = local> select timestamp '2016-04-07 12:45:52' +
   interval '1' day(1) from (values(1));

Result

+------------------------+
|         EXPR$0         |
+------------------------+
| 2016-04-08 12:45:52.0  |
+———————————————————————-+

Here INTERVAL ‘1’ indicates that two days will be added from that specified day.

Operators

The following operators are used in Apache Drill to perform the desired operations.

Sr.No Operators & Description
1

Logical Operators

AND, BETWEEN, IN , LIKE , NOT , OR

2

Comparison Operators

<, > , <= , >= , = , <> , IS NULL , IS NOT NULL , IS FALSE , IS NOT FALSE , IS TRUE , IS NOT TRUE, Pattern Matching Operator - LIKE

3

Math Operators

+,-,*,/

4

Subquery Operators

EXISTS, IN

Drill Scalar Functions

Apache Drill scalar functions supports Math and Trig functions. Most scalar functions use data types such as INTEGER, BIGINT, FLOAT and DOUBLE.

Math Functions

The following table describes the list of “Math functions” in Apache Drill.

Sr.No Function & Description
1

ABS(x)

Returns the absolute value of the input argument x.

2

CBRT(x)

Returns the cubic root of x.

3

CEIL(x)

Returns the smallest integer not less than x.

4

CEILING(x)

Same as CEIL.

5

DEGREES(x)

Converts x radians to degrees.

6

EXP(x)

Returns e (Euler's number) to the power of x.

7

FLOOR(x)

Returns the largest integer not greater than x.

8

LOG(x, y)

Returns log base x to the y power.

Now let’s run queries for the scalar functions. The Drill scalar functions can be easily tested by the values() statement, otherwise you can also use the select statement.

Trig Functions

Apache Drill supports the following trig functions and these functions’ return type is a floating point value.

Sr.No Function & Description
1

SIN(x)

Sine of angle x in radians

2

ACOS(x)

Inverse cosine of angle x in radians

3

COSH()

Hyperbolic cosine of hyperbolic angle x in radians

Data Type Conversion

In Apache Drill, you can cast or convert data to the required type for moving data from one data source to another. Drill also supports the following functions for casting and converting data types −

Function Return type Description

CAST(x AS y)

Data type of y Converts the data type of x to y

CONVERT_TO(x,y)

Data type of y Converts binary data (x) to Drill internal types (y) based on the little or big endian encoding of the data.

CONVERT_FROM(x,y)

Data type of y Converts binary data (x) from Drill internal types (y) based on the little or big endian encoding of the data.

Date - Time Functions

Apache Drill supports time functions based on the Gregorian calendar and in the range from 1971 to 2037. The following table describes the list of Date/Time functions.

Function Return Type Description

AGE(x [, y ] )

INTERVALDAY

or

INTERVALYEAR

Returns interval between two timestamps or subtracts a timestamp from midnight of the current date.

CURRENT_DATE

DATE Returns current date

CURRENT_TIME

TIME Returns current time

CURRENT_TIMESTAMP

TIMESTAMP Returns current timestamp

DATE_SUB(x,y)

DATE, TIMESTAMP Subtracts an interval (y) from a date or timestamp expression (x).

EXTRACT(x FROM y)

DOUBLE

Extracts a time unit from a date or timestamp expression (y).

This must be one of the following values: SECOND, MINUTE, HOUR, DAY, MONTH, and YEAR.

LOCALTIME

TIME Returns the local current time.

LOCALTIMESTAMP

TIMESTAMP Returns the local current timestamp.

NOW()

TIMESTAMP Returns current timestamp

TIMEOFDAY()

VARCHAR Returns current timestamp for UTC time zone.

String Manipulation Function

Apache Drill supports the following list of string functions.

Function Return type Description

BYTE_SUBSTR(x,y [, z ] )

BINARY or VARCHAR Returns in binary format a substring y of the string x.

CHAR_LENGTH(x)

INTEGER Returns the length of the alphanumeric argument x.

CONCAT(x,y)

VARCHAR Combines the two alphanumeric values x and y. Has the same effect as the || operator.

INITCAP(x)

VARCHAR Returns x in which the first character is capitalized.

LENGTH(x)

INTEGER Returns the length in bytes of the alphanumeric value x.

LOWER(x)

VARCHAR Converts all upper-case letters of x to lower-case letters.

LPAD(x,y [ , z ] )

VARCHAR

The value of x is filled in the front (the left-hand side) with the value of z until the total length of the value is equal y’s length.

If no z value then blanks are used to fill the position.

LTRIM(x)

VARCHAR Removes all blanks that appear at the beginning of x.

POSITION( x IN y)

INTEGER Returns the start position of the string x in the string y.

REGEXP_REPLACE(x,y,z)

VARCHAR Substitutes new text for substrings that match Java regular expression patterns. In the string x, y is replaced by z. Y is the regular expression.

RPAD(x,y,z)

VARCHAR The value of x is filled in the front (the right-hand side) with the value of z just until the total length of the value is equal to that of y.

RTRIM(x)

VARCHAR Removes all blanks from the end of the value of x.

STRPOS(x,y)

INTEGER Returns the start position of the string y in the string x.

SUBSTR(x,y,z)

VARCHAR Extracts characters from position 1 - x of x an optional y times.

UPPER(x)

VARCHAR Converts all lower-case letters of x to upper-case letters.

Null Handling Function

Apache Drill supports the following list of null handling functions.

Function Return type Description

COALESCE(x, y [ , y ]... )

Data type of y Returns the first non-null argument in the list of y’s.

NULLIF(x,y )

Data type of y Returns the value of the x if x and y are not equal, and returns a null value if x and y are equal.

The following table describes the list of “Math functions” in Apache Drill.

Sr.No Function & Description
1

E()

Returns 2.718281828459045.

2

LOG(x)

Returns the natural log (base e) of x.

3

LOG10(x)

Returns the common log of x.

4

LSHIFT(x, y)

Shifts the binary x by y times to the left.

5

MOD(x, y)

Returns the remainder of x divided by y.

6

NEGATIVE(x)

Returns x as a negative number.

7

PI

Returns pi.

8

POW(x, y)

Returns the value of x to the y power.

9

RADIANS

Converts x degrees to radians.

10

RAND

Returns a random number from 0-1.

11

ROUND(x)

Rounds to the nearest integer.

12

ROUND(x, y)

Rounds x to y decimal places. Return type is decimal.

13

RSHIFT(x, y)

Shifts the binary x by y times to the right.

14

SIGN(x)

Returns the sign of x.

15

SQRT(x)

Returns the square root of x.

16

TRUNC(x)

Apache Drill supports the following trig functions and these functions’ return type is a floating point value.

Sr.No. Function & Description
1

COS(x)

Cosine of angle x in radians

2

TAN(x)

Tangent of angle x in radians

3

ASIN(x)

Inverse sine of angle x in radians

4

ATAN(x)

Inverse tangent of angle x in radians

5

SINH()

Hyperbolic sine of hyperbolic angle x in radians

6

TANH()

Hyperbolic tangent of hyperbolic angle x in radians

The following table describes the list of Date/Time functions.

Function Return type Description

DATE_ADD(x,y)

DATE, TIMESTAMP

Returns the sum of the sum of a date/time and a number of days/hours, or of a date/time and date/time interval.

Where,

x - date, time or timestamp

y - integer or an interval expression.

DATE_PART(x,y)

DOUBLE

Returns a field of a date, time, timestamp, or interval.

where,

x - year, month, day, hour, minute, or second

y - date, time, timestamp, or interval literal

UNIX_TIMESTAMP ( [ x] )

BIGINT

If x is specified as timestamp then the number of seconds since the UNIX epoch and the timestamp x is returned.

If x is not specified then it returns the number of seconds since the UNIX epoch (January 1, 1970 at 00:00:00).

Advertisements