Apache Drill - Querying Data



In this chapter, we will learn about how Apache Drill allows us to query plain text files such as CSV, TSV and PSV.

CSV File

Create a CSV file named “students.csv” as shown in the following program −

001,Adam,23 new street
002,Amit,12 old street
003,Bob,10 cross street
004,David,15 express avenue
005,Esha,20 garden street
006,Ganga,25 north street
007,Jack,2 park street
008,Leena,24 south street
009,Mary,5 west street
010,Peter,16 park avenue

After saving the file, you can return to the terminal again and type the following query to view that CSV file.

Query

0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/students.csv`;

Result

+-------------------------------------+
|              columns                 |
+-------------------------------------+
| ["001","Adam","23 new street"]      |
| ["002","Amit","12 old street"]      |
| ["003","Bob","10 cross street"]     |
| ["004","David","15 express avenue"] |
| ["005","Esha","20 garden street"]   |
| ["006","Ganga","25 north street"]   |
| ["007","Jack","2 park street"]      |
| ["008","Leena","24 south street"]   |
| ["009","Mary","5 west street"]      |
| ["010","Peter","16 park avenue"]    |
+—————————————————————————————————————+

From this output we can conclude that, Apache Drill recognizes each row as an array of values and returns one column for each row.

Finding Columns[n]

The COLUMNS[n] syntax is used to return CSV rows in a column by the column format, where n starts from 0 to n-1.

Query

0: jdbc:drill:zk = local>select columns[0],columns[1],columns[2] from
   dfs.`/Users/../workspace/Drill-samples/students.csv`;

Result

+---------+---------+--------------------+
|  EXPR$0 |  EXPR$1 |       EXPR$2       |
+---------+---------+--------------------+
|   001   |  Adam   | 23 new street      |
|   002   |  Amit   | 12 old street      |
|   003   |  Bob    | 10 cross street    |
|   004   |  David  | 15 express avenue  |
|   005   |  Esha   | 20 garden street   |
|   006   |  Ganga  | 25 north street    |
|   007   |  Jack   | 2 park street      |
|   008   |  Leena  | 24 south street    |
|   009   |  Mary   | 5 west street      |
|   010   |  Peter  | 16 park avenue     |
+---------+---------+--------------------+

If you want to assign an alias name for columns, use the following query −

Query

0: jdbc:drill:zk = local>select columns[0] as ID,columns[1] as Name,columns[2] as Address from
   dfs.`/Users/../workspace/Drill-samples/students.csv`;

Result

+------+-------+--------------------+
|  ID  | Name  |      Address       |
+------+-------+--------------------+
|  001 | Adam  | 23 new street      |
|  002 | Amit  | 12 old street      |
|  003 | Bob   | 10 cross street    |
|  004 | David | 15 express avenue  |
|  005 | Esha  | 20 garden street   |
|  006 | Ganga | 25 north street    | 
|  007 | Jack  | 2 park street      |
|  008 | Leena | 24 south street    |
|  009 | Mary  | 5 west street      |
|  010 | Peter | 16 park avenue     |
+------+-------+--------------------+

Create Table

Like in JSON, you can create table for plain text files. Following is a sample query −

0: jdbc:drill:zk = local> create table CSV as select * from
   dfs.`/Users/../workspace/Drill-samples/students.csv`;

Result

+----------+----------------------------+
| Fragment | Number of records written  |
+----------+----------------------------+
|   0_0    |            10              |
+-----------+---------------------------+

To view the file contents, type the following query −

Query

0: jdbc:drill:zk = local> select * from CSV;

Result

+-------------------------------------+
|              columns                |
+-------------------------------------+
| ["001","Adam","23 new street"]      |
| ["002","Amit","12 old street"]      |
| ["003","Bob","10 cross street"]     |
| ["004","David","15 express avenue"] |
| ["005","Esha","20 garden street"]   |
| ["006","Ganga","25 north street"]   | 
| ["007","Jack","2 park street"]      |
| ["008","Leena","24 south street"]   |
| ["009","Mary","5 west street"]      |
| ["010","Peter","16 park avenue"]    |
+-------------------------------------+

TSV File

Create a TSV file named “students.tsv” as shown in the following program −

   ID         Name               Age                 Standard Remark
001  id  "name is Adam" "for the age of 12" "studying sixth std" "Having good marks"

Now we can execute this TSV file in Apache Drill by using the following query −

Query

0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/student.tsv`;

Result

+-------------------------------------------------------------------------------------------------------------------+
|                                               columns                                                             |
+-------------------------------------------------------------------------------------------------------------------+
| [  "ID"   ,     "Name"    ,         "Age"      ,  "Standard"          ,     "Marks"         , "Addr", "pincode"] |
| ["001 id ", "name is adam", "for the age of 12", "studying sxith std\" \"Having good marks"]                      |
+-------------------------------------------------------------------------------------------------------------------+

Create Table

As shown in the CSV file above, you can also create a table for the TSV file.

Query

0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/student.tsv`;

Result

+-----------+----------------------------+
| Fragment  |  Number of records written |
+-----------+----------------------------+
|    0_0    |             2              |
+-----------+----------------------------+
1 row selected (0.347 seconds)

Query

0: jdbc:drill:zk = local> select * from TSV;

Result

+------------------------------------------------------------------------------------------------------------------+
|                                        columns                                                                   |
+------------------------------------------------------------------------------------------------------------------+
| [   "ID"  ,     "Name"    ,         "Age"      ,     "Standard"       ,      "Marks"        , "Addr", "pincode"] |
| ["001 id ", "name is adam", "for the age of 12", "studying sxith std\" \"Having good marks"]                     |
+------------------------------------------------------------------------------------------------------------------+

PSV (Pipe Separated Value) File

Create a psv file named “sample.tbl” as shown in the following program.

Tutorialspoint|Apache|Drill|article

Now we can execute this PSV file in Drill,

Query

0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/sample.tbl`;

Result

+------------------------------------------------+
|                   columns                      |
+------------------------------------------------+
| ["Tutorialspoint","Apache","Drill","article"]  |
+------------------------------------------------+

Now, similar to the CSV and TSV files, try for yourself to create a table for PSV file.

Advertisements