
- Apache Drill Tutorial
- Apache Drill - Home
- Apache Drill - Introduction
- Apache Drill - Fundamentals
- Apache Drill - Architecture
- Apache Drill - Installation
- Apache Drill - SQL Operations
- Apache Drill - Query using JSON
- Window Functions using JSON
- Querying Complex Data
- Data Definition Statements
- Apache Drill - Querying Data
- Querying Data using HBase
- Querying Data using Hive
- Apache Drill - Querying Parquet Files
- Apache Drill - JDBC Interface
- Apache Drill - Custom Function
- Apache Drill - Contributors
- Apache Drill Useful Resources
- Apache Drill - Quick Guide
- Apache Drill - Useful Resources
- Apache Drill - Discussion
Apache Drill - Querying Data
In this chapter, we will learn about how Apache Drill allows us to query plain text files such as CSV, TSV and PSV.
CSV File
Create a CSV file named “students.csv” as shown in the following program −
001,Adam,23 new street 002,Amit,12 old street 003,Bob,10 cross street 004,David,15 express avenue 005,Esha,20 garden street 006,Ganga,25 north street 007,Jack,2 park street 008,Leena,24 south street 009,Mary,5 west street 010,Peter,16 park avenue
After saving the file, you can return to the terminal again and type the following query to view that CSV file.
Query
0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/students.csv`;
Result
+-------------------------------------+ | columns | +-------------------------------------+ | ["001","Adam","23 new street"] | | ["002","Amit","12 old street"] | | ["003","Bob","10 cross street"] | | ["004","David","15 express avenue"] | | ["005","Esha","20 garden street"] | | ["006","Ganga","25 north street"] | | ["007","Jack","2 park street"] | | ["008","Leena","24 south street"] | | ["009","Mary","5 west street"] | | ["010","Peter","16 park avenue"] | +—————————————————————————————————————+
From this output we can conclude that, Apache Drill recognizes each row as an array of values and returns one column for each row.
Finding Columns[n]
The COLUMNS[n] syntax is used to return CSV rows in a column by the column format, where n starts from 0 to n-1.
Query
0: jdbc:drill:zk = local>select columns[0],columns[1],columns[2] from dfs.`/Users/../workspace/Drill-samples/students.csv`;
Result
+---------+---------+--------------------+ | EXPR$0 | EXPR$1 | EXPR$2 | +---------+---------+--------------------+ | 001 | Adam | 23 new street | | 002 | Amit | 12 old street | | 003 | Bob | 10 cross street | | 004 | David | 15 express avenue | | 005 | Esha | 20 garden street | | 006 | Ganga | 25 north street | | 007 | Jack | 2 park street | | 008 | Leena | 24 south street | | 009 | Mary | 5 west street | | 010 | Peter | 16 park avenue | +---------+---------+--------------------+
If you want to assign an alias name for columns, use the following query −
Query
0: jdbc:drill:zk = local>select columns[0] as ID,columns[1] as Name,columns[2] as Address from dfs.`/Users/../workspace/Drill-samples/students.csv`;
Result
+------+-------+--------------------+ | ID | Name | Address | +------+-------+--------------------+ | 001 | Adam | 23 new street | | 002 | Amit | 12 old street | | 003 | Bob | 10 cross street | | 004 | David | 15 express avenue | | 005 | Esha | 20 garden street | | 006 | Ganga | 25 north street | | 007 | Jack | 2 park street | | 008 | Leena | 24 south street | | 009 | Mary | 5 west street | | 010 | Peter | 16 park avenue | +------+-------+--------------------+
Create Table
Like in JSON, you can create table for plain text files. Following is a sample query −
0: jdbc:drill:zk = local> create table CSV as select * from dfs.`/Users/../workspace/Drill-samples/students.csv`;
Result
+----------+----------------------------+ | Fragment | Number of records written | +----------+----------------------------+ | 0_0 | 10 | +-----------+---------------------------+
To view the file contents, type the following query −
Query
0: jdbc:drill:zk = local> select * from CSV;
Result
+-------------------------------------+ | columns | +-------------------------------------+ | ["001","Adam","23 new street"] | | ["002","Amit","12 old street"] | | ["003","Bob","10 cross street"] | | ["004","David","15 express avenue"] | | ["005","Esha","20 garden street"] | | ["006","Ganga","25 north street"] | | ["007","Jack","2 park street"] | | ["008","Leena","24 south street"] | | ["009","Mary","5 west street"] | | ["010","Peter","16 park avenue"] | +-------------------------------------+
TSV File
Create a TSV file named “students.tsv” as shown in the following program −
ID Name Age Standard Remark 001 id "name is Adam" "for the age of 12" "studying sixth std" "Having good marks"
Now we can execute this TSV file in Apache Drill by using the following query −
Query
0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/student.tsv`;
Result
+-------------------------------------------------------------------------------------------------------------------+ | columns | +-------------------------------------------------------------------------------------------------------------------+ | [ "ID" , "Name" , "Age" , "Standard" , "Marks" , "Addr", "pincode"] | | ["001 id ", "name is adam", "for the age of 12", "studying sxith std\" \"Having good marks"] | +-------------------------------------------------------------------------------------------------------------------+
Create Table
As shown in the CSV file above, you can also create a table for the TSV file.
Query
0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/student.tsv`;
Result
+-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 0_0 | 2 | +-----------+----------------------------+ 1 row selected (0.347 seconds)
Query
0: jdbc:drill:zk = local> select * from TSV;
Result
+------------------------------------------------------------------------------------------------------------------+ | columns | +------------------------------------------------------------------------------------------------------------------+ | [ "ID" , "Name" , "Age" , "Standard" , "Marks" , "Addr", "pincode"] | | ["001 id ", "name is adam", "for the age of 12", "studying sxith std\" \"Having good marks"] | +------------------------------------------------------------------------------------------------------------------+
PSV (Pipe Separated Value) File
Create a psv file named “sample.tbl” as shown in the following program.
Tutorialspoint|Apache|Drill|article
Now we can execute this PSV file in Drill,
Query
0: jdbc:drill:zk = local> select * from dfs.`/Users/../workspace/Drill-samples/sample.tbl`;
Result
+------------------------------------------------+ | columns | +------------------------------------------------+ | ["Tutorialspoint","Apache","Drill","article"] | +------------------------------------------------+
Now, similar to the CSV and TSV files, try for yourself to create a table for PSV file.