Apache Tajo - SQL Statements

In the previous chapter, you have understood how to create tables in Tajo. This chapter explains about the SQL statement in Tajo.

Create Table Statement

Before moving to create a table, create a text file “students.csv” in Tajo installation directory path as follows −

students.csv

Id	Name	Address	Age	Marks
1	Adam	23 New Street	21	90
2	Amit	12 Old Street	13	95
3	Bob	10 Cross Street	12	80
4	David	15 Express Avenue	12	85
5	Esha	20 Garden Street	13	50
6	Ganga	25 North Street	12	55
7	Jack	2 Park Street	12	60
8	Leena	24 South Street	12	70
9	Mary	5 West Street	12	75
10	Peter	16 Park Avenue	12	95

After the file has been created, move to the terminal and start the Tajo server and shell one by one.

Create Database

Create a new database using the following command −

Query

default> create database sampledb; 
OK

Connect to the database “sampledb” which is now created.

default> \c sampledb 
You are now connected to database "sampledb" as user “user1”.

Then, create a table in “sampledb” as follows −

Query

sampledb>  create external table mytable(id int,name text,address text,age int,mark int) 
   using text with('text.delimiter' = ',') location ‘file:/Users/workspace/Tajo/students.csv’;

Result

The above query will generate the following result.

OK

Here, the external table is created. Now, you just have to enter the file location. If you have to assign the table from hdfs then use hdfs instead of file.

Next, the “students.csv” file contains comma separated values. The text.delimiter field is assigned with ‘,’.

You have now created “mytable” successfully in “sampledb”.

Show Table

To show tables in Tajo, use the following query.

Query

sampledb> \d  
mytable  
sampledb> \d mytable

Result

The above query will generate the following result.

table name: sampledb.mytable 
table uri: file:/Users/workspace/Tajo/students.csv 
store type: TEXT 
number of rows: unknown 
volume: 261 B
Options: 
'timezone' = 'Asia/Kolkata' 
'text.null' = '\\N' 
'text.delimiter' = ','  
schema: 
id INT4 
name TEXT 
address TEXT 
age INT4 
mark INT4

List table

To fetch all the records in the table, type the following query −

Query

sampledb> select * from mytable;

Result

The above query will generate the following result.

Insert Table Statement

Tajo uses the following syntax to insert records in table.

Syntax

create table table1 (col1 int8, col2 text, col3 text);   
--schema should be same for target table schema  
Insert overwrite into table1 select * from table2;  
                     (or)  
Insert overwrite into LOCATION '/dir/subdir' select * from table;

Tajo’s insert statement is similar to the INSERT INTO SELECT statement of SQL.

Query

Let’s create a table to overwrite table data of an existing table.

sampledb> create table test(sno int,name text,addr text,age int,mark int);  
OK 
sampledb> \d

Result

The above query will generate the following result.

mytable 
test

Insert Records

To insert records in the “test” table, type the following query.

Query

sampledb> insert overwrite into test select * from mytable;

Result

The above query will generate the following result.

Progress: 100%, response time: 0.518 sec

Here, “mytable" records overwrite the “test” table. If you don’t want to create the “test” table, then straight away assign the physical path location as mentioned in an alternative option for insert query.

Fetch records

Use the following query to list out all the records in the “test” table −

Query

sampledb> select * from test;

Result

The above query will generate the following result.

This statement is used to add, remove or modify columns of an existing table.

To rename the table use the following syntax −

Alter table table1 RENAME TO table2;

Query

sampledb> alter table test rename to students;

Result

The above query will generate the following result.

OK

To check the changed table name, use the following query.

sampledb> \d  
mytable 
students

Now the table “test” is changed to “students” table.

Add Column

To insert new column in the “students” table, type the following syntax −

Alter table <table_name> ADD COLUMN <column_name> <data_type>

Query

sampledb> alter table students add column grade text;

Result

The above query will generate the following result.

OK

Set Property

This property is used to change the table’s property.

Query

sampledb> ALTER TABLE students SET PROPERTY 'compression.type' = 'RECORD',
   'compression.codec' = 'org.apache.hadoop.io.compress.Snappy Codec' ; 
OK

Here, compression type and codec properties are assigned.

To change the text delimiter property, use the following −

Query

ALTER TABLE students  SET PROPERTY ‘text.delimiter'=',';  
OK

Result

The above query will generate the following result.

sampledb> \d students  
table name: sampledb.students 
table uri: file:/tmp/tajo-user1/warehouse/sampledb/students 
store type: TEXT 
number of rows: 10 
volume: 228 B 
Options: 
'compression.type' = 'RECORD' 
'timezone' = 'Asia/Kolkata' 
'text.null' = '\\N' 
'compression.codec' = 'org.apache.hadoop.io.compress.SnappyCodec' 
'text.delimiter' = ','  
schema: 
id INT4 
name TEXT 
addr TEXT 
age INT4 
mark INT4 
grade TEXT

The above result shows that the table’s properties are changed using the “SET” property.

Select Statement

The SELECT statement is used to select data from a database.

The syntax for the Select statement is as follows −

SELECT [distinct [all]] * | <expression> [[AS] <alias>] [, ...] 
   [FROM <table reference> [[AS] <table alias name>] [, ...]] 
   [WHERE <condition>] 
   [GROUP BY <expression> [, ...]] 
   [HAVING <condition>] 
   [ORDER BY <expression> [ASC|DESC] [NULLS (FIRST|LAST)] [, …]]

Where Clause

The Where clause is used to filter records from the table.

Query

sampledb> select * from mytable where id > 5;

Result

The above query will generate the following result.

The query returns the records of those students whose id is greater than 5.

Query

sampledb> select * from mytable where name = ‘Peter’;

Result

The above query will generate the following result.

Progress: 100%, response time: 0.117 sec 
 
  id,  name,  address,   age  
------------------------------- 
10,  Peter,  16 park avenue , 12

The result filters Peter’s records only.

Distinct Clause

A table column may contain duplicate values. The DISTINCT keyword can be used to return only distinct (different) values.

Syntax

SELECT DISTINCT column1,column2 FROM table_name;

Query

sampledb> select distinct age from mytable;

Result

The above query will generate the following result.

Progress: 100%, response time: 0.216 sec  
age 
------------------------------- 
13 
12

The query returns the distinct age of students from mytable.

Group By Clause

The GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups.

Syntax

SELECT column1, column2 FROM table_name WHERE [ conditions ] GROUP BY column1, column2;

Query

select age,sum(mark) as sumofmarks from mytable group by age;

Result

The above query will generate the following result.

age,  sumofmarks 
------------------------------- 
13,  145 
12,  610

Here, the “mytable" column has two types of ages — 12 and 13. Now the query groups the records by age and produces the sum of marks for the corresponding ages of students.

Having Clause

The HAVING clause enables you to specify conditions that filter which group results appear in the final results. The WHERE clause places conditions on the selected columns, whereas the HAVING clause places conditions on the groups created by the GROUP BY clause.

Syntax

SELECT column1, column2 FROM table1 GROUP BY column HAVING [ conditions ]

Query

sampledb> select age from mytable group by age  having  sum(mark) > 200;

Result

The above query will generate the following result.

age 
------------------------------- 
12

The query groups the records by age and returns the age when the condition result sum(mark) > 200.

Order By Clause

The ORDER BY clause is used to sort the data in ascending or descending order, based on one or more columns. The Tajo database sorts query results in ascending order by default.

Syntax

SELECT column-list FROM table_name  
[WHERE condition]  
[ORDER BY column1, column2, .. columnN] [ASC | DESC];

Query

sampledb> select * from mytable where mark > 60 order by name desc;

Result

The above query will generate the following result.

The query returns the names of those students in descending order whose marks are greater than 60.

Create Index Statement

The CREATE INDEX statement is used to create indexes in tables. Index is used for fast retrieval of data. Current version supports index for only plain TEXT formats stored on HDFS.

Syntax

CREATE INDEX [ name ] ON table_name ( { column_name | ( expression ) }

Query

create index student_index on mytable(id);

Result

The above query will generate the following result.

id 
———————————————

To view assigned index for the column, type the following query.

default> \d mytable  
table name: default.mytable 
table uri: file:/Users/deiva/workspace/Tajo/students.csv 
store type: TEXT 
number of rows: unknown  
volume: 307 B 
Options: 
   'timezone' = 'Asia/Kolkata' 
   'text.null' = '\\N' 
   'text.delimiter' = ','  
schema: 
id INT4 
name TEXT 
address TEXT 
age INT4 
mark INT4   
Indexes:  
"student_index" TWO_LEVEL_BIN_TREE (id ASC NULLS LAST )

Here, TWO_LEVEL_BIN_TREE method is used by default in Tajo.

Drop Table Statement

The Drop Table Statement is used to drop a table from the database.

Syntax

drop table table name;

Query

sampledb> drop table mytable;

To check if the table has been dropped from the table, type the following query.

sampledb> \d mytable;

Result

The above query will generate the following result.

ERROR: relation 'mytable' does not exist

You can also check the query using “\d” command to list out the available Tajo tables.