Apache Tajo - OpenStack Swift Integration


Swift is a distributed and consistent object/blob store. Swift offers cloud storage software so that you can store and retrieve lots of data with a simple API. Tajo supports Swift integration.

The following are the prerequisites of Swift Integration −

  • Swift
  • Hadoop


Add the following changes to the hadoop “core-site.xml” file −

   <description>File system implementation for Swift</description> 

   <description>Split size in KB</description> 

This will be used for Hadoop to access the Swift objects. After you made all the changes move to the Tajo directory to set Swift environment variable.


Open the Tajo configuration file and add set the environment variable as follows −

$ vi conf/tajo-env.h  
export TAJO_CLASSPATH = $HADOOP_HOME/share/hadoop/tools/lib/hadoop-openstack-x.x.x.jar 

Now, Tajo will be able to query the data using Swift.

Create Table

Let’s create an external table to access Swift objects in Tajo as follows −

default> create external table swift(num1 int, num2 text, num3 float) 
   using text with ('text.delimiter' = '|') location 'swift://bucket-name/table1';

After the table has been created, you can run the SQL queries.