Apache Tajo - Configuration Settings



Tajos configuration is based on Hadoops configuration system. This chapter explains Tajo configuration settings in detail.

Basic Settings

Tajo uses the following two config files −

  • catalog-site.xml − configuration for the catalog server.
  • tajo-site.xml − configuration for other Tajo modules.

Distributed Mode Configuration

Distributed mode setup runs on Hadoop Distributed File System (HDFS). Lets follow the steps to configure Tajo distributed mode setup.

tajo-site.xml

This file is available @ /path/to/tajo/conf directory and acts as configuration for other Tajo modules. To access Tajo in a distributed mode, apply the following changes to tajo-site.xml.

<property> 
   <name>tajo.rootdir</name> 
   <value>hdfs://hostname:port/tajo</value> 
</property>
  
<property> 
   <name>tajo.master.umbilical-rpc.address</name> 
   <value>hostname:26001</value> 
</property> 
 
<property> 
   <name>tajo.master.client-rpc.address</name> 
   <value>hostname:26002</value> 
</property>
  
<property> 
   <name>tajo.catalog.client-rpc.address</name> 
   <value>hostname:26005</value> 
</property>   

Master Node Configuration

Tajo uses HDFS as a primary storage type. The configuration is as follows and should be added to tajo-site.xml.

<property> 
   <name>tajo.rootdir</name> 
   <value>hdfs://namenode_hostname:port/path</value> 
</property> 

Catalog Configuration

If you want to customize the catalog service, copy $path/to/Tajo/conf/catalogsite.xml.template to $path/to/Tajo/conf/catalog-site.xml and add any of the following configuration as needed.

For example, if you use Hive catalog store to access Tajo, then the configuration should be like the following −

<property> 
   <name>tajo.catalog.store.class</name> 
   <value>org.apache.tajo.catalog.store.HCatalogStore</value> 
</property> 

If you need to store MySQL catalog, then apply the following changes −

<property> 
   <name>tajo.catalog.store.class</name> 
   <value>org.apache.tajo.catalog.store.MySQLStore</value> 
</property> 

<property> 
   <name>tajo.catalog.jdbc.connection.id</name> 
   <value><mysql user name></value> 
</property>
 
<property> 
   <name>tajo.catalog.jdbc.connection.password</name> 
   <value><mysql user password></value> 
</property>
 
<property> 
   <name>tajo.catalog.jdbc.uri</name> 
   <value>jdbc:mysql://<mysql host name>:<mysql port>/<database name for tajo>
      ?createDatabaseIfNotExist = true</value> 
</property> 

Similarly, you can register the other Tajo supported catalogs in the configuration file.

Worker Configuration

By default, the TajoWorker stores temporary data on the local file system. It is defined in the tajo-site.xml file as follows −

<property> 
   <name>tajo.worker.tmpdir.locations</name> 
   <value>/disk1/tmpdir,/disk2/tmpdir,/disk3/tmpdir</value> 
</property> 

To increase the capacity of running tasks of each worker resource, choose the following configuration −

<property> 
   <name>tajo.worker.resource.cpu-cores</name> 
   <value>12</value> 
</property>
 
<property> 
   <name>tajo.task.resource.min.memory-mb</name> 
   <value>2000</value> 
</property>
  
<property> 
   <name>tajo.worker.resource.disks</name> 
   <value>4</value> 
</property> 

To make the Tajo worker run in a dedicated mode, choose the following configuration −

<property> 
   <name>tajo.worker.resource.dedicated</name> 
   <value>true</value> 
</property> 
Advertisements