
- Apache Tajo Tutorial
- Apache Tajo - Home
- Apache Tajo - Introduction
- Apache Tajo - Architecture
- Apache Tajo - Installation
- Apache Tajo - Configuration Settings
- Apache Tajo - Shell Commands
- Apache Tajo - Data Types
- Apache Tajo - Operators
- Apache Tajo - SQL Functions
- Apache Tajo - Math Functions
- Apache Tajo - String Functions
- Apache Tajo - DateTime Functions
- Apache Tajo - JSON Functions
- Apache Tajo - Database Creation
- Apache Tajo - Table Management
- Apache Tajo - SQL Statements
- Aggregate & Window Functions
- Apache Tajo - SQL Queries
- Apache Tajo - Storage Plugins
- Integration with HBase
- Apache Tajo - Integration with Hive
- OpenStack Swift Integration
- Apache Tajo - JDBC Interface
- Apache Tajo - Custom Functions
- Apache Tajo Useful Resources
- Apache Tajo - Quick Guide
- Apache Tajo - Useful Resources
- Apache Tajo - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Apache Tajo - Configuration Settings
Tajo’s configuration is based on Hadoop’s configuration system. This chapter explains Tajo configuration settings in detail.
Basic Settings
Tajo uses the following two config files −
- catalog-site.xml − configuration for the catalog server.
- tajo-site.xml − configuration for other Tajo modules.
Distributed Mode Configuration
Distributed mode setup runs on Hadoop Distributed File System (HDFS). Let’s follow the steps to configure Tajo distributed mode setup.
tajo-site.xml
This file is available @ /path/to/tajo/conf directory and acts as configuration for other Tajo modules. To access Tajo in a distributed mode, apply the following changes to “tajo-site.xml”.
<property> <name>tajo.rootdir</name> <value>hdfs://hostname:port/tajo</value> </property> <property> <name>tajo.master.umbilical-rpc.address</name> <value>hostname:26001</value> </property> <property> <name>tajo.master.client-rpc.address</name> <value>hostname:26002</value> </property> <property> <name>tajo.catalog.client-rpc.address</name> <value>hostname:26005</value> </property>
Master Node Configuration
Tajo uses HDFS as a primary storage type. The configuration is as follows and should be added to “tajo-site.xml”.
<property> <name>tajo.rootdir</name> <value>hdfs://namenode_hostname:port/path</value> </property>
Catalog Configuration
If you want to customize the catalog service, copy $path/to/Tajo/conf/catalogsite.xml.template to $path/to/Tajo/conf/catalog-site.xml and add any of the following configuration as needed.
For example, if you use “Hive catalog store” to access Tajo, then the configuration should be like the following −
<property> <name>tajo.catalog.store.class</name> <value>org.apache.tajo.catalog.store.HCatalogStore</value> </property>
If you need to store MySQL catalog, then apply the following changes −
<property> <name>tajo.catalog.store.class</name> <value>org.apache.tajo.catalog.store.MySQLStore</value> </property> <property> <name>tajo.catalog.jdbc.connection.id</name> <value><mysql user name></value> </property> <property> <name>tajo.catalog.jdbc.connection.password</name> <value><mysql user password></value> </property> <property> <name>tajo.catalog.jdbc.uri</name> <value>jdbc:mysql://<mysql host name>:<mysql port>/<database name for tajo> ?createDatabaseIfNotExist = true</value> </property>
Similarly, you can register the other Tajo supported catalogs in the configuration file.
Worker Configuration
By default, the TajoWorker stores temporary data on the local file system. It is defined in the “tajo-site.xml” file as follows −
<property> <name>tajo.worker.tmpdir.locations</name> <value>/disk1/tmpdir,/disk2/tmpdir,/disk3/tmpdir</value> </property>
To increase the capacity of running tasks of each worker resource, choose the following configuration −
<property> <name>tajo.worker.resource.cpu-cores</name> <value>12</value> </property> <property> <name>tajo.task.resource.min.memory-mb</name> <value>2000</value> </property> <property> <name>tajo.worker.resource.disks</name> <value>4</value> </property>
To make the Tajo worker run in a dedicated mode, choose the following configuration −
<property> <name>tajo.worker.resource.dedicated</name> <value>true</value> </property>