Apache Oozie - Property File



Oozie workflows can be parameterized. The parameters come from a configuration file called as property file. We can run multiple jobs using same workflow by using multiple .property files (one property for each job).

Suppose we want to change the jobtracker url or change the script name or value of a param.

We can specify a config file (.property) and pass it while running the workflow.

Property File

Variables like ${nameNode} can be passed within the workflow definition. The value of this variable will be replaced at the run time with the value defined in the ‘.properties’ file.

Following is an example of a property file we will use in our workflow example.

File name -- job1.properties

# proprties
nameNode = hdfs://rootname
jobTracker = xyz.com:8088
script_name_external = hdfs_path_of_script/external.hive
script_name_orc=hdfs_path_of_script/orc.hive
script_name_copy=hdfs_path_of_script/Copydata.hive
database = database_name

Now to use this property file we will have to update the workflow and pass the parameters in a workflow as shown in the following program.

<!-- This is a comment -->
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow">
   <start to = "Create_External_Table" />
   <action name = "Create_External_Table">
      <hive xmlns = "uri:oozie:hive-action:0.4">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <script>${script_name_external}</script>
      </hive>
      <ok to = "Create_orc_Table" />
      <error to = "kill_job" />
   </action>
   
   <action name = "Create_orc_Table">
      <hive xmlns = "uri:oozie:hive-action:0.4">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <script>${script_name_orc}</script>
      </hive>
      <ok to = "Insert_into_Table" />
      <error to = "kill_job" />
   </action>
   
   <action name = "Insert_into_Table">
      <hive xmlns = "uri:oozie:hive-action:0.4">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <script>${script_name_copy}</script>
         <param>${database}</param>
      </hive>
      <ok to = "end" />
      <error to = "kill_job" />
   </action>
   
   <kill name = "kill_job">
      <message>Job failed</message>
   </kill>
   <end name = "end" />
</workflow-app>

Now to use the property file in this workflow we will have to pass the –config while running the workflow.

oozie job --oozie http://host_name:8080/oozie 
   --config edgenode_path/job1.properties -D oozie.wf.application.path
      hdfs://Namenodepath/pathof_workflow_xml/workflow.xml –run

Note − The property file should be on the edge node (not in HDFS), whereas the workflow and hive scripts will be in HDFS.

At run time, all the parameters in ${} will be replaced by its corresponding value in the .properties file.

Also a single property file can have more parameters than required in a single workflow and no error will be thrown. This makes it possible to run more than one workflow by using the same properties file. But if the property file does not have a parameter required by a workflow then an error will occur.

Advertisements