Apache Oozie - Coordinator



Coordinator applications allow users to schedule complex workflows, including workflows that are scheduled regularly. Oozie Coordinator models the workflow execution triggers in the form of time, data or event predicates. The workflow job mentioned inside the Coordinator is started only after the given conditions are satisfied.

Coordinators

As done in the previous chapter for the workflow, let’s learn concepts of coordinators with an example.

The first two hive actions of the workflow in our example creates the table. We don’t need these step when we run the workflow in a coordinated manner each time with a given frequency. So let’s modify the workflow which will then be called by our coordinator.

In a real life scenario, the external table will have a flowing data and as soon as the data is loaded in the external table, the data will be processed into ORC and from the file.

Modified Workflow

<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow">
   <start to = "Insert_into_Table" />
   
   <action name = "Insert_into_Table">
      <hive xmlns = "uri:oozie:hive-action:0.4">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <script>${script_name_copy}</script>
         <param>${database}</param>
      </hive>
      <ok to = "end" />
      <error to = "kill_job" />
   </action>
   
   <kill name = "kill_job">
      <message>Job failed</message>
   </kill>
   <end name = "end" />

</workflow-app>

Now let’s write a simple coordinator to use this workflow.

<coordinator-app xmlns = "uri:oozie:coordinator:0.2" name =
   "coord_copydata_from_external_orc" frequency = "5 * * * *" start =
   "2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone = "America/Los_Angeles">
   
   <controls>
      <timeout>1</timeout>
      <concurrency>1</concurrency>
      <execution>FIFO</execution>
      <throttle>1</throttle>
   </controls>
   
   <action>
      <workflow>
         <app-path>pathof_workflow_xml/workflow.xml</app-path>
      </workflow>
   </action>
	
</coordinator-app>

Definitions of the above given code is as follows −

  • start − It means the start datetime for the job. Starting at this time the actions will be materialized.

  • end − The end datetime for the job. When actions will stop being materialized.

  • timezone − The timezone of the coordinator application.

  • frequency − The frequency, in minutes, to materialize actions.

Control Information

  • timeout − The maximum time, in minutes, that a materialized action will be waiting for the additional conditions to be satisfied before being discarded. A timeout of 0 indicates that at the time of materialization all the other conditions must be satisfied, else the action will be discarded. A timeout of 0 indicates that if all the input events are not satisfied at the time of action materialization, the action should timeout immediately. A timeout of -1 indicates no timeout, the materialized action will wait forever for the other conditions to be satisfied. The default value is -1.

  • concurrency − The maximum number of actions for this job that can be running at the same time. This value allows to materialize and submit multiple instances of the coordinator app, and allows operations to catchup on delayed processing. The default value is 1.

  • execution − Specifies the execution order if multiple instances of the coordinator job have satisfied their execution criteria. Valid values are −

    • FIFO (oldest first) default.
    • LIFO (newest first).
    • LAST_ONLY (discards all older materializations).

(Ref of definitions − http://oozie.apache.org/docs/3.2.0-incubating/CoordinatorFunctionalSpec.html#a6.3._Synchronous_Coordinator_Application_Definition)

Above coordinator will run at a given frequency i.e. every 5th minute of an hour. (Similar to a cron job).

To run this coordinator, use the following command.

  • oozie job − oozie http://host_name:8080/oozie --config edgenode_path/job1.properties -D

  • oozie.wf.application.path=hdfs − //Namenodepath/pathof_coordinator_xml/coordinator.xml -d "2 minute"` -run-d “2minute” will ensure that the coordinator starts only after 2 minutes of when the job was submitted.

The above coordinator will call the workflow which in turn will call the hive script. This script will insert the data from external table to hive the managed table.

Coordinator Job Status

Similar to the workflow, parameters can be passed to a coordinator also using the .properties file. These parameters are resolved using the configuration properties of Job configuration used to submit the coordinator job.

If a configuration property used in the definitions is not provided with the job configuration used to submit a coordinator job, the value of the parameter will be undefined and the job submission will fail.

At any time, a coordinator job is in one of the following statuses − PREP, RUNNING, PREPSUSPENDED, SUSPENDED, PREPPAUSED, PAUSED, SUCCEEDED, DONWITHERROR, KILLED, FAILED.

Valid coordinator job status transitions are −

  • PREP − PREPSUSPENDED | PREPPAUSED | RUNNING | KILLED

  • RUNNING − SUSPENDED | PAUSED | SUCCEEDED | DONWITHERROR | KILLED | FAILED

  • PREPSUSPENDED − PREP | KILLED

  • SUSPENDED − RUNNING | KILLED

  • PREPPAUSED − PREP | KILLED

  • PAUSED − SUSPENDED | RUNNING | KILLED

  • When a coordinator job is submitted, Oozie parses the coordinator job XML. Oozie then creates a record for the coordinator with status PREP and returns a unique ID. The coordinator is also started immediately if the pause time is not set.

  • When a user requests to suspend a coordinator job that is in status PREP, Oozie puts the job in the status PREPSUSPEND. Similarly, when the pause time reaches for a coordinator job with the status PREP, Oozie puts the job in the status PREPPAUSED.

  • Conversely, when a user requests to resume a PREPSUSPEND coordinator job, Oozie puts the job in status PREP. And when the pause time is reset for a coordinator job and job status is PREPPAUSED, Oozie puts the job in status PREP.

  • When a coordinator job starts, Oozie puts the job in status RUNNING and starts materializing workflow jobs based on the job frequency.

  • When a user requests to kill a coordinator job, Oozie puts the job in status KILLED and it sends kill to all submitted workflow jobs. If any coordinator action finishes with not KILLED, Oozie puts the coordinator job into DONEWITHERROR.

  • When a user requests to suspend a coordinator job that is in status RUNNING, Oozie puts the job in status SUSPEND and it suspends all the submitted workflow jobs.

  • When pause time reaches for a coordinator job that is in status RUNNING, Oozie puts the job in status PAUSED.

Conversely, when a user requests to resume a SUSPEND coordinator job, Oozie puts the job in status RUNNING. And when pause time is reset for a coordinator job and job status is PAUSED, Oozie puts the job in status RUNNING.

A coordinator job creates workflow jobs (commonly coordinator actions) only for the duration of the coordinator job and only if the coordinator job is in RUNNING status. If the coordinator job has been suspended, when resumed it will create all the coordinator actions that should have been created during the time it was suspended, actions will not be lost, they will be delayed.

When the coordinator job materialization finishes and all the workflow jobs finish, Oozie updates the coordinator status accordingly. For example, if all the workflows are SUCCEEDED, Oozie puts the coordinator job into SUCCEEDED status. However, if any workflow job finishes with not SUCCEEDED (e.g. KILLED or FAILED or TIMEOUT), then Oozie puts the coordinator job into DONEWITHERROR. If all coordinator actions are TIMEDOUT, Oozie puts the coordinator job into DONEWITHERROR.

(Reference − http://oozie.apache.org/docs/)

Parametrization of a Coordinator

The workflow parameters can be passed to a coordinator as well using the .properties file. These parameters are resolved using the configuration properties of Job configuration used to submit the coordinator job.

If a configuration property used in the definition is not provided with the job configuration used to submit a coordinator job, the value of the parameter will be undefined and the job submission will fail.

Advertisements