Apache Oozie - CLI and Extensions



By this time, you have a good understanding of Oozie workflows, coordinators and bundles. In the last part of this tutorial, let’s touch base some of the other important concepts in Oozie.

Command Line Tools

We have seen a few commands earlier to run the jobs of workflow, coordinator and bundle. Oozie provides a command line utility, Oozie, to perform job and admin tasks.

oozie version : show client version

Following are some of the other job operations −

oozie job <OPTIONS> :
-action <arg> coordinator rerun on action ids (requires -rerun); coordinator log
   retrieval on action ids (requires -log)
-auth <arg> select authentication type [SIMPLE|KERBEROS]
-change <arg> change a coordinator/bundle job
-config <arg> job configuration file '.xml' or '.properties'
-D <property = value> set/override value for given property
-date <arg> coordinator/bundle rerun on action dates (requires -rerun)
-definition <arg> job definition
-doas <arg> doAs user, impersonates as the specified user
-dryrun Supported in Oozie-2.0 or later versions ONLY - dryrun or test run a
   coordinator job, job is not queued
-info <arg> info of a job
-kill <arg> kill a job
-len <arg> number of actions (default TOTAL ACTIONS, requires -info)
-localtime use local time (default GMT)
-log <arg> job log
-nocleanup do not clean up output-events of the coordinator rerun actions (requires
   -rerun)
-offset <arg> job info offset of actions (default '1', requires -info)
-oozie <arg> Oozie URL
-refresh re-materialize the coordinator rerun actions (requires -rerun)
-rerun <arg> rerun a job (coordinator requires -action or -date; bundle requires 
   -coordinator or -date)
-resume <arg> resume a job
-run run a job
-start <arg> start a job
-submit submit a job
-suspend <arg> suspend a job
-value <arg> new endtime/concurrency/pausetime value for changing a coordinator  
    job;new pausetime value for changing a bundle job
-verbose verbose mode

To check the status of the job, following commands can be used.

-auth <arg> select authentication type [SIMPLE|KERBEROS]
-doas <arg> doAs user, impersonates as the specified user.
-filter <arg> user = <U>; name = <N>; group = <G>; status = <S>; ...
-jobtype <arg> job type ('Supported in Oozie-2.0 or later versions ONLY - 
   coordinator' or 'wf' (default))
-len <arg> number of jobs (default '100')
-localtime use local time (default GMT)
-offset <arg> jobs offset (default '1')
-oozie <arg> Oozie URL
-verbose verbose mode

For example − To check the status of the Oozie system you can run the following command −

oozie admin -oozie http://localhost:8080/oozie -status

Validating a Workflow XML −

oozie validate myApp/workflow.xml

It performs an XML Schema validation on the specified workflow XML file.

Action Extensions

We have seen hive extensions. Similarly, Oozie provides more action extensions few of them are as below −

Email Action

The email action allows sending emails in Oozie from a workflow application. An email action must provide to addresses, cc addresses (optional), a subject and a body. Multiple recipients of an email can be provided as comma separated addresses.

All the values specified in the email action can be parameterized (templated) using EL expressions.

Example

<workflow-app name = "sample-wf" xmlns = "uri:oozie:workflow:0.1">
...
   <action name = "an-email">
      <email xmlns = "uri:oozie:email-action:0.1">
         <to>julie@xyz.com,max@abc.com</to>
         <cc>jax@xyz.com</cc>
         <subject>Email notifications for ${wf:id()}</subject>
         <body>The wf ${wf:id()} successfully completed.</body>
      </email>
      <ok to = "main_job"/>
      <error to = "kill_job"/>
   </action>
...
</workflow-app>

Shell Action

The shell action runs a Shell command. The workflow job will wait until the Shell command completes before continuing to the next action.

To run the Shell job, you have to configure the shell action with the =job-tracker=, name-node and Shell exec elements as well as the necessary arguments and configuration. A shell action can be configured to create or delete HDFS directories before starting the Shell job.

The shell launcher configuration can be specified with a file, using the job-xml element, and inline, using the configuration elements.

Example

How to run any shell script?

<workflow-app xmlns = 'uri:oozie:workflow:0.3' name = 'shell-wf'>
   <start to = 'shell1' />
   
   <action name = 'shell1'>
      <shell xmlns = "uri:oozie:shell-action:0.1">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <file>path_of_file_name</file>
      </shell>
      <ok to = "end" />
      <error to = "fail" />
   </action>
   
   <kill name = "fail">
      <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
         </message>
   </kill>
	
   <end name = 'end' />
</workflow-app>

Similarly, we can have many more actions like ssh, sqoop, java action, etc.

Additional Resources

Oozie official documentation website is the best resource to understand Oozie in detail.

Advertisements