Apache Oozie - Bundle



The Oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a data pipeline. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline.

The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting in a better and easy operational control.

Bundle

Let’s extend our workflow and coordinator example to a bundle.

<bundle-app xmlns = 'uri:oozie:bundle:0.1' 
   name = 'bundle_copydata_from_external_orc'>
   
   <controls>
      <kick-off-time>${kickOffTime}</kick-off-time>
   </controls>
   
   <coordinator name = 'coord_copydata_from_external_orc' >
      <app-path>pathof_coordinator_xml</app-path>
      <configuration>
         <property>
            <name>startTime1</name>
            <value>time to start</value>
         </property>
      </configuration>
   </coordinator>

</bundle-app>

Kick-off-time − The time when a bundle should start and submit coordinator applications.

There can be more than one coordinator in a bundle.

Bundle Job Status

At any time, a bundle job is in one of the following status: PREP, RUNNING, PREPSUSPENDED, SUSPENDED, PREPPAUSED, PAUSED, SUCCEEDED, DONWITHERROR, KILLED, FAILED.

Valid bundle job status transitions are −

  • PREP − PREPSUSPENDED | PREPPAUSED | RUNNING | KILLED

  • RUNNING − SUSPENDED | PAUSED | SUCCEEDED | DONWITHERROR | KILLED | FAILED

  • PREPSUSPENDED − PREP | KILLED

  • SUSPENDED − RUNNING | KILLED

  • PREPPAUSED − PREP | KILLED

  • PAUSED − SUSPENDED | RUNNING | KILLED

  • When a bundle job is submitted, Oozie parses the bundle job XML. Oozie then creates a record for the bundle with status PREP and returns a unique ID.

  • When a user requests to suspend a bundle job that is in PREP state, Oozie puts the job in status PREPSUSPEND. Similarly, when the pause time reaches for a bundle job with PREP status, Oozie puts the job in status PREPPAUSED.

  • Conversely, when a user requests to resume a PREPSUSPENDED bundle job, Oozie puts the job in status PREP. And when pause time is reset for a bundle job that is in PREPPAUSED state, Oozie puts the job in status PREP.

  • There are two ways a bundle job could be started. * If kick-off-time (defined in the bundle xml) reaches. The default value is null, which means starts coordinators NOW. * If user sends a start request to START the bundle.

  • When a bundle job starts, Oozie puts the job in status RUNNING and it submits the all coordinator jobs.

  • When a user requests to kill a bundle job, Oozie puts the job in status KILLED and it sends kill to all submitted coordinator jobs.

  • When a user requests to suspend a bundle job that is not in PREP status, Oozie puts the job in status SUSPEND and it suspends all submitted coordinator jobs.

  • When pause time reaches for a bundle job that is not in PREP status, Oozie puts the job in status PAUSED. When the paused time is reset, Oozie puts back the job in status RUNNING.

When all the coordinator jobs finish, Oozie updates the bundle status accordingly. If all coordinators reach to the same terminal state, the bundle job status also moves to the same status. For example, if all coordinators are SUCCEEDED, Oozie puts the bundle job into SUCCEEDED status. However, if all coordinator jobs don't finish with the same status, Oozie puts the bundle job into DONEWITHERROR.

Advertisements