Apache Oozie is the tool in which all sort of programs can be pipelined in a desired order to work in Hadoop’s distributed environment. Oozie also provides a mechanism to run the job at a given schedule.
This tutorial explains the scheduler system to run and manage Hadoop jobs called Apache Oozie. It is tightly integrated with Hadoop stack supporting various Hadoop jobs like Hive, Pig, Sqoop, as well as system specific jobs like Java and Shell.
This tutorial explores the fundamentals of Apache Oozie like workflow, coordinator, bundle and property file along with some examples. By the end of this tutorial, you will have enough understanding on scheduling and running Oozie jobs on Hadoop cluster in a distributed environment.
This tutorial has been prepared for professionals working with Big Data Analytics and want to understand about scheduling complex Hadoop jobs using Apache Oozie.
This tutorial is intended to make you comfortable in getting started with Oozie and does not detail each and every function available. For these details, Oozie documentation is the best place to visit.
Before proceeding with this tutorial, you must have a conceptual understanding of Cron jobs and schedulers.