Friday, February 20, 2015

Apache Oozie - Schedule your job using powerful workflow engine

Apache Oozie
Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes:

  * XML-based declarative framework to specify a job or a complex workflow of dependent jobs.
  * Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications.
  * Workflow scheduling based on frequency and/or data availability.
  * Monitoring capability, automatic retry and failure handing of jobs.
  * Extensible and pluggable architecture to allow arbitrary grid programming paradigms.
  * Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service.

Oozie Engines :
Oozie has Workflow Engine,  Coordinator Engine and Bundle Engine

PreRequiste :
JAVA_HOME (Java)
M2_HOME (Install Maven)
HADOOP_HOME (hadoop)
Pig (PIG_HOME)

Download Oozie

Make sure below command works
$ java -version
$ javac -version
$ mvn -version


Extract Oozie archieve
$ sudo cp ~/Downloads/oozie-*.tar.gz /usr/local/
$ sudo su -
$ cd /usr/local
$ tar -xzf oozie-3.3.2.tar.gz


Building Oozie
The simplest way to build Oozie is to run the mkdistro.sh script:
$ cd oozie-3.3.2
$ ./bin/mkdistro.sh -DskipTests


Oozie Server Setup
Copy the built binaries to the home directory as ‘oozie’
$ cd ..
$ cp -R oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/ oozie

Create the required libext directory
$ cd oozie
$ mkdir libext


Copy all the required jars from hadooplibs to the libext directory using the following command:
$ cp ../oozie-3.3.2/hadooplibs/target/oozie-3.3.2-hadooplibs.tar.gz .
$ tar xzvf oozie-3.3.2-hadooplibs.tar.gz
$ cp oozie-3.3.2/hadooplibs/hadooplib-1.1.1.oozie-3.3.2/* libext/


Get Ext2Js – This library is not bundled with Oozie and needs to be downloaded separately. This library is used for the Oozie Web Console:
$ cd libext
$ wget http://extjs.com/deploy/ext-2.2.zip
$ cd ..


Update ../hadoop/conf/core-site.xml as follows. Hadoop Version 1.2.x:

<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>localhost</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>hadoop</value>
</property>


Note : Here, ‘hduser’ is the username and it belongs to ‘hadoop’ group.


Prepare the WAR file
$../bin/oozie-setup.sh prepare-war

INFO: Oozie is ready to be started


Provide permission to oozie directory
$ chown -R hduser:hadoop oozie


Create sharelib on HDFS
$ su hduser
$ /usr/local/oozie/
$ ./bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310


Create the OoozieDB
$ ./bin/ooziedb.sh create -sqlfile oozie.sql -run

The SQL commands have been written to: oozie.sql


Start Oozie as a daemon process run / start Oozie as a foreground process run:
oozie-start.sh, oozie-run.sh , and oozie-stop.sh
$ ./bin/oozie-start.sh
or
$ ./bin/oozie-run.sh
or
$ ./bin/oozie-stop.sh


Note : oozie log will be in /usr/local/oozie/logs/oozie.log


URL for Oozie Web Console is http://localhost:11000/oozie

Check Oozie status, should be NORMAL.
$ bin/oozie admin -oozie http://localhost:11000/oozie -status

Try Oozie Examples : Oozie Examples which i tried from my same blog

Oozie Client Setup may required in the remote machine
$ cd ..
$ cp oozie/oozie-client-3.3.2.tar.gz .
$ tar xvzf oozie-client-3.3.2.tar.gz
$ mv oozie-client-3.3.2 oozie-client
$ cd bin

Add the /home/hduser/oozie-client/bin to PATH in .bashrc or /etc/profiles and restart your terminal.

References : Oozie Installation : (Apache Oozie, Rohit Blog, CloudBlog)

No comments :

// Below script tag for SyntaxHighLighter