Apache Oozie
Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes:
* XML-based declarative framework to specify a job or a complex workflow of dependent jobs.
* Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications.
* Workflow scheduling based on frequency and/or data availability.
* Monitoring capability, automatic retry and failure handing of jobs.
* Extensible and pluggable architecture to allow arbitrary grid programming paradigms.
* Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service.
Oozie Engines :
Oozie has Workflow Engine, Coordinator Engine and Bundle Engine
PreRequiste :
JAVA_HOME (Java)
M2_HOME (Install Maven)
HADOOP_HOME (hadoop)
Pig (PIG_HOME)
Download Oozie
Make sure below command works
Extract Oozie archieve
Building Oozie
The simplest way to build Oozie is to run the mkdistro.sh script:
Oozie Server Setup
Copy the built binaries to the home directory as ‘oozie’
Create the required libext directory
Copy all the required jars from hadooplibs to the libext directory using the following command:
Get Ext2Js – This library is not bundled with Oozie and needs to be downloaded separately. This library is used for the Oozie Web Console:
Update ../hadoop/conf/core-site.xml as follows. Hadoop Version 1.2.x:
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>localhost</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>hadoop</value>
</property>
Prepare the WAR file
Provide permission to oozie directory
Create sharelib on HDFS
Create the OoozieDB
Start Oozie as a daemon process run / start Oozie as a foreground process run:
oozie-start.sh, oozie-run.sh , and oozie-stop.sh
URL for Oozie Web Console is http://localhost:11000/oozie
Check Oozie status, should be NORMAL.
Try Oozie Examples : Oozie Examples which i tried from my same blog
Oozie Client Setup may required in the remote machine
Add the /home/hduser/oozie-client/bin to PATH in .bashrc or /etc/profiles and restart your terminal.
References : Oozie Installation : (Apache Oozie, Rohit Blog, CloudBlog)
Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes:
* XML-based declarative framework to specify a job or a complex workflow of dependent jobs.
* Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications.
* Workflow scheduling based on frequency and/or data availability.
* Monitoring capability, automatic retry and failure handing of jobs.
* Extensible and pluggable architecture to allow arbitrary grid programming paradigms.
* Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service.
Oozie Engines :
Oozie has Workflow Engine, Coordinator Engine and Bundle Engine
PreRequiste :
JAVA_HOME (Java)
M2_HOME (Install Maven)
HADOOP_HOME (hadoop)
Pig (PIG_HOME)
Download Oozie
Make sure below command works
$ java -version $ javac -version $ mvn -version
Extract Oozie archieve
$ sudo cp ~/Downloads/oozie-*.tar.gz /usr/local/ $ sudo su - $ cd /usr/local $ tar -xzf oozie-3.3.2.tar.gz
Building Oozie
The simplest way to build Oozie is to run the mkdistro.sh script:
$ cd oozie-3.3.2 $ ./bin/mkdistro.sh -DskipTests
Oozie Server Setup
Copy the built binaries to the home directory as ‘oozie’
$ cd .. $ cp -R oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/ oozie
Create the required libext directory
$ cd oozie $ mkdir libext
Copy all the required jars from hadooplibs to the libext directory using the following command:
$ cp ../oozie-3.3.2/hadooplibs/target/oozie-3.3.2-hadooplibs.tar.gz . $ tar xzvf oozie-3.3.2-hadooplibs.tar.gz $ cp oozie-3.3.2/hadooplibs/hadooplib-1.1.1.oozie-3.3.2/* libext/
Get Ext2Js – This library is not bundled with Oozie and needs to be downloaded separately. This library is used for the Oozie Web Console:
$ cd libext $ wget http://extjs.com/deploy/ext-2.2.zip $ cd ..
Update ../hadoop/conf/core-site.xml as follows. Hadoop Version 1.2.x:
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>localhost</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>hadoop</value>
</property>
Note : Here, ‘hduser’ is the username and it belongs to ‘hadoop’ group.
Prepare the WAR file
$../bin/oozie-setup.sh prepare-war INFO: Oozie is ready to be started
Provide permission to oozie directory
$ chown -R hduser:hadoop oozie
Create sharelib on HDFS
$ su hduser $ /usr/local/oozie/ $ ./bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310
Create the OoozieDB
$ ./bin/ooziedb.sh create -sqlfile oozie.sql -run The SQL commands have been written to: oozie.sql
Start Oozie as a daemon process run / start Oozie as a foreground process run:
oozie-start.sh, oozie-run.sh , and oozie-stop.sh
$ ./bin/oozie-start.sh or $ ./bin/oozie-run.sh or $ ./bin/oozie-stop.sh
Note : oozie log will be in /usr/local/oozie/logs/oozie.log
URL for Oozie Web Console is http://localhost:11000/oozie
Check Oozie status, should be NORMAL.
$ bin/oozie admin -oozie http://localhost:11000/oozie -status
Try Oozie Examples : Oozie Examples which i tried from my same blog
Oozie Client Setup may required in the remote machine
$ cd .. $ cp oozie/oozie-client-3.3.2.tar.gz . $ tar xvzf oozie-client-3.3.2.tar.gz $ mv oozie-client-3.3.2 oozie-client $ cd bin
Add the /home/hduser/oozie-client/bin to PATH in .bashrc or /etc/profiles and restart your terminal.
References : Oozie Installation : (Apache Oozie, Rohit Blog, CloudBlog)
No comments :
Post a Comment