Thursday, February 12, 2015

Ubuntu - Install Hive and configure Hive Metastore with MySQL

 Install Hive and configure Hive Metastore with MySQL

Hive is designed for data summarization, ad-hoc querying, and analysis of large volumes of data. Hive is a Data Warehousing package built on top of Hadoop.

HiveQL -> HiveQueryLanguage will create MapReduce job for your task and you can assign the number of reducer and bytes to be consume by reducer tasks etc..

Hive Vs MapReduce
Hive provides no additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via Hive interpreter.

Hive is powerful that in Hive, you can join table data with log file and query to see the result using regexp pattern.

1) Download hive

2) Extract Hive (Hope you followed "Install Hadoop" with user hduser)
sudo su
cd /usr/local; cp /home//Download/apache-hive-0.14.0-bin.tar.gz /usr/local/
tar -xvzf apache-hive-0.14.0-bin.tar.gz
mv apache-hive-0.14.0-bin hive-0.14
chown -R hduser:hadoop hive-0.14 


6) Update /etc/profile PATH or su hduser; vi ~/.bashrc
export HIVE_HOME=/usr/local/hive-0.14
export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin

7) Running Hive

Note : Make sure HADOOP_HOME is installed and configured.

In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w in HDFS before you can create a table in Hive.


$HADOOP_HOME/bin/hadoop fs -mkdir       /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse


8) Remove .template extension from all the files stored. (Ex : mv hive-env.sh.template hive-env.sh)
cd $HIVE_HOME/conf

9) Enter into hive console
$hive (Press Enter to hive shell)

Note : If you want to proceed with the default hive metastore with derby db in your local machine experimental proceed $hive and start creating your table and try HiveQueryLanguage..(HQL). Go to Step 18

10) Configure the hive metaStore pointing MySQL (Recommended)
This will install the jar for mysql connector through java, cd /usr/share/java/ to see the mysql-connector-java.jar)
sudo apt-get install mysql-server-5.5
sudo apt-get install libmysql-java
sudo ln -s /usr/share/java/mysql.jar /usr/local/hive-0.14/lib/libmysql-java.jar


11) sudo sysv-rc-conf mysql on (Start mysql upon machine start. If sysv-rc-conf missing in your ubuntu then "sudo apt-get install sysv-rc-conf")

 Create the initial database schema using the hive-schema-.sql file located in the $HIVE_HOME/scripts/metastore/upgrade/mysql directory.
 

$ mysql -uroot -proot
mysql>CREATE DATABASE metastore;
mysql>USE metastore;
mysql> SOURCE $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-.sql


 You also need a MySQL user account for Hive to use to access the metastore. It is very important to prevent this user account from creating or altering tables in the metastore database schema. Replace metastorehost(your remote metastore server) with localhost if your metastore server is same.

mysql> CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword';
...
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'metastorehost';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'metastorehost';
mysql> FLUSH PRIVILEGES;
mysql> quit;

 This step shows the configuration properties you need to set in hive-site.xml to configure the metastore service to communicate with the MySQL database, and provides sample settings. Though you can use the same $HIVE_HOME/conf/hive-site.xml or $HIVE_HOME/conf/hive-default.xml on all hosts (client, metastore, HiveServer), hive.metastore.uris is the only property that must be configured on all of them; the others are used only on the metastore host.

12) Update hive-site.xml or hive-default.xml under $HIVE_HOME/conf with below
   <property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive</value>
</property>

<property>
  <name>datanucleus.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://localhost:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

13) Start Hive Metastore Service
hive --service metastore &

14) Create vi /home/hduser/user.txt in your local file system and add below content or download user.txt
userid,username,city,state,country
1,John,Montgomery,Alabama,US
2,David,Phoenix,Arizona,US
3,Sarah,Sacramento,California,US
4,Anoop,Montgomery,Alabama,US
5,Gubs,Villupuram,TamilNadu,India

Note : In hive, table names are all case insensitive


15) Go to hive prompt and we'll create the table users in the Hive MetaStore to map data from user.txt

$ hive (Enter)
hive>CREATE TABLE user(id INT, name STRING, City STRING, State STRING, Country STRING) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n' STORED AS TEXTFILE;
hive>show tables;


16) The following command maps user.txt data to the users table by loading data from user.txt.
  (LOCAL => keyword needed to load file from local into hive. Not necessary if your file is in hdfs)
  You can add OVERWRITE prior to 'INTO TABLE' if you want to overwrite existing user table content
  You can use
hadoop fs -put <localfilesystempath> <hdfsfiledirectory>

Load data into hive from your local directory. Remove LOCAL keyword if you file is in hdfs.
$hive>LOAD DATA LOCAL INPATH '/home/hduser/user.txt' INTO TABLE user;

Note : Load Local command copies the file from local location to /user/hive/warehouse/userdb/user/user.txt. When you execute "DROP TABLE user" the file will also be dropped/removed from the hive location /user/hive/warehouse/userdb/user/user.txt.

17) Query How many people belong to each state?
$hive>select state, count(state) from user group by state;

18) You can use most of the relational db (SQL) queries in hive
hive>describe user;
hive>show create table user;

19) Drop hive table drops the table and data file from the warehouse (/user/hive/warehouse/db/table/user.txt). What if you have MapReduce Program reference to this data file ? Refer next External Table creation
hive>DROP TABLE user

20) Create external table in hive. So, multiple tables can refer to the same data file and if you drop the table the data file will be available in the same location.
hive>CREATE EXTERNAL TABLE user(id INT, name STRING, city STRING, state STRING, country STRING) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n' STORED AS TEXTFILE;

hive>LOAD DATA INPATH '/samples/user.txt' INTO TABLE user;
 
Note : Load command moves the file from hdfs location to /user/hive/warehouse/userdb/user/user.txt. So, file won't be available in original hdfs location. Upon DROP TABLE user (External created table) the file will be available still in the same location /user/hive/warehouse/userdb/user/user.txt.Even after table in hive got dropped.

21) Exit from hive
$hive>quit;

22) Hive Log file location :
Note : hive.log path can be find in $HIVE_HOME/conf/hive-log4j.properties. ${java.io.tmpdir}/${user.name}/hive.log (/tmp/hduser/hive.log)

You can do all the DDL (DataDefinitionLanguage commands) you do in SQL in HiveQueryLanguage. Refer : Hive DDL Manual


References :  Developer.com About Pig and Hive,
Cloudera - Hive Install

Feel free to add your inputs / something wrong.

No comments :

// Below script tag for SyntaxHighLighter