sqoop - Sql to hadoop
Efficient tool to transfer bulk data from structured (relational db's) to hadoop (hdfs / hive / hbase). You can also export data from hdfs to import into other datawarehouses.
Multiple ways to install sqoop :
To install sqoop in Debian distributions (Ubuntu / Debian)
Advantage installing with debian packages than Tar ball. Still to handle yourself efficiently the package use tar ball.
Configure sqoop wrapper with hadoop
In /home/hduser/.bashrc or /etc/profiles
Installing JDBC Driver for Sqoop
(This will install the jar for mysql connector through java, cd /usr/share/java/ to see the mysql-connector-java.jar). mysql-connector-java.jar must be symlink to mysql.jar. If so follow below
Check your sqoop version
Create table movies in your mysql
Refer : Download sqoopSample.sql from github for mysql schema.
Import data (table or all tables) from MySQL into Hadoop (HDFS)
Make sure you started your hadoop. (cd $HADOOP_HOME/bin; ./start-all.sh)
-m Use n map tasks to import in parallel
Export data from hadoop (hdfs) into MySQL
sqoop command to create hive tables matches the database table. If already exist it will throw error
sqoop command to import the entire content from the database table to hive, that uses commas(,) to separate the files in data files
--hive-overwrite -> This will overwrite the existing content and write the data into hive table
--hive-import -> To import the table into hive we need --hive-import when there is no table in hive.
--hive-table -> If this is not specified by default the database table name will be created in the hive
Default delimiter using sqoop into hive is ^a. You can specify --fields-terminated-by ',' for the specific separator.
Refer : Download sqoopSample.sql from github for mysql schema.
References : Tutorial Reference,
GoodPlaceToStart
Efficient tool to transfer bulk data from structured (relational db's) to hadoop (hdfs / hive / hbase). You can also export data from hdfs to import into other datawarehouses.
Multiple ways to install sqoop :
To install sqoop in Debian distributions (Ubuntu / Debian)
$ sudo apt-get install sqoop
Advantage installing with debian packages than Tar ball. Still to handle yourself efficiently the package use tar ball.
- Handle dependencies
- Provide for easy upgrades
- Automatically install resources to conventional locations
$ sudo su $ (cp ~/Downloads/sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz /usr/local && cd /usr/local/ && tar -zxvf path_to_sqoop.tar.gz) $ mv sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz sqoop-1.4.5 $ chown -R hduser:hadoop sqoop-1.4.5
Configure sqoop wrapper with hadoop
$cd /usr/local/sqoop-1.4.5/conf/ $mv sqoop-env-template.sh sqoop-env.sh
Enable HADOOP_COMMON_HOME and HADOOP_MAPRED_HOME by providing hadoop available path. If you want to sqoop data to HBASE or Hive enable those variables as well and provide path in sqoop-env.sh
$vi sqoop-env.sh export HADOOP_COMMON_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=/usr/local/hadoop
In /home/hduser/.bashrc or /etc/profiles
export SQOOP_HOME=/usr/local/sqoop-1.4.5 export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin
Installing JDBC Driver for Sqoop
sudo apt-get install libmysql-java
(This will install the jar for mysql connector through java, cd /usr/share/java/ to see the mysql-connector-java.jar). mysql-connector-java.jar must be symlink to mysql.jar. If so follow below
sudo ln -s /usr/share/java/mysql.jar /usr/local/sqoop-1.4.5/lib/
Check your sqoop version
$sqoop-version
Create table movies in your mysql
Refer : Download sqoopSample.sql from github for mysql schema.
Import data (table or all tables) from MySQL into Hadoop (HDFS)
Make sure you started your hadoop. (cd $HADOOP_HOME/bin; ./start-all.sh)
-m Use n map tasks to import in parallel
$cd /home/hduser $sqoop import --connect jdbc:mysql://hostName/dbName --username userName --password password --table tableName --target-dir /samples/movies -m 1
$sqoop import-all-tables --connect jdbc:mysql://hostName/dbName --username userName --password password
Important :
- If you see error "Exception in thread "main" java.lang.IncompatibleClassChangeError Found class org.apache.hadoop.mapreduce.JobContext, but interface was expected" then hadoop version between you installed and sqoop hadoop has mismatched. Hadoop 1 and hadoop 2 has major changes. So, have 1.x or 2.x on hadoop and sqoop-hadoop.
- Make sure table has primary key otherwise you need to mention map tasks.
Export data from hadoop (hdfs) into MySQL
$sqoop export --connect jdbc:mysql://localhost/sqoop_test --table movies_export --username root --password root --export-dir '/samples/movies/' -m 1;
sqoop command to create hive tables matches the database table. If already exist it will throw error
hduser$ sqoop create-hive-table --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table movies
Note : Impala uses the same metadata as hive, So you can use create-hive-table to import and query in Impala
sqoop command to import the entire content from the database table to hive, that uses commas(,) to separate the files in data files
hduser$ sqoop import --connect jdbc:mysql://localhost/sqoop_test --username root --password root --table movies --fields-terminated-by ',' --hive-import
hduser$
--hive-overwrite -> This will overwrite the existing content and write the data into hive table
--hive-import -> To import the table into hive we need --hive-import when there is no table in hive.
--hive-table -> If this is not specified by default the database table name will be created in the hive
Default delimiter using sqoop into hive is ^a. You can specify --fields-terminated-by ',' for the specific separator.
Refer : Download sqoopSample.sql from github for mysql schema.
References : Tutorial Reference,
GoodPlaceToStart
2 comments :
There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.
Big Data Training Chennai
Big Data Training
Big Data Course in Chennai
Really very nice blog,keep sharing more posts with us.
Thank you...
big data and hadoop course
big data hadoop certification
big data hadoop training
Post a Comment