For Developers: HBase - Hadoop Database

What is HBase ? (Hadoop Database) Abstraction for hadoop to store data.
HBase is an Open-Source, Distributed (clustered), Sparse (any row, no strict schema) and column-oriented Store modeled after google's BigTable. Its an Hierarchical data structure.

If you have small amount of data you can go for RDBMS (Relation database management system like MySQL or Oracle). (Max data storage is TB's - tera byte)
If you have large amount of data like 100giga byte of data or peta byte data you need HBase for faster performance and faster throughputs. (Max data storage is PB's peta bytes)

HBase is developed on top of Hadoop and HDFS.

Data store based on the ROWID->Family->Qualifier
In the same family which is of table you can have different qualifier of columns names. RowID should be identifier for fetching the data from the row.
Timestamp will be added and data will be stored for every version.
By default get, scan everything will fetch the top only 1 record.

Java Hbase Code for hbase operations example : http://autofei.wordpress.com/2012/04/02/java-example-code-using-hbase-data-model-operations/

Installation based on cloudera : https://ccp.cloudera.com/display/CDHDOC/HBase+Installation#HBaseInstallation-InstallingtheHBaseMasterforStandaloneOperation

Start : sudo /etc/init.d/hadoop-hbase-master start

Port : http://localhost:60010/master-status

Hbase : http://hbase.apache.org/book.html (Refer link to install standard Node on your box and proceed playing Hbase). Good link for commands : http://wiki.apache.org/hadoop/Hbase/Shell

COMMAND GROUPS:
Group name: general
Commands: status, version

Group name: ddl
Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list

Group name: dml
Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate

Group name: tools
Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump

Group name: replication
Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication

If you are using binary keys or values and need to enter them in the shell, use

double-quote'd hexadecimal representation. For example:

hbase> get 't1', "key\x03\x3f\xcd"

hbase> get 't1', "key\003\023\011"

hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html

Played in Hbase Shell :

There was good tool H-Rider available in github.
What is H-Rider - The h-rider is a UI application created to provide an easier way to view or manipulate the data saved in the distributed database - HBase™ - that supports structured data storage for large tables.

root@glakshmanan-laptop:/usr/lib/hbase/bin# hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011

hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.5080 seconds

hbase(main):002:0> list
TABLE
test
1 row(s) in 0.0200 seconds

hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.1320 seconds

hbase(main):004:0> put 'test', 'row1', 'cf:a', 'sai-ram'
0 row(s) in 0.0280 seconds

hbase(main):005:0> put 'test', 'row2', 'cf:b', 'gubs'
0 row(s) in 0.0060 seconds

hbase(main):006:0> put 'test', 'row3', 'cf:c', 'kavitha'
0 row(s) in 0.0070 seconds

hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613705467, value=sai-ram
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0310 seconds

hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1325613705467, value=sai-ram
1 row(s) in 0.0320 seconds

hbase(main):009:0> put 'test', 'row1', 'cf:a', 'sai sai'
0 row(s) in 0.0110 seconds

hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613788305, value=sai sai
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0250 seconds

hbase(main):011:0> major_compact 'test'

hbase(main):011:0> drop 'test'

ERROR: Table test is enabled. Disable it first.'

Here is some help for this command:
Drop the named table. Table must first be disabled. If table has
more than one region, run a major compaction on .META.:

hbase> major_compact ".META."

hbase(main):012:0> disable 'test'
0 row(s) in 2.0440 seconds

hbase(main):013:0> drop 'test'
0 row(s) in 1.0670 seconds

root@glakshmanan-laptop:/usr/lib/hbase/bin# ./hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011

hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.5080 seconds

hbase(main):002:0> list
TABLE
test
1 row(s) in 0.0200 seconds

hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.1320 seconds

hbase(main):004:0> put 'test', 'row1', 'cf:a', 'sai-ram'
0 row(s) in 0.0280 seconds

hbase(main):005:0> put 'test', 'row2', 'cf:b', 'gubs'
0 row(s) in 0.0060 seconds

hbase(main):006:0> put 'test', 'row3', 'cf:c', 'kavitha'
0 row(s) in 0.0070 seconds

hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613705467, value=sai-ram
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0310 seconds

hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1325613705467, value=sai-ram
1 row(s) in 0.0320 seconds

hbase(main):009:0> put 'test', 'row1', 'cf:a', 'sai sai'
0 row(s) in 0.0110 seconds

hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613788305, value=sai sai
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0250 seconds

hbase(main):011:0> drop 'test'

ERROR: Table test is enabled. Disable it first.'

Here is some help for this command:
Drop the named table. Table must first be disabled. If table has
more than one region, run a major compaction on .META.:

hbase> major_compact ".META."

hbase(main):012:0> disable 'test'
0 row(s) in 2.0440 seconds

hbase(main):013:0> drop 'test'
0 row(s) in 1.0670 seconds

hbase(main):014:0> exit

Note : For Every DDL operations you need to run disable, to Modify / Delete / Drop. Hbase is case-sensitive.

Import data from hadoop hdfs into hbase
syntax : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import

example : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import 0005_AlertTemplate hdfs://localhost/export/gubs_export/0005_AlertTemplate

Execute shell script against hbase
hbase shell < shell_script.sh

Export HBase data to Hadoop (Run this command in hdfs user)
Ex : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar export 0005_SCMaster /export/gubs_export/0005_SCMaster 2000000

Export Hadoop To LocalSystem
Ex : hadoop fs -copyToLocal /export/gubs_export/0005_SCMaster /tmp/gubs_export/0005_SCMaster

SCP : scp the file from source box to destination box

Export LocalSystem To Hadoop
Ex : hadoop fs -copyFromLocal /tmp/gubs_export/0005_SCMaster /export/0005_SCMaster

Import Hadoop data into Hbase (Run this command in hdfs user)
Ex : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import 0005_SCMaster /export/gubs_export/0005_SCMaster

To describe hbase table :
describe '0001_SMSReportData'

Compress the bigger table in hbase
alter '0001_SMSReportData', {NAME => 'TELEMETRY', COMPRESSION => 'SNAPPY'}

Define TTL - TimeToLive for table data
alter '0030_ConnectionSummary',{NAME=>'WSC',TTL=>'604800'}

Run Major compact for a table
major_compact '0001_ReportEvent'

Store the hbase table data in memory, when you retrieve frequently
alter '0031_DeviceMaster' , {NAME=>'D',IN_MEMORY => 'true'}

Java Hbase code to scan and rename the given table qualifier name

private static void updateTelemetryTempQualifier() throws IOException {

Configuration configuration = HBaseConfiguration.create();
HTablePool hTablePool = new HTablePool(configuration, 10);

HTableInterface connection = (HTable) hTablePool.getTable(Tables.DEVICE_TELEMETRYAVERAGES
.getTenantTableName("0001"));
Scan scan = new Scan();
scan.addColumn(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
scan.setMaxVersions();
scan.setCacheBlocks(false);
ResultScanner rsc = connection.getScanner(scan);
for (Result result : rsc) {
List kvList1 = result.getColumn(
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
if (kvList1 != null && !kvList1.isEmpty()) {
for (int i = 0; i < kvList1.size(); i++) {
Put put = new Put(kvList1.get(i).getRow());
put.add(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForQualifier(), kvList1.get(i)
.getTimestamp(), kvList1.get(i).getValue());

connection.put(put);
Delete delete = new Delete(kvList1.get(i).getRow());
delete.deleteColumn(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
connection.delete(delete);
}
}
}

}

For Developers

Menu

Sunday, December 18, 2011

HBase - Hadoop Database

No comments :