What is HBase ? (Hadoop Database) Abstraction for hadoop to store data.
HBase is an Open-Source, Distributed (clustered), Sparse (any row, no strict schema) and column-oriented Store modeled after google's BigTable. Its an Hierarchical data structure.
Start : sudo /etc/init.d/hadoop-hbase-master start
Port : http://localhost:60010/master-status
Hbase : http://hbase.apache.org/book.html (Refer link to install standard Node on your box and proceed playing Hbase). Good link for commands : http://wiki.apache.org/hadoop/Hbase/Shell
COMMAND GROUPS:
Group name: general
Commands: status, version
Group name: ddl
Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list
Group name: dml
Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
Group name: tools
Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump
Group name: replication
Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication
HBase is an Open-Source, Distributed (clustered), Sparse (any row, no strict schema) and column-oriented Store modeled after google's BigTable. Its an Hierarchical data structure.
- If you have small amount of data you can go for RDBMS (Relation database management system like MySQL or Oracle). (Max data storage is TB's - tera byte)
- If you have large amount of data like 100giga byte of data or peta byte data you need HBase for faster performance and faster throughputs. (Max data storage is PB's peta bytes)
HBase is developed on top of Hadoop and HDFS.
Data store based on the ROWID->Family->Qualifier
In the same family which is of table you can have different qualifier of columns names. RowID should be identifier for fetching the data from the row.
Timestamp will be added and data will be stored for every version.
By default get, scan everything will fetch the top only 1 record.
Java Hbase Code for hbase operations example : http://autofei.wordpress.com/2012/04/02/java-example-code-using-hbase-data-model-operations/
Installation based on cloudera : https://ccp.cloudera.com/display/CDHDOC/HBase+Installation#HBaseInstallation-InstallingtheHBaseMasterforStandaloneOperationData store based on the ROWID->Family->Qualifier
In the same family which is of table you can have different qualifier of columns names. RowID should be identifier for fetching the data from the row.
Timestamp will be added and data will be stored for every version.
By default get, scan everything will fetch the top only 1 record.
Java Hbase Code for hbase operations example : http://autofei.wordpress.com/2012/04/02/java-example-code-using-hbase-data-model-operations/
Start : sudo /etc/init.d/hadoop-hbase-master start
Port : http://localhost:60010/master-status
Hbase : http://hbase.apache.org/book.html (Refer link to install standard Node on your box and proceed playing Hbase). Good link for commands : http://wiki.apache.org/hadoop/Hbase/Shell
COMMAND GROUPS:
Group name: general
Commands: status, version
Group name: ddl
Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list
Group name: dml
Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
Group name: tools
Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump
Group name: replication
Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html
Played in Hbase Shell :
There was good tool H-Rider available in github.
What is H-Rider - The h-rider is a UI application created to provide an easier way to view or manipulate the data saved in the distributed database - HBase™ - that supports structured data storage for large tables.
root@glakshmanan-laptop:/usr/lib/hbase/bin# hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.5080 seconds
hbase(main):002:0> list
TABLE
test
1 row(s) in 0.0200 seconds
hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.1320 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'sai-ram'
0 row(s) in 0.0280 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b', 'gubs'
0 row(s) in 0.0060 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'kavitha'
0 row(s) in 0.0070 seconds
hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613705467, value=sai-ram
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0310 seconds
hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1325613705467, value=sai-ram
1 row(s) in 0.0320 seconds
hbase(main):009:0> put 'test', 'row1', 'cf:a', 'sai sai'
0 row(s) in 0.0110 seconds
hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613788305, value=sai sai
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0250 seconds
hbase(main):011:0> major_compact 'test'
hbase(main):011:0> drop 'test'
ERROR: Table test is enabled. Disable it first.'
Here is some help for this command:
Drop the named table. Table must first be disabled. If table has
more than one region, run a major compaction on .META.:
hbase> major_compact ".META."
hbase(main):012:0> disable 'test'
0 row(s) in 2.0440 seconds
hbase(main):013:0> drop 'test'
0 row(s) in 1.0670 seconds
root@glakshmanan-laptop:/usr/lib/hbase/bin# ./hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.5080 seconds
hbase(main):002:0> list
TABLE
test
1 row(s) in 0.0200 seconds
hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.1320 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'sai-ram'
0 row(s) in 0.0280 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b', 'gubs'
0 row(s) in 0.0060 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'kavitha'
0 row(s) in 0.0070 seconds
hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613705467, value=sai-ram
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0310 seconds
hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1325613705467, value=sai-ram
1 row(s) in 0.0320 seconds
hbase(main):009:0> put 'test', 'row1', 'cf:a', 'sai sai'
0 row(s) in 0.0110 seconds
hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613788305, value=sai sai
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0250 seconds
hbase(main):011:0> drop 'test'
ERROR: Table test is enabled. Disable it first.'
Here is some help for this command:
Drop the named table. Table must first be disabled. If table has
more than one region, run a major compaction on .META.:
hbase> major_compact ".META."
hbase(main):012:0> disable 'test'
0 row(s) in 2.0440 seconds
hbase(main):013:0> drop 'test'
0 row(s) in 1.0670 seconds
hbase(main):014:0> exit
Note : For Every DDL operations you need to run disable, to Modify / Delete / Drop. Hbase is case-sensitive.
Import data from hadoop hdfs into hbase
syntax : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import
example : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import 0005_AlertTemplate hdfs://localhost/export/gubs_export/0005_AlertTemplate
Execute shell script against hbase
hbase shell < shell_script.sh
Export HBase data to Hadoop (Run this command in hdfs user)
Ex : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar export 0005_SCMaster /export/gubs_export/0005_SCMaster 2000000
Export Hadoop To LocalSystem
Ex : hadoop fs -copyToLocal /export/gubs_export/0005_SCMaster /tmp/gubs_export/0005_SCMaster
SCP : scp the file from source box to destination box
Export LocalSystem To Hadoop
Ex : hadoop fs -copyFromLocal /tmp/gubs_export/0005_SCMaster /export/0005_SCMaster
Import Hadoop data into Hbase (Run this command in hdfs user)
Ex : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import 0005_SCMaster /export/gubs_export/0005_SCMaster
To describe hbase table :
describe '0001_SMSReportData'
Compress the bigger table in hbase
alter '0001_SMSReportData', {NAME => 'TELEMETRY', COMPRESSION => 'SNAPPY'}
Define TTL - TimeToLive for table data
alter '0030_ConnectionSummary',{NAME=>'WSC',TTL=>'604800'}
Run Major compact for a table
major_compact '0001_ReportEvent'
Store the hbase table data in memory, when you retrieve frequently
alter '0031_DeviceMaster' , {NAME=>'D',IN_MEMORY => 'true'}
Java Hbase code to scan and rename the given table qualifier name
private static void updateTelemetryTempQualifier() throws IOException {
Configuration configuration = HBaseConfiguration.create();
HTablePool hTablePool = new HTablePool(configuration, 10);
HTableInterface connection = (HTable) hTablePool.getTable(Tables.DEVICE_TELEMETRYAVERAGES
.getTenantTableName("0001"));
Scan scan = new Scan();
scan.addColumn(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
scan.setMaxVersions();
scan.setCacheBlocks(false);
ResultScanner rsc = connection.getScanner(scan);
for (Result result : rsc) {
List kvList1 = result.getColumn(
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
if (kvList1 != null && !kvList1.isEmpty()) {
for (int i = 0; i < kvList1.size(); i++) {
Put put = new Put(kvList1.get(i).getRow());
put.add(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForQualifier(), kvList1.get(i)
.getTimestamp(), kvList1.get(i).getValue());
connection.put(put);
Delete delete = new Delete(kvList1.get(i).getRow());
delete.deleteColumn(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
connection.delete(delete);
}
}
}
}
Played in Hbase Shell :
There was good tool H-Rider available in github.
What is H-Rider - The h-rider is a UI application created to provide an easier way to view or manipulate the data saved in the distributed database - HBase™ - that supports structured data storage for large tables.
root@glakshmanan-laptop:/usr/lib/hbase/bin# hbase shell
HBase Shell; enter 'help
Type "exit
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.5080 seconds
hbase(main):002:0> list
TABLE
test
1 row(s) in 0.0200 seconds
hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.1320 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'sai-ram'
0 row(s) in 0.0280 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b', 'gubs'
0 row(s) in 0.0060 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'kavitha'
0 row(s) in 0.0070 seconds
hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613705467, value=sai-ram
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0310 seconds
hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1325613705467, value=sai-ram
1 row(s) in 0.0320 seconds
hbase(main):009:0> put 'test', 'row1', 'cf:a', 'sai sai'
0 row(s) in 0.0110 seconds
hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613788305, value=sai sai
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0250 seconds
hbase(main):011:0> major_compact 'test'
hbase(main):011:0> drop 'test'
ERROR: Table test is enabled. Disable it first.'
Here is some help for this command:
Drop the named table. Table must first be disabled. If table has
more than one region, run a major compaction on .META.:
hbase> major_compact ".META."
hbase(main):012:0> disable 'test'
0 row(s) in 2.0440 seconds
hbase(main):013:0> drop 'test'
0 row(s) in 1.0670 seconds
root@glakshmanan-laptop:/usr/lib/hbase/bin# ./hbase shell
HBase Shell; enter 'help
Type "exit
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.5080 seconds
hbase(main):002:0> list
TABLE
test
1 row(s) in 0.0200 seconds
hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.1320 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'sai-ram'
0 row(s) in 0.0280 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b', 'gubs'
0 row(s) in 0.0060 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'kavitha'
0 row(s) in 0.0070 seconds
hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613705467, value=sai-ram
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0310 seconds
hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1325613705467, value=sai-ram
1 row(s) in 0.0320 seconds
hbase(main):009:0> put 'test', 'row1', 'cf:a', 'sai sai'
0 row(s) in 0.0110 seconds
hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1325613788305, value=sai sai
row2 column=cf:b, timestamp=1325613715431, value=gubs
row3 column=cf:c, timestamp=1325613724596, value=kavitha
3 row(s) in 0.0250 seconds
hbase(main):011:0> drop 'test'
ERROR: Table test is enabled. Disable it first.'
Here is some help for this command:
Drop the named table. Table must first be disabled. If table has
more than one region, run a major compaction on .META.:
hbase> major_compact ".META."
hbase(main):012:0> disable 'test'
0 row(s) in 2.0440 seconds
hbase(main):013:0> drop 'test'
0 row(s) in 1.0670 seconds
hbase(main):014:0> exit
Note : For Every DDL operations you need to run disable, to Modify / Delete / Drop. Hbase is case-sensitive.
Import data from hadoop hdfs into hbase
syntax : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import
Execute shell script against hbase
hbase shell < shell_script.sh
Export HBase data to Hadoop (Run this command in hdfs user)
Ex : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar export 0005_SCMaster /export/gubs_export/0005_SCMaster 2000000
Export Hadoop To LocalSystem
Ex : hadoop fs -copyToLocal /export/gubs_export/0005_SCMaster /tmp/gubs_export/0005_SCMaster
SCP : scp the file from source box to destination box
Export LocalSystem To Hadoop
Ex : hadoop fs -copyFromLocal /tmp/gubs_export/0005_SCMaster /export/0005_SCMaster
Import Hadoop data into Hbase (Run this command in hdfs user)
Ex : hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u3.jar import 0005_SCMaster /export/gubs_export/0005_SCMaster
To describe hbase table :
describe '0001_SMSReportData'
Compress the bigger table in hbase
alter '0001_SMSReportData', {NAME => 'TELEMETRY', COMPRESSION => 'SNAPPY'}
Define TTL - TimeToLive for table data
alter '0030_ConnectionSummary',{NAME=>'WSC',TTL=>'604800'}
Run Major compact for a table
major_compact '0001_ReportEvent'
Store the hbase table data in memory, when you retrieve frequently
alter '0031_DeviceMaster' , {NAME=>'D',IN_MEMORY => 'true'}
Java Hbase code to scan and rename the given table qualifier name
private static void updateTelemetryTempQualifier() throws IOException {
Configuration configuration = HBaseConfiguration.create();
HTablePool hTablePool = new HTablePool(configuration, 10);
HTableInterface connection = (HTable) hTablePool.getTable(Tables.DEVICE_TELEMETRYAVERAGES
.getTenantTableName("0001"));
Scan scan = new Scan();
scan.addColumn(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
scan.setMaxVersions();
scan.setCacheBlocks(false);
ResultScanner rsc = connection.getScanner(scan);
for (Result result : rsc) {
List
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
if (kvList1 != null && !kvList1.isEmpty()) {
for (int i = 0; i < kvList1.size(); i++) {
Put put = new Put(kvList1.get(i).getRow());
put.add(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForQualifier(), kvList1.get(i)
.getTimestamp(), kvList1.get(i).getValue());
connection.put(put);
Delete delete = new Delete(kvList1.get(i).getRow());
delete.deleteColumn(DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName(),
DeviceTelemetryAveragesFamilyAndColumns.AVERAGE_DAILY_TEMP.getBytesForFamilyName());
connection.delete(delete);
}
}
}
}
No comments :
Post a Comment