Friday, September 27, 2013

HBASE - RegionServer - Hbase Master failed to reach RegionServer

RegionServer was failed to respond hbase Master. Basically, in regionServer the zookeeper failed to respond due to GC happens and java stop-the-world.

Read the below blog which explains very very clever how to get rid of GC failure.

http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/

Below is some configuration changes we did on our side to avoid the issue as a system load test.

(Hbase configuration) Hbase zookeeper session timeout increased to 90 seconds from 40seconds and default 60seconds as per hbase guide max : 3minutes you can have. To collect GC on 1 GB on an avg. system takes 8 to 10 seconds. Since, we have HeapMemory configured 8 GB and GC can collect @ around 7 GB we may ended up failure on connection time out.

Pass java arguments : -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -XX:PrintFLSStatistics=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/usr/lib/hbase/logs/logs/gc-$(hostname)-hbase.log

Enable MSLAB Allocation Scheme with default flush values - In hbase-0.92 its enabled by default.

Cluster should be in odd number. Because, zookeeper multi-server suggest to have in odd numbers.

No comments :

// Below script tag for SyntaxHighLighter