Friday, September 27, 2013

HBASE - RegionServer - Hbase Master failed to reach RegionServer

RegionServer was failed to respond hbase Master. Basically, in regionServer the zookeeper failed to respond due to GC happens and java stop-the-world.

Read the below blog which explains very very clever how to get rid of GC failure.

Below is some configuration changes we did on our side to avoid the issue as a system load test.

(Hbase configuration) Hbase zookeeper session timeout increased to 90 seconds from 40seconds and default 60seconds as per hbase guide max : 3minutes you can have. To collect GC on 1 GB on an avg. system takes 8 to 10 seconds. Since, we have HeapMemory configured 8 GB and GC can collect @ around 7 GB we may ended up failure on connection time out.

Pass java arguments : -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -XX:PrintFLSStatistics=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/usr/lib/hbase/logs/logs/gc-$(hostname)-hbase.log

Enable MSLAB Allocation Scheme with default flush values - In hbase-0.92 its enabled by default.

Cluster should be in odd number. Because, zookeeper multi-server suggest to have in odd numbers.

