RegionServer was failed to respond hbase Master. Basically, in regionServer the zookeeper failed to respond due to GC happens and java stop-the-world.
Read the below blog which explains very very clever how to get rid of GC failure.
http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/
Below is some configuration changes we did on our side to avoid the issue as a system load test.
(Hbase configuration) Hbase zookeeper session timeout increased to 90 seconds from 40seconds and default 60seconds as per hbase guide max : 3minutes you can have. To collect GC on 1 GB on an avg. system takes 8 to 10 seconds. Since, we have HeapMemory configured 8 GB and GC can collect @ around 7 GB we may ended up failure on connection time out.
Pass java arguments : -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -XX:PrintFLSStatistics=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/usr/lib/hbase/logs/logs/gc-$(hostname)-hbase.log
Enable MSLAB Allocation Scheme with default flush values - In hbase-0.92 its enabled by default.
Cluster should be in odd number. Because, zookeeper multi-server suggest to have in odd numbers.
Read the below blog which explains very very clever how to get rid of GC failure.
http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/
Below is some configuration changes we did on our side to avoid the issue as a system load test.
(Hbase configuration) Hbase zookeeper session timeout increased to 90 seconds from 40seconds and default 60seconds as per hbase guide max : 3minutes you can have. To collect GC on 1 GB on an avg. system takes 8 to 10 seconds. Since, we have HeapMemory configured 8 GB and GC can collect @ around 7 GB we may ended up failure on connection time out.
Pass java arguments : -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -XX:PrintFLSStatistics=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/usr/lib/hbase/logs/logs/gc-$(hostname)-hbase.log
Enable MSLAB Allocation Scheme with default flush values - In hbase-0.92 its enabled by default.
Cluster should be in odd number. Because, zookeeper multi-server suggest to have in odd numbers.
No comments :
Post a Comment