In his presentation, Avoiding
Full GCs with MemStore-Local Allocation Buffers, Todd Lipcon
describes two cases of stop-the-world garbage collections common in
HBase, especially during loading; CMS failure modes and old generation
heap fragmentation brought. To address the first, start the CMS
earlier than default by adding
-XX:CMSInitiatingOccupancyFraction
and setting it down
from defaults. Start at 60 or 70 percent (The lower you bring down the
threshold, the more GCing is done, the more CPU used). To address the
second fragmentation issue, Todd added an experimental facility,
, that
must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be on in
Apache 0.92.x HBase). See hbase.hregion.memstore.mslab.enabled
to true in your Configuration
. See the cited
slides for background and detail[1].
Be aware that when enabled, each MemStore instance will occupy at least
an MSLAB instance of memory. If you have thousands of regions or lots
of regions each with many column families, this allocation of MSLAB
may be responsible for a good portion of your heap allocation and in
an extreme case cause you to OOME. Disable MSLAB in this case, or
lower the amount of memory it uses or float less regions per server.
If you have a write-heavy workload, check out
HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB.
It describes configurations to lower the amount of young GC during write-heavy loadings. If you do not have HBASE-8163 installed, and you are
trying to improve your young GC times, one trick to consider -- courtesy of our Liang Xie -- is to set the GC config -XX:PretenureSizeThreshold
in hbase-env.sh
to be just smaller than the size of hbase.hregion.memstore.mslab.chunksize
so MSLAB allocations happen in the
tenured space directly rather than first in the young gen. You'd do this because these MSLAB allocations are going to likely make it
to the old gen anyways and rather than pay the price of a copies between s0 and s1 in eden space followed by the copy up from
young to old gen after the MSLABs have achieved sufficient tenure, save a bit of YGC churn and allocate in the old gen directly.
For more information about GC logs, see ???.
[1] The latest jvms do better regards fragmentation so make sure you are running a recent release. Read down in the message, Identifying concurrent mode failures caused by fragmentation.