G1GC Notes
Table of Contents
Notes
G1GC divides the heap into regions. During a GC it determines the amount of live objects in a region, stops all application threads, copies the live objects to a region with large amounts of live objects, then reclaims empty "regions".
It deals better with larger heap sizes (> 6gb) that have issues with unpredictable CMS collection pauses.
NOTE: do NOT use Elasticsearch/Lucene with G1GC for production yet.
References
List of G1-related JVM crashes and issues
Really bad Lucene issue:
SIGSEGV (0xb) at pc=0x00007f8aa32ef13e, pid=24816, tid=140229027428096
Flags
All flags shown with their default values (if applicable)
Enable G1GC
-XX:+UseG1GC
Print details about garbage collection to a log, so it can be debugged
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
Log garbage collection stuff to an actual file
-Xloggc:gc.log
Details about how much time was spent stopped/not-stopped
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime
Set the maximum target time for garbage collection with G1
-XX:MaxGCPauseMillis=200
Set the region size (usually determined automatically)
-XX:G1HeapRegionSize=16mb
Heap occupancy percent
-XX:InitiatingHeapOccupancyPercent=45
Make GC logging verbose
-verbose:gc
Turn on support for NUMA where appropriate. Appropriate means "The NUMA-aware allocator is available on the Solaris™ operating system starting in Solaris 9 12/02 and on the Linux operating system starting in Linux kernel 2.6.19 and glibc 2.6.1."
-XX:+UseNUMA
Things to try for Elasticsearch
Some GC settings I'd like to test out with Elasticsearch:
First, ES' usual JVM arguments:
-Xss256k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC
And some addition options to test:
# use a larger survivor space, default is 1024 -XX:SurvivorRatio=512 # allow 90% of the survivor space to be used, instead of 50% -XX:TargetSurvivorRatio=90 # Enables a technique for improving the performance of uncontended # synchronization. An object is "biased" toward the thread which first acquires # its monitor -XX:+UseBiasedLocking # From the docs: # Note: when used with -XX:+UseBiasedLocking, this setting should be 15. # gives short-lived objects a little longer period to die in the young gen -XX:MaxTenuringThreshold=15
To use:
export EXTRA_OPTS="-XX:SurvivorRatio=512 -XX:TargetSurvivorRatio=90 -XX:+UseBiasedLocking -XX:MaxTenuringThreshold=15" # If you want GC output export GC_LOG_OPTS="-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps" export ES_JAVA_OPTS="$EXTRA_OPTS $GC_LOG_OPTS" ./bin/elasticsearch
Things SOLR has tried
Most of these come from https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
I can't vouch for their accuracy, but I thought I would copy them here in case I wanted to try them myself.
-XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages -XX:+AggressiveOpts
-XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=250 -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages -XX:+AggressiveOpts
While I don't think these would all be valid for Elasticsearch, it may help with determining what to try.
Things Cassandra has tried
I stole most of these from https://issues.apache.org/jira/browse/CASSANDRA-7486
Along with some numbers: https://gist.github.com/tobert/ea9328e4873441c7fc34
-XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:+AlwaysPreTouch -XX:+UseTLAB -XX:+ResizeTLAB -XX:InitiatingHeapOccupancyPercent=70
The AlwaysPreTouch setting is interesting, assuming the same conclusion from
https://tobert.github.io/tldr/cassandra-java-huge-pages.html holds for
Elasticsearch. It should be noted that there is a bug where AlwaysPreTouch
does not take effect with G1 in 1.8.0u40.