G1GC Notes

Notes
References
List of G1-related JVM crashes and issues
Flags
Things to try for Elasticsearch
Things SOLR has tried
Things Cassandra has tried

Notes

G1GC divides the heap into regions. During a GC it determines the amount of live objects in a region, stops all application threads, copies the live objects to a region with large amounts of live objects, then reclaims empty "regions".

It deals better with larger heap sizes (> 6gb) that have issues with unpredictable CMS collection pauses.

NOTE: do NOT use Elasticsearch/Lucene with G1GC for production yet.

References

And some general GC references:

List of G1-related JVM crashes and issues

Really bad Lucene issue:

https://issues.apache.org/jira/browse/LUCENE-6098

SIGSEGV (0xb) at pc=0x00007f8aa32ef13e, pid=24816, tid=140229027428096

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/11791/

Flags

All flags shown with their default values (if applicable)

Enable G1GC

-XX:+UseG1GC

Print details about garbage collection to a log, so it can be debugged

-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps

Log garbage collection stuff to an actual file

-Xloggc:gc.log

Details about how much time was spent stopped/not-stopped

-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime

Set the maximum target time for garbage collection with G1

-XX:MaxGCPauseMillis=200

Set the region size (usually determined automatically)

-XX:G1HeapRegionSize=16mb

Heap occupancy percent

-XX:InitiatingHeapOccupancyPercent=45

Make GC logging verbose

-verbose:gc

Turn on support for NUMA where appropriate. Appropriate means "The NUMA-aware allocator is available on the Solaris™ operating system starting in Solaris 9 12/02 and on the Linux operating system starting in Linux kernel 2.6.19 and glibc 2.6.1."

-XX:+UseNUMA

Things to try for Elasticsearch

Some GC settings I'd like to test out with Elasticsearch:

First, ES' usual JVM arguments:

-Xss256k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError
-XX:+DisableExplicitGC

And some addition options to test:

# use a larger survivor space, default is 1024
-XX:SurvivorRatio=512

# allow 90% of the survivor space to be used, instead of 50%
-XX:TargetSurvivorRatio=90

# Enables a technique for improving the performance of uncontended
# synchronization. An object is "biased" toward the thread which first acquires
# its monitor
-XX:+UseBiasedLocking

# From the docs:
# Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
# gives short-lived objects a little longer period to die in the young gen
-XX:MaxTenuringThreshold=15

To use:

export EXTRA_OPTS="-XX:SurvivorRatio=512 -XX:TargetSurvivorRatio=90 -XX:+UseBiasedLocking -XX:MaxTenuringThreshold=15"

# If you want GC output
export GC_LOG_OPTS="-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps"

export ES_JAVA_OPTS="$EXTRA_OPTS $GC_LOG_OPTS"

./bin/elasticsearch

Things SOLR has tried

Most of these come from https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

I can't vouch for their accuracy, but I thought I would copy them here in case I wanted to try them myself.

-XX:+UseG1GC
-XX:+ParallelRefProcEnabled
-XX:G1HeapRegionSize=8m
-XX:MaxGCPauseMillis=200
-XX:+UseLargePages
-XX:+AggressiveOpts

-XX:+UseG1GC
-XX:+PerfDisableSharedMem
-XX:+ParallelRefProcEnabled
-XX:G1HeapRegionSize=8m
-XX:MaxGCPauseMillis=250
-XX:InitiatingHeapOccupancyPercent=75
-XX:+UseLargePages
-XX:+AggressiveOpts

While I don't think these would all be valid for Elasticsearch, it may help with determining what to try.

Things Cassandra has tried

I stole most of these from https://issues.apache.org/jira/browse/CASSANDRA-7486

Along with some numbers: https://gist.github.com/tobert/ea9328e4873441c7fc34

-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=500
-XX:+AlwaysPreTouch
-XX:+UseTLAB -XX:+ResizeTLAB
-XX:InitiatingHeapOccupancyPercent=70

The AlwaysPreTouch setting is interesting, assuming the same conclusion from https://tobert.github.io/tldr/cassandra-java-huge-pages.html holds for Elasticsearch. It should be noted that there is a bug where AlwaysPreTouch does not take effect with G1 in 1.8.0u40.