SPARK / HDFS – G1 GC

Posted: June 1, 2017 in GC, HDFS, SPARK, Uncategorized
Tags: , , ,

We have recently faced lot of issues with our spark based APP running in Docker SWARM due to heavy Minor/Major GC calls (STW) and following configuration helped to minimize it. Configuration is specific to application and it can’t be reused as is but it can be used as basis to start & try different values to arrive at good numbers for your app. We have tried atleast 10 to 15 different combinations before we arrive at below entries. In our case we use SPARK to run in standalone mode with 4 workers and 4 cores per worker, 16GB driver memory, 16 GB executor memory + max 16 cores & parallelism of 16.

–conf “spark.executor.extraJavaOptions= -XX:ParallelGCThreads=12 -XX:ConcGCThreads=12 -XX:SurvivorRatio=6 -XX:MaxTenuringThreshold=7 -XX:+UseG1GC -XX:MaxGCPauseMillis=15 -XX:InitiatingHeapOccupancyPercent=85 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution”

Use SparkUI for your application which runs on default port 4040 -> executor tab -> “Task Time (GC Time)” column / logs column – stdout link for GC usage / log entries. Above configuration got entries that enables detailed GC related log entries.

Btw, the above entries can also be given as part of HDFS configuration for HADOOP_JOBTRACKER_OPTS, SHARED_HADOOP_NAMENODE_OPTS & HADOOP_DATANODE_OPTS options so that even HDFS can start using G1 GC instead of default -XX:+UseConcMarkSweepGC.

HDFS level config entries that helped us are => -server -XX:ParallelGCThreads=8 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=50

Changes can be done using Ambari (in case if your cluster been setup using Ambari)  HDFS -> Configs -> Advanced -> Advanced hadoop-env section -> hadoop-env template value.

Reference Links – 

https://spark.apache.org/docs/2.1.1/spark-standalone.html

http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html

https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

https://blog.codecentric.de/en/2012/08/useful-jvm-flags-part-5-young-generation-garbage-collection/

http://stackoverflow.com/questions/16794783/how-to-read-a-verbosegc-output

Summary Notes:

  1. GC – Mark reachable references in object graph, Sweep/Delete, Compacting => Minor/Major GC (Stop the world = stops apps for few sec)
  2. Performance => responsiveness / latency => How quick the app is accessible, throughput => app’s output w.r.t amount of work it does
  3. Types => 
    1. Serial (single gc thread handles GC), 
    2. Concurrent Mark Sweep (CMS) ( ConcGCThreads=n, gc thread runs along with app side by side) => When the old generation reaches a certain occupancy rate, the CMS is kicked off
      1. 5 collection phases => 
        1. Initial mark for reachable objs(does STW – low pause time)
        2. Concurrent marking as live objs(while app runs) 
        3. Remark the updated objects by the running app (STW)
        4. Concurrent sweep (deallocate space for dead objects and not move live objects i.e. no compaction been done)
        5. Resetting (prepare for next concurrent collection) 
    3. Parallel (ParallelGCThreads=n, uses parallel cores of CPU once heap hits 90%) GC collectors
  4. When to use => CMS => when we have more memory/no.of CPUs + app demands only short pauses … called as low latency collector most of web/financial apps. 
    1. Parallel collector => when we have less memory, less no.of CPUs & app demands high throughput. 
  5. G1 GC => Predictable/tunable GC pauses, low pauses, parallelism & concurrency together & better heap utlization
    1. Heap region => size decides based on amount of the Heap size and JVM plans around 2000 regions ranging from 1 MB to 32 MB size
      1. 10% – default reserved value for safety to avoid promotion failures
    2. Tenuring Threshold is used by JVM to decide when an object can be promoted from young generations to Old generation (MaxTenuringThreshold=n, default 15, -XX:+PrintTenuringDistribution – prints age distribution)
      1. Live objects are evacuated (i.e., copied or moved) to one or more survivor regions. If the aging threshold is met, some of the objects are promoted to old generation region
    3. 6 collection phase for old generation => 
      1. Initial Mark survivor regions or root regions (STW)
      2. Root Region Scanning => Scan survivor regions for old gen references  – need to complete before young GC can occur
      3. Concurrent marking (find live objects across heap) – parallel to application but it can be interrupted by young GC
      4. Remark (STW) – complete the marking of live objects 
      5. Cleanup (STW) – empty regions are removed and region availability recalculated
      6. Copying (STW) – copy to new unused regions. This can be done with young generation regions which are logged as [GC pause (young)]. Or both young and old generation regions which are logged as [GC Pause (mixed)]
    4. do not explicitly set young generation size –Xmn  which will impact G1 GC collector’s pause time goal and with explicit value G1 GC can’t auto adjust the size as needed => i.e. 
      1. Evacuation Failure => when JVM runs out of heap regions during GC for either survivor or promoted objects 
    5. Tuning => 
      1. -XX:NewSize and -XX:MaxNewSize => set a lower and upper bound for the size of the young generation. Young generation can’t be bigger than old since at some point if all young gen obj may beed to be moved to old gen.
      2. -XX:NewRatio => allows us to specify the factor by which the old generation should be larger than the young generation. For example, with -XX:NewRatio=3 the old generation will be three times as large as the young generation
      3. -XX:MaxGCPauseMillis=200 => Sets a target for the maximum GC pause time. This is a soft goal, and the JVM will make its best effort to achieve it. Therefore, the pause time goal will sometimes not be met. The default value is 200 milliseconds
      4. -XX:InitiatingHeapOccupancyPercent=45 => Percentage of the (entire) heap occupancy to start a concurrent GC cycle.
      5. -XX:SurvivorRatio specifies how large “Eden” should be sized relative to one of the two survivor spaces. For example, with -XX:SurvivorRatio=10 we dimension “Eden” ten times as large as “To” (and at the same time ten times as large as “From”). As a result, “Eden” occupies 10/12 of the young generation while “To” and “From” each occupy 1/12. Note that the two survivor spaces are always equal in size.
      6. -XX:InitialTenuringThreshold and -XX:MaxTenuringThreshold, -XX:+PrintTenuringDistribution, -XX:TargetSurvivorRatio =>initial and maximum value of the tenuring threshold, respectively + specify the target utilization (in percent) of “To” at the end of a young generation GC
      7. -XX:+NeverTenure and -XX:+AlwaysTenure => objects are never promoted to the old generation i.e. old generation not needed / no survivor spaces are used so that all young objects are immediately promoted to the old generation on their first GC
      8. Xms / Xmx=> minimum / maximum heap allocated to the program (like spark)
    6. logging => -Xloggc:gc.log (Use eclipse plug-in for visual graphs, Visual VM (jvisualvm), Visual GC plug-in), jhat (java heap analyzer tool), teraquota big memory tool

Leave a comment