如何解释Java G1 GC暂停时间的原因

3

使用G1收集器时,我们遇到了似乎是长时间的“stop the world”暂停的困扰。我已经阅读了Oracle文档,但仍然难以确定是什么导致了这些长时间暂停,以及该如何解决问题。(下面是GC日志)

我们的实例正在受监控,并且我有以下图表中包含的信息:

12秒的Stop the World暂停 堆分配

我们还有另一个监控工具来ping JVM,它报告JVM在大约相同的时间内无响应了12秒。

那么这就引出了关于怎么做的问题。服务器上的负载非常低,所以这种情况并不经常发生,但似乎在几个小时内堆会持续增长,然后会发生巨大的GC事件,可能会导致严重问题。下面是我们用于GC的配置:

   wrapper.java.additional.40=-XX:+UseG1GC

   wrapper.java.additional.44=-XX:+ScavengeBeforeFullGC


   wrapper.java.additional.50=-XX:+PrintGCCause
   wrapper.java.additional.51=-XX:+PrintGCDetails
   wrapper.java.additional.52=-XX:+PrintGCTimeStamps
   wrapper.java.additional.53=-XX:+PrintGCApplicationStoppedTime
   wrapper.java.additional.54=-XX:+PrintGCApplicationConcurrentTime
   wrapper.java.additional.55=-verbose:gc
   wrapper.java.additional.56=-Xloggc:../../../logs/gc.log
   wrapper.java.additional.57-XX:+UseGCLogFileRotation
   wrapper.java.additional.58-XX:NumberOfGCLogFiles=10
   wrapper.java.additional.59-XX:GCLogFileSize=100M
   wrapper.java.additional.60=-XX:+PrintHeapAtGC
   wrapper.java.additional.61=-XX:+PrintTenuringDistribution
   wrapper.java.additional.62=-XX:+UseCompressedClassPointers
   wrapper.java.additional.63=-XX:+UseCompressedOops

请问有没有人能给我指点一下方向。谢谢!

GCEasy分析: http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjcvLS1nYyAoMSkubG9nLnppcC0tMTgtNDEtNDA=

更新: 元空间图表

enter image description here

更新: GC日志: https://dl.dropboxusercontent.com/u/3642047/gc.log.zip


我通过gceasy运行了日志以获取更详细的信息,但对我来说并没有使事情更加清晰。看起来也许是我的年轻一代太小了?http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjcvLS1nYyAoMSkubG9nLnppcC0tMTgtNDEtNDA= - Casey Jordan
1
你已经有了日志,为什么不发布它们,而不是降低信息内容的图表呢? - the8472
我会发布它们,但它们相当大。谢谢。 - Casey Jordan
谢谢@duffymo,不过这似乎只适用于CMS收集器,是在调整G1收集器吗? - Casey Jordan
不,不要那么具体。 - duffymo
显示剩余4条评论
1个回答

3

我在你的GC日志中至少看到两个问题:

  1. There were 4 full collections invoked explicitly by System.gc() or Runtime.getRuntme().gc(). Each was around 10 seconds long:

    277042.600: [Full GC (System.gc())  12G->1537M(5126M), 11.4203806 secs]
    

    You may want to add -XX:+ExplicitGCInvokesConcurrent JVM flag to prevent System.gc() from causing stop-the-world events.

    It will be also useful to find who calls System.gc() and probably avoid this call altogether. For this purpose you may modify Runtime.gc method by adding code to print stack trace. Then recompile Runtime.java and prepend the modified class to bootstrap classpath with
    -Xbootclasspath/p:/path/to/yourpatch.jar

  2. Another problem is an extremely long safepoint synchronization pause not related to GC:

    5512447.686: Total time for which application threads were stopped: 16.4426008 seconds, Stopping threads took: 16.4414390 seconds
    

    This is typically caused by MappedByteBuffer I/O or because Java process began swapping to disk. See this and this answers regarding similar problems.


棒极了的观察!如果我们能够追踪到哪些库在调用它,那么修复 System.gc 应该很容易。此外,我们使用了一个大量使用内存映射文件的库,有时需要处理 70G 或更多的数据。除了将所有内容强制缓存到 RAM 中之外,您还有什么建议可以调整让操作系统更好地处理这些数据呢?谢谢! - Casey Jordan
@CaseyJordan 很不幸,你无法做太多事情,只能确保文件在RAM中或者用普通的FileChannel I/O替换映射I/O,或者禁用JVM中的Unsafe intrinsics(后者可能会显著降低吞吐量)。 - apangin

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接