Java 1.8安全点超时

Question

Java 1.8安全点超时

javalinuxgarbage-collectioncentos

4

我似乎遇到了一个问题，JVM在几个小时后无限期地试图进入安全点。但是，如果我使用-F选项进行jstack，则似乎可以解除等待并继续执行。

jdk1.8.0_45 / bin / jstack -F 39924> a.out

我正在Centos上使用jdk1.8.0_45

我的问题是：

i）当从jstack发送中断时，JVM似乎可以无限期地从安全点等待中出来。为什么不用jstack就不能出来呢？是否有一些JVM选项可以使用以避免无限期等待。

ii）我是否可以获得更明确的线程转储以查找引起问题的线程。安全点日志的输出不太精确。

我正在使用的选项是：

-server
-XX:+AggressiveOpts
-XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions
-XX:G1MixedGCLiveThresholdPercent=85
-XX:InitiatingHeapOccupancyPercent=30
-XX:G1HeapWastePercent=5 
-XX:MaxGCPauseMillis=1000
-XX:G1HeapRegionSize=4M
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+UnlockExperimentalVMOptions
-XX:G1LogLevel=finest
-Xmx6000m
-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=999
-XX:+SafepointTimeout
-XX:+UnlockDiagnosticVMOptions
-XX:SafepointTimeoutDelay=20000
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1

安全点日志

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.115: G1IncCollectionPause             [     170          0              0    ]      [     0     0     0     0     8    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.125: RevokeBias                       [     170          1              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.127: RevokeBias                       [     170          1              1    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.131: RevokeBias                       [     170          1              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17771.955: RevokeBias                       [     169          0              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17772.160: BulkRevokeBias                   [     171          0              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17772.352: RevokeBias                       [     170          1              3    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17773.596: RevokeBias                       [     169          0              1    ]      [     0     0     0     0     0    ]  0

 # SafepointSynchronize::begin: Timeout detected:
 # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
 # SafepointSynchronize::begin: Threads which did not reach the safepoint:
 # "Thread-14" #115 prio=5 os_prio=0 tid=0x00007f20c8029000 nid=0x9cd0 runnable [0x0000000000000000]    java.lang.Thread.State: RUNNABLE
 # SafepointSynchronize::begin: (End of list)

之后，我从安全点日志中看到以下内容。

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
17779.826: G1IncCollectionPause             [     169          1              1    ]      [3315603     03315603     0     8    ]  1

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias                       [     169          2             13    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias                       [     169          1              2    ]      [     0     0     0     0     0    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.441: RevokeBias                       [     184          3              4    ]      [     0     0     3     0     1    ]  0

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
21095.447: RevokeBias                       [     190          0              2    ]      [     0     0     4     0     2    ]  0

- Dan John

你有一些可以重现问题的示例代码吗？还有，呃...一个安全点到底是什么？一个总结性的链接会很有帮助 - 或者一个简短的解释。 - fge

此外，为什么要使用这么多JVM选项？这只是为了玩耍还是为了尝试解决实际问题？如果是的话，是什么问题？ - fge

这里简要概括一下什么是安全点：http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html。其中一些选项是用于垃圾收集器，另一些则是用于调试目的。代码过于复杂无法贴出，但显然JVM中存在某种内在机制，其中中断会以某种方式导致JVM正常运行。 - Dan John

当您运行jstack -F时，Thread-14显示哪个堆栈？这可能会有所帮助。我对这个问题的回答：https://dev59.com/nF0a5IYBdhLWcg3wYnx2以及我在那里链接的回答可能会给您一些额外的指针。 - K Erlandsson

你的JVM选项看起来像是被复制粘贴在一起，却没有理解它们的含义。 - the8472

jstack -F命令对于该线程没有打印任何内容（为空）。看起来jstack无法获取特定线程的堆栈转储。 - Dan John

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- the8472 · Accepted Answer

由于您能通过中断虚拟机来解决问题，而且您正在使用CentOS，这个问题让我想起了这个内核bug。

该线程列出了以下受影响的版本（假设是标准内核）：

RHEL 6（和CentOS 6、SL 6）：6.0-6.5正常，6.6存在问题，6.6.z版本正常。

RHEL 7（和CentOS 7、SL 7）：7.1存在问题。截至昨天，似乎还没有7.x版本的修复程序。

RHEL 5（和CentOS 5、SL 5）：所有版本都正常（包括5.11）。