我在一台服务器上检测到 jvm safepoint.log 中存在长时间的安全点(>10秒!):
6534.953: no vm operation [ 353 0 4 ] [ 0 0 14179 0 0 ] 0
7241.410: RevokeBias [ 357 0 1 ] [ 0 0 14621 0 0 ] 0
8501.278: BulkRevokeBias [ 356 0 6 ] [ 0 0 13440 0 2 ] 0
9667.681: no vm operation [ 349 0 8 ] [ 0 0 15236 0 0 ] 0
12094.170: G1IncCollectionPause [ 350 0 4 ] [ 0 0 15144 1 24 ] 0
13383.412: no vm operation [ 348 0 4 ] [ 0 0 15783 0 0 ] 0
13444.109: RevokeBias [ 349 0 2 ] [ 0 0 16084 0 0 ] 0
在我的笔记本电脑上,我使用了-XX:SafepointTimeoutDelay=2进行尝试,它运作良好,可以打印出坏线程:
# SafepointSynchronize::begin: Timeout detected:
...
# SafepointSynchronize::begin: (End of list)
<writer thread='11267'/>
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
567.766: BulkRevokeBias [ 78 1 2 ] [ 0 6 6 0 0 ] 0
所以我已经向服务器添加了两个选项:-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1000,以查看是哪些线程导致了问题,但我没有看到任何输出,同时仍然看到长时间的安全点时间。
为什么它没有应用到服务器上?
这是实际的服务器配置(从safepoint.log中获取):
Java HotSpot(TM) 64-Bit Server VM (25.202-b08) for linux-amd64 JRE (1.8.0_202-b08), built on Dec 15 2018 12:40:22 by "java_re" with gcc 7.3.0
...
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=10
-XX:+UnlockDiagnosticVMOptions
-XX:+LogVMOutput
-XX:LogFile=/opt/pprb/card-pro/pci-pprb-eip57/logs/safepoint.log
-XX:+SafepointTimeout
-XX:SafepointTimeoutDelay=1000
...
13383.412: no vm operation [ 348 0 4 ] [ 0 0 15783 0 0 ] 0
,但没有任何带有“检测到超时”的行被打印。稍后我会分享完整的日志。 - Sergey Kosarev