我正在尝试在产品环境中使用zgc,因此我将jdk从jdk8更新为openjdk 15,将tomcat 8更新为tomcat 8.5,并更改了与gc相关的选项,但是几个小时后jvm启动后cpu使用率会达到1000%+(正常cpu使用率应该是100-300%)。有时,在cpu高的情况下,日志文件中会出现许多ICBufferFull安全点:
[2020-11-12T19:00:43.669+0800] Safepoint "ICBufferFull", Time since last: 41374119 ns, Reaching safepoint: 2026134 ns, At safepoint: 85993 ns, Total: 2112127 ns
[2020-11-12T19:00:43.672+0800] Safepoint "ICBufferFull", Time since last: 2521598 ns, Reaching safepoint: 1109894 ns, At safepoint: 57235 ns, Total: 1167129 ns
[2020-11-12T19:00:43.676+0800] Safepoint "ICBufferFull", Time since last: 2892867 ns, Reaching safepoint: 1240834 ns, At safepoint: 59633 ns, Total: 1300467 ns
[2020-11-12T19:00:43.681+0800] Safepoint "ICBufferFull", Time since last: 2870110 ns, Reaching safepoint: 1425837 ns, At safepoint: 54052 ns, Total: 1479889 ns
如果将节点离线约30秒钟,CPU 就会下降,然后将其上线,它将正常工作数小时,直到 CPU 再次变高。
我检查了源代码,ICBufferFull 表示“内联缓存缓冲区已满”,但我找不到增加它的选项,有谁能帮忙吗?谢谢!
GC 选项如下:
export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=4 -Xss2m -XX:+UseZGC -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m -XX:+HeapDumpOnOutOfMemoryError'
编辑1:
我有另一个主机在jdk8下与cms作为参考仍然正常运行,对两个主机的请求几乎相同。
我使用async-profile进行了分析,最热门的方法是java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss
,出现了50%以上,最热门的本地方法是ZMark::try_mark_object(ZMarkCache*, unsigned long, bool)
,只出现了0.41%。我检查了jdk8和openjdk15中与线程本地相关的源代码,似乎没有改变。
编辑2:
我在JIRA上发现了一个类似的bug,我的应用也与lucene有关,从gc日志中可以看出,当弱引用计数为1m+时,使用率很高。所以问题是如何更积极地收集zgc中的弱引用?
编辑3:
从以下带有System.gc()每3秒调用的日志中可以看出,我的应用程序产生了太多的弱引用。但奇怪的是,产生的计数在启动后持续增加。请求从11:00-17:00几乎是恒定的。请注意,在GC(9821)之后,CPU自动从600%下降到400%,enqueued为~250K。GC(10265)节点离线,enqueued为~770K。为什么长时间内的enqueued计数很小,这是否意味着对象没有完全回收?
[2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered, 72334 discovered, 0 enqueued
[2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462 encountered, 122216 discovered, 1380 enqueued
[2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598 encountered, 107228 discovered, 677 enqueued
[2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536 encountered, 82199 discovered, 1713 enqueued
[2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946 encountered, 291651 discovered, 292 enqueued
[2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065 encountered, 124351 discovered, 815 enqueued
[2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932 discovered, 353 enqueued
[2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369 discovered, 4648 enqueued
[2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932 discovered, 353 enqueued
[2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369 discovered, 4648 enqueued
[2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757 encountered, 928748 discovered, 1691 enqueued
[2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080 encountered, 841168 discovered, 247352 enqueued
[2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253 encountered, 568564 discovered, 3938 enqueued
[2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081 encountered, 788825 discovered, 767288 enqueued
[2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876 encountered, 18186 discovered, 1 enqueued
ThreadLocalMap.getEntryAfterMiss
方法是从哪里调用的堆栈跟踪?当这个方法变得热门时,通常意味着有太多的ThreadLocals和线程。 - undefined