jdk11的G1垃圾收集器在准备TLABs方面花费了大量时间。

5

gc.log显示,准备TLABs阶段花费了约57秒的时间,这是不可接受的。而且,在五天中只发生过一次这种情况。我想找出究竟发生了什么以及如何避免。

[gc.log]

[2021-08-02T11:38:38.134+0800][322490.377s][161325][safepoint     ] Entering safepoint region: G1CollectForAllocation
[2021-08-02T11:38:38.134+0800][322490.378s][161325][gc,start      ] GC(238) Pause Young (Normal) (G1 Evacuation Pause)
[2021-08-02T11:38:38.134+0800][322490.378s][161325][gc,task       ] GC(238) Using 18 workers of 18 for evacuation
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)   Pre Evacuate Collection Set: 0.1ms
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Prepare TLABs: 57039.1ms
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Choose Collection Set: 0.0ms
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Humongous Register: 0.1ms
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)   Evacuate Collection Set: 3.8ms
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Ext Root Scanning (ms):   Min:  0.0, Avg:  0.4, Max:  3.5, Diff:  3.5, Sum:  7.2, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Update RS (ms):           Min:  0.0, Avg:  0.5, Max:  0.9, Diff:  0.9, Sum:  8.7, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)       Processed Buffers:        Min: 0, Avg:  2.9, Max: 16, Diff: 16, Sum: 53, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)       Scanned Cards:            Min: 0, Avg: 265.7, Max: 623, Diff: 623, Sum: 4783, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)       Skipped Cards:            Min: 0, Avg: 20.4, Max: 32, Diff: 32, Sum: 368, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Scan RS (ms):             Min:  0.0, Avg:  0.2, Max:  0.3, Diff:  0.3, Sum:  3.2, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)       Scanned Cards:            Min: 0, Avg: 27.3, Max: 120, Diff: 120, Sum: 491, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)       Claimed Cards:            Min: 0, Avg: 29.9, Max: 129, Diff: 129, Sum: 538, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)       Skipped Cards:            Min: 0, Avg: 187.6, Max: 347, Diff: 347, Sum: 3377, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Code Root Scanning (ms):  Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.1, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     AOT Root Scanning (ms):   skipped
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Object Copy (ms):         Min:  0.0, Avg:  2.0, Max:  2.6, Diff:  2.6, Sum: 35.7, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.422s][161325][gc,phases     ] GC(238)     Termination (ms):         Min:  0.0, Avg:  0.4, Max:  0.6, Diff:  0.6, Sum:  7.4, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)       Termination Attempts:     Min: 1, Avg:  4.8, Max: 9, Diff: 8, Sum: 86, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     GC Worker Other (ms):     Min:  0.0, Avg:  0.1, Max:  0.3, Diff:  0.2, Sum:  2.2, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     GC Worker Total (ms):     Min:  3.4, Avg:  3.6, Max:  3.7, Diff:  0.3, Sum: 64.7, Workers: 18
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)   Post Evacuate Collection Set: 1.0ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Code Roots Fixup: 0.0ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Clear Card Table: 0.2ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Reference Processing: 0.1ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Weak Processing: 0.1ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Merge Per-Thread State: 0.1ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Code Roots Purge: 0.0ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Redirty Cards: 0.1ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     DerivedPointerTable Update: 0.0ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Free Collection Set: 0.3ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Humongous Reclaim: 0.1ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Start New Collection Set: 0.0ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Resize TLABs: 0.0ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)     Expand Heap After Collection: 0.0ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,phases     ] GC(238)   Other: 57039.6ms
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,heap       ] GC(238) Eden regions: 865->0(867)
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,heap       ] GC(238) Survivor regions: 27->25(112)
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,heap       ] GC(238) Old regions: 20->20
[2021-08-02T11:39:35.179+0800][322547.423s][161325][gc,heap       ] GC(238) Humongous regions: 5->5
[2021-08-02T11:39:35.180+0800][322547.423s][161325][gc,metaspace  ] GC(238) Metaspace: 45664K->45664K(1091584K)
[2021-08-02T11:39:35.180+0800][322547.423s][161325][gc            ] GC(238) Pause Young (Normal) (G1 Evacuation Pause) 916M->48M(1488M) 57045.332ms
[2021-08-02T11:39:35.180+0800][322547.423s][161325][gc,cpu        ] GC(238) User=0.00s Sys=0.09s Real=57.05s
[2021-08-02T11:39:35.180+0800][322547.423s][161325][safepoint     ] Leaving safepoint region
[2021-08-02T11:39:35.180+0800][322547.423s][161325][safepoint     ] Total time for which application threads were stopped: 57.0457609 seconds, Stopping threads took: 0.0000407 seconds

1
很不幸,似乎存在一个未公开的相关错误。我找到了这个间接引用,提到了这个错误(ensure_parsability是在“准备TLABs”阶段执行的)。 - Holger
@Holger 感谢您的回复。根据参考资料,缓慢的GC暂停与G1UpdateBufferSize(默认为256)的小值高度相关。但是我已经使用了默认值。这种情况下,我是否应该设置更大的G1UpdateBufferSize值?是否有更好的方法来避免这种情况? - JasonYu
2
我非常怀疑你会在这里得到答案。我的建议是设置 Xlog:gc+tlab=trace 并前往 GC 开发邮件列表。 - Eugene
1
如果问题只发生了一次,我认为这与TLABs没有任何关系。很可能只是JVM进程被操作系统暂停的一个随机时刻。例如,冻结的常见原因是当操作系统耗尽可用物理内存时。 - apangin
@apangin,这很有道理,尽管彻底避免听起来很难。任何来自Linux的系统日志都可以帮助我们确认吗?另一方面,严格限制进程的内存使用会有所帮助吗? - JasonYu
1个回答

1
就我而言,仅凭gc日志无法找到根本原因。
一些建议:
1. 添加-Xlog:gc+tlab=debug以显示与TLAB相关的日志。 2. 注意以“TLAB totals: thrds:”开头的日志,参考TLAB源代码:threadLocalAllocBuffer.cpp 3. 与gc日志进行比较,分析可能存在的问题。 4. 如果日志显示“TLAB totals: thrds:”和“Pre Evacuate Collection Set”之间有很大的延迟,则应深入研究“TLAB totals: thrds:”日志并找出原因(线程数过多、新创建的线程并分配了一些新对象等)。 5. 如果日志显示“TLAB totals: thrds:”和“Pre Evacuate Collection Set”之间几乎没有延迟,则应将日志级别切换为-Xlog:gc*=debug以查找更多信息。
一些怀疑:
  1. 可能有很多新创建的线程,每个线程都至少创建一个对象。这可能会导致TLAB准备缓慢,因为会有很多TLAB来填充虚拟对象。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接