G1 GC处理引用速度太慢。

3

我们正在使用G1 GC和18GB堆。年轻代大小约为3.5G。堆的最大使用量约为12G。内存充满了短暂存在的对象。

此外,也许重要的是实例正在靠近JVM的同一节点上运行。当它在硬盘上持久化更改时,它经常占用所有IOPS,但有足够的空闲CPU时间和内存。

JVM选项已启用:

-Xmx18g -Xms18g -XX:MaxPermSize=512M -XX:+UseG1GC -XX:+DisableExplicitGC -XX:+ParallelRefProcEnabled

我需要翻译的内容是关于IT技术方面的,具体来说是有关年轻一代GC暂停的问题。长时间暂停GC的日志通常如下所示:

2013-07-10T15:06:25.963+0400: 9122,066: [GC pause (young)
Desired survivor size 243269632 bytes, new threshold 5 (max 15)
        - age   1:   69789280 bytes,   69789280 total
- age   2:   58618240 bytes,  128407520 total
- age   3:   54519720 bytes,  182927240 total
- age   4:   51592728 bytes,  234519968 total
- age   5:   45186520 bytes,  279706488 total
9122,066: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 174,16 ms, remaining time: 25,84 ms, target pause time: 200,00 ms]
        9122,066: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 426 regions, survivors: 34 regions, predicted young region time: 164,97 ms]
        9122,066: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 426 regions, survivors: 34 regions, old: 0 regions, predicted pause time: 339,13 ms, target pause time: 200,00 ms]
        9122,259: [SoftReference, 0 refs, 0,0063780 secs]9124,575: [WeakReference, 4 refs, 0,0031600 secs]9124,578: [FinalReference, 1640 refs, 0,0033730 secs]9124,581: [PhantomReference, 145 refs, 0,0032080 secs]9124,5
        85: [JNI Weak Reference, 0,0000810 secs], 2,53669600 secs]
        [Parallel Time: 190,5 ms]
        [GC Worker Start (ms):  9122066,6  9122066,7  9122066,7  9122066,7  9122066,8  9122066,9  9122066,9  9122066,9  9122066,9  9122067,0  9122067,0  9122067,1  9122067,1  9122067,1  9122067,1  9122067,2  91220
        67,2  9122067,3
Avg: 9122067,0, Min: 9122066,6, Max: 9122067,3, Diff:   0,7]
        [Ext Root Scanning (ms):  4,7  6,0  4,8  4,5  4,2  4,3  4,2  4,3  4,6  3,4  13,5  5,2  4,2  5,6  4,2  4,1  4,3  4,0
Avg:   5,0, Min:   3,4, Max:  13,5, Diff:  10,1]
        [Update RS (ms):  20,9  19,6  21,1  21,3  21,2  21,2  21,3  21,2  21,7  21,5  12,1  20,2  21,1  19,4  21,0  21,1  20,7  21,2
Avg:  20,4, Min:  12,1, Max:  21,7, Diff:   9,6]
        [Processed Buffers : 27 23 25 29 31 22 25 34 28 14 36 23 24 22 28 24 25 24
Sum: 464, Avg: 25, Min: 14, Max: 36, Diff: 22]
        [Scan RS (ms):  9,0  9,2  8,7  8,8  9,1  9,1  8,9  9,1  8,3  9,2  9,0  9,1  9,2  9,2  9,1  9,0  9,0  9,1
Avg:   9,0, Min:   8,3, Max:   9,2, Diff:   1,0]
        [Object Copy (ms):  145,1  145,0  145,2  145,1  145,1  144,9  145,1  144,9  144,9  145,4  144,8  144,8  144,8  145,0  145,0  145,1  145,2  144,9
Avg: 145,0, Min: 144,8, Max: 145,4, Diff:   0,6]
        [Termination (ms):  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0
Avg:   0,0, Min:   0,0, Max:   0,0, Diff:   0,0]
        [Termination Attempts : 5 8 2 11 5 6 6 5 5 7 4 7 2 9 8 5 7 8
Sum: 110, Avg: 6, Min: 2, Max: 11, Diff: 9]
        [GC Worker End (ms):  9122246,4  9122246,6  9122246,7  9122246,6  9122246,7  9122246,7  9122246,5  9122246,7  9122246,5  9122246,5  9122246,6  9122246,7  9122246,8  9122246,4  9122246,6  9122246,5  9122246
        ,7  9122246,8
Avg: 9122246,6, Min: 9122246,4, Max: 9122246,8, Diff:   0,3]
        [GC Worker (ms):  179,8  179,9  180,0  179,8  179,9  179,9  179,6  179,8  179,6  179,5  179,6  179,6  179,7  179,3  179,5  179,4  179,4  179,5
Avg: 179,7, Min: 179,3, Max: 180,0, Diff:   0,7]
        [GC Worker Other (ms):  10,7  10,7  10,8  10,8  10,9  10,9  11,0  11,0  11,0  11,1  11,1  11,1  11,2  11,2  11,2  11,2  11,3  11,3
Avg:  11,0, Min:  10,7, Max:  11,3, Diff:   0,6]
        [Clear CT:   2,8 ms]
        [Other: 2343,4 ms]
        [Choose CSet:   0,1 ms]
        [Ref Proc: 2327,7 ms]
        [Ref Enq:   1,9 ms]
        [Free CSet:   8,2 ms]
        [Eden: 3408M(3408M)->0B(3400M) Survivors: 272M->280M Heap: 9998M(18432M)->6638M(18432M)]
        [Times: user=3,26 sys=0,02, real=2,54 secs]
Total time for which application threads were stopped: 2,5434370 seconds

唯一会出现问题的 GC 阶段是 '引用处理'。但是日志看起来很奇怪:软引用、弱引用、终结器引用和 JNI 引用处理所花费的时间很少。但总体花费的时间为 2.5 秒。在最坏的情况下,甚至可能高达 10 秒。
另一个(更加舒适的)暂停可以如下所示:
2013-07-10T16:26:11.862+0400: 13907,965: [GC pause (young)
    Desired survivor size 243269632 bytes, new threshold 4 (max 15)
    - age   1:   69125832 bytes,   69125832 total
    - age   2:   58756480 bytes,  127882312 total
    - age   3:   52397376 bytes,  180279688 total
    - age   4:   88850424 bytes,  269130112 total
    13907,965: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 77,38 ms, remaining time: 122,62 ms, target pause time: 200,00 ms]
    13907,965: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 427 regions, survivors: 33 regions, predicted young region time: 167,95 ms]
    13907,965: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 427 regions, survivors: 33 regions, old: 0 regions, predicted pause time: 245,33 ms, target pause time: 200,00 ms]
    13908,155: [SoftReference, 0 refs, 0,0041340 secs]13908,160: [WeakReference, 0 refs, 0,0023850 secs]13908,162: [FinalReference, 1393 refs, 0,0065970 secs]13908,169: [PhantomReference, 108 refs, 0,0018650 secs]13908,171: [JNI Weak Reference, 0,0000630 secs], 0,22008100 secs]
    [Parallel Time: 188,4 ms]
    [GC Worker Start (ms):  13907965,3  13907965,3  13907965,4  13907965,4  13907965,5  13907965,5  13907965,6  13907965,6  13907965,6  13907965,7  13907965,7  13907965,7  13907965,8  13907965,8  13907965,8  13907965,9  13907965,9  13907965,9
    Avg: 13907965,6, Min: 13907965,3, Max: 13907965,9, Diff:   0,6]
    [Ext Root Scanning (ms):  5,8  5,0  6,8  6,3  6,1  6,2  6,0  6,3  5,2  4,2  5,0  6,2  4,5  6,0  17,1  4,4  6,2  5,3
    Avg:   6,3, Min:   4,2, Max:  17,1, Diff:  12,9]
    [Update RS (ms):  24,8  26,0  23,9  24,1  24,1  24,1  24,2  23,9  25,0  25,2  25,1  24,1  26,0  24,3  13,7  25,7  24,2  24,7
    Avg:  24,1, Min:  13,7, Max:  26,0, Diff:  12,2]
    [Processed Buffers : 30 20 9 16 16 19 20 21 22 12 30 17 17 20 12 20 17 22
    Sum: 340, Avg: 18, Min: 9, Max: 30, Diff: 21]
    [Scan RS (ms):  7,5  7,1  7,2  7,5  7,6  7,5  7,5  7,6  7,1  7,4  7,6  7,2  7,2  7,4  7,2  7,5  7,0  7,7
    Avg:   7,4, Min:   7,0, Max:   7,7, Diff:   0,7]
    [Object Copy (ms):  133,1  133,1  133,2  133,1  133,2  133,1  133,2  133,1  133,5  134,0  133,0  133,2  133,0  132,9  132,6  133,1  133,2  132,9
    Avg: 133,1, Min: 132,6, Max: 134,0, Diff:   1,3]
    [Termination (ms):  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0
    Avg:   0,0, Min:   0,0, Max:   0,0, Diff:   0,0]
    [Termination Attempts : 1 3 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1
    Sum: 24, Avg: 1, Min: 1, Max: 3, Diff: 2]
    [GC Worker End (ms):  13908136,6  13908136,9  13908136,5  13908136,7  13908136,7  13908136,8  13908136,7  13908136,7  13908136,8  13908136,8  13908136,5  13908136,6  13908136,5  13908136,5  13908136,5  13908136,5  13908136,8  13908136,6
    Avg: 13908136,7, Min: 13908136,5, Max: 13908136,9, Diff:   0,4]
    [GC Worker (ms):  171,3  171,6  171,1  171,2  171,2  171,3  171,1  171,1  171,1  171,2  170,8  170,9  170,7  170,7  170,7  170,6  171,0  170,7
    Avg: 171,0, Min: 170,6, Max: 171,6, Diff:   0,9]
    [GC Worker Other (ms):  17,2  17,2  17,3  17,3  17,4  17,4  17,5  17,5  17,5  17,5  17,6  17,6  17,7  17,7  17,7  17,7  17,8  17,8
    Avg:  17,5, Min:  17,2, Max:  17,8, Diff:   0,6]
    [Clear CT:   1,6 ms]
    [Other:  30,1 ms]
    [Choose CSet:   0,1 ms]
    [Ref Proc:  17,1 ms]
    [Ref Enq:   0,9 ms]
    [Free CSet:   7,4 ms]
    [Eden: 3416M(3416M)->0B(3456M) Survivors: 264M->224M Heap: 7289M(18432M)->3912M(18432M)]
    [Times: user=3,16 sys=0,00, real=0,22 secs]    

参考处理仍然是最长的阶段,但时间要短得多。ParallelRefProcEnabled不能解决我的问题。我也尝试过更改年轻代的大小,但这也没有帮助。设置不同的-XX:MaxGCPauseMillis,比如更宽松的600毫秒或更严格的100毫秒,结果仍然是吞吐量不佳。

CMS的性能甚至比使用以下参数的G1还差:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly

年轻代收集越来越频繁。

我对所描述的日志完全感到困惑。明天我将尝试将Couchbase实例移动到另一个节点,以检查是否冻结了GC线程。

但是,如果Couchbase不是问题所在,也许有人可以解释一下日志的含义。或者也许有一些神奇的CMS参数可以解决这个问题。

我会非常感谢任何帮助!


你是否有启用+XX:+PrintReferenceGC选项的CMS收集器的GC日志? - Alexey Ragozin
好的,这部分日志是关于它的:13908,155: [软引用,0个引用,0.0041340秒]13908,160: [弱引用,0个引用,0.0023850秒]13908,162: [终结器引用,1393个引用,0.0065970秒]13908,169: [虚引用,108个引用,0.0018650秒]13908,171: [JNI 弱引用,0.0000630秒],0.22008100秒] [并行时间:188.4毫秒] - Anton Maximov
1
实际上,我们通过将Couchbase移动到另一台机器上解决了问题。现在99%的暂停时间都少于0.25秒。但是,我们仍然有大约2000个最终链接在每0.1-0.2秒的GC暂停中。也许我应该做些什么来进一步改善它。 - Anton Maximov
听起来当你在同一台机器上安装JVM和CouchBase时,会推动内存的极限? - Nat
1个回答

2
问题是我们自己解决的。我们有一个新规则 - 永远不要在JVM附近安装Couchbase。过去,我们遇到了Postgresql实例与Couchbase冲突的问题,因为Couchbase喜欢抓取所有磁盘操作,而Postgresql无法提交任何内容。所以,隔离Couchbase,一切都会好起来。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接