JVM G1GC的混合垃圾回收器未能收集到太多的旧区域

Question

JVM G1GC的混合垃圾回收器未能收集到太多的旧区域

javagarbage-collectionweak-referencesg1gc

34

我的服务器运行在CentOS 6.7上，使用1.8.0_92版本的Java，并设置GC参数为'-Xms16g -Xmx16g -XX:+UseG1GC'。所以默认的InitiatingHeapOccupancyPercent是45，G1HeapWastePercent是5，G1MixedGCLiveThresholdPercent是85。我的服务器的mixed GC从7.2GB开始，但清理的越来越少，最终老年代保持大于7.2GB，因此它总是尝试进行并发标记。最终所有堆都报错了，发生了full GC。full GC后，老年代使用量低于500MB。

我很好奇为什么我的mixed GC无法收集更多数据，似乎存活数据并不是很多...

我尝试打印相关的g1信息，并发现许多像下面的消息，看起来我的老年代包含了很多存活数据，但为什么full GC可以收集这么多...

G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 190 regions, reclaimable: 856223240 bytes (4.98 %),  threshold: 5.00 %

以下日志是将 InitiatingHeapOccupancyPercent 修改为 15（在2.4GB处开始并发标记）以加速的结果。

### PHASE Post-Marking
......
### SUMMARY  capacity: 16384.00 MB  used: 2918.42 MB / 17.81 %  prev-live: 2407.92 MB / 14.70 %  next-live: 2395.00 MB / 14.62 %  remset: 56.66 MB  code-roots: 0.91 MB
### PHASE Post-Sorting
....
### SUMMARY  capacity: 1624.00 MB  used: 1624.00 MB / 100.00 %  prev-live: 1123.70 MB / 69.19 %  next-live: 0.00 MB / 0.00 %  remset: 35.90 MB  code-roots: 0.89 MB

编辑:

我尝试在混合GC后触发完整GC，仍然可以将内存减少到4xx MB，因此看起来我的老年代有更多的数据可以被收集。

在进行完整GC之前，混合GC日志为：

 32654.979: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 457 regions, reclaimable: 2956666176 bytes (17.21 %), threshold: 5.00 %], 0.1106810 secs]
 ....
 [Eden: 6680.0M(6680.0M)->0.0B(536.0M) Survivors: 344.0M->280.0M Heap: 14.0G(16.0G)->7606.6M(16.0G)]
 [Times: user=2.31 sys=0.01, real=0.11 secs]
 ...
 [GC pause (G1 Evacuation Pause) (mixed)
 ...
 32656.876: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: old CSet region num reached max, old: 205 regions, max: 205 regions]
 32656.876: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 67 regions, survivors: 35 regions, old: 205 regions, predicted pause time: 173.84 ms, target pause time: 200.00 ms]
 32656.992: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 252 regions, reclaimable: 1321193600 bytes (7.69 %), threshold: 5.00 %]
 [Eden: 536.0M(536.0M)->0.0B(720.0M) Survivors: 280.0M->96.0M Heap: 8142.6M(16.0G)->6029.9M(16.0G)]
 [Times: user=2.49 sys=0.01, real=0.12 secs]
 ...
 [GC pause (G1 Evacuation Pause) (mixed)
 ...
 32659.727: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 66 regions, max: 205 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %]
 32659.727: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 90 regions, survivors: 12 regions, old: 66 regions, predicted pause time: 120.51 ms, target pause time: 200.00 ms]
 32659.785: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 186 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %]
 [Eden: 720.0M(720.0M)->0.0B(9064.0M) Survivors: 96.0M->64.0M Heap: 6749.9M(16.0G)->5572.0M(16.0G)]
 [Times: user=1.20 sys=0.00, real=0.06 secs]

编辑：2016/12/11

我使用-Xmx4G从另一台机器中转储了堆。

我使用lettuce作为redis客户端，并使用LatencyUtils进行跟踪功能。每10分钟（默认情况下重置延迟发布后为true，https://github.com/mp911de/lettuce/wiki/Command-Latency-Metrics），它会使LatencyStats（其中包含约3000个元素的long[]）实例弱引用。因此，长时间后会产生大量的LatencyStats的弱引用。

在Full GC之前。

完全GC后。

目前我不需要lettuce的跟踪功能，所以只需将其禁用即可，这样它就不会再进行完全GC了。但是不确定为什么混合gc不能清除它们。

- koji lin

3

"-XX:+UnlockDiagnosticVMOptions -XX:+G1PrintHeapRegions -XX:+G1PrintRegionLivenessInfo" 可能提供一些见解。我猜测您可能有巨大的分配或某些分配模式导致不正确的存活估计（也许是软引用）？ - the8472

1

谢谢，### PHASE Post-Marking 是 G1PrintRegionLivenessInfo，看起来它仍然有超过1GB的活动数据。而且巨大对象并没有显示很多 [Humongous Total: 1] [Humongous Candidate: 1]。我会检查如何知道它是否来自软引用..（有关此内容的任何文档吗？） - koji lin

我将尝试使用 SoftRefLRUPolicyMSPerMB。 - koji lin

然后您应该将GC日志上传到某个地方，或尝试降低“G1MixedGCLiveThresholdPercent”。 - the8472

你曾考虑过使用并行收集器吗？你的堆大小已经趋近于并行收集器能够在执行主要垃圾回收时满足响应时间要求的边界线。200毫秒是相当长的一段时间。 - Cogman

显示剩余4条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user4602302 · Accepted Answer

嗯，你没有提到你设置的所有参数，但是

你可以尝试设置

-XX:+ScavengeBeforeFullGC

你还应该考虑你的对象生命周期，以及应用程序对象的寿命和大小。思考一下并查看以下参数。

-XX:NewRatio=n              old/new ration (default 2)
-XX:SurvivorRatio=n         eden/survivor ratio (default 8)
-XX:MaxTenuringThreshold=n  number of times, objects are moved from survivor one to survivor two and vice versa before objects are moved to old-gen (default 15)

使用默认值时，Xms和Xmx设置为32GB -> old gen = 16GB 和 new gen = 16GB -> eden = 14GB -> survivors = 2GB（有两个，每个大小为1GB）。

eden包含由new Object实例化的所有 Object 对象。

一个survivor（to-survivor）始终为空。另一个survivor（from-survivor）包含在minor gc中幸存的Object对象。

来自eden和from-survivor的幸存Object将在minor gc中进入to-survivor。

如果此“默认配置”的标准尺寸超过1GB，则Object将进入old-gen。

如果未超过，则经过15次minor gc（-XX:MaxTenuringThreshold默认值），Object将进入old-gen。

通过调整这些值，请始终记住，old-gen必须与new-gen一样大或更大，因为gc可能会导致整个new-gen进入old-gen。

编辑

你的第一张“旧版：已使用”图片的时间轴会很有帮助。

请记住，只有当老年代不超过时才需要进行完整的垃圾回收-完整的垃圾回收会使整个“世界”停止一段时间。

在这种特殊情况下，我建议您可以：

将-Xms和-Xmx减少到8GB
将-XX:SurvivorRatio的值设置/降低为2
将-XX:MaxTenuringThreshold设置/增加到50

然后您将获得一个大小为4GB的新旧代，

大小为2GB的伊甸园，

两个大小为1GB的幸存者，

以及大约50个次要GC，在Object进入老年代之前。