我在开发一个轻量级的异步消息库的过程中遇到了这种情况。为了了解创建大量寿命短暂的中等大小对象的成本,我编写了以下测试:
import java.nio.ByteBuffer;
import java.util.Random;
public class MemPressureTest {
static final int SIZE = 4096;
static final class Bigish {
final ByteBuffer b;
public Bigish() {
this(ByteBuffer.allocate(SIZE));
}
public Bigish(ByteBuffer b) {
this.b = b;
}
public void fill(byte bt) {
b.clear();
for (int i = 0; i < SIZE; ++i) {
b.put(bt);
}
}
}
public static void main(String[] args) {
Random random = new Random(1);
Bigish tmp = new Bigish();
for (int i = 0; i < 3e7; ++i) {
tmp.fill((byte)random.nextInt(255));
}
}
}
在我的笔记本电脑上,使用默认的GC设置,它大约需要运行95秒:
/tmp$ time java -Xlog:gc MemPressureTest
[0.006s][info][gc] Using G1
real 1m35.552s
user 1m33.658s
sys 0m0.428s
事情就在这里变得有些奇怪了。我调整了程序,在每次迭代中分配一个新对象:
...
Random random = new Random(1);
for (int i = 0; i < 3e7; ++i) {
Bigish tmp = new Bigish();
tmp.fill((byte)random.nextInt(255));
}
...
理论上,这应该会增加一些小的开销,但是没有任何对象应该被提升出Eden区。最好情况下,我期望运行时间非常接近。然而,这个测试只需要约17秒完成:
/tmp$ time java -Xlog:gc MemPressureTest
[0.007s][info][gc] Using G1
[0.090s][info][gc] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 23M->1M(130M) 1.304ms
[0.181s][info][gc] GC(1) Pause Young (Normal) (G1 Evacuation Pause) 76M->1M(130M) 0.870ms
[0.247s][info][gc] GC(2) Pause Young (Normal) (G1 Evacuation Pause) 76M->0M(130M) 0.844ms
[0.317s][info][gc] GC(3) Pause Young (Normal) (G1 Evacuation Pause) 75M->0M(130M) 0.793ms
[0.381s][info][gc] GC(4) Pause Young (Normal) (G1 Evacuation Pause) 75M->0M(130M) 0.859ms
[lots of similar GC pauses, snipped for brevity]
[16.608s][info][gc] GC(482) Pause Young (Normal) (G1 Evacuation Pause) 254M->0M(425M) 0.765ms
[16.643s][info][gc] GC(483) Pause Young (Normal) (G1 Evacuation Pause) 254M->0M(425M) 0.580ms
[16.676s][info][gc] GC(484) Pause Young (Normal) (G1 Evacuation Pause) 254M->0M(425M) 0.841ms
real 0m16.766s
user 0m16.578s
sys 0m0.576s
我多次运行了这两个版本,结果与上面几乎完全相同。我感觉自己一定漏掉了什么很明显的东西。我是不是要疯了?有什么原因可以解释这种性能差异吗?
=== 编辑 ===
我按照apangin和dan1st的建议改写了测试代码,使用了JMH:
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import java.nio.ByteBuffer;
import java.util.Random;
public class MemPressureTest {
static final int SIZE = 4096;
@State(Scope.Benchmark)
public static class Bigish {
final ByteBuffer b;
private Blackhole blackhole;
@Setup(Level.Trial)
public void setup(Blackhole blackhole) {
this.blackhole = blackhole;
}
public Bigish() {
this.b = ByteBuffer.allocate(SIZE);
}
public void fill(byte bt) {
b.clear();
for (int i = 0; i < SIZE; ++i) {
b.put(bt);
}
blackhole.consume(b);
}
}
static Random random = new Random(1);
@Benchmark
public static void test1(Blackhole blackhole) {
Bigish tmp = new Bigish();
tmp.setup(blackhole);
tmp.fill((byte)random.nextInt(255));
blackhole.consume(tmp);
}
@Benchmark
public static void test2(Bigish perm) {
perm.fill((byte) random.nextInt(255));
}
}
然而,第二个测试速度要慢得多:
> Task :jmh
# JMH version: 1.35
# VM version: JDK 18.0.1.1, OpenJDK 64-Bit Server VM, 18.0.1.1+2-6
# VM invoker: /Users/xxx/Library/Java/JavaVirtualMachines/openjdk-18.0.1.1/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/Users/xxx/Dev/MemTests/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.xxx.MemPressureTest.test1
# Run progress: 0.00% complete, ETA 00:16:40
# Fork: 1 of 5
# Warmup Iteration 1: 2183998.556 ops/s
# Warmup Iteration 2: 2281885.941 ops/s
# Warmup Iteration 3: 2239644.018 ops/s
# Warmup Iteration 4: 1608047.994 ops/s
# Warmup Iteration 5: 1992314.001 ops/s
Iteration 1: 2053657.571 ops/s3s]
Iteration 2: 2054957.773 ops/sm 3s]
Iteration 3: 2051595.233 ops/sm 13s]
Iteration 4: 2054878.239 ops/sm 23s]
Iteration 5: 2031111.214 ops/sm 33s]
# Run progress: 10.00% complete, ETA 00:15:04
# Fork: 2 of 5
# Warmup Iteration 1: 2228594.345 ops/s
# Warmup Iteration 2: 2257983.355 ops/s
# Warmup Iteration 3: 2063130.244 ops/s
# Warmup Iteration 4: 1629084.669 ops/s
# Warmup Iteration 5: 2063018.496 ops/s
Iteration 1: 1939260.937 ops/sm 33s]
Iteration 2: 1791414.018 ops/sm 43s]
Iteration 3: 1914987.221 ops/sm 53s]
Iteration 4: 1969484.898 ops/sm 3s]
Iteration 5: 1891440.624 ops/sm 13s]
# Run progress: 20.00% complete, ETA 00:13:23
# Fork: 3 of 5
# Warmup Iteration 1: 2228664.719 ops/s
# Warmup Iteration 2: 2263677.403 ops/s
# Warmup Iteration 3: 2237032.464 ops/s
# Warmup Iteration 4: 2040040.243 ops/s
# Warmup Iteration 5: 2038848.530 ops/s
Iteration 1: 2023934.952 ops/sm 14s]
Iteration 2: 2041874.437 ops/sm 24s]
Iteration 3: 2002858.770 ops/sm 34s]
Iteration 4: 2039727.607 ops/sm 44s]
Iteration 5: 2045827.670 ops/sm 54s]
# Run progress: 30.00% complete, ETA 00:11:43
# Fork: 4 of 5
# Warmup Iteration 1: 2105430.688 ops/s
# Warmup Iteration 2: 2279387.762 ops/s
# Warmup Iteration 3: 2228346.691 ops/s
# Warmup Iteration 4: 1438607.183 ops/s
# Warmup Iteration 5: 2059319.745 ops/s
Iteration 1: 1112543.932 ops/sm 54s]
Iteration 2: 1977077.976 ops/sm 4s]
Iteration 3: 2040147.355 ops/sm 14s]
Iteration 4: 1975766.032 ops/sm 24s]
Iteration 5: 2003532.092 ops/sm 34s]
# Run progress: 40.00% complete, ETA 00:10:02
# Fork: 5 of 5
# Warmup Iteration 1: 2240203.848 ops/s
# Warmup Iteration 2: 2245673.994 ops/s
# Warmup Iteration 3: 2096257.864 ops/s
# Warmup Iteration 4: 2046527.740 ops/s
# Warmup Iteration 5: 2050379.941 ops/s
Iteration 1: 2050691.989 ops/sm 35s]
Iteration 2: 2057803.100 ops/sm 45s]
Iteration 3: 2058634.766 ops/sm 55s]
Iteration 4: 2060596.595 ops/sm 5s]
Iteration 5: 2061282.107 ops/sm 15s]
Result "com.xxx.MemPressureTest.test1":
1972203.484 ±(99.9%) 142904.698 ops/s [Average]
(min, avg, max) = (1112543.932, 1972203.484, 2061282.107), stdev = 190773.683
CI (99.9%): [1829298.786, 2115108.182] (assumes normal distribution)
# JMH version: 1.35
# VM version: JDK 18.0.1.1, OpenJDK 64-Bit Server VM, 18.0.1.1+2-6
# VM invoker: /Users/xxx/Library/Java/JavaVirtualMachines/openjdk-18.0.1.1/Contents/Home/bin/java
# VM options: -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/Users/xxx/Dev/MemTests/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.xxx.MemPressureTest.test2
# Run progress: 50.00% complete, ETA 00:08:22
# Fork: 1 of 5
# Warmup Iteration 1: 282751.407 ops/s
# Warmup Iteration 2: 283333.984 ops/s
# Warmup Iteration 3: 293785.079 ops/s
# Warmup Iteration 4: 268403.105 ops/s
# Warmup Iteration 5: 280054.277 ops/s
Iteration 1: 279093.118 ops/s9m 15s]
Iteration 2: 282782.996 ops/s9m 25s]
Iteration 3: 282688.921 ops/s9m 35s]
Iteration 4: 291578.963 ops/s9m 45s]
Iteration 5: 294835.777 ops/s9m 55s]
# Run progress: 60.00% complete, ETA 00:06:41
# Fork: 2 of 5
# Warmup Iteration 1: 283735.550 ops/s
# Warmup Iteration 2: 283536.547 ops/s
# Warmup Iteration 3: 294403.173 ops/s
# Warmup Iteration 4: 284161.042 ops/s
# Warmup Iteration 5: 281719.077 ops/s
Iteration 1: 276838.416 ops/s10m 56s]
Iteration 2: 284063.117 ops/s11m 6s]
Iteration 3: 282361.985 ops/s11m 16s]
Iteration 4: 289125.092 ops/s11m 26s]
Iteration 5: 294236.625 ops/s11m 36s]
# Run progress: 70.00% complete, ETA 00:05:01
# Fork: 3 of 5
# Warmup Iteration 1: 284567.336 ops/s
# Warmup Iteration 2: 283548.713 ops/s
# Warmup Iteration 3: 294317.511 ops/s
# Warmup Iteration 4: 283501.873 ops/s
# Warmup Iteration 5: 283691.306 ops/s
Iteration 1: 283462.749 ops/s12m 36s]
Iteration 2: 284120.587 ops/s12m 46s]
Iteration 3: 264878.952 ops/s12m 56s]
Iteration 4: 292681.168 ops/s13m 6s]
Iteration 5: 295279.759 ops/s13m 16s]
# Run progress: 80.00% complete, ETA 00:03:20
# Fork: 4 of 5
# Warmup Iteration 1: 284823.519 ops/s
# Warmup Iteration 2: 283913.207 ops/s
# Warmup Iteration 3: 294401.483 ops/s
# Warmup Iteration 4: 283998.027 ops/s
# Warmup Iteration 5: 283987.408 ops/s
Iteration 1: 278014.618 ops/s14m 17s]
Iteration 2: 283431.491 ops/s14m 27s]
Iteration 3: 284465.945 ops/s14m 37s]
Iteration 4: 293202.934 ops/s14m 47s]
Iteration 5: 290059.807 ops/s14m 57s]
# Run progress: 90.00% complete, ETA 00:01:40
# Fork: 5 of 5
# Warmup Iteration 1: 285598.809 ops/s
# Warmup Iteration 2: 284434.916 ops/s
# Warmup Iteration 3: 294355.547 ops/s
# Warmup Iteration 4: 284307.860 ops/s
# Warmup Iteration 5: 284297.362 ops/s
Iteration 1: 283676.043 ops/s15m 57s]
Iteration 2: 283609.750 ops/s16m 7s]
Iteration 3: 284575.124 ops/s16m 17s]
Iteration 4: 293564.269 ops/s16m 27s]
Iteration 5: 216267.883 ops/s16m 37s]
Result "com.xxx.MemPressureTest.test2":
282755.844 ±(99.9%) 11599.112 ops/s [Average]
(min, avg, max) = (216267.883, 282755.844, 295279.759), stdev = 15484.483
CI (99.9%): [271156.731, 294354.956] (assumes normal distribution)
JMH Blackhole可以防止代码被优化,现在JMH负责运行单独的迭代,这样可以防止并行化,对吗?Blackhole难道不应该还可以阻止对象被限制在堆栈中吗?此外,如果Hotspot仍在进行大量优化,预热迭代之间的差异不会更大吗?
ByteBuffer.put
几乎总是优于相对put
。 - apangin