public class Main {
public static void main(String... args) throws IOException {
for (int i = 0; i < 10; i++)
doTest();
}
public static void doTest() {
final ByteBuffer writeBuffer = ByteBuffer.allocateDirect(64 * 1024);
final ByteBuffer readBuffer = writeBuffer.slice();
final AtomicInteger readCount = new PaddedAtomicInteger();
final AtomicInteger writeCount = new PaddedAtomicInteger();
for(int i=0;i<3;i++)
performTiming(writeBuffer, readBuffer, readCount, writeCount);
System.out.println();
}
private static void performTiming(ByteBuffer writeBuffer, final ByteBuffer readBuffer, final AtomicInteger readCount, final AtomicInteger writeCount) {
writeBuffer.clear();
readBuffer.clear();
readCount.set(0);
writeCount.set(0);
Thread t = new Thread(new Runnable() {
@Override
public void run() {
byte[] bytes = new byte[128];
while (!Thread.interrupted()) {
int rc = readCount.get(), toRead;
while ((toRead = writeCount.get() - rc) <= 0) ;
for (int i = 0; i < toRead; i++) {
byte len = readBuffer.get();
if (len == -1) {
// rewind.
readBuffer.clear();
// rc++;
} else {
int num = readBuffer.getInt();
if (num != rc)
throw new AssertionError("Expected " + rc + " but got " + num) ;
rc++;
readBuffer.get(bytes, 0, len - 4);
}
}
readCount.lazySet(rc);
}
}
});
t.setDaemon(true);
t.start();
Thread.yield();
long start = System.nanoTime();
int runs = 30 * 1000 * 1000;
int len = 32;
byte[] bytes = new byte[len - 4];
int wc = writeCount.get();
for (int i = 0; i < runs; i++) {
if (writeBuffer.remaining() < len + 1) {
// reader has to catch up.
while (wc - readCount.get() > 0) ;
// rewind.
writeBuffer.put((byte) -1);
writeBuffer.clear();
}
writeBuffer.put((byte) len);
writeBuffer.putInt(i);
writeBuffer.put(bytes);
writeCount.lazySet(++wc);
}
// reader has to catch up.
while (wc - readCount.get() > 0) ;
t.interrupt();
t.stop();
long time = System.nanoTime() - start;
System.out.printf("Message rate was %.1f M/s offsets %d %d %d%n", runs * 1e3 / time
, addressOf(readBuffer) - addressOf(writeBuffer)
, addressOf(readCount) - addressOf(writeBuffer)
, addressOf(writeCount) - addressOf(writeBuffer)
);
}
// assumes -XX:+UseCompressedOops.
public static long addressOf(Object... o) {
long offset = UNSAFE.arrayBaseOffset(o.getClass());
return UNSAFE.getInt(o, offset) * 8L;
}
public static final Unsafe UNSAFE = getUnsafe();
public static Unsafe getUnsafe() {
try {
Field field = Unsafe.class.getDeclaredField("theUnsafe");
field.setAccessible(true);
return (Unsafe) field.get(null);
} catch (Exception e) {
throw new AssertionError(e);
}
}
private static class PaddedAtomicInteger extends AtomicInteger {
public long p2, p3, p4, p5, p6, p7;
public long sum() {
// return 0;
return p2 + p3 + p4 + p5 + p6 + p7;
}
}
}
打印相同数据块的时间。末尾的数字是对象的相对地址,显示它们每次都以相同的方式布局在缓存中。运行更长时间的10个测试表明,给定的组合反复产生相同的性能。
Message rate was 63.2 M/s offsets 136 200 264
Message rate was 80.4 M/s offsets 136 200 264
Message rate was 80.0 M/s offsets 136 200 264
Message rate was 81.9 M/s offsets 136 200 264
Message rate was 82.2 M/s offsets 136 200 264
Message rate was 82.5 M/s offsets 136 200 264
Message rate was 79.1 M/s offsets 136 200 264
Message rate was 82.4 M/s offsets 136 200 264
Message rate was 82.4 M/s offsets 136 200 264
Message rate was 34.7 M/s offsets 136 200 264
Message rate was 39.1 M/s offsets 136 200 264
Message rate was 39.0 M/s offsets 136 200 264
每组缓冲区和计数器都进行了三次测试,这些缓冲区似乎给出了类似的结果。因此,我相信有关于这些缓冲区在内存中布局的一些问题我没有发现。
是否有任何方法可以更经常地提高性能?看起来像是缓存冲突,但我看不到这里可能发生的地方。
顺便说一下:
M/s
表示每秒百万条消息,这比任何人可能需要的要多,但理解如何使其始终保持快速是很重要的。编辑:使用同步和等待以及通知会使结果更加一致。但并不会更快。
Message rate was 6.9 M/s
Message rate was 7.8 M/s
Message rate was 7.9 M/s
Message rate was 6.7 M/s
Message rate was 7.5 M/s
Message rate was 7.7 M/s
Message rate was 7.3 M/s
Message rate was 7.9 M/s
Message rate was 6.4 M/s
Message rate was 7.8 M/s
编辑:通过使用任务集,如果我将两个线程锁定到更改同一个核心,则可以使性能保持一致。
Message rate was 35.1 M/s offsets 136 200 216
Message rate was 34.0 M/s offsets 136 200 216
Message rate was 35.4 M/s offsets 136 200 216
Message rate was 35.6 M/s offsets 136 200 216
Message rate was 37.0 M/s offsets 136 200 216
Message rate was 37.2 M/s offsets 136 200 216
Message rate was 37.1 M/s offsets 136 200 216
Message rate was 35.0 M/s offsets 136 200 216
Message rate was 37.1 M/s offsets 136 200 216
If I use any two logical threads on different cores, I get the inconsistent behaviour
Message rate was 60.2 M/s offsets 136 200 216
Message rate was 68.7 M/s offsets 136 200 216
Message rate was 55.3 M/s offsets 136 200 216
Message rate was 39.2 M/s offsets 136 200 216
Message rate was 39.1 M/s offsets 136 200 216
Message rate was 37.5 M/s offsets 136 200 216
Message rate was 75.3 M/s offsets 136 200 216
Message rate was 73.8 M/s offsets 136 200 216
Message rate was 66.8 M/s offsets 136 200 216
编辑:似乎触发GC会改变行为。这些显示了在手动触发GC的一半时对相同缓冲区+计数器进行重复测试。
faster after GC
Message rate was 27.4 M/s offsets 136 200 216
Message rate was 27.8 M/s offsets 136 200 216
Message rate was 29.6 M/s offsets 136 200 216
Message rate was 27.7 M/s offsets 136 200 216
Message rate was 29.6 M/s offsets 136 200 216
[GC 14312K->1518K(244544K), 0.0003050 secs]
[Full GC 1518K->1328K(244544K), 0.0068270 secs]
Message rate was 34.7 M/s offsets 64 128 144
Message rate was 54.5 M/s offsets 64 128 144
Message rate was 54.1 M/s offsets 64 128 144
Message rate was 51.9 M/s offsets 64 128 144
Message rate was 57.2 M/s offsets 64 128 144
and slower
Message rate was 61.1 M/s offsets 136 200 216
Message rate was 61.8 M/s offsets 136 200 216
Message rate was 60.5 M/s offsets 136 200 216
Message rate was 61.1 M/s offsets 136 200 216
[GC 35740K->1440K(244544K), 0.0018170 secs]
[Full GC 1440K->1302K(244544K), 0.0071290 secs]
Message rate was 53.9 M/s offsets 64 128 144
Message rate was 54.3 M/s offsets 64 128 144
Message rate was 50.8 M/s offsets 64 128 144
Message rate was 56.6 M/s offsets 64 128 144
Message rate was 56.0 M/s offsets 64 128 144
Message rate was 53.6 M/s offsets 64 128 144
编辑:使用 @BegemoT 的库打印所使用的核心 ID,我在一台3.8 GHz的i7(家用PC)上得到了以下结果
注意:偏移量乘以8是不正确的。由于堆大小很小,JVM不会像对待大型堆(但小于32 GB)那样将引用乘以8。
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 54.4 M/s offsets 3392 3904 4416
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#6]
Message rate was 54.2 M/s offsets 3392 3904 4416
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 60.7 M/s offsets 3392 3904 4416
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 25.5 M/s offsets 1088 1600 2112
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 25.9 M/s offsets 1088 1600 2112
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 26.0 M/s offsets 1088 1600 2112
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 61.0 M/s offsets 1088 1600 2112
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 61.8 M/s offsets 1088 1600 2112
writer.currentCore() -> Core[#0]
reader.currentCore() -> Core[#5]
Message rate was 60.7 M/s offsets 1088 1600 2112
你可以看到相同的逻辑线程被使用,但性能在运行之间会有所变化,但在运行中不会变化(在运行中使用相同的对象)。
我已经找到了问题。这是一个内存布局问题,但我发现了一个简单的解决方法。ByteBuffer无法扩展,因此您无法添加填充,因此我创建了一个我会丢弃的对象。
final ByteBuffer writeBuffer = ByteBuffer.allocateDirect(64 * 1024);
final ByteBuffer readBuffer = writeBuffer.slice();
new PaddedAtomicInteger();
final AtomicInteger readCount = new PaddedAtomicInteger();
final AtomicInteger writeCount = new PaddedAtomicInteger();
没有这个额外的填充(未使用的对象),结果在3.8 GHz i7上看起来像这样。
Message rate was 38.5 M/s offsets 3392 3904 4416
Message rate was 54.7 M/s offsets 3392 3904 4416
Message rate was 59.4 M/s offsets 3392 3904 4416
Message rate was 54.3 M/s offsets 1088 1600 2112
Message rate was 56.3 M/s offsets 1088 1600 2112
Message rate was 56.6 M/s offsets 1088 1600 2112
Message rate was 28.0 M/s offsets 1088 1600 2112
Message rate was 28.1 M/s offsets 1088 1600 2112
Message rate was 28.0 M/s offsets 1088 1600 2112
Message rate was 17.4 M/s offsets 1088 1600 2112
Message rate was 17.4 M/s offsets 1088 1600 2112
Message rate was 17.4 M/s offsets 1088 1600 2112
Message rate was 54.5 M/s offsets 1088 1600 2112
Message rate was 54.2 M/s offsets 1088 1600 2112
Message rate was 55.1 M/s offsets 1088 1600 2112
Message rate was 25.5 M/s offsets 1088 1600 2112
Message rate was 25.6 M/s offsets 1088 1600 2112
Message rate was 25.6 M/s offsets 1088 1600 2112
Message rate was 56.6 M/s offsets 1088 1600 2112
Message rate was 54.7 M/s offsets 1088 1600 2112
Message rate was 54.4 M/s offsets 1088 1600 2112
Message rate was 57.0 M/s offsets 1088 1600 2112
Message rate was 55.9 M/s offsets 1088 1600 2112
Message rate was 56.3 M/s offsets 1088 1600 2112
Message rate was 51.4 M/s offsets 1088 1600 2112
Message rate was 56.6 M/s offsets 1088 1600 2112
Message rate was 56.1 M/s offsets 1088 1600 2112
Message rate was 46.4 M/s offsets 1088 1600 2112
Message rate was 46.4 M/s offsets 1088 1600 2112
Message rate was 47.4 M/s offsets 1088 1600 2112
使用被废弃的填充对象。
Message rate was 54.3 M/s offsets 3392 4416 4928
Message rate was 53.1 M/s offsets 3392 4416 4928
Message rate was 59.2 M/s offsets 3392 4416 4928
Message rate was 58.8 M/s offsets 1088 2112 2624
Message rate was 58.9 M/s offsets 1088 2112 2624
Message rate was 59.3 M/s offsets 1088 2112 2624
Message rate was 59.4 M/s offsets 1088 2112 2624
Message rate was 59.0 M/s offsets 1088 2112 2624
Message rate was 59.8 M/s offsets 1088 2112 2624
Message rate was 59.8 M/s offsets 1088 2112 2624
Message rate was 59.8 M/s offsets 1088 2112 2624
Message rate was 59.2 M/s offsets 1088 2112 2624
Message rate was 60.5 M/s offsets 1088 2112 2624
Message rate was 60.5 M/s offsets 1088 2112 2624
Message rate was 60.5 M/s offsets 1088 2112 2624
Message rate was 60.5 M/s offsets 1088 2112 2624
Message rate was 60.9 M/s offsets 1088 2112 2624
Message rate was 60.6 M/s offsets 1088 2112 2624
Message rate was 59.6 M/s offsets 1088 2112 2624
Message rate was 60.3 M/s offsets 1088 2112 2624
Message rate was 60.5 M/s offsets 1088 2112 2624
Message rate was 60.9 M/s offsets 1088 2112 2624
Message rate was 60.5 M/s offsets 1088 2112 2624
Message rate was 60.5 M/s offsets 1088 2112 2624
Message rate was 60.7 M/s offsets 1088 2112 2624
Message rate was 61.6 M/s offsets 1088 2112 2624
Message rate was 60.8 M/s offsets 1088 2112 2624
Message rate was 60.3 M/s offsets 1088 2112 2624
Message rate was 60.7 M/s offsets 1088 2112 2624
Message rate was 58.3 M/s offsets 1088 2112 2624
很遗憾,每次垃圾回收后,对象的布局可能无法优化。唯一解决这个问题的方法可能是在原始类中添加填充 :(