我正在尝试使用Unsafe来遍历内存,而不是遍历byte[]数组中的值。使用unsafe分配了一个内存块,该内存足以容纳65536个字节值。
我正在尝试以下内容:
char aChar = some character
if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){
// do something
}
改为:
char aChar = some character
if ((byte) 0 == ( lookup[aChar] & mask )){
// do something
}
我以为Unsafe能够比使用常规数组访问更快,因为它对每个索引进行了索引检查...但这只是一厢情愿的想法,我认为jvm不会有一个特殊的操作(unsafe)可以使常规数组访问和迭代更快。在我的看来,jvm使用正常的byte[]迭代工作得很好,并且使用正常的、未经改变的、纯净的Java代码,速度非常快。@millimoose 点出了问题所在:'Unsafe可能对许多事情有用,但这种微观优化并不是其中之一。'在极其严格的有限情况下,使用Unsafe会更快:(64位jvm only)对于每个测试仅执行一次的单个65535 byte[]查找更快。在这种情况下,64位jvm上的UnsafeLookup_8B比普通方法快24%。如果测试重复,使每个测试都执行两次,则正常方法现在比unsafe快30%。在冷启动的纯解释模式下,Unsafe要快得多——但仅限于第一次和小数组大小。在32位标准Oracle JVM 7.x上,正常方法比使用unsafe快三倍。在我的测试中,使用Unsafe更慢:无论是在Oracle Java 64位还是32位虚拟机上,也无论是在哪种操作系统和机器架构(32位和64位),使用Unsafe都更慢。即使调用了
server
jvm选项,它也会更慢。Unsafe在32位jvm上比正常方法慢9%或更多(下面代码中1_GB数组和UnsafeLookup_8B(最快的)),在64位jvm上甚至更慢??Unsafe在64位jvm上比正常方法慢234%或更多(下面代码中1_MB数组和UnsafeLookup_1B(最快的))。这是为什么呢?当我运行yellowB发布的代码(检查一个1GB byte[])时,正常方法仍然是最快的。C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar"
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1967737 us.
use unsafeLookup_1B()...
Not found '0'
time : 2923367 us.
use unsafeLookup_8B()...
Not found '0'
time : 2495663 us.
Flat profile of 26.35 secs (2018 total ticks): main
Interpreted + native Method
0.0% 1 + 0 test.StackOverflow.main
0.0% 1 + 0 Total interpreted
Compiled + native Method
67.8% 1369 + 0 test.StackOverflow.main
11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B
11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B
9.1% 184 + 0 test.StackOverflow.normalLookup
99.9% 2016 + 0 Total compiled
Stub + native Method
0.0% 0 + 1 sun.misc.Unsafe.getLong
0.0% 0 + 1 Total stub
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 26.39 seconds:
100.0% 2023 Received ticks
C:\Users\wilf>java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing)
CPU是:Intel Core 2 Duo E4600 @ 2.4GHZ,内存大小为4.00GB(其中3.25GB可用)。
操作系统为:Windows 7(32位)。
在运行测试时,使用的是带有Windows 7_64操作系统和32位Java的4核AMD64处理器:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1631142 us.
use unsafeLookup_1B()...
Not found '0'
time : 2365214 us.
use unsafeLookup_8B()...
Not found '0'
time : 1783320 us.
在一台配备 Windows 7_64 操作系统、4核 AMD64 处理器以及64位 Java 环境的计算机上运行测试:
use normalLookup()...
Not found '0'
time : 655146 us.
use unsafeLookup_1B()...
Not found '0'
time : 904783 us.
use unsafeLookup_8B()...
Not found '0'
time : 764427 us.
Flat profile of 6.34 secs (13 total ticks): main
Interpreted + native Method
23.1% 3 + 0 java.io.PrintStream.println
23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B
15.4% 2 + 0 test.StackOverflow.main
7.7% 1 + 0 java.io.DataInputStream.<init>
69.2% 9 + 0 Total interpreted
Compiled + native Method
7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B
7.7% 0 + 1 test.StackOverflow.main
7.7% 0 + 1 test.StackOverflow.normalLookup
7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B
30.8% 0 + 4 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 6.35 seconds:
100.0% 14 Received ticks
42.9% 6 Compilation
Unsafe
方法都被声明为native
。调用本地方法可能涉及到 JVM 无法优化的一些开销。如果您愿意首先使用本地代码来优化此类事情,最好尽可能避免跨越本地-JVM 边界。 - millimoose