从Efficient R programming the byte compiler和R docment r byte compiler中,我了解到可以使用
与“mkl 2019.02-057”相同:
cmpfun
将纯R
函数编译成字节码以提高速度,而enableJIT
可以通过启用即时编译
来提高速度。
因此,我决定像第一个链接那样自己进行基准测试,使用以下代码:
library("compiler")
library("rbenchmark")
enableJIT(3)
my_mean = function(x) {
total = 0
n = length(x)
for (each in x)
total = total + each
total / n
}
cmp_mean = cmpfun(my_mean, list(optimize = 3))
## Generate some data
x = rnorm(100000)
benchmark(my_mean(x), cmp_mean(x), mean(x), columns = c("test", "elapsed", "relative"), order = "relative", replications = 5000)
不幸的是,结果与第一个链接所示的完全不同。 my_mean
的性能甚至比 cmp_mean
更好:
test elapsed relative
3 mean(x) 1.468 1.000
1 my_mean(x) 35.402 24.116
2 cmp_mean(x) 36.817 25.080
我无法弄清发生了什么。
编辑:
我的电脑上的 R
版本是 3.5.2
。
操作系统是 debian 9.8
。我的电脑上的所有软件都使用 debian
提供的稳定源进行更新。
linux
内核版本为 4.9.0-8-amd64
。
编辑5:
我重写了脚本来测试不同组合的 optimize
和 JIT
:
#!/usr/bin/env Rscript
library("compiler")
library("microbenchmark")
library("rlist")
my_mean = function(x) {
total = 0
n = length(x)
for (each in x)
total = total + each
total / n
}
do_cmpfun = function(f, f_name, optimization_level) {
cmp_f = cmpfun(f, list(optimize = optimization_level))
list(cmp_f, f_name, optimize = optimization_level)
}
do_benchmark = function(f, f_name, optimization_level, JIT_level, x) {
result = summary(microbenchmark(f(x), times = 1000, unit = "us", control = list(warmup = 100)))
data.frame(fun = f_name, optimize = optimization_level, JIT = JIT_level, mean = result$mean)
}
means = list(list(mean, "mean", optimize = -1), list(my_mean, "my_mean", optimize = -1))
for (optimization_level in 0:3)
means = list.append(means, do_cmpfun(my_mean, "my_mean", optimization_level))
# Generate some data
x = rnorm(100000)
# Benchmark in different JIT levels
result = c()
for (JIT_level in 0:3) {
enableJIT(JIT_level)
for (f in means) {
result = rbind(result, do_benchmark(f[[1]], f[[2]], f[[3]], JIT_level, x))
}
}
# Sort result
sorted_result = result[order(result$mean), ]
rownames(sorted_result) = NULL
print("Unit = us, optimize = -1 means it is not processed by cmpfun")
print(sorted_result)
在运行 R 脚本之前,我执行了 sudo cpupower frequency-set --governor performance
命令,并获得了以下输出:
[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
fun optimize JIT mean
1 mean -1 2 229.1841
2 mean -1 1 229.3910
3 mean -1 3 236.3680
4 mean -1 0 252.9416
5 my_mean -1 2 5242.0413
6 my_mean 3 0 5279.9710
7 my_mean 2 2 5297.5323
8 my_mean 2 1 5327.0324
9 my_mean -1 1 5333.6941
10 my_mean 3 1 5336.4559
11 my_mean 2 0 5362.6644
12 my_mean 3 3 5410.1963
13 my_mean 2 3 5414.4616
14 my_mean -1 3 5418.3823
15 my_mean 3 2 5437.3233
16 my_mean 1 2 9947.7897
17 my_mean 1 1 10101.6464
18 my_mean 1 3 10204.3253
19 my_mean 1 0 10323.0782
20 my_mean 0 0 26557.3808
21 my_mean 0 2 26728.5222
22 my_mean -1 0 26901.4200
23 my_mean 0 3 26984.5200
24 my_mean 0 1 27060.6188
然而,当我使用由openblas 0.2.19-3
提供的libblas.so.3
和liblapack.so.3
进行update-alternative
之后,optimize = 3
和JIT = 0
的my_mean
成为性能最佳的一个(除了mean
):
[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
fun optimize JIT mean
1 mean -1 0 228.9361
2 mean -1 1 229.1223
3 mean -1 2 233.9757
4 mean -1 3 241.7835
5 my_mean 3 0 5246.8089
6 my_mean -1 1 5261.3951
7 my_mean -1 2 5330.6310
8 my_mean 2 3 5362.2055
9 my_mean 3 1 5400.9983
10 my_mean 2 0 5418.7674
11 my_mean 2 1 5460.8133
12 my_mean 3 3 5464.8280
13 my_mean -1 3 5520.7021
14 my_mean 2 2 5591.7352
15 my_mean 3 2 5610.6446
16 my_mean 1 3 10244.2832
17 my_mean 1 0 10274.7504
18 my_mean 1 1 10311.6423
19 my_mean 1 2 10735.6449
20 my_mean 0 2 26904.1858
21 my_mean -1 0 26961.0536
22 my_mean 0 0 27115.8191
23 my_mean 0 3 27538.7224
24 my_mean 0 1 28133.6159
与“mkl 2019.02-057”相同:
[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
fun optimize JIT mean
1 mean -1 1 257.8620
2 mean -1 0 263.3743
3 mean -1 2 280.6906
4 mean -1 3 291.8409
5 my_mean 2 0 5445.3252
6 my_mean 2 2 5462.4575
7 my_mean 3 3 5560.2931
8 my_mean -1 1 5591.0089
9 my_mean 3 1 5645.3897
10 my_mean 3 0 5676.1714
11 my_mean 3 2 5707.7964
12 my_mean 2 3 5757.7887
13 my_mean -1 3 5856.0215
14 my_mean -1 2 5897.1735
15 my_mean 2 1 6363.1090
16 my_mean 1 2 9973.7666
17 my_mean 1 1 10557.8154
18 my_mean 1 0 10926.6103
19 my_mean 1 3 16030.0326
20 my_mean 0 0 27461.4078
21 my_mean 0 1 27939.7680
22 my_mean -1 0 27985.4590
23 my_mean 0 3 30394.2772
24 my_mean 0 2 33768.5701
my_mean()
和cmp_mean()
在所有问题规模上似乎处于势均力敌的状态。我无法复制此链接中显示它们的cmp_mean
显着更好的图表。不确定那是什么时候写的。也许 R 解释器已经有所改进,现在cmpfun()
做的优化已经自动完成了?另外,目录一览表显示某些内容与改变 R 使用的 BLAS 有关。也许他们的结果是在使用非默认 BLAS 时得出的?这是一个很好的问题。 - John Colemanmicrobenchmark
而非benchmark
,在任何情况下,cmpfun
似乎都没有起到帮助的作用,反而似乎(微不足道地)产生了负面影响。 - John Colemancmp_mean
比my_mean
快了约3倍。 - amatsuo_net