编译过的R代码实际上比启用JIT的纯R代码运行更慢。

10
Efficient R programming the byte compilerR docment r byte compiler中,我了解到可以使用cmpfun将纯R函数编译成字节码以提高速度,而enableJIT可以通过启用即时编译来提高速度。

因此,我决定像第一个链接那样自己进行基准测试,使用以下代码:

library("compiler")
library("rbenchmark")

enableJIT(3)

my_mean = function(x) {
    total = 0
    n = length(x)
    for (each in x)
        total = total + each
    total / n
}

cmp_mean = cmpfun(my_mean, list(optimize = 3))

## Generate some data
x = rnorm(100000)
benchmark(my_mean(x), cmp_mean(x), mean(x), columns = c("test", "elapsed", "relative"), order = "relative", replications = 5000)

不幸的是,结果与第一个链接所示的完全不同。 my_mean 的性能甚至比 cmp_mean 更好:

         test elapsed relative
3     mean(x)   1.468    1.000
1  my_mean(x)  35.402   24.116
2 cmp_mean(x)  36.817   25.080

我无法弄清发生了什么。

编辑:

我的电脑上的 R 版本是 3.5.2

操作系统是 debian 9.8。我的电脑上的所有软件都使用 debian 提供的稳定源进行更新。

linux 内核版本为 4.9.0-8-amd64

编辑5:

我重写了脚本来测试不同组合的 optimizeJIT

#!/usr/bin/env Rscript

library("compiler")
library("microbenchmark")
library("rlist")

my_mean = function(x) {
    total = 0
    n = length(x)
    for (each in x)
    total = total + each
    total / n
}

do_cmpfun = function(f, f_name, optimization_level) {
    cmp_f = cmpfun(f, list(optimize = optimization_level))
    list(cmp_f, f_name, optimize = optimization_level)
}

do_benchmark = function(f, f_name, optimization_level, JIT_level, x) {
    result = summary(microbenchmark(f(x), times = 1000, unit = "us", control = list(warmup = 100)))
    data.frame(fun = f_name, optimize = optimization_level, JIT = JIT_level, mean = result$mean)
}

means = list(list(mean, "mean", optimize = -1), list(my_mean, "my_mean", optimize = -1))

for (optimization_level in 0:3)
    means = list.append(means, do_cmpfun(my_mean, "my_mean", optimization_level))

# Generate some data
x = rnorm(100000)

# Benchmark in different JIT levels
result = c()
for (JIT_level in 0:3) {
    enableJIT(JIT_level)

    for (f in means) {
    result = rbind(result, do_benchmark(f[[1]], f[[2]], f[[3]], JIT_level, x))
    }
}


# Sort result
sorted_result = result[order(result$mean), ]
rownames(sorted_result) = NULL

print("Unit = us, optimize = -1 means it is not processed by cmpfun")
print(sorted_result)

在运行 R 脚本之前,我执行了 sudo cpupower frequency-set --governor performance 命令,并获得了以下输出:

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   2   229.1841
2     mean       -1   1   229.3910
3     mean       -1   3   236.3680
4     mean       -1   0   252.9416
5  my_mean       -1   2  5242.0413
6  my_mean        3   0  5279.9710
7  my_mean        2   2  5297.5323
8  my_mean        2   1  5327.0324
9  my_mean       -1   1  5333.6941
10 my_mean        3   1  5336.4559
11 my_mean        2   0  5362.6644
12 my_mean        3   3  5410.1963
13 my_mean        2   3  5414.4616
14 my_mean       -1   3  5418.3823
15 my_mean        3   2  5437.3233
16 my_mean        1   2  9947.7897
17 my_mean        1   1 10101.6464
18 my_mean        1   3 10204.3253
19 my_mean        1   0 10323.0782
20 my_mean        0   0 26557.3808
21 my_mean        0   2 26728.5222
22 my_mean       -1   0 26901.4200
23 my_mean        0   3 26984.5200
24 my_mean        0   1 27060.6188

然而,当我使用由openblas 0.2.19-3提供的libblas.so.3liblapack.so.3进行update-alternative之后,optimize = 3JIT = 0my_mean成为性能最佳的一个(除了mean):

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   0   228.9361
2     mean       -1   1   229.1223
3     mean       -1   2   233.9757
4     mean       -1   3   241.7835
5  my_mean        3   0  5246.8089
6  my_mean       -1   1  5261.3951
7  my_mean       -1   2  5330.6310
8  my_mean        2   3  5362.2055
9  my_mean        3   1  5400.9983
10 my_mean        2   0  5418.7674
11 my_mean        2   1  5460.8133
12 my_mean        3   3  5464.8280
13 my_mean       -1   3  5520.7021
14 my_mean        2   2  5591.7352
15 my_mean        3   2  5610.6446
16 my_mean        1   3 10244.2832
17 my_mean        1   0 10274.7504
18 my_mean        1   1 10311.6423
19 my_mean        1   2 10735.6449
20 my_mean        0   2 26904.1858
21 my_mean       -1   0 26961.0536
22 my_mean        0   0 27115.8191
23 my_mean        0   3 27538.7224
24 my_mean        0   1 28133.6159

与“mkl 2019.02-057”相同:

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   1   257.8620
2     mean       -1   0   263.3743
3     mean       -1   2   280.6906
4     mean       -1   3   291.8409
5  my_mean        2   0  5445.3252
6  my_mean        2   2  5462.4575
7  my_mean        3   3  5560.2931
8  my_mean       -1   1  5591.0089
9  my_mean        3   1  5645.3897
10 my_mean        3   0  5676.1714
11 my_mean        3   2  5707.7964
12 my_mean        2   3  5757.7887
13 my_mean       -1   3  5856.0215
14 my_mean       -1   2  5897.1735
15 my_mean        2   1  6363.1090
16 my_mean        1   2  9973.7666
17 my_mean        1   1 10557.8154
18 my_mean        1   0 10926.6103
19 my_mean        1   3 16030.0326
20 my_mean        0   0 27461.4078
21 my_mean        0   1 27939.7680
22 my_mean       -1   0 27985.4590
23 my_mean        0   3 30394.2772
24 my_mean        0   2 33768.5701

3
我明白my_mean()cmp_mean()在所有问题规模上似乎处于势均力敌的状态。我无法复制此链接中显示它们的cmp_mean显着更好的图表。不确定那是什么时候写的。也许 R 解释器已经有所改进,现在 cmpfun() 做的优化已经自动完成了?另外,目录一览表显示某些内容与改变 R 使用的 BLAS 有关。也许他们的结果是在使用非默认 BLAS 时得出的?这是一个很好的问题。 - John Coleman
出于好奇,我进行了实验,有启用JIT和没有启用JIT两种情况,同时也使用了microbenchmark而非benchmark,在任何情况下,cmpfun似乎都没有起到帮助的作用,反而似乎(微不足道地)产生了负面影响。 - John Coleman
7
自从 3.4 版本起,JIT 已默认启用。也许这与书的过时有关。你可以尝试在 3.3、3.4 和 3.5 版本上进行测试以确认。 - pogibas
你可以很容易地利用rstudio.cloud测试不同版本的R。我刚刚测试了R3.1,但是cmp_meanmy_mean快了约3倍。 - amatsuo_net
差异非常小,可能不太显著。 - James
显示剩余2条评论
1个回答

4

虽然我还没有弄清楚为什么JIT编译没有加速你的代码,但我们可以通过使用Rcpp包来加速相同的函数的编译。

这样做会得到以下结果(其中mean_cpp是使用Rcpp编写和编译的函数):

         test elapsed relative
4 mean_cpp(x)    0.67    1.000
3     mean(x)    1.00    1.493
1  my_mean(x)   14.00   20.896
2 cmp_mean(x)   14.50   21.642

生成此功能的代码如下。
library("compiler")
library("rbenchmark")
library("Rcpp")


enableJIT(3)

my_mean = function(x) {
  total = 0
  n = length(x)
  for (each in x)
    total = total + each
  total / n
}

cmp_mean = cmpfun(my_mean, list(optimize = 3))


#we can also write this same function using the Rcpp package
cppFunction('double mean_cpp(NumericVector x) {
  double total = 0;
  int n = x.size();
  for(int i = 0; i < n; i++) {
    total += x[i];
  }
  return total / n;
}')


#run once to compile
mean_cpp(c(1))


## Generate some data
x = rnorm(100000)
benchmark(my_mean(x), cmp_mean(x), mean(x), mean_cpp(x),
          columns = c("test", "elapsed", "relative"),
          order = "relative", replications = 5000)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接