Rcpp函数比同样的R函数慢

Question

Rcpp函数比同样的R函数慢

4

我一直在编写一个R函数，用于计算某些分布的积分，如下所示。

EVofPsi = function(psi, probabilityMeasure, eps=0.01, ...){

distFun = function(u){
 probabilityMeasure(u, ...)
}
xx = yy = seq(0,1,length=1/eps+1)
summand=0

for(i in 1:(length(xx)-1)){
  for(j in 1:(length(yy)-1)){
    signPlus = distFun(c(xx[i+1],yy[j+1]))+distFun(c(xx[i],yy[j]))
    signMinus = distFun(c(xx[i+1],yy[j]))+distFun(c(xx[i],yy[j+1]))
    summand = c(summand, psi(c(xx[i],yy[j]))*(signPlus-signMinus))
  }
}
sum(summand)
}

它的功能很好，但速度相对较慢。经常听到重新使用编译语言如C++等重写函数会加速执行，特别是因为上述R代码涉及双重循环。所以我使用了Rcpp：

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double EVofPsiCPP(Function distFun, Function psi, int n, double eps) {

  NumericVector xx(n+1);
  NumericVector yy(n+1);
  xx[0] = 0;
  yy[0] = 0;

  // discretize [0,1]^2
  for(int i = 1; i < n+1; i++) {
      xx[i] = xx[i-1] + eps;
      yy[i] = yy[i-1] + eps;
  }

  Function psiCPP(psi);
  Function distFunCPP(distFun);
  double signPlus;
  double signMinus;
  double summand = 0;

  NumericVector topRight(2); 
  NumericVector bottomLeft(2);
  NumericVector bottomRight(2);
  NumericVector topLeft(2);

  // compute the integral
  for(int i=0; i<n; i++){
    //printf("i:%d \n",i);
    for(int j=0; j<n; j++){
      //printf("j:%d \n",j);
      topRight[0] = xx[i+1];
      topRight[1] = yy[j+1];
      bottomLeft[0] = xx[i];
      bottomLeft[1] = yy[j];
      bottomRight[0] = xx[i+1];
      bottomRight[1] = yy[j];
      topLeft[0] = xx[i];
      topLeft[1] = yy[j+1];
      signPlus = NumericVector(distFunCPP(topRight))[0] +  NumericVector(distFunCPP(bottomLeft))[0];
      signMinus = NumericVector(distFunCPP(bottomRight))[0] + NumericVector(distFunCPP(topLeft))[0];
      summand = summand + NumericVector(psiCPP(bottomLeft))[0]*(signPlus-signMinus);
      //printf("summand:%f \n",summand);
    }
  }
  return summand;
}

这个 C++ 函数的运行效果不错，让我很开心。然而，在测试两个函数时，C++ 函数的运行速度比较慢：

sourceCpp("EVofPsiCPP.cpp")
pFGM = function(u,theta){
  u[1]*u[2] + theta*u[1]*u[2]*(1-u[1])*(1-u[2])
}
psi = function(u){
  u[1]*u[2]
}
print(system.time(
for(i in 1:10){
  test = EVofPsi(psi, pFGM, 1/100, 0.2)  
}
))
test

print(system.time(
  for(i in 1:10){
    test = EVofPsiCPP(psi, function(u){pFGM(u,0.2)}, 100, 1/100)  
  }
))

所以，有没有一位愿意为我解释这个的专家？我的代码像猴子一样吗？有没有办法加速这个函数？此外，我还有第二个问题。事实上，我可以将输出类型double替换为SEXP，并将函数类型的参数也替换为SEXP，但似乎并没有什么区别。那么这有什么区别呢？

非常感谢您的帮助， Gildas

- user2635870

1

虽然NumericVector没有代码，但我猜测创建这么多此类的临时对象可能是瓶颈所在。 - lapk

2

将 R 函数传递给 C++ 函数速度较慢。如果您想要快速执行，需要在 C++ / Rcpp 代码中重新定义这些 R 函数。 - Kevin Ushey

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Romain Francois · Accepted Answer

已经有其他评论中的回答了，所以我只想强调一点：调用R函数是昂贵的，因为我们需要特别小心错误处理。仅在C++中循环并调用R函数并不是将您的代码重写为C++。尝试将psi和pFGM重写为C++函数，并在此处报告发生了什么。

您可能会争辩说，您失去了某些灵活性，不能再使用任何R函数。对于这种情况，我建议使用某种混合解决方案，其中您已实现了C++中最常见的情况，并在否则回退到R解决方案。

至于另一个问题，SEXP是R对象。这是R API的一部分。它可以是任何东西。当您从中创建一个Function（当创建一个带有Function参数的函数时，这是为您隐式完成的）时，您可以确保这确实是一个R函数。开销非常小，但在代码表达性方面的收益巨大。