OpenMP 和 (Rcpp)Eigen

Question

OpenMP 和 (Rcpp)Eigen

3

我想知道如何编写代码，有时使用内置于Eigen库中的OpenMP并行化，而有时使用我指定的并行化。希望下面的代码片段能提供关于我的问题的背景信息。我在我的库的设计阶段提出了这个问题（抱歉我没有一个工作/损坏的代码示例）。

#ifdef _OPENMP
  #include <omp.h>
#endif

#include <RcppEigen.h>

void fxn(..., int ncores=-1){
  if (ncores > 0) omp_set_num_threads(ncores);
  /*
  * Code with matrix products 
  * where I would like to use Eigen's 
  * OpenMP parallelization
  */ 

  #pragma omp parallel for
  for (int i=0; i < iter; i++){
  /* 
  * Code I would like to parallelize "myself"
  * even though it involves matrix products
  */
  }
}

如何控制Eigen自身并行化与OpenMP并行化之间的平衡是最佳实践？

更新：

我写了一个简单的例子并测试了ggael的建议。简而言之，我对它解决我提出的问题持怀疑态度（或者我做错了其他事情 - 如果是后者，请原谅）。请注意，使用显式并行化的for循环运行时间没有变化（甚至没有变慢）。

#ifdef _OPENMP
  #include <omp.h>
#endif 
#include <RcppEigen.h>

using namespace Rcpp;
// [[Rcpp::plugins(openmp)]]

// [[Rcpp::export]]
Eigen::MatrixXd testing(Eigen::MatrixXd A, Eigen::MatrixXd B, int n_threads=1){
  Eigen::setNbThreads(n_threads);
  Eigen::MatrixXd C = A*B;
  Eigen::setNbThreads(1);
  for (int i=0; i < A.cols(); i++){
    A.col(i).array() = A.col(i).array()*B.col(i).array(); 
  }
  return A;
}

// [[Rcpp::export]]
Eigen::MatrixXd testing_omp(Eigen::MatrixXd A, Eigen::MatrixXd B, int n_threads=1){
  Eigen::setNbThreads(n_threads);
  Eigen::MatrixXd C = A*B;
  Eigen::setNbThreads(1);
  #pragma omp parallel for num_threads(n_threads)
  for (int i=0; i < A.cols(); i++){
    A.col(i).array() = A.col(i).array()*B.col(i).array(); 
  }
  return A;
}


/*** R
A <- matrix(rnorm(1000*1000), 1000, 1000)
B <- matrix(rnorm(1000*1000), 1000, 1000)
microbenchmark::microbenchmark(testing(A,B, n_threads=1),
                               testing_omp(A,B, n_threads=1),
                               testing(A,B, n_threads=8), 
                               testing_omp(A,B, n_threads=8), 
                               times=10)
*/

Unit: milliseconds
                             expr       min        lq      mean    median        uq       max neval cld
     testing(A, B, n_threads = 1) 169.74272 183.94500 212.83868 218.15756 236.97049 264.52183    10   b
 testing_omp(A, B, n_threads = 1) 166.53132 178.48162 210.54195 227.65258 234.16727 238.03961    10   b
     testing(A, B, n_threads = 8)  56.03258  61.16001  65.15763  62.67563  67.37089  83.43565    10  a 
 testing_omp(A, B, n_threads = 8)  54.18672  57.78558  73.70466  65.36586  67.24229 167.90310    10  a

- jds

这真的不是一个关于 Rcpp 的问题。那是你和 Eigen 之间的事情。也许可以删除 rcpp 标签？ - Dirk Eddelbuettel

1

奇怪，你应该能够编辑你的帖子，包括标签。无论如何，我现在先删除了标签... - Ralf Stubner

我似乎无法在移动设备上运行。我认为这是一个权限问题。 - jds

你的结果看起来对我来说非常正常，因为成本完全被矩阵乘积所主导，其复杂度是 n^2 循环的 n^3 倍。 - ggael

1

此外，您的循环没有执行任何矩阵乘积，而是进行了系数逐个相乘，这些操作无论如何都不会被Eigen并行化。因此，在此循环中禁用或不禁用Eigen的多线程都不会有任何区别。顺便说一句，您的for循环也可以直接写成A.array() *= B.array();。 - ggael

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ggael · Accepted Answer

最简单的方法可能是在运行时禁用/启用Eigen的多线程功能：

Eigen::setNbThreads(1); // single thread mode
#pragma omp parallel for
for (int i=0; i < iter; i++){ 
  // Code I would like to parallelize "myself"
  // even though it involves matrix products
}
Eigen::setNbThreads(0); // restore default