在R中使用C和并行化实现快速相关性

Question

在R中使用C和并行化实现快速相关性

crcorrelationsnowfall

6

今天我的项目是使用我所掌握的基本技能，用R语言编写一个快速相关性计算程序。我需要找出将近400个变量之间的相关性，每个变量都有将近一百万个观测值(即大小为p=1MM行和n=400列的矩阵)。

R语言的原生相关性函数对于每个变量的200个观测值和1MM行数据大约需要2分钟的时间。虽然我没有试图计算400个变量每列的观测值数，但我的猜测是这将需要近8分钟时间。而我只有不到30秒的时间来完成它。

因此，我想要做以下几件事情：

1- 用C语言编写一个简单的相关性函数，并在块中并行应用它(如下所示)。

2- 块 - 将相关矩阵分为三块(大小为K*K的左上方正方形区域，大小为(p-K)*(p-K)的右下方正方形区域，以及大小为K*(p-K)的右上方矩形区域)。这样就可以覆盖相关矩阵corr的所有单元格，因为我只需要上三角部分。

3- 通过snowfall并行运行C语言函数的.C调用。

n = 100
p = 10
X = matrix(rnorm(n*p), nrow=n, ncol=p)
corr = matrix(0, nrow=p, ncol=p)

# calculation of column-wise mean and sd to pass to corr function
mu = colMeans(X)
sd = sapply(1:dim(X)[2], function(x) sd(X[,x]))

# setting up submatrix row and column ranges
K = as.integer(p/2)

RowRange = list()
ColRange = list()
RowRange[[1]] = c(0, K)
ColRange[[1]] = c(0, K)

RowRange[[2]] = c(0, K)
ColRange[[2]] = c(K, p+1)

RowRange[[3]] = c(K, p+1)
ColRange[[3]] = c(K, p+1)

# METHOD 1. NOT PARALLEL
########################
# function to calculate correlation on submatrices
BigCorr <- function(x){
  Rows = RowRange[[x]]
  Cols = ColRange[[x]]    
  return(.C("rCorrelationWrapper2", as.matrix(X), as.integer(dim(X)), 
            as.double(mu), as.double(sd), 
            as.integer(Rows), as.integer(Cols), 
            as.matrix(corr)))
}

res = list()
for(i in 1:3){
  res[[i]] = BigCorr(i)
}

# METHOD 2
########################
BigCorr <- function(x){
    Rows = RowRange[[x]]
    Cols = ColRange[[x]]    
    dyn.load("./rCorrelation.so")
    return(.C("rCorrelationWrapper2", as.matrix(X), as.integer(dim(X)), 
          as.double(mu), as.double(sd), 
          as.integer(Rows), as.integer(Cols), 
          as.matrix(corr)))
}

# parallelization setup
NUM_CPU = 4
library('snowfall')
sfSetMaxCPUs() # maximum cpu processing
sfInit(parallel=TRUE,cpus=NUM_CPU) # init parallel procs
sfExport("X", "RowRange", "ColRange", "sd", "mu", "corr")  
res = sfLapply(1:3, BigCorr)
sfStop()

这里是我的问题：

对于方法1，它可以工作，但不是我想要的方式。我相信当我传递相关矩阵时，我正在传递一个地址，C将在源代码中进行更改。

# Output of METHOD 1
> res[[1]][[7]]
      [,1]      [,2]        [,3]       [,4]         [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1 0.1040506 -0.01003125 0.23716384 -0.088246793    0    0    0    0     0
 [2,]    0 1.0000000 -0.09795989 0.11274508  0.025754150    0    0    0    0     0
 [3,]    0 0.0000000  1.00000000 0.09221441  0.052923520    0    0    0    0     0
 [4,]    0 0.0000000  0.00000000 1.00000000 -0.000449975    0    0    0    0     0
 [5,]    0 0.0000000  0.00000000 0.00000000  1.000000000    0    0    0    0     0
 [6,]    0 0.0000000  0.00000000 0.00000000  0.000000000    0    0    0    0     0
 [7,]    0 0.0000000  0.00000000 0.00000000  0.000000000    0    0    0    0     0
 [8,]    0 0.0000000  0.00000000 0.00000000  0.000000000    0    0    0    0     0
 [9,]    0 0.0000000  0.00000000 0.00000000  0.000000000    0    0    0    0     0
[10,]    0 0.0000000  0.00000000 0.00000000  0.000000000    0    0    0    0     0
> res[[2]][[7]]
      [,1] [,2] [,3] [,4] [,5]        [,6]        [,7]        [,8]       [,9]       [,10]
 [1,]    0    0    0    0    0 -0.02261175 -0.23398448 -0.02382690 -0.1447913 -0.09668318
 [2,]    0    0    0    0    0 -0.03439707  0.04580888  0.13229376  0.1354754 -0.03376527
 [3,]    0    0    0    0    0  0.10360907 -0.05490361 -0.01237932 -0.1657041  0.08123683
 [4,]    0    0    0    0    0  0.18259522 -0.23849323 -0.15928474  0.1648969 -0.05005328
 [5,]    0    0    0    0    0 -0.01012952 -0.03482429  0.14680301 -0.1112500  0.02801333
 [6,]    0    0    0    0    0  0.00000000  0.00000000  0.00000000  0.0000000  0.00000000
 [7,]    0    0    0    0    0  0.00000000  0.00000000  0.00000000  0.0000000  0.00000000
 [8,]    0    0    0    0    0  0.00000000  0.00000000  0.00000000  0.0000000  0.00000000
 [9,]    0    0    0    0    0  0.00000000  0.00000000  0.00000000  0.0000000  0.00000000
[10,]    0    0    0    0    0  0.00000000  0.00000000  0.00000000  0.0000000  0.00000000
> res[[3]][[7]]
      [,1] [,2] [,3] [,4] [,5] [,6]       [,7]        [,8]        [,9]       [,10]
 [1,]    0    0    0    0    0    0 0.00000000  0.00000000  0.00000000  0.00000000
 [2,]    0    0    0    0    0    0 0.00000000  0.00000000  0.00000000  0.00000000
 [3,]    0    0    0    0    0    0 0.00000000  0.00000000  0.00000000  0.00000000
 [4,]    0    0    0    0    0    0 0.00000000  0.00000000  0.00000000  0.00000000
 [5,]    0    0    0    0    0    0 0.00000000  0.00000000  0.00000000  0.00000000
 [6,]    0    0    0    0    0    1 0.03234195 -0.03488812 -0.18570151  0.14064640
 [7,]    0    0    0    0    0    0 1.00000000  0.03449697 -0.06765511 -0.15057244
 [8,]    0    0    0    0    0    0 0.00000000  1.00000000 -0.03426464  0.10030619
 [9,]    0    0    0    0    0    0 0.00000000  0.00000000  1.00000000 -0.08720512
[10,]    0    0    0    0    0    0 0.00000000  0.00000000  0.00000000  1.00000000

但是原始的corr矩阵仍然保持不变：

> corr
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    0    0    0    0    0    0    0    0    0     0
 [2,]    0    0    0    0    0    0    0    0    0     0
 [3,]    0    0    0    0    0    0    0    0    0     0
 [4,]    0    0    0    0    0    0    0    0    0     0
 [5,]    0    0    0    0    0    0    0    0    0     0
 [6,]    0    0    0    0    0    0    0    0    0     0
 [7,]    0    0    0    0    0    0    0    0    0     0
 [8,]    0    0    0    0    0    0    0    0    0     0
 [9,]    0    0    0    0    0    0    0    0    0     0
[10,]    0    0    0    0    0    0    0    0    0     0

问题1：有没有办法确保C函数在源代码中更改corr的值？我仍然可以将它们合并以创建一个上三角形相关矩阵，但我想知道是否可能在源代码中进行更改。注意：这不能帮助我实现快速相关性，因为我只是在运行循环。

问题2：对于METHOD 2，如何在init步骤中将共享对象加载到每个核心以进行并行作业（而不是我所做的方式）？

问题3：这个错误是什么意思？我需要一些指针，并且我很乐意自己调试。

问题4：是否有一种快速的方法来计算1MM x 400矩阵的相关性，少于30秒？

当我运行METHOD 2时，我会得到以下错误：

R(6107) malloc: *** error for object 0x100664df8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Error in unserialize(node$con) : error reading from connection

以下是我用纯C编写的相关代码：

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <stddef.h>
#include <R.h> // to show errors in R


double calcMean (double *x, int n);
double calcStdev (double *x, double mu, int n);
double calcCov(double *x, double *y, int n, double xmu, double ymu);        

void rCorrelationWrapper2 ( double *X, int *dim, double *mu, double *sd, int *RowRange, int *ColRange, double *corr) {

    int i, j, n = dim[0], p = dim[1];
    int RowStart = RowRange[0], RowEnd = RowRange[1], ColStart = ColRange[0], ColEnd = ColRange[1];
    double xyCov;

    Rprintf("\n p: %d, %d <= row < %d, %d <= col < %d", p, RowStart, RowEnd, ColStart, ColEnd);

    if(RowStart==ColStart && RowEnd==ColEnd){
        for(i=RowStart; i<RowEnd; i++){
            for(j=i; j<ColEnd; j++){
                Rprintf("\n i: %d, j: %d, p: %d", i, j, p);
                xyCov = calcCov(X + i*n, X + j*n, n, mu[i], mu[j]);
                *(corr + j*p + i) = xyCov/(sd[i]*sd[j]);
            }
        }
    } else {
        for(i=RowStart; i<RowEnd; i++){
            for (j=ColStart; j<ColEnd; j++){
                xyCov = calcCov(X + i*n, X + j*n, n, mu[i], mu[j]);
                *(corr + j*p + i) = xyCov/(sd[i]*sd[j]);
            }
        }
    }
}


// function to calculate mean

double calcMean (double *x, int n){
    double s = 0;
    int i;
    for(i=0; i<n; i++){     
        s = s + *(x+i);
    }
    return(s/n);
}

// function to calculate standard devation

double calcStdev (double *x, double mu, int n){
    double t, sd = 0;
    int i;

    for (i=0; i<n; i++){
        t = *(x + i) - mu;
        sd = sd + t*t;
    }    
    return(sqrt(sd/(n-1)));
}


// function to calculate covariance

double calcCov(double *x, double *y, int n, double xmu, double ymu){
    double s = 0;
    int i;

    for(i=0; i<n; i++){
        s = s + (*(x+i)-xmu)*(*(y+i)-ymu);
    }
    return(s/(n-1));
}

- user1971988

@MartinMorgan - 根据我所使用的版本，R 的原生 cor 函数需要更多时间，就像我之前提到的那样。我正在使用 Andrey 下面的建议，对于 1MM x 400 变量，大约需要 2 分钟。会更新。 - user1971988

2个回答

4

有几件事情需要注意。

首先，如果您正在使用外部调用的 .C 接口，则默认情况下它会复制所有参数。这就是为什么对象 corr 没有被修改的原因。如果您想避免这种情况，则必须在 .C 调用中设置 DUP=false。然而，一般来说，使用 .C 来修改现有的 R 对象并不是首选的方法。相反，您可能希望创建一个新数组，并允许外部调用填充它，就像这样。

corr<-.C("rCorrelationWrapper2", as.double(X), as.integer(dim(X)), 
        as.double(mu), as.double(sd), 
        as.integer(Rows), as.integer(Cols), 
        result=double(p*q))$result
corr<-array(corr,c(p,q))

其次，如果想要编写快速相关函数，首先要尝试使用有效的BLAS实现编译R。这不仅会使你的相关函数更快，还会使所有的线性代数计算更快。好的免费选择是来自AMD的ACML或ATLAS。其中任一选择都能够非常快速地计算相关矩阵。其加速不仅仅是并行化--这些库还能够聪明地利用缓存以及在汇编级别进行优化，因此即使只有一个内核也会看到很大的改进。 http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/ http://math-atlas.sourceforge.net/ 最后，如果真的想要编写自己的C代码，建议使用openMP来自动将计算分配到不同的线程中，而不是手动完成。但对于像矩阵乘法这样基本的任务，使用已有的优化库可能更好。

- mrip

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andrey Shabalin · Accepted Answer

使用快速的BLAS（通过Revolution R或Goto BLAS），您可以在R中快速计算所有这些相关性，而无需编写任何C代码。在我的第一代Intel i7个人电脑上，这需要16秒：

n = 400;
m = 1e6;

# Generate data
mat = matrix(runif(m*n),n,m);
# Start timer
tic = proc.time();
# Center each variable
mat = mat - rowMeans(mat);
# Standardize each variable
mat = mat / sqrt(rowSums(mat^2));   
# Calculate correlations
cr = tcrossprod(mat);
# Stop timer
toc = proc.time();

# Show the results and the time
show(cr[1:4,1:4]);
show(toc-tic)

上面的R代码报告了以下时间：

 user  system elapsed 
31.82    1.98   15.74

我在我的MatrixEQTL软件包中使用这种方法。
http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/

关于R的各种BLAS选项的更多信息，请参见：
http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/runit.html#large