大稀疏矩阵的Moore-Penrose广义逆

Question

大稀疏矩阵的Moore-Penrose广义逆

rsparse-matrixlinear-algebramatrix-inverselarge-data

16

我有一个正方形矩阵，有几万行和列，只有几个1和许多0，因此我使用Matrix包以高效的方式在R中存储。由于我用完了内存，所以base::matrix对象无法处理这么多的单元格。

我的问题是我需要这些矩阵的逆和Moore-Penrose广义逆，但我目前无法计算。

我尝试过以下方法:

solve yields an Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory) error
MASS::ginv is not compatible with the Matrix class
there is no direct way to convert a sparse Matrix to e.g. bigmemory::big.matrix, and that latter neither does work with MASS::ginv anyway

if I try to compute the Choleski factorization of the matrix to later call Matrix::chol2inv on that, I get the following error message:

Error in .local(x, ...) : 
  internal_chm_factor: Cholesky factorization failed
In addition: Warning message:
In .local(x, ...) :
  Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 431

based on a related question, I also gave a try to the pbdDMAT package on a single node, but pbdDMAT::chol yielded an Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 147 error message

问题简述：除了使用内存充足的计算机上的matrix类，还有没有其他方法计算这种稀疏矩阵的逆和Moore-Penrose广义逆？

快速可重现示例：

library(Matrix)
n <- 1e5
m <- sparseMatrix(1:n, 1:n, x = 1)
m <- do.call(rBind, lapply(1:10, function(i) {
    set.seed(i)
    m[-sample(1:n, n/3), ]
}))
m <- t(m) %*% m

以下是一些描述性词语（感谢Gabor Grothendieck）：

> dim(m)
[1] 100000 100000
> sum(m) / prod(dim(m))
[1] 6.6667e-05
> table(rowSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 
> table(colSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701

还有一些错误信息:

> Matrix::solve(m)
Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory)
> base::solve(m)
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
> MASS::ginv(m)
Error in MASS::ginv(m) : 'X' must be a numeric or complex matrix

的目标是找到一种在不到8Gb的RAM情况下计算 m 的Moore-Penrose广义逆的方法（速度和性能并不重要）。

- daroczig

1

计算伪逆后，您需要对其进行什么操作？ - Ben Bolker

谢谢@BenBolker，我尝试在http://www.scribd.com/doc/226039409#page=14的第14页上使用大矩阵实现公式（22）。我也希望能得到任何有关如何保存该计算的帮助，因为使用`Matrix::sparseMatrix`计算`D_1`和`D^a_2`非常快，但我却卡在了（伪）逆上。 - daroczig

@BenBolker，非常抱歉这个问题很烦人，请再试一次，我刚刚更新了隐私设置。 - daroczig

由于某些原因，您的示例 m 没有创建正方形矩阵。 - James

谢谢@James，我刚刚修复了这个例子。问题在于行/列的数量默认为生成样本的“max”。 - daroczig

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- G. Grothendieck · Accepted Answer

如果1的数量很少，那么您的矩阵可能在任何一列和任何一行中都没有超过一个1，这种情况下广义逆矩阵等于转置：

library(Matrix)
set.seed(123)
n <- 1e5
m <- sparseMatrix(sample(1:n, n/10), sample(1:n, n/10), x = 1, dims = c(n, n))

##############################################################################
# confirm that m has no more than one 1 in each row and column
##############################################################################
table(rowSums(m))  # here 90,000 rows are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

table(colSums(m))  # here 90,000 cols are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

##############################################################################
# calculate generalized inverse
##############################################################################
minv <- t(m)

##############################################################################
# check that when multiplied by minv it gives a diagonal matrix of 0's and 1's
##############################################################################
mm <- m %*% minv

table(diag(mm)) # diagonal has 90,000 zeros and 10,000 ones
##     0     1 
## 90000 10000 

diag(mm) <- 0
range(mm) # off diagonals are all zero
## [1] 0 0

修订版：展示更加出色。