在R中，如果选择矩阵的第一个元素，则是按行选择，如果选择向量的第一个元素，则直接选择第一个元素。

Question

在R中，如果选择矩阵的第一个元素，则是按行选择，如果选择向量的第一个元素，则直接选择第一个元素。

4

在 R 中，有没有一种优雅的语法可以根据对象类型选择每行中前 n 个元素或向量中前 n 个元素？

我当然可以使用条件语句来实现，但我想知道是否有一种简单的解决方案。我还想避免由于效率问题而对整个矩阵调用 t()。

M = matrix(1:12,3,4)
x = 1:12

slct = function(obj,n){
  if(is.matrix(obj)) res = c(t(obj))[1:n]
  if(is.vector(obj)) res = obj[1:n]
  res
}
slct(M,5); slct(x,5)

- user36302

关于条件语句，可以简单地使用 as.vector(object) 或 c(object)。然而，不幸的是我认为你无法避免转置操作。因此，我认为最好的答案是类似这样的：c(t(object))[1:n]。 - byouness

5个回答

1

这是什么意思？

slct = function(obj,n){
  if(is.matrix(obj)) res = as.vector(matrix(M, dim(M),
                                            byrow = TRUE))[1:n]
  if(is.vector(obj)) res = obj[1:n]
  res
}
> slct(M,5); slct(x,5)
[1] 1 5 9 2 6
[1] 1 2 3 4 5

根据基准测试，似乎速度快了一倍：

Unit: microseconds
   expr   min    lq     mean median    uq       max neval cld
    t() 7.654 8.420 9.077494  8.675 8.675 10440.259 1e+05   b
 matrix 3.316 3.827 4.411272  4.082 4.083  9502.881 1e+05  a

注意: 在第二行中应该使用is.vector而不是is.numeric，因为is.numeric(M)会返回TRUE。

- jay.sf

1

您可以利用数组索引中的[。

# new function
slct2 <- function(x, n){
  if(is.vector(x)) i <- 1:n
  if(is.matrix(x))
    i <- cbind(ceiling(1:n/ncol(mat)), rep_len(seq(ncol(mat)), n))
  x[i]
}
# old function
slct = function(obj,n){
  if(is.matrix(obj)) res = c(t(obj))[1:n]
  if(is.vector(obj)) res = obj[1:n]
  res
}

基准测试

m <- 1e4
mat <- matrix(runif(m^2), m)
n <- floor(m*2.3)
all.equal(slct(mat, n), slct2(mat, n))
# [1] TRUE
microbenchmark(slct(mat, n), slct2(mat, n), times = 10)
# Unit: milliseconds
#           expr         min          lq        mean      median         uq        max neval
#   slct(mat, n) 2471.438599 2606.071460 3466.046729 3137.255011 4420.69364 4985.20781    10
#  slct2(mat, n)    2.358151    4.748712    6.627644    4.973533   11.05927   13.73906    10

- IceCreamToucan

0

你不能只使用 head 吗？...

head(c(t(M)),5)
[1]  1  4  7 10  2

head(c(t(x)),5)
[1] 1 2 3 4 5

- Andrew Gustar

2

我正在寻找一个解决方案，至少在整个矩阵中不调用t()函数... - user36302

0

这里是基于R的解决方案：

funEmil <- function(obj, n) {
    myDim <- dim(obj)
    vec <- 1:n
    if (is.null(myDim))
        return(obj[vec])

    nr <- myDim[1]
    nc <- myDim[2]
    vec1 <- vec - 1L
    rem <- vec1 %% nc
    quot <- vec1 %/% nc
    obj[quot + (rem * nr + 1L)]
}

它依赖于基本的向量化模数算术运算%%和整数除法%/%。它也非常快速：

set.seed(42)
MBig <- matrix(sample(10^7, 10^6, replace = TRUE), nrow = 10^4)

funOP = function(obj,n){
    if(is.matrix(obj)) res = c(t(obj))[1:n]
    if(is.vector(obj)) res = obj[1:n]
    res
}

funRyan <- function(x, n){
    if(is.vector(x)) i <- 1:n
    if(is.matrix(x))
        i <- cbind(ceiling(1:n/ncol(x)), rep_len(seq(ncol(x)), n))
    x[i]
}


n <- 25000

## Returns same results
all.equal(funRyan(MBig, n), funEmil(MBig, n))
[1] TRUE

all.equal(funOP(MBig, n), funEmil(MBig, n))
[1] TRUE

library(microbenchmark)
microbenchmark(funOP(MBig, n), funRyan(MBig, n), funWoody(MBig, n), unit = "relative")
Unit: relative
             expr      min       lq     mean   median       uq       max neval
   funOP(MBig, n) 6.154284 5.915182 5.659250 5.880826 9.140565 1.0344393   100
 funRyan(MBig, n) 1.015332 1.030278 1.028644 1.018446 1.032610 0.8330967   100
 funEmil(MBig, n) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000   100

这里是使用 @Ryan 的示例和 OP 修改后的解决方案进行基准测试的结果：

n <- 1e4
mat <- matrix(runif(n^2), n)
s <- floor(n*2.3)

microbenchmark(funOP(mat, s), funRyan(mat, s), 
               funWoody(mat, s), funOPmod(mat, s), unit = "relative", times = 10)
Unit: relative
            expr         min          lq        mean      median          uq         max neval
   funOP(mat, s) 6189.449838 5558.293891 3871.425974 5139.192594 2443.203331 2222.778805    10
 funRyan(mat, s)    2.633685    3.032467    2.155205    2.863710    1.445421    1.537473    10
 funEmil(mat, s)    2.654739    2.714287    1.969482    2.642673    1.277088    1.326510    10
funOPmod(mat, s)    1.000000    1.000000    1.000000    1.000000    1.000000    1.000000    10

新修改后的速度更快，仍然能够给出正确的结果...非常令人印象深刻！！

identical(funOPmod(mat, s), funRyan(mat, s))
[1] TRUE

- Emil

有任何原因给我点踩吗？能否解释一下，这样我就可以解决我的回答中的任何不足之处了。 - Emil

@user36302，非常感谢...我认为您修改后的解决方案是迄今为止最佳的答案，因为它非常简洁和快速。 - Emil

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user36302 · Accepted Answer

所以避免在整个矩阵上调用t（）是关键。我认为其他解决方案更有趣和教育意义，但我看到的最快的解决方案是以下内容。

效率可能仅仅是因为它们依赖于C子例程来执行与其他建议相同的向量化。可能，如果您只需要元素1：n的特定子集，则修改其他方法可能会更快。

我仍然想知道是否有一些内置函数可以做到这一点？

以下是我的两个解决方案（感谢其他帖子的一些想法）：

funOPmod2 = function(obj,n){
  if(is.matrix(obj)){ 
    nc = ncol(obj)
    nr = (n %/% nc) + 1
    subM = obj[1:nr,]
    res = matrix(subM, ncol = nr,
                 byrow = TRUE)[1:n] }
  if(is.vector(obj)) res = obj[1:n]
  res
}

funOPmod = function(obj,n){
  if(is.matrix(obj)){ 
    nc = ncol(obj)
    nr = (n %/% nc) + 1
    res = t(obj[1:nr,])[1:n] }
  if(is.vector(obj)) res = obj[1:n]
  res
}

funOP = function(obj,n){
  if(is.matrix(obj)) res = c(t(obj))[1:n]
  if(is.vector(obj)) res = obj[1:n]
  res
}


funRyan <- function(x, n){
  if(is.vector(x)) i <- 1:n
  if(is.matrix(x))
    i <- cbind(ceiling(1:n/ncol(x)), rep_len(seq(ncol(x)), n))
  x[i]
}

funEmil <- function(obj, n) {
  myDim <- dim(obj)
  vec <- 1:n
  if (is.null(myDim))
    return(obj[vec])

  nr <- myDim[1]
  nc <- myDim[2]
  vec1 <- vec - 1L
  rem <- vec1 %% nc
  quot <- vec1 %/% nc
  obj[quot + (rem * nr + 1L)]
}

n <- 25000

set.seed(42)
MBig <- matrix(sample(10^7, 10^6, replace = TRUE), nrow = 10^4)

## Returns same results
all.equal(funOPmod2(MBig, n), funOP(MBig, n))
all.equal(funOPmod(MBig, n), funOP(MBig, n))
all.equal(funOP(MBig, n), funEmil(MBig, n))
all.equal(funRyan(MBig, n), funEmil(MBig, n))



library(microbenchmark)
microbenchmark(funOP(MBig, n), funOPmod(MBig, n), funOPmod2(MBig, n), funRyan(MBig, n), funEmil(MBig, n), unit = "relative")

Unit: relative
               expr       min        lq      mean    median        uq        max neval
     funOP(MBig, n) 13.788456 13.343185 15.776079 13.104634 15.064036 13.1959488   100
  funOPmod(MBig, n)  1.052210  1.089507  1.071219  1.118461  1.025714  0.4533697   100
 funOPmod2(MBig, n)  1.000000  1.000000  1.000000  1.000000  1.000000  1.0000000   100
   funRyan(MBig, n)  2.689417  2.694442  2.464471  2.637720  2.351565  0.9274931   100
   funEmil(MBig, n)  2.760368  2.681478  2.434167  2.591716  2.308087  0.8921837   100