数据框中每列的随机样本

Question

数据框中每列的随机样本

3

我想从每行中独立地随机抽取一个data.frame。以下是一个例子。这段代码为每一行选择相同的列，但我需要对每一行进行独立的列选择。

library(plyr)
set.seed(12345)
df1 <- mdply(data.frame(mean=c(10, 15)), rnorm, n = 5, sd = 1)
df1
  mean       V1       V2        V3        V4       V5
1   10 10.58553 10.70947  9.890697  9.546503 10.60589
2   15 13.18204 15.63010 14.723816 14.715840 14.08068
> df1[ , -1]
        V1       V2        V3        V4       V5
1 10.58553 10.70947  9.890697  9.546503 10.60589
2 13.18204 15.63010 14.723816 14.715840 14.08068
> sample(df1[, -1], replace = TRUE)
         V3       V2       V5        V4      V4.1
1  9.890697 10.70947 10.60589  9.546503  9.546503
2 14.723816 15.63010 14.08068 14.715840 14.715840
> t(apply(df1[, -1], 1, sample))
         [,1]      [,2]     [,3]     [,4]      [,5]
[1,] 10.70947  9.890697 10.60589 10.58553  9.546503
[2,] 14.71584 13.182044 14.08068 15.63010 14.723816

已编辑。

df1[ , -1]
            V1       V2        V3        V4       V5
    1 10.58553 10.70947  9.890697  9.546503 10.60589
    2 13.18204 15.63010 14.723816 14.715840 14.08068

sample(df1[, -1], replace = TRUE)
             V3       V2       V5        V4      V4.1
    1  9.890697 10.70947 10.60589  9.546503  9.546503
    2 14.723816 15.63010 14.08068 14.715840 14.715840

"sample(df1[, -1], replace = TRUE)" 选择了第二行中的 V3, V2, V5, V4 和 V4 这五列。但我需要它可以选择第一行中的这五列，以及第二行中的任意组合。

- MYaseen208

你需要 df1[sample(seq_along(1:ncol(df1))[-1])] 吗？ - akrun

1

@akrun - df1[sample(names(df1[-1]))] @akrun - df1[sample(names(df1[-1]))] - thelatemail

@akrun：请查看我的修改。谢谢。 - MYaseen208

你的意思是每一行的值可以独立吗？ - akrun

1

@akrun：它起作用了。你介意把你的评论改成答案吗？ - MYaseen208

显示剩余5条评论

2个回答

2

您可以一次性对列索引进行采样，然后使用矩阵子集来避免使用apply：

## Determine how many indices are required (nrow x (ncol - 1))
nsamp <- prod(dim(df1[, -1]))

## Sample from the number of desired columns, here 5 = ncol(df1[, -1])
mySamp <- sample.int(5, nsamp, replace = TRUE)

## Create a matrix of row and column indices
## Have to add 1 to mySamp to ignore first column of df1
myIdx <- cbind(rep(seq_len(nrow(df1)), ncol(df1) - 1), mySamp + 1)

## Return the corresponding values
matrix(df1[myIdx], nrow = nrow(df1))

#           [,1]     [,2]      [,3]      [,4]     [,5]
# [1,]  9.890697 10.60589  9.546503  9.546503 10.70947
# [2,] 15.630099 14.71584 15.630099 14.723816 14.72382

- BenBarnes

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

您可以使用具有replace=TRUE参数的apply函数来进行sample操作。

 t(apply(df1[,-1], 1, sample, replace=TRUE))