在R中将向量中两个位置的值相加/合并

Question

在R中将向量中两个位置的值相加/合并

6

我有一个在 R 语言中的整数向量。我希望从该向量中随机选择 n 个位置，并在向量中进行“合并”（即求和）。这个过程可以发生多次，例如，在长度为 100 的向量中，可能会发生 5 次合并/求和事件，其中每个事件分别合并了 2、3、2、4 和 2 个向量位置。例如：

#An example original vector of length 10:
ex.have<-c(1,1,30,16,2,2,2,1,1,9)

#For simplicity assume some process randomly combines the 
#first two [1,1] and last three [1,1,9] positions in the vector. 

ex.want<-c(2,30,16,2,2,2,11)

#Here, there were two merging events of 2 and 3 vector positions, respectively

#EDIT: the merged positions do not need to be consecutive. 
#They could be randomly selected from any position.

除此之外，我还需要记录有多少个向量位置被“合并”，（如果向量中的位置没有被合并，则包括值1） - 把它们称为索引。由于上面的示例中第一个和第二个位置被合并，最后三个位置也被合并，因此索引数据看起来像：

ex.indices<-c(2,1,1,1,1,1,3)

最后，我需要把所有数据放入一个矩阵中，因此上面示例中的最终数据将是一个2列矩阵，其中一列为整数，另一列为索引：

ex.final<-matrix(c(2,30,16,2,2,2,11,2,1,1,1,1,1,3),ncol=2,nrow=7)

目前我正在寻求帮助，即在最简单的步骤上：合并向量中的位置。我已经尝试了多种sample和split函数的变化，但是一直没有进展。例如，sum(sample(ex.have,2))将累加两个随机选择的位置（或sum(sample(ex.have,rpois(1,2))将在n值上添加一些随机性），但我不确定如何利用它来实现所需的数据集。详尽的搜索导致了关于合并向量的多篇文章，但没有关于合并向量中的位置的文章，如果这是一个重复的问题，我很抱歉。如何处理这些问题的任何建议都将不胜感激。

- jpsmith

这可能很有趣。有几个问题。(1)您如何确定要求和的元素的数量？那是一个随机数吗？换句话说：合并前2个和后3个元素的规则是什么？(2)选择将被合并的元素的索引的规则是什么？它们是（均匀）随机选择的吗？我可以想到一些可能会出现或不会出现的边缘情况。例如，如果起始位置是最后一个元素，并且您想要对接下来的4个元素求和（这些元素不存在），该怎么办。您提供的一些细节将有助于澄清如何处理这些情况。 - Maurits Evers

还有一个问题：是什么决定了每个向量的合并次数？这也是一个（均匀）随机数吗？ - Maurits Evers

求和和跟踪似乎很简单 - 它只是分组求和，您可以使用分组求和FAQ中的喜爱方法。正如Maurits所说，有趣（且不清楚）的部分是随机选择索引。需要更多信息。 - Gregor Thomas

谢谢！首先，我意识到在我的示例中，两个合并操作中的向量位置都是连续的位置 - 这不一定是必须的 - 例如，[1]和[4]位置可以合并而不是第一个合并中的[1]，[2]位置。为了简单起见（和实际应用），要求要相加的元素数量应该只在2-4之间均匀分布。因此，在每次合并事件中，可以随机选择向量中的任意2-4个元素进行合并。每个向量的合并次数应该成比例 - 即向量中20％的位置将被合并。 - jpsmith

2

在1和4位置的情况下，你的输出会是什么样子？具体来说，ex.indices会是什么样子？ - Cole

抱歉耽搁了（我的幼儿生病了！）- 指数将反映求和在新位置的位置。例如，如果在原始的4位置（连同最后3个一起）合并了1和4，并且结果向量为c(1,30,17,2,2,2,11)，则索引将为c(1,1,2,1,1,1,3)。但是如果它在原始的1位置c(17,1,30,2,2,2,11)，ides将为c(2,1,1,1,1,1,3)。求和发生的位置不重要，只要将索引映射到合并的位置即可。 - jpsmith

2个回答

1

这是我设计的一个函数，用于执行你描述的任务。

vec_merge 函数接受以下参数：

x: 一个整数向量。

event_perc: 事件的百分比。这是一个介于0到1之间的数字（虽然1可能太大了）。事件的数量是通过 x 的长度乘以 event_perc 计算得出的。

sample_n: 合并样本数。这是一个整数向量，所有数字都大于或至少等于 2。

vec_merge <- function(x, event_perc = 0.2, sample_n = c(2, 3)){
  # Check if event_perc makes sense
  if (event_perc > 1 | event_perc <= 0){
    stop("event_perc should be between 0 to 1.")
  }
  # Check if sample_n makes sense
  if (any(sample_n < 2)){
    stop("sample_n should be at least larger than 2")
  }
  # Determine the event numbers
  n <- round(length(x) * event_perc)
  # Determine the sample number of each event
  sample_vec <- sample(sample_n, size = n, replace = TRUE)
  names(sample_vec) <- paste0("S", 1:n)
  # Check if the sum of sample_vec is larger than the length of x
  # If yes, stop the function and print a message 
  if (length(x) < sum(sample_vec)){
    stop("Too many samples. Decrease event_perc or sampel_n")
  }
  # Determine the number that will not be merged
  n2 <- length(x) - sum(sample_vec) 
  # Create a vector with replicated 1 based on m
  non_merge_vec <- rep(1, n2)
  names(non_merge_vec) <- paste0("N", 1:n2)
  # Combine sample_vec and non_merge_vec, and then randomly sorted the vector
  combine_vec <- c(sample_vec, non_merge_vec)
  combine_vec2 <- sample(combine_vec, size = length(combine_vec))
  # Expand the vector
  expand_list <- list(lengths = combine_vec2, values = names(combine_vec2))
  expand_vec <- inverse.rle(expand_list)
  # Create a data frame with x and expand_vec
  dat <- data.frame(number = x, 
                    group = factor(expand_vec, levels = unique(expand_vec)))
  dat$index <- 1
  dat2 <- aggregate(cbind(dat$number, dat$index), 
                    by = list(group = dat$group),
                    FUN = sum)
  # # Convert dat2 to a matrix, remove the group column
  dat2$group <- NULL
  mat <- as.matrix(dat2)
  return(mat)
}

这是一个函数的测试。我将该函数应用于从1到10的序列。正如您所看到的，在这个例子中，4和5被合并了，8和9也被合并了。

set.seed(123)
vec_merge(1:10)
#      number index
# [1,]      1     1
# [2,]      2     1
# [3,]      3     1
# [4,]      9     2
# [5,]      6     1
# [6,]      7     1
# [7,]     17     2
# [8,]     10     1

- www

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

我想你可以编写以下类似的函数：

假设你要编写一个函数：

fun <- function(vec = have, events = merge_events, include_orig = TRUE) {
  if (sum(events) > length(vec)) stop("Too many events to merge")

  # Create "groups" for the events
  merge_events_seq <- rep(seq_along(events), events) 

  # Create "groups" for the rest of the data
  remainder <- sequence((length(vec) - sum(events))) + length(events)

  # Combine both groups and shuffle them so that the 
  # positions being combined are not necessarily consecutive
  inds <- sample(c(merge_events_seq, remainder))

  # Aggregate using `data.table`
  temp <- data.table(values = vec, groups = inds)[
    , list(count = length(values), 
           total = sum(values),
           pos = toString(.I),
           original = toString(values)), groups][, groups := NULL]

  # Drop the other columns if required. Return the output.
  if (isTRUE(include_orig)) temp[] else temp[, c("original", "pos") := NULL][]
}

该函数返回四列：

特定和中包含的值的计数（即您的ex.indices）。
求和相关值后的总和（即您的ex.want）。
原始向量中值的位置。
原始值本身，以便稍后进行验证。

通过将include_orig = FALSE，可以从结果中删除最后两列。如果您尝试合并的元素数量超过输入（ex.have）向量的长度，则函数还会产生错误。

以下是一些示例数据：

library(data.table)
set.seed(1) ## So you can recreate these examples with the same results
have <- sample(20, 10, TRUE)
have
##  [1]  4  7  1  2 11 14 18 19  1 10

merge_events <- c(2, 3)

fun(have, merge_events)
##    count total      pos   original
## 1:     1     4        1          4
## 2:     1     7        2          7
## 3:     2     2     3, 9       1, 1
## 4:     1     2        4          2
## 5:     3    40 5, 8, 10 11, 19, 10
## 6:     1    14        6         14
## 7:     1    18        7         18

fun(events = c(3, 4))
##    count total        pos     original
## 1:     4    39 1, 4, 6, 8 4, 2, 14, 19
## 2:     3    36    2, 5, 7    7, 11, 18
## 3:     1     1          3            1
## 4:     1     1          9            1
## 5:     1    10         10           10

fun(events = c(6, 4, 3))
## Error: Too many events to merge

input <- sample(30, 20, TRUE)
input
##  [1]  6 10 10  6 15 20 28 20 26 12 25 23  6 25  8 12 25 23 24  6

fun(input, events = c(4, 7, 2, 3))
##    count total                    pos                original
## 1:     7    92 1, 3, 4, 5, 11, 19, 20 6, 10, 6, 15, 25, 24, 6
## 2:     1    10                      2                      10
## 3:     3    71               6, 9, 14              20, 26, 25
## 4:     4    69          7, 12, 13, 16           28, 23, 6, 12
## 5:     2    45                  8, 17                  20, 25
## 6:     1    12                     10                      12
## 7:     1     8                     15                       8
## 8:     1    23                     18                      23

# Verification
input[c(1, 3, 4, 5, 11, 19, 20)]
## [1]  6 10  6 15 25 24  6

sum(.Last.value)
## [1] 92