将向量化循环应用于向量元素

Question

将向量化循环应用于向量元素

12

我很难想出以下问题的快速解决方案:

我有一组观测值的向量，它表示某些现象的观测时间。

example <- c(0,0,0,1,0,1,1,0,0,0,-1,0,0,-1,-1,0,0,1,0,0);

现在我想要消除特定观测之间的零值，假设某个现象会持续到出现矛盾的观测为止，例如，在第三次观测中观察到“1”，则在观测到第一个“-1”元素时，我只想保留“1”直到第11个元素。因此，我期望的输出如下：

desired.output <- c(0,0,0,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1);

> print(cbind(example, desired.output))
      example desired.output
 [1,]       0              0
 [2,]       0              0
 [3,]       0              0
 [4,]       1              1
 [5,]       0              1
 [6,]       1              1
 [7,]       1              1
 [8,]       0              1
 [9,]       0              1
[10,]       0              1
[11,]      -1             -1
[12,]       0             -1
[13,]       0             -1
[14,]      -1             -1
[15,]      -1             -1
[16,]       0             -1
[17,]       0             -1
[18,]       1              1
[19,]       0              1
[20,]       0              1

我的笨拙解决方案是

for (i in 1:length(example)){
     if (example[i] != 0){
       current <- example[i];
       while ((example[i] != -current) & (i <= length(example))){
         example[i] <- current;
         i <- i+1;
       }
    }
}

我将感激任何加速此过程的帮助。

- Banach

3个回答

7

我相信一定有人会提出更好的纯R语言解决方案，但我的第一次尝试是只使用一个循环，具体如下：

x <- c(0,0,0,1,0,1,1,0,0,0,-1,0,0,-1,-1,0,0,1,0,0)

last <- x[1]
for (i in seq_along(x)) {
   if (x[i] == 0) x[i] <- last
   else last <- x[i] 
}

x
## [1]  0  0  0  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1  1  1  1

上述内容很容易转化为有效的C++代码：

Rcpp::cppFunction('
NumericVector elimzeros(NumericVector x) {
   int n = x.size();
   NumericVector y(n);
   double last = x[0];
   for (int i=0; i<n; ++i) {
      if (x[i] == 0)
         y[i] = last;
      else
         y[i] = last = x[i];
   }
   return y;
}
')

elimzeros(x)
## [1]  0  0  0  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1  1  1  1

一些基准测试结果：

set.seed(123L)
x <- sample(c(-1,0,1), replace=TRUE, 100000)
# ...
microbenchmark::microbenchmark(
   gagolews(x),
   gagolews_Rcpp(x),
   Roland(x),
   AndreyShabalin_match(x),
   AndreyShabalin_findInterval(x),
   AndreyShabalin_cumsum(x),
   unit="relative"
)
## Unit: relative
##                            expr        min         lq     median         uq        max neval
##                     gagolews(x) 167.264538 163.172532 162.703810 171.186482 110.604258   100
##                gagolews_Rcpp(x)   1.000000   1.000000   1.000000   1.000000   1.000000   100
##                       Roland(x)  33.817744  34.374521  34.544877  35.633136  52.825091   100
##         AndreyShabalin_match(x)  45.217805  43.819050  44.105279  44.800612  58.375625   100
##  AndreyShabalin_findInterval(x)  45.191419  43.832256  44.283284  45.094304  23.819259   100
##        AndreyShabalin_cumsum(x)   8.701682   8.367212   8.413992   9.938748   5.676467   100

- gagolews

很棒的基准测试。虽然我不希望能够击败Rcpp，但你也可以测试一下我的最新代码吗？ - Andrey Shabalin

7

我怀疑你的0值实际上是NA值。在这里，我将它们更改为NA，然后使用来自zoo包的na.locf（向前填充法）：

example <- c(0,0,0,1,0,1,1,0,0,0,-1,0,0,-1,-1,0,0,1,0,0)
res <- example
#res[res==0] <- NA
#the same but faster
res <- res/res*res
library(zoo)
res <- na.locf(res,  na.rm = FALSE)
res[is.na(res)] <- 0
cbind(example, res)
#       example res
#  [1,]       0   0
#  [2,]       0   0
#  [3,]       0   0
#  [4,]       1   1
#  [5,]       0   1
#  [6,]       1   1
#  [7,]       1   1
#  [8,]       0   1
#  [9,]       0   1
# [10,]       0   1
# [11,]      -1  -1
# [12,]       0  -1
# [13,]       0  -1
# [14,]      -1  -1
# [15,]      -1  -1
# [16,]       0  -1
# [17,]       0  -1
# [18,]       1   1
# [19,]       0   1
# [20,]       0   1

- Roland

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andrey Shabalin · Accepted Answer

我会尝试提供一个纯R解决方案：

example <- c(0,0,0,1,0,1,1,0,0,0,-1,0,0,-1,-1,0,0,1,0,0);

cs = cumsum(example!=0);
mch = match(cs, cs);
desired.output = example[mch];

print(cbind(example,desired.output))

更新: 使用以下代码可以更快地计算 mch

mch = findInterval(cs-1,cs)+1

UPD2: 我喜欢@Roland的回答。它可以缩短为两行：

NN = (example != 0);
desired.output = c(example[1], example[NN])[cumsum(NN) + 1L];