为每个组从0开始计算偏移累加和

Question

为每个组从0开始计算偏移累加和

4

我的样本数据如下所示：

>         gros id nr_oriz
>      1:   23  1       1
>      2:   16  1       2
>      3:   14  1       3
>      4:   15  1       4
>      5:   22  1       5
>      6:   30  1       6
>      7:   25  2       1
>      8:   10  2       2
>      9:   13  2       3
>     10:   17  2       4
>     11:   45  2       5
>     12:   25  4       1
>     13:   15  4       2
>     14:   20  4       3
>     15:   20  4       4
>     16:   20  4       5

其中gros是每个土壤层的深度，id是剖面编号，nr_horiz是土壤层号。我需要创建两列：顶部和底部，顶部是土层上限，底部是土层下限。我们只能获得底部值，使用以下方法：

topsoil$bottom<-ave(topsoil$gros,topsoil$id,FUN=cumsum)

但是对于最高值，我们需要以某种方式为每个 id 偏移数据，并从0开始累加总和，而不包括最后一个值，就像在这个例子中：

    gros id nr_oriz top bottom
 1:   23  1       1   0     23
 2:   16  1       2  23     39
 3:   14  1       3  39     53
 4:   15  1       4  53     68
 5:   22  1       5  68     90
 6:   30  1       6  90    120
 7:   25  2       1   0     25
 8:   10  2       2  25     35
 9:   13  2       3  35     48
10:   17  2       4  48     65
11:   45  2       5  65    110
12:   25  4       1   0     25
13:   15  4       2  25     40
14:   20  4       3  40     60
15:   20  4       4  60     80
16:   20  4       5  80    100

针对这个问题，是否有一个简单的解决方案？需要考虑到数据库非常大，我们无法手动完成（就像在这个示例中处理top列一样）。

- Rosca Bogdan

1

你可以尝试使用 library(data.table); setDT(topsoil)[ , top := c(0, cumsum(gros)), by = id] 这样的代码。 - grrgrrbla

1

看起来你有一个 data.table 对象，所以我建议你学习正确的 data.table 语法。你可以从这里开始：https://github.com/Rdatatable/data.table/wiki/Getting-started - David Arenburg

3个回答

4

您可以使用 data.table 的开发版本中的 shift 来实现此操作。安装开发版本的说明在这里

library(data.table)#v1.9.5+
setDT(topsoil)[, c('top', 'bottom'):= {tmp <- cumsum(gros)
          list(top= shift(tmp, fill=0), bottom=tmp)}, by = id]
topsoil
#    gros id nr_oriz top bottom
# 1:   23  1       1   0     23
# 2:   16  1       2  23     39
# 3:   14  1       3  39     53
# 4:   15  1       4  53     68
# 5:   22  1       5  68     90
# 6:   30  1       6  90    120
# 7:   25  2       1   0     25
# 8:   10  2       2  25     35
# 9:   13  2       3  35     48
#10:   17  2       4  48     65
#11:   45  2       5  65    110
#12:   25  4       1   0     25
#13:   15  4       2  25     40
#14:   20  4       3  40     60
#15:   20  4       4  60     80
#16:   20  4       5  80    100

- akrun

0

library(dplyr)
df %>% group_by(id) %>%
       mutate(bottom = cumsum(gros), top = lag(bottom)) %>%
       replace(is.na(.), 0)

- Shenglin Chen

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joshua Ulrich · Accepted Answer

您可以再次使用ave函数，但是应用于“底部”列，并使用自定义函数：

topsoil$top <- ave(topsoil$bottom, topsoil$id, FUN=function(x) c(0,x[-length(x)]))

看起来您正在使用 data.table 包，您可以修改代码以利用 data.table 的语法和性能。为了计算 bottom，您只需要执行以下操作：

topsoil[, bottom := cumsum(gros), by = id]

然后计算top的值：

topsoil[, top := c(0L, bottom[-.N]), by = id]

您也可以按照@akrun答案中的类似方式一步包裹它们。