将两个数据表连接起来,通过日期范围覆盖数值。

5

我希望根据另一个表中的覆盖来更正一个表。当dt_override具有该单元并且日期范围与dt_current重叠时,我想要更改dt_current中的值。

dt_current <- data.table( unit = c(rep("a",10), rep("b", 10)), 
    date = seq(as.Date("2015-1-1"), by = "day", length.out = 10), 
    num = 1:10, key = "unit")
dt_override <- data.table( unit = c("a", "a", "b", "zed" ), start_date = as.Date(c("2015-01-03", "1492-12-25", "2015-01-02", "2015-01-11")), 
    end_date = as.Date(c("2015-01-05", "1492-12-26", "2015-01-04", "2015-01-14")), 
    value = NA, key = "unit")

看起来我应该在连接这两个数据表时使用某种形式的.EACHI,编写如下代码,尽管它当然不起作用。

dt_current[dt_override, 
    num := if(i.start_date <= date & i.end_date >= date) i.value, 
    by = .EACHI]
3个回答

6
使用foverlaps可以实现以下功能:
dt_current[, date2 := date] # define end date
setkey(dt_current, unit, date, date2) # key by unit, start and end dates
setkey(dt_override, unit, start_date, end_date) # same

第一种选项,创建索引并按引用更新
indx <- foverlaps(dt_override, dt_current, which = TRUE) # run foverlaps and get indices
dt_current[indx$yid, num := dt_override[indx$xid, value]] # adjust by reference

或者,您可以以相反的方式运行foverlaps,避免创建indx,而是创建一个全新的数据集。

foverlaps(dt_current, dt_override)[!is.na(start_date), num := value
                                   ][, .SD, .SDcols = names(dt_current)]

4

另一种选择是使用滚动连接:

setkey(dt_current, unit, date)
setkey(dt_override, unit, start_date)

dt_current[, num := dt_override[dt_current, roll = T][end_date >= start_date,
                                                      num := value]$num]

# another version of the above, but using ifelse (unclear to me which one is faster)
dt_current[, num := dt_override[dt_current,
                                ifelse(end_date >= start_date, value, num), roll = T]]

2
这里有一种方法,枚举日期序列:
dt_override[,value:=as.integer(value)]
# It's necessary to convert to integer because `NA` is logical unless otherwise specified.

dto = dt_override[,.(
    unit,
    date = seq.Date(start_date,end_date,by="day"),
    value
),by=seq_along(dt_override)][,seq_along:=NULL]

setkey(dt_current,unit,date)
dt_current[dto,num:=i.value]

现在已经有了foverlaps,可能有更好的方法。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接