在R中基于多个范围连接表格

5

我有一个情况,需要将两个数据框连接起来。表格params描述了一个单位的参数,包括时间和角度范围。表格data更长,包含id、时间和角度参数。

当id匹配且时间在valid_from和valid_to之间,角度在data表中的angle_begin和angle_end之间时,我想从params中获取param值并进行连接。

以下是表格的示例:

params <- data.frame(id = 1:4
                    ,valid_from  = 1
                    ,valid_to    = c(10, 20, 30, 40)
                    ,angle_begin = c(120, 90, 0, 50)
                    ,angle_end   = c(180, 170, 160, 150)
                    ,param       = c("A", "B", "C", "D"))

data <- data.frame(id = rep(1:4, each=100)
                  ,time = rep(seq(from = 0.5, to = 50, by = 0.5), 4)
                  ,ang  = rep(runif(100, 0, 360), 4))
4个回答

11

使用data.table进行的是非等值连接

library(data.table)
# coerce to data.table
setDT(params)
setDT(data)

# keep only rows of data with matches in params
data[params, 
     on = .(id, time >= valid_from, time <= valid_to, ang >= angle_begin, ang <= angle_end),
     .(id, time = x.time, ang = x.ang, param)]
    id time        ang param
 1:  1  2.0 140.383052     A
 2:  1  3.5 152.772925     A
 3:  1  8.0 141.039548     A
 4:  2  1.0 104.434264     B
 5:  2  2.0 140.383052     B
 6:  2  3.5 152.772925     B
 7:  2  8.0 141.039548     B
 8:  2 16.0 150.424306     B
 9:  2 16.5  92.201187     B
10:  ...
41:  4 22.0  89.813795     D
42:  4 22.5 131.004229     D
43:  4 26.0  79.839443     D
44:  4 27.5 128.291356     D
45:  4 29.0 127.942287     D
46:  4 30.0 136.388594     D
47:  4 32.0 140.092817     D
48:  4 32.5 108.346831     D
49:  4 37.0 140.732844     D
    id time        ang param
如果需要保留所有data的行
params[data, 
       on = .(id, valid_from <= time, valid_to >= time, angle_begin <= ang, angle_end >= ang), 
       .(id, time = i.time, ang = i.ang, param)]
     id time       ang param
  1:  1  0.5 106.62639    NA
  2:  1  1.0 104.43426    NA
  3:  1  1.5  15.77429    NA
  4:  1  2.0 140.38305     A
  5:  1  2.5 322.31929    NA
 ---                        
396:  4 48.0 131.17405    NA
397:  4 48.5 335.47857    NA
398:  4 49.0 181.64450    NA
399:  4 49.5  90.96224    NA
400:  4 50.0  60.04268    NA

4
使用tidyverse,您可以尝试以下操作:
data %>%
  inner_join(params) %>%
  filter( time > valid_from & time < valid_to) %>%
  filter( ang > angle_begin & ang < angle_end)

1
鉴于您的措辞,我对您的问题有不同的解释。也就是说,我理解您想保留所有行,但只在valid_angle_的值在指定范围内时进行连接。 此外,请注意,根据您是否希望包括范围边界上的值,您可能需要使用>=<=代替><。 因此,从Aramis7d的答案开始:
data %>%
  inner_join(params, by = "id") %>%
  mutate(param = ifelse(
           time >= valid_from & time <= valid_to & 
             ang >= angle_begin & ang <= angle_end,
           param,
           NA))

这确实是我的意图。我使用了Aramis7d的答案,然后以right_join(data)结束了链。我相信这个结果是相同的。 - Kresten

1
另一种使用tidyverse进行操作的方法是使用包。
library(tidyverse)
library(fuzzyjoin)

params <- data.frame(id = 1:4
                     ,valid_from  = 1
                     ,valid_to    = c(10, 20, 30, 40)
                     ,angle_begin = c(120, 90, 0, 50)
                     ,angle_end   = c(180, 170, 160, 150)
                     ,param       = c("A", "B", "C", "D"))

data <- data.frame(id = rep(1:4, each=100)
                   ,time = rep(seq(from = 0.5, to = 50, by = 0.5), 4)
                   ,ang  = rep(runif(100, 0, 360), 4))


fuzzy_left_join(data, params,
                by = c("id" = "id",
                       "time" = "valid_from",
                       "time" = "valid_to",
                       "ang" = "angle_begin",
                       "ang" = "angle_end"),
                match_fun = list(`==`, `>`, `<`, `>`, `<`)) %>%
  slice(1:10)

感谢mfherman提供的贡献。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接