我想在data.table的连接操作中同时使用i表格的多列进行计算和分组,但是目前语法好像有些限制。你能否建议更简洁的方法?
require(data.table)
set.seed(1)
表格1
DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12))
表格2
DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10))
直接在表上进行连接是可以的:[这里没有问题,但要求在 on 子句中使用引号括起来的列名似乎对于 data.table 来说不太一致]
DT1[DT2, on = "product"]
# loc product qty family price
# 1: L1 P1 0.1297134 A 5
# 2: L2 P1 0.2423550 A 5
# 3: L1 P1 0.3421633 A 5
# 4: L2 P1 0.6537663 A 5
# 5: L2 P2 0.9822407 A 7
# 6: L1 P2 0.8568853 A 7
# 7: L2 P2 0.7062672 A 7
# 8: L1 P2 0.9224086 A 7
# 9: L1 P3 0.8267184 B 10
#10: L2 P3 0.8408788 B 10
#11: L1 P3 0.6212432 B 10
#12: L2 P3 0.5363538 B 10
可以使用两个表的列进行计算:
DT1[DT2, .(family, product, val = qty*price), on = "product"]
# family product val
# 1: A P1 0.6485671
# 2: A P1 1.2117750
# 3: A P1 1.7108164
# 4: A P1 3.2688313
# 5: A P2 6.8756851
# 6: A P2 5.9981971
# 7: A P2 4.9438704
# 8: A P2 6.4568599
# 9: B P3 8.2671841
#10: B P3 8.4087878
#11: B P3 6.2124323
#12: B P3 5.3635379
我可以在 .EACHI 上进行分组和聚合。
DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = .EACHI]
# product family product val
#1: P1 A P1 6.83999
#2: P2 A P1 24.27461
#3: P3 B P1 28.25194
但不使用产品
DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = product]
#Error in `[.data.table`(DT1, DT2, .(family, product, val = sum(qty * price)), :
#object 'price' not found
在这种情况下,它停止在i表上寻找价格。在此情况下,.EACHI是可用的,因为“by”元素是DT2的唯一键。
然而,如果我想按DT2的属性分组,似乎无法使用.EACHI引用。我已经通过以下方式实现了我的目标:
DT1[DT2, .(family, product, val = qty*price), on = "product"][, .(sum(val)), by = family]
# family V1
#1: A 31.11460
#2: B 28.25194
这个双重处理是必要的吗?还是有其他语法可以在此情况下使用?
DT1[DT2, on="product"][,.(family, product, val = sum(qty*price)), by = product]
。我认为它们应该将其扩展到其他by=
选项,但我不确定这方面的功能请求在哪里提出。哦,也许是这个:https://github.com/Rdatatable/data.table/issues/733 - undefined.
符号代替引号,例如:DT1[DT2, on = .(product)]
。 - undefined