如何在两个表格之间使用sumif函数?

13

我有两个表格,需要对它们进行“sumif”操作。表格1包含时间段,即年末季度(例如4812等)。表格2包含了在第3、6、7等季度发生的交易。

我需要表格3汇总整年的所有交易,以获得年末的累计情况。

下面是一些样本代码,用于说明数据的外观和输出结果应该如何:

library(data.table)

x1 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36))
x2 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(3, 6, 7, 9, 11, 14, 16, 20, 24), 
                 "Amount" = c(10000, 15000, -2500, 3500, -6500, 25000, 
                              11000, 9000, 7500))
x3 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36), 
                 "Amount" = c(10000, 22500, 19500, 55500, 64500, 72000, 
                              72000, 72000, 72000))

我已经尝试过mergesummarisefoverlaps,但还没完全弄清楚。

1个回答

13

不错的问题。基本上您要做的是通过NameYearQuarter <= Quarter进行连接,同时对所有匹配的Amount值进行求和。这既可以使用新的非等连接(在最新稳定版本的data.table v-1.10.0中引入)实现,也可以使用foverlaps(后者可能不太优化)。

非等连接:

x2[x1, # for each value in `x1` find all the matching values in `x2`
   .(Amount = sum(Amount)), # Sum all the matching values in `Amount`
   on = .(Name, Year, Quarter <= Quarter), # join conditions
   by = .EACHI] # Do the summing per each match in `i`
#    Name Year Quarter Amount
# 1: LOB1 2000       4  10000
# 2: LOB1 2000       8  22500
# 3: LOB1 2000      12  19500
# 4: LOB1 2000      16  55500
# 5: LOB1 2000      20  64500
# 6: LOB1 2000      24  72000
# 7: LOB1 2000      28  72000
# 8: LOB1 2000      32  72000
# 9: LOB1 2000      36  72000
作为一个附注,您可以轻松地按照@Frank的建议,在x1的位置上添加Amount
x1[, Amount := 
  x2[x1, sum(x.Amount), on = .(Name, Year, Quarter <= Quarter), by = .EACHI]$V1
]

如果您在该表中有超过三个连接列,则这可能会很方便。


foverlaps:

您提到了foverlaps,理论上您也可以使用此函数实现相同的操作。但我担心您很容易会耗尽内存。使用foverlaps,您需要创建一个巨大的表,将x2中的每个值多次加入到x1中的每个值中,并将所有内容存储在内存中。

x1[, Start := 0] # Make sure that we always join starting from Q0
x2[, Start := Quarter] # In x2 we want to join all possible rows each time 
setkey(x2, Name, Year, Start, Quarter) # set keys
## Make a huge cartesian join by overlaps and then aggregate
foverlaps(x1, x2)[, .(Amount = sum(Amount)), by = .(Name, Year, Quarter = i.Quarter)]
#    Name Year Quarter Amount
# 1: LOB1 2000       4  10000
# 2: LOB1 2000       8  22500
# 3: LOB1 2000      12  19500
# 4: LOB1 2000      16  55500
# 5: LOB1 2000      20  64500
# 6: LOB1 2000      24  72000
# 7: LOB1 2000      28  72000
# 8: LOB1 2000      32  72000
# 9: LOB1 2000      36  72000

非常感谢 - 我刚刚让它工作了。非常感激!看起来我的两个表格都需要有相同的列。如果x2有一个额外的列,我不想在结果表x3中包含,那么代码是否相同? - kodfather
您可以在on参数中指定来自两个表的任何列名。例如.on(column1 = column2, column3 = column4)等。方程式的左侧是来自x1的列,而方程式的右侧是来自x2的列。 - David Arenburg

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接