简介
这个图像清晰地描述了左连接的过程:基于两列lat, lon
,从一个data.table
中删除行,这些行恰好匹配另一个data.table
中的lat, lon
。
问题描述
假设我有一个名为"dt.master"
的data.table
,它包含100万行数据,每行数据包括一个id
和该位置的坐标lat, lon
。
id lat lon
1 43.23 5.43
2 43.56 4.12
3 52.14 -9.85
4 43.56 4.12
5 43.83 9.43
... ... ...
我希望做的是删除与特定坐标成对的行。您可以将这对坐标想象为被列入黑名单的(即名为"dt.blacklist"
的data.table
):
lat lon
43.56 4.12
11.14 -5.85
在这种情况下,当应用黑名单时,答案必须是:
id lat lon
1 43.23 5.43
3 52.14 -9.85
5 43.83 9.43
... ... ...
尽管看起来很简单,但我还是做不对。
我到目前为止做了什么
Using
merge
, like such:dt.result <- merge(dt.master, dt.blacklist[, c("lat", "lon")], by.x=c("lat", "lon"), by.y=c("lat", "lon"))
But that yields the rows that match and is thus an inner join. I thought about removing rows based on this result by using
subset
:subset(dt.master, lat != dt.result$lat & lon != dt.result$lon)
But the issue is that it partially works as then only 1 row in the above example is removed and not 2 rows as I would like. Somehow it only removes the first "hit".
Using a quick and dirty solution by concatenating
lat, lon
to a new column named"C"
in both data tables and then removing it as such:dt.master[C != dt.blacklist$C]
Yet, the same issue arises where only 1 of the two rows is removed.