我在R中有下面两个数据框,尝试了各种方法但仍无法达到所需的输出。
数据框:
ID Date city code uid
I-1 2020-01-01 10:12:15 New York 123 K-1
I-1 2020-01-01 10:12:15 Utha 103 K-1
I-2 2020-01-02 10:12:15 Washington 122 K-1
I-3 2020-02-01 10:12:15 Tokyo 123 K-2
I-3 2020-02-01 10:12:15 Osaka 193 K-2
I-4 2020-02-02 10:12:15 London 144 K-3
I-5 2020-02-04 10:12:15 Dubai 101 K-4
I-6 2019-11-01 10:12:15 Dubai 101 K-4
I-7 2019-11-01 10:12:15 London 144 K-3
I-8 2018-12-13 10:12:15 Tokyo 143 K-5
I-9 2019-05-17 10:12:15 Dubai 101 K-4
I-19 2020-03-11 10:12:15 Dubai 150 K-7
Dput:
structure(list(ID = c("I-1", "I-1",
"I-2", "I-3", "I-3", "I-4",
"I-5", "I-6", "I-7", "I-8", "I-9","I-19"
), DATE = c("2020-01-01 11:49:40.842", "2020-01-01 09:35:33.607",
"2020-01-02 06:14:58.731", "2020-02-01 16:51:27.190", "2020-02-01 05:35:46.952",
"2020-02-02 05:48:49.443", "2020-02-04 10:00:41.616", "2019-11-01 09:10:46.536",
"2019-11-01 11:54:05.655", "2018-12-13 14:24:31.617", "2019-05-17 14:24:31.617", "2020-03-11 14:24:31.617"), CITY = c("New York",
"UTAH", "Washington", "Tokyo",
"Osaka", "London", "Dubai",
"Dubai", "London", "Tokyo", "Dubai",
"Dubai"), CODE = c("221010",
"411017", "638007", "583101", "560029", "643102", "363001", "452001",
"560024", "509208"), UID = c("K-1",
"K-1", "K-1", "K-2", "K-2",
"K-3", "K-4", "K-4", "K-3",
"K-5","K-4","K-7")), .Names = c("ID", "DATE",
"CITY", "CODE", "UID"), row.names = c(NA,
10L), class = "data.fram)
使用上述两个数据框,我想获取从2020年1月1日至2002年2月29日之间的记录,并将这些ID与整个数据库中的其他ID进行比较,以检查城市和代码是否匹配,并进一步分类以检查有多少相同uid和多少不同uid。
其中,
- 匹配 - 城市和代码组合与数据库中的其他ID匹配 - Same_uid - 分类Match ID以识别有多少个ID具有相似的uid - different_uid - 分类Match ID以识别有多少个ID没有相似的uid - uid_count - 整个数据库中具有特定ID的相似uid的计数
注意 - 数据框中有超过1000万条记录。
所需输出
ID Date city code uid Match Same_uid different_uid uid_count
I-1 2020-01-01 10:12:15 New York 123 K-1 No 0 0 2
I-2 2020-01-02 10:12:15 Washington 122 K-1 No 0 0 2
I-3 2020-02-01 10:12:15 Tokyo 123 K-2 No 0 0 1
I-4 2020-02-02 10:12:15 London 144 K-3 Yes 1 0 2
I-5 2020-02-04 10:12:15 Dubai 101 K-4 Yes 2 0 3
Date
也是如此。如果仅仅是最近的“ID”对您有趣,那我会更新我的答案。 - Bryan Wammackcode
是要匹配的代码。上面的示例中没有重复的ID
。请查看我在原始帖子中所做的编辑。 - Bryan Wammack