删除 重复 行在一个 数据帧 中是 相对容易 的。然而,从数据框中删除行内的重复元素 是一个更具挑战性的问题。
让我们从这个 df
开始:
df <- structure(list(V1 = c("B1182", "B1182", "B1182", "B1182", "B1182",
"B1182", "B1182", "B1182", NA, NA, "B1182", "B1182", "B1182",
NA, NA, NA, NA, "P2000", "P2000", NA), V2 = c("B124D", "B124D",
"B124D", "B124D", "B124D", "B124D", "B124D", "B124D", NA, NA,
"B124D", "B124D", "B124D", NA, NA, NA, NA, "P2000", "P2000",
NA), V3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, "U3003", "U3003", NA), V4 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "P2000",
"P2000", NA), V5 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), V6 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), V7 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
V8 = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7",
"V8"), row.names = c(NA, 20L), class = "data.frame")
这是df
的输出结果:
V1 V2 V3 V4 V5 V6 V7 V8
1 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
2 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
3 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
4 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
5 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
6 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
7 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
8 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
11 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
12 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
13 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
14 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
15 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
18 P2000 P2000 U3003 P2000 <NA> <NA> <NA> <NA>
19 P2000 P2000 U3003 P2000 <NA> <NA> <NA> <NA>
20 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
如您所见,第18和19行包含重复的代码(P2000
)。我想删除这些重复元素,并仅保留出现在该行中的第一个元素。请注意,这是我的原始df
的摘录,因此它必须适用于所有情况。
期望的输出可能像这样:
V1 V2 V3 V4 V5 V6 V7 V8
1 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
2 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
3 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
4 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
5 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
6 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
7 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
8 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
11 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
12 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
13 B1182 B124D <NA> <NA> <NA> <NA> <NA> <NA>
14 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
15 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
18 P2000 <NA> U3003 <NA> <NA> <NA> <NA> <NA>
19 P2000 <NA> U3003 <NA> <NA> <NA> <NA> <NA>
20 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
我不关心变量本身,因为它们稍后会被重新排列和转换。
那么,在这个df
中,如何删除行内的重复元素?谢谢。