我有一个尺寸为58000*900的数据框,其中行值存在重复,我想遍历每一行并将它们删除。下面举个例子来更清楚地说明。
df
IDs Name col1 col2 col3
123 AB.C 1.3,1.3,1.3,1.3,1.3 0,0,0,0,0 5,5,5,5,5
234 CD-E 2,2,2,2,2 0.3,0.3,0.3,0.3,0.3 1,1,1,1,1
568 GHJ 123456 123456 123456
345 FGH 9,9,9,9,9 54,54,54,54,54 0,0,0,0,0
显然,每个值都被复制了5次,在某些情况下存在一个问题,即没有.
或,
分隔值。
我希望的是删除那些不包含.
或,
的行,并删除其余部分中的重复值。因此,输出结果将为:
IDs Name col1 col2 col3
123 AB.C 1.3 0 5
234 CD-E 2 0.3 1
345 FGH 9 54 0
dput(df)
structure(list(IDs = c(123L, 234L, 568L, 345L), Name = structure(c(1L,
2L, 4L, 3L), .Label = c("ABC", "CDE", "FGH", "GHJ"), class = "factor"),
col1 = structure(c(2L, 3L, 1L, 4L), .Label = c("123456",
"1.3,1.3,1.3,1.3,1.3", "2,2,2,2,2", "9,9,9,9,9"), class = "factor"),
col2 = structure(1:4, .Label = c("0,0,0,0,0", "0.3,0.3,0.3,0.3,0.3",
"123456", "54,54,54,54,54"), class = "factor"), col3 = structure(c(4L,
2L, 3L, 1L), .Label = c("0,0,0,0,0", "1,1,1,1,1", "123456",
"5,5,5,5,5"), class = "factor")), .Names = c("IDs", "Name",
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA,
-4L))
dput()
函数输出您的数据框df
。 - mtoto