R数据框中行的身份标识

Question

R数据框中行的身份标识

5

我想要比较数据框中的两行是否相同。我以为使用identical()函数可以完成这个任务，但实际上并不如预期那样。这里是一个最简示例：

x=factor(c("x","x"),levels=c("x","y"))
y=factor(c("y","y"),levels=c("x","y"))
df=data.frame(x,y)
df
  x y
1 x y
2 x y

identical(df[1,],df[2,])
[1] FALSE
> df[1,]==df[2,]
     x    y

1 TRUE TRUE

有人能解释一下为什么identical()返回FALSE吗？

谢谢， Thomas

- user2481662

3

identical 检查“完全相等”。这意味着它也会检查 row names（在此处不同）。 - Arun

3

在这里，您要查找的函数是 duplicated，它用于比较 data.frame 的行是否相等。 - Simon O'Hanlon

2个回答

2

尝试使用这个函数

all.equal(df[1,],df[2,])
[1] "Attributes: < Component 2: Mean relative difference: 1 >"

通常比较因素可能会产生“意外”的结果……在这种情况下，identity 尝试匹配所有内容，发现不同的 row.names，你可以从 dput 看到：

> dput(df[1,])
structure(list(x = structure(1L, .Label = c("x", "y"), class = "factor"), 
    y = structure(2L, .Label = c("x", "y"), class = "factor")), .Names = c("x", 
"y"), row.names = 1L, class = "data.frame")
> dput(df[2,])
structure(list(x = structure(1L, .Label = c("x", "y"), class = "factor"), 
    y = structure(2L, .Label = c("x", "y"), class = "factor")), .Names = c("x", 
"y"), row.names = 2L, class = "data.frame")

在这个例子中，简单的==就可以使用：

> df[1,]==df[2,]
     x    y
1 TRUE TRUE

- Michele

@Roland，我知道，抱歉，我在同一句话中混合了两个主题/讨论。那只是一个普遍的建议... 行名是原因，我发布了dput以突出这种差异的事实 :-) - Michele

@user2481662 谢谢，但最佳答案是@Roland的，特别是这个： all.equal(df[1,],df[2,],check.attributes = FALSE) - Michele

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Roland · Accepted Answer

identical(df[1,],df[2,])
#[1] FALSE
all.equal(df[1,],df[2,])
#[1] "Attributes: < Component 2: Mean relative difference: 1 >"

all.equal(df[1,],df[2,],check.attributes = FALSE)
#[1] TRUE

anyDuplicated(df[1:2,])>0
#[1] TRUE