如何在R中提取特定的行？

Question

如何在R中提取特定的行？

5

我想使用R从数据框中提取特定的行，并创建一个新的数据框。我有两列：城市和家庭。为了检测迁移，我需要一个新的数据框，其中包含没有相同城市的家庭。

例如，如果一个家庭至少有一个与其他不同的城市出现了3次，我会保留它。否则，我将删除该家庭的3行数据。

    City      Household
   Paris              A
   Paris              A
    Nice              A
  Limoge              B
  Limoge              B
Toulouse              C
   Paris              C

这里，我只想保留家庭 A 和家庭 C。

- Marie

2个回答

2

基于R语言的可能解决方案

df1[with(df1, ave(as.character(City), Household, FUN=function(x) length(unique(x))) > 1L),]

或者

df1[df1$Household %in% names(which(table(unique(df1)$Household) > 1)),]

或者可能使用 data.table 版本 >= 1.9.5 的开发版本解决方案

library(data.table) # v > 1.9.5, otherwise use length(unique(City))
setDT(df1)[, if(uniqueN(City) > 1L) .SD, by = Household]

或者

setDT(df1)[, .SD[uniqueN(City) > 1L], by = Household]

- David Arenburg

谢谢你的帮助！我使用了第一个答案中的代码：new_df <- df %>% group_by(household) %>% filter(n_distinct(city) > 1)，它起作用了 :) - Marie

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- scoa · Accepted Answer

一种dplyr解决方案：计算每个家庭的唯一城市数量，并仅保留长度大于1的家庭

library(dplyr)
df <- data.frame(city=c("Paris","Paris","Nice","Limoge","Limoge","Toulouse","Paris"),
                 household =c(rep("A",3),rep("B",2),rep("C",2)))

new_df <- df %>% group_by(household) %>%
  filter(n_distinct(city) > 1)

Source: local data frame [5 x 2]
Groups: household

      city household
1    Paris         A
2    Paris         A
3     Nice         A
4 Toulouse         C
5    Paris         C

编辑：从评论中添加了@shadow和@davidarenburg的建议