在'dplyr'库中使用'select'函数选择唯一值

Question

在'dplyr'库中使用'select'函数选择唯一值

rselectuniquedplyr

64

在dplyr库中，使用select函数从data.frame列中选择所有唯一值是否可行？类似于SQL表示法中的"SELECT DISTINCT field1 FROM table1"。

谢谢！

- nodm

3个回答

25

补充一下其他答案，如果您希望返回一个向量而不是数据框，则有以下选项：

dplyr >= 0.7.0

使用pull动词：

mtcars %>% distinct(cyl) %>% pull()

dplyr < 0.7.0

使用括号将dplyr函数括起来，再与$语法结合使用：

(mtcars %>% distinct(cyl))$cyl

- Josh Gilfillan

我喜欢你将“pull”识别为动词——比“function”更富有诗意（和描述性）！ - butterflyeffect

10

dplyr 的 select 函数可以从数据框中选择特定的列。要返回特定列中唯一的值，可以使用 group_by 函数。例如：

library(dplyr)

# Fake data
set.seed(5)
dat = data.frame(x=sample(1:10,100, replace=TRUE))

# Return the distinct values of x
dat %>%
  group_by(x) %>%
  summarise() 

    x
1   1
2   2
3   3
4   4
5   5
6   6
7   7
8   8
9   9
10 10

如果您想更改列名，可以添加以下内容：

dat %>%
  group_by(x) %>%
  summarise() %>%
  select(unique.x=x)

这将从数据框中选择列 x（从 dplyr 返回的数据框中，当然在这种情况下只有一列），并将其名称更改为 unique.x。

set.seed(5)
dat = data.frame(x=sample(1:10,100, replace=TRUE), 
                 y=sample(letters[1:5], 100, replace=TRUE))

dat %>% 
  group_by(x,y) %>%
  summarise() %>%
  select(unique.x=x, unique.y=y)

- eipi10

7

或者在 dplyr 0.3 中使用新的 distinct() 函数。 - hadley

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ron Gejman · Accepted Answer

在dplyr 0.3中，可以轻松使用distinct()方法来实现这一点。

以下是一个例子： distinct_df = df %>% distinct(field1) 您可以通过以下方式获得不同值的向量： distinct_vector = distinct_df$field1 同样，在执行distinct()调用时，您还可以同时选择列的子集，如果使用head/tail/glimpse查看数据帧，则会使代码更加清晰易读。

distinct_df = df %>% distinct(field1) %>% select(field1)
    distinct_vector = distinct_df$field1