如何在R中对组进行排名？

Question

如何在R中对组进行排名？

31

这是我的数据框：

  customer_name order_dates order_values
1          John  2010-11-01           15
2           Bob  2008-03-25           12
3          Alex  2009-11-15            5
4          John  2012-08-06           15
5          John  2015-05-07           20

假设我想添加一个订单变量，按名称、最大订单日期和最后订单日期作为决胜者，排列最高的订单价值。

因此，最终数据应该如下所示：

  customer_name order_dates order_values ranked_order_values_by_max_value_date
1          John  2010-11-01           15                               3
2           Bob  2008-03-25           12                               1
3          Alex  2009-11-15            5                               1
4          John  2012-08-06           15                               2
5          John  2015-05-07           20                               1

每个人的第一笔订单获得1，而所有后续订单根据价值排名，平局的解决办法是最后下单日期具有优先权。在这个例子中，John的2012年8月6日订单由于是在2010年11月1日之后下达，因此获得第2名。 2015年5月7日的订单是第1名，因为它是最大的。因此，即使该订单是20年前下达的，它也应该是第1名，因为它是John最高的订单价值。

有人知道我如何在R中做到这一点吗？在数据框中，我可以在指定变量组内排名吗？

- Saul Feliz

@akrun 那对于数值的平局怎么处理？ - Señor O

1

以下是创建数据框的代码，以便于理解： customer_name <- c("John","Bob","Alex","John","John"); order_dates <- as.Date(c('2010-11-1','2008-3-25','2009-11-15','2012-8-6','2015-5-7')); order_values <- c(15,12,5,15,20); test_data <- data.frame(customer_name,order_dates,order_values); - Saul Feliz

2

@SenorO，OP的示例应该更复杂一些才能进行测试。此外，dplyr中的dense_rank是解决并列情况的一种方法。 - akrun

@akun：对于数值的打破平局将根据订单日期确定。因此，John有两个15美元的订单，但是先下单的那个排名更高。 - Saul Feliz

可能使用 data.table，将 order_values 和 order_dates 按降序排列，然后按照 customer_name 进行排序并赋予 rnk 值。 - akrun

6个回答

24

你可以使用 dplyr 来实现这个目标。

library(dplyr)
df %>%
    group_by(customer_name) %>%
    mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))

Source: local data frame [5 x 4]
Groups: customer_name

  customer_name order_dates order_values my_ranks
1          John  2010-11-01           15        3
2           Bob  2008-03-25           12        1
3          Alex  2009-11-15            5        1
4          John  2012-08-06           15        2
5          John  2015-05-07           20        1

- cdeterman

3

这是不正确的。@T.Himmel提供了正确的答案。 - syre

7

这可以通过使用ave和rank来实现。 ave将适当的组传递给rank。由于请求的顺序，rank的结果被反转：

with(x, ave(as.numeric(order_dates), customer_name, FUN=function(x) rev(rank(x))))
## [1] 3 1 1 2 1

- Matthew Lundberg

2

df %>% 
  group_by(customer_name) %>% 
  arrange(customer_name,desc(order_values)) %>% 
  mutate(rank2=rank(order_values))

- Spandan Pan

1

在 R 基础中，您可以使用稍微笨重的方法来实现此目的。

transform(df,rank=ave(1:nrow(df),customer_name,
  FUN=function(x) order(order_values[x],order_dates[x],decreasing=TRUE)))

客户名称 订单日期 订单价值 排名
1 约翰 2010-11-01 15 3
2 鲍勃 2008-03-25 12 1
3 亚历克斯 2009-11-15 5 1
4 约翰 2012-08-06 15 2
5 约翰 2015-05-07 20 1

其中order为每个组提供主要和平局的值。

- A. Webb

0

与 @t-himmel 的答案类似，您可以使用 data.table 获取排名。

dt[ , rnk := order(order(order_values, decreasing = TRUE)), customer_name ]

- andschar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- T. Himmel · Accepted Answer

最高评分答案（作者为cdeterman）实际上是不正确的。order函数提供的是第1、2、3等排名值的位置，而不是当前顺序中值的排名。

让我们以一个简单的例子开始，我们想要按客户名称进行排名，从最大的开始分组。我已经包含了手动排名，以便我们可以检查这些值。

    > df
       customer_name order_values manual_rank
    1           John            2           5
    2           John            5           2
    3           John            9           1
    4           John            1           6
    5           John            4           3
    6           John            3           4
    7           Lucy            4           4
    8           Lucy            9           1
    9           Lucy            6           3
    10          Lucy            2           6
    11          Lucy            8           2
    12          Lucy            3           5

如果我运行cdeterman建议的代码，我会得到以下不正确的排名：

    > df %>%
    +   group_by(customer_name) %>%
    +   mutate(my_ranks = order(order_values, decreasing=TRUE))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank my_ranks
              <fctr>        <dbl>       <dbl>    <int>
    1           John            2           5        3
    2           John            5           2        2
    3           John            9           1        5
    4           John            1           6        6
    5           John            4           3        1
    6           John            3           4        4
    7           Lucy            4           4        2
    8           Lucy            9           1        5
    9           Lucy            6           3        3
    10          Lucy            2           6        1
    11          Lucy            8           2        6
    12          Lucy            3           5        4

Order函数用于将数据框按照升序或降序重新排序。我们实际上想要的是运行两次order函数，第二个order函数会给我们想要的实际排名。

    > df %>%
    +   group_by(customer_name) %>%
    +   mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank good_ranks
              <fctr>        <dbl>       <dbl>      <int>
    1           John            2           5          5
    2           John            5           2          2
    3           John            9           1          1
    4           John            1           6          6
    5           John            4           3          3
    6           John            3           4          4
    7           Lucy            4           4          4
    8           Lucy            9           1          1
    9           Lucy            6           3          3
    10          Lucy            2           6          6
    11          Lucy            8           2          2
    12          Lucy            3           5          5