根据数据框中的其他值，在每个组中选择前n个值，其中n取决于数据。

Question

根据数据框中的其他值，在每个组中选择前n个值，其中n取决于数据。

3

我对r语言和编程都比较陌生，非常感谢您的帮助 :)

我试图从我的数据框中按组选择前n个值，其中n取决于另一个值（以下称为factor）。然后，应该通过组对所选值进行汇总以计算平均值（d100）。我的目标是获得每个组的一个d100值。

（背景：在林业中，有一种称为d100的指标，它是每公顷最粗的100棵树的平均直径。如果采样区域的大小小于1公顷，则需要选择相应较少的树来计算d100。这就是factor的作用。）

首先，我尝试将factor作为自己的列放入我的数据框中。然后我想也许有像“查找表”之类的东西会有所帮助，因为R说，n必须是一个单独的数字。但我不知道如何创建一个查找函数。（请参见示例代码的最后一部分。）或者，在使用它之前对df$factor进行汇总是否能解决问题？

示例数据：

（我用这种方式指出了我不确定如何在R中编写它们的表达式：'I dont know how'）

# creating sample data
library(tidyverse)

df <- data.frame(group = c(rep(1, each = 5), rep(2, each = 8), rep(3, each = 10)),
                 BHD = c(rnorm(23, mean = 30, sd = 5)),
                 factor = c(rep(pi*(15/100)^2, each = 5), rep(pi*(20/100)^2, each = 8), rep(pi*(25/100)^2, each = 10))
                )

# group by ID, then select top_n values of df$BHD with n depending on value of df$factor
df %>% 
  group_by(group) %>% 
  slice_max(
    BHD, 
    n = 100*df$factor, 
    with_ties = F) %>% 
  summarise(d100 = mean('sliced values per group'))

# other thought: having a "lookup-table" for the factor like this:
lt <- data.frame(group = c(1, 2, 3),
                 factor = c(pi*(15/100)^2, pi*(20/100)^2, pi*(25/100)^2))

# then
df %>% 
  group_by(group) %>% 
  slice_max(
    BHD, 
    n = 100*lt$factor 'where lt$group == df$group', 
    with_ties = F) %>% 
  summarise(d100 = mean('sliced values per group'))

我已经找到了这个回答，它似乎与我的问题类似，但并没有完全帮助到我。

- kate99

这个能帮到你吗？https://dev59.com/WWcs5IYBdhLWcg3wWCpB#50906379 - hannes101

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ronak Shah · Answer 1

由于每个组内所有的factor值都相同，因此您可以选择任何一个factor值。

library(dplyr)

df %>% 
  group_by(group) %>% 
  top_n(BHD, n = 100* first(factor))  %>%
  ungroup 

#   group   BHD factor
#   <dbl> <dbl>  <dbl>
# 1     1  25.8 0.0707
# 2     1  24.6 0.0707
# 3     1  27.6 0.0707
# 4     1  28.3 0.0707
# 5     1  29.2 0.0707
# 6     2  28.8 0.126 
# 7     2  39.5 0.126 
# 8     2  23.1 0.126 
# 9     2  27.9 0.126 
#10     2  31.7 0.126 
# … with 13 more rows