我试图创建不同列的分组,但不确定我正在使用 group_by 的方法是否最佳。我想知道是否有一种内联的方式可以进行分组?
我知道可以使用 data.table 包来完成此操作,其中语法类型为 DT[i,j,by]。
但由于这只是一个较大代码中的小部分,该代码使用 tidyverse 工作良好,因此我不想偏离这个方向。
## Creating Sample Data Frame
state <- rep(c("OH", "IL", "IN", "PA", "KY"),10)
county <- sample(LETTERS[1:5], 50, replace = T) %>% str_c(state,sep = "-")
customers <- sample.int(50:100,50)
sales <- sample.int(500:5000,50)
df <- bind_cols(data.frame(state, county,customers,sales))
## workflow
df2 <- df %>%
group_by(state) %>%
mutate(customerInState = sum(customers),
saleInState = sum(sales)) %>%
ungroup %>%
group_by(county) %>%
mutate(customerInCounty = sum(customers),
saleInCounty = sum(sales)) %>%
ungroup %>%
mutate(salePerCountyPercent = saleInCounty/saleInState,
customerPerCountyPercent = customerInCounty/customerInState) %>%
group_by(state) %>%
mutate(minSale = min(salePerCountyPercent)) %>%
ungroup
我希望我的代码看起来像这样
df3 <- df %>%
mutate(customerInState = sum(customers, by = state),
saleInState = sum(sales, by = state),
customerInCounty = sum(customers, by = county),
saleInCounty = sum(sales, by = county),
salePerCountyPercent = saleInCounty/saleInState,
customerPerCountyPercent = customerInCounty/customerInState,
minSale = min(salePerCountyPercent, by = state))
程序没有出错,但我知道输出结果不正确。
我明白通过重新排列mutate可能可以用更少的group_bys得到我需要的结果。 但问题是,是否有一种方法在dplyr中进行内联分组(group by)操作。
df <- data.frame(state, county, customers, sales)
可以将状态、县、顾客和销售数据放入一个数据框中。 - Rui Barradasungroup
- 当您进行分组时,会自动删除先前的分组。 - January