使用函数或purrr包pmap的split apply combine?

3

这对我来说是一个需要解决的大问题。如果我有足够的声望来颁发赏金,我就会这样做!

希望平衡销售代表账户的领土。我已经将过程分解了,但我不知道如何在每个区域中进行操作。

在这个例子中,有1000个账户分布在4个地区,每个地区有2个子联盟,然后各种账户所有者 -- 一些账户没有所有者。每个账户的价值随机在1,000到100,000之间。

可重复的例子:

账户列表:

set.seed(1)
Accounts <- paste0("Acc", 1:1000)
Region <- c("NorthEast", "SouthEast", "MidWest", "West")
League <- sample(c("Majors", "Minors"), 1000, replace = TRUE)
AccValue <- sample(1000:100000, 1000, replace = TRUE)
Owner <- sample(c("Chad", NA, "Jimmy", "Adrian", NA, NA, "Steph", "Matt", "Jared", "Eric"), 1000, replace = TRUE)
AccDF <- data.frame(Accounts, Region, League, AccValue, Owner)
AccDF$Accounts <- as.character(AccDF$Accounts)
AccDF$Region <- as.character(AccDF$Region)
AccDF$League <- as.character(AccDF$League)
AccDF$Owner <- as.character(AccDF$Owner)

区域所有权摘要:

Summary <- AccDF %>%
  group_by(Region, League, Owner) %>%
  summarise(Count = n(),
            TotalValue = sum(AccValue))

按地区、联赛汇总:

Summary2 <- AccDF %>%
  group_by(Region, League) %>%
  summarise(Count = n(),
            TotalValue = sum(AccValue),
            AccountsPerRep = round(Count / 7, 0),
            ValuePerRep = TotalValue / 7)

这是所有起始数据,我希望对Summary2表中的每个分组进行以下处理。
西部辅修生示例:
总的西部辅修生账户数:120
#break out into owned and unowned

WestMinorsOwned <- AccDF %>%
  filter(Region == "West",
         League == "Minors",
         !is.na(Owner))

WestMinorsUnowned <- AccDF %>%
  filter(Region == "West",
         League == "Minors",
         is.na(Owner))

#unassign accounts until threshold is hit

New.WestMinors <- WestMinorsOwned %>% 
  mutate(r = runif(n())) %>% 
  arrange(r) %>% 
  group_by(Owner) %>% 
  mutate(NewOwner = replace(Owner, cumsum(AccValue) > 600000 | row_number() > 14, NA)) %>% 
  ungroup(Owner) %>%
  mutate(Owner = NewOwner) %>%
  select(-r, -NewOwner)

在业主更新之后,我们将这些部分重新绑定在一起,形成拥有更新业主的WestMinors账户基础,希望能够达到平衡。

AssignableWestMinors <- bind_rows(filter(AccDF, Region == "West" & League == "Minors" & is.na(Owner)), 
                                  filter(New.WestMinors, is.na(Owner))) %>%
  arrange(desc(AccValue))

#check work
OwnerSummary <- New.WestMinors %>%
  filter(!is.na(Owner)) %>%
  group_by(Region, League, Owner) %>%
  summarise(Count = n(), TotalValue = sum(AccValue))

没有人拥有14个以上的账户或者超过60万美元,因此我们可以开始重新分配未被拥有的账户,以尝试将所有人的账户平衡起来。以下for循环检查OwnerSummary中每个名称的赋值情况,将最有价值的账户分配给拥有最少资金的人,并移动到每个账户,尝试平衡每个所有者的份额。
#Balance Unassigned

for (i in 1:nrow(AssignableWestMinors)){
  idx <- which.min(OwnerSummary$TotalValue)
  OwnerSummary$TotalValue[idx] <- OwnerSummary$TotalValue[idx] + AssignableWestMinors$AccValue[i]
  OwnerSummary$Count[idx] <- OwnerSummary$Count[idx] + 1
  AssignableWestMinors$Owner[i] <- as.character(OwnerSummary$Owner[idx])}

现在我们只需要将之前拥有的和新分配的绑定在一起,就可以得到我们完成的平衡西部小区领土。
WestMinors.Final <- bind_rows(filter(New.WestMinors, !is.na(Owner)), AssignableWestMinors)

WM.Summary <- WestMinors.Final %>%
  group_by(Region, League, Owner) %>%
  summarise(Count = n(),
            TotalValue = sum(AccValue))

每个人的账户数量大致相同,总共的金额也都在合理范围内。

现在我想为原始的4个地区、2个联盟中的每个分组进行此操作。因此需要重复这个过程8次,然后将所有结果拼接起来。每个子群组都有不同的目标 $$ 值和账户数量阈值。如何将原始账户库拆分成8个部分,并应用所有这些条件,最后再将其合并在一起?


也许尝试将所有内容包装在 split(AccDF, paste(AccDF$Region, AccDF$League, sep = ".")) %>% lapply({ # 这里放你的代码 }) %>% bind_rows() 中? - Aurèle
1个回答

2
你应该利用 ?dplyr::do 来在 Region-League 的子集上进行拆分-应用-合并操作。首先,将你的逻辑函数化,以便它可以在数据框 dta 上运行,该数据框代表主数据框 AccDF 的子集版本。
reAssign <- function(dta) {
  other_acct <- dta %>% 
    filter(!is.na(Owner)) %>% 
    mutate(r = runif(n())) %>% 
    arrange(r) %>% 
    group_by(Owner) %>% 
    mutate(NewOwner = replace(Owner, cumsum(AccValue) > 600000 | row_number() > 14, NA)) %>% 
    ungroup(Owner) %>%
    mutate(Owner = NewOwner) %>%
    select(-r, -NewOwner)

  assignable_acct <- other_acct %>% 
    filter(is.na(Owner)) %>% 
    bind_rows( filter(dta, is.na(Owner)) ) %>% 
    arrange(desc(AccValue))

  acct_summary <- other_acct %>%
    filter(!is.na(Owner)) %>%
    group_by(Owner) %>%
    summarise(Count = n(), TotalValue = sum(AccValue))

  # I have a feeling there's a much better way of doing this, but oh well...  
  for (i in seq(nrow(assignable_acct))) {
    idx <- which.min(acct_summary$TotalValue)
    acct_summary$TotalValue[idx] <- acct_summary$TotalValue[idx] + assignable_acct$AccValue[i]
    acct_summary$Count[idx] <- acct_summary$Count[idx] + 1
    assignable_acct$Owner[i] <- as.character(acct_summary$Owner[idx])
  }
  final <- other_acct %>% 
    filter(!is.na(Owner)) %>% 
    bind_rows(assignable_acct)

  return(final)
}

然后只需将其应用于按区域、联赛分组的 AccDF。
new_master <- AccDF %>% 
  group_by(Region, League) %>% 
  do( reAssign(.) ) %>% 
  ungroup() 

检查确保它已经完成了它的工作...

new_master %>% 
  group_by(Region, League, Owner) %>%
  summarise(Count = n(),
          TotalValue = sum(AccValue)) %>% 
  as.data.frame()

我会试一下。非常感谢! - Matt W.
所以我想我的唯一问题是如何更新ReAssign函数第一部分中的个别$$和计数阈值。我将能够从完整的dta中获得汇总表,显示每个Region-League分组的计数和$$阈值。我可以通过每个应用函数来引用它吗? - Matt W.

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接