使用dplyr中的列名称向量对行求和

Question

使用dplyr中的列名称向量对行求和

8

我再次对如何达成这个目标感到困惑：

给定以下数据框：

df <- tibble(
  foo = c(1,0,1),
  bar = c(1,1,1),
  foobar = c(0,1,1)
)

并且这个向量：

to_sum <- c("foo", "bar")

我想获得列to_sum中值的按行求和结果。

期望输出：

# A tibble: 3 x 4
# Rowwise: 
    foo   bar foobar   sum
  <dbl> <dbl>  <dbl> <dbl>
1     1     1      0     2
2     0     1      1     1
3     1     1      1     2

打字肯定是有效的（显而易见）。

df %>% rowwise() %>% 
  mutate(
    sum = sum(foo, bar)
  )

这并不会造成：

df %>% rowwise() %>% 
  mutate(
    sum = sum(to_sum)
  )

我理解这一点，因为如果我尝试的话：

df %>% rowwise() %>% 
  mutate(
    sum = sum("foo", "bar")
  )

如何从列名向量计算逐行总和？

- MKR

5个回答

6

我认为你正在寻找rlang::syms将字符串转换为quosures：

library(dplyr)
library(rlang)
df %>% 
  rowwise() %>% 
  mutate(
    sum = sum(!!!syms(to_sum))
  )
#     foo   bar foobar   sum
#   <dbl> <dbl>  <dbl> <dbl>
# 1     1     1      0     2
# 2     0     1      1     1
# 3     1     1      1     2

- user63230

1

concise and elegant. - Anoushiravan R

这就是我一直在寻找的。我尝试使用!!syms()，但忘记了应该三个感叹号。 - MKR

3

你需要使用 c_across 和 any_of。这是 RStudio 团队想要的用法：查看 vignette("rowwise", package = "dplyr")。

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(sum = sum(c_across(any_of(to_sum))))

#> # A tibble: 3 x 4
#> # Rowwise: 
#>     foo   bar foobar   sum
#>   <dbl> <dbl>  <dbl> <dbl>
#> 1     1     1      0     2
#> 2     0     1      1     1
#> 3     1     1      1     2

c_across 是特定于逐行操作的。

any_of 用于解释 to_sum 作为一个包含列名的字符向量。即使没有使用它，也可以正常工作，但通常建议使用它。

最后可能需要使用 ungroup() 来删除 rowwise。

- Edo

3

这可能会对你有所帮助：

library(dplyr)
library(purrr)
library(rlang)

df %>%
  bind_cols(parse_exprs(to_sum) %>%
              map_dfc(~ eval_tidy(.x, data = df)) %>%
              rowSums()) %>%
  rename(sum = ...4)

# A tibble: 3 x 4
    foo   bar foobar   sum
  <dbl> <dbl>  <dbl> <dbl>
1     1     1      0     2
2     0     1      1     1
3     1     1      1     2

- Anoushiravan R

2

你也可以考虑使用rowSums：

df %>% 
   mutate(sum = rowSums(across(all_of(to_sum))))

# A tibble: 3 x 4
    foo   bar foobar   sum
  <dbl> <dbl>  <dbl> <dbl>
1     1     1      0     2
2     0     1      1     1
3     1     1      1     2

- Onyambu

确实是个不错的解决方案。我接受了 !!!syms() 的解决方案，但我喜欢这个不需要 rowwise() 函数的解决方案。谢谢。 - MKR

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sam Firke · Accepted Answer

library(janitor)
df %>%
  adorn_totals("col",,,"sum",to_sum)

 foo bar foobar sum
   1   1      0   2
   0   1      1   1
   1   1      1   2

为什么要用,,,？

如果您查看?adorn_totals，您会看到它的参数：

adorn_totals(dat, where = "row", fill = "-", na.rm = TRUE, name = "Total", ...)

最后一个...是用来控制列选择的。不幸的是，没有办法直接告诉R应该使用to_sum来代替那个...参数，因此在这个答案中，,,, 告诉它使用参数where，fill和na.rm的默认值。在那一点上，它已经有了除了...之外的每个参数的值，所以to_sum就被应用于它。

有关该主题的进一步讨论，请参见：Specify the dots argument when calling a tidyselect-using function without needing to specify the preceding arguments