在dplyr链中格式化tbl

3

我正在尝试为我的数据添加千位分隔符,例如10,000,以及美元符号,例如$10,000。

我正在使用几个dplyr命令以及tidyr gather和spread函数。这是我尝试过的代码:

复制并粘贴此代码块以生成我正在使用的随机数据集“dataset”:

library(dplyr)
library(tidyr)
library(lubridate)

## Generate some data
channels <- c("Facebook", "Youtube", "SEM", "Organic", "Direct", "Email")
last_month <- Sys.Date() %m+% months(-1) %>% floor_date("month")
mts <- seq(from = last_month %m+% months(-23), to = last_month, by = "1 month") %>% as.Date()
dimvars <- expand.grid(Month = mts, Channel = channels, stringsAsFactors = FALSE)

# metrics
rws <- nrow(dimvars)
set.seed(42)

# generates variablility in the random data
randwalk <- function(initial_val, ...){
  initial_val + cumsum(rnorm(...))
}
Sessions <- ceiling(randwalk(3000, n = rws, mean = 8, sd = 1500)) %>% abs()
Revenue <- ceiling(randwalk(10000, n = rws, mean = 0, sd = 3500)) %>% abs()

# make primary df
dataset <- cbind(dimvars, Revenue)

它看起来像这样:

> tbl_df(dataset)
# A tibble: 144 × 3
        Month  Channel Revenue
       <date>    <chr>   <dbl>
1  2015-06-01 Facebook    8552
2  2015-07-01 Facebook   12449
3  2015-08-01 Facebook   10765
4  2015-09-01 Facebook    9249
5  2015-10-01 Facebook   11688
6  2015-11-01 Facebook    7991
7  2015-12-01 Facebook    7849
8  2016-01-01 Facebook    2418
9  2016-02-01 Facebook    6503
10 2016-03-01 Facebook    5545
# ... with 134 more rows

现在我想将月份分成列,以显示按渠道、逐月的收入趋势。可以这样做:
revenueTable <- dataset %>% select(Month, Channel, Revenue) %>%
  group_by(Month, Channel) %>%
  summarise(Revenue = sum(Revenue)) %>%
  #mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%
  gather(Key, Value, -Channel, -Month) %>%
  spread(Month, Value) %>%
  select(-Key)

而且它看起来几乎完全符合我的要求:

> revenueTable
# A tibble: 6 × 25
   Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01` `2015-10-01` `2015-11-01` `2015-12-01` `2016-01-01` `2016-02-01` `2016-03-01` `2016-04-01`
*    <chr>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1   Direct        11910         8417         4012          359         4473         2702         6261         6167         8630         5230         1394
2    Email         7244         3517          671         1339        10788        10575         8567         8406         7856         6345         7733
3 Facebook         8552        12449        10765         9249        11688         7991         7849         2418         6503         5545         3908
4  Organic         4191          978          219         4274         2924         4155         5981         9719         8220         8829         7024
5      SEM         2344         6873        10230         6429         5016         2964         3390         3841         3163         1994         2105
6  Youtube          186         2949         2144         5073         1035         4878         7905         7377         2305         4556         6247
# ... with 13 more variables: `2016-05-01` <dbl>, `2016-06-01` <dbl>, `2016-07-01` <dbl>, `2016-08-01` <dbl>, `2016-09-01` <dbl>, `2016-10-01` <dbl>,
#   `2016-11-01` <dbl>, `2016-12-01` <dbl>, `2017-01-01` <dbl>, `2017-02-01` <dbl>, `2017-03-01` <dbl>, `2017-04-01` <dbl>, `2017-05-01` <dbl>

现在我遇到了困难。我想把数据格式化为货币形式。我尝试在链式操作的summarise()gather()之间添加以下内容:

mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%

这部分代码有一半起效。美元符号成功加在前面,但逗号却没有显示出来。我尝试删除 paste0("$" 部分,但未能成功实现逗号格式化。

我该如何对我的 tbl 进行货币格式化,让它显示美元符号和逗号,并将金额四舍五入至整数美元(不是 $1.99,而是 $2)?

2个回答

4

我认为你可以在最后使用 dplyr::mutate_at() 来完成此操作。

revenueTable %>% mutate_at(vars(-Channel), funs(. %>% round(0) %>% scales::dollar()))

#> # A tibble: 6 x 25
#>    Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01`
#>      <chr>        <chr>        <chr>        <chr>        <chr>
#> 1   Direct      $11,910       $8,417       $4,012         $359
#> 2    Email       $7,244       $3,517         $671       $1,339
#> 3 Facebook       $8,552      $12,449      $10,765       $9,249
#> 4  Organic       $4,191         $978         $219       $4,274
#> 5      SEM       $2,344       $6,873      $10,230       $6,429
#> 6  Youtube         $186       $2,949       $2,144       $5,073
#> # ... with 20 more variables: `2015-10-01` <chr>, `2015-11-01` <chr>,
#> #   `2015-12-01` <chr>, `2016-01-01` <chr>, `2016-02-01` <chr>,
#> #   `2016-03-01` <chr>, `2016-04-01` <chr>, `2016-05-01` <chr>,
#> #   `2016-06-01` <chr>, `2016-07-01` <chr>, `2016-08-01` <chr>,
#> #   `2016-09-01` <chr>, `2016-10-01` <chr>, `2016-11-01` <chr>,
#> #   `2016-12-01` <chr>, `2017-01-01` <chr>, `2017-02-01` <chr>,
#> #   `2017-03-01` <chr>, `2017-04-01` <chr>, `2017-05-01` <chr>

嘿,它可以工作!谢谢。有没有可能把它分解一下?我从来没有遇到过“mutate_at()”,我会查一下,但是“funs(.)”是什么意思?点是什么? - Doug Fir
1
是的,funs() 包含一个函数,可应用于 vars() 中指定的所有列。funs() 中的 . 代表列名,否则与修改语句的内容一样。例如:new_var = old_var %>% round(0) %>% scales::dollar() 可变为 funs(. %>% round(0) %>% scales::dollar())vars() 中使用与 dplyr::select() 中的相同 select helpers - austensen
感谢您的详细解释。 - Doug Fir

1
我们可以使用data.table
library(data.table)
nm1 <- setdiff(names(revenueTable), 'Channel')
setDT(revenueTable)[, (nm1) := lapply(.SD, function(x) 
         scales::dollar(round(x))), .SDcols =  nm1]

revenueTable[, 1:3, with = FALSE]
#     Channel `2015-06-01` `2015-07-01`
#1:   Direct      $11,910       $8,417
#2:    Email       $7,244       $3,517
#3: Facebook       $8,552      $12,449
#4:  Organic       $4,191         $978
#5:      SEM       $2,344       $6,873
#6:  Youtube         $186       $2,949

1
感谢提供数据表。 - Doug Fir

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接