使用dplyr mutate自动生成新变量名

Question

使用dplyr mutate自动生成新变量名

4

我希望能够在使用dplyr时动态创建变量名，不过如果不使用dplyr的解决方案也可以。

例如：

data(iris)
library(dplyr) 

iris <- iris %>%
  group_by(Species) %>%
  mutate(
    lag_Sepal.Length = lag(Sepal.Length),
    lag_Sepal.Width  = lag(Sepal.Width),
    lag_Petal.Length = lag(Petal.Length)
  ) %>%
  ungroup

head(iris)

    Sepal.Length Sepal.Width Petal.Length Petal.Width Species lag_Sepal.Length lag_Sepal.Width
             (dbl)       (dbl)        (dbl)       (dbl)  (fctr)            (dbl)           (dbl)
    1          5.1         3.5          1.4         0.2  setosa               NA              NA
    2          4.9         3.0          1.4         0.2  setosa              5.1             3.5
    3          4.7         3.2          1.3         0.2  setosa              4.9             3.0
    4          4.6         3.1          1.5         0.2  setosa              4.7             3.2
    5          5.0         3.6          1.4         0.2  setosa              4.6             3.1
    6          5.4         3.9          1.7         0.4  setosa              5.0             3.6
    Variables not shown: lag_Petal.Length (dbl)

但是，我不想重复做三次相同的事情，而是要创建100个这样的“滞后”变量，它们会接受名称：lag_original variable name。我正在尝试弄清楚如何在不输入新变量名称100次的情况下完成此操作，但是我一直没有头绪。

我查看了这个示例以及SO上的这个示例。它们很相似，但我无法拼凑出我需要的具体解决方案。感谢任何帮助！

编辑
感谢@BenFasoli的启发。我稍微修改了他的答案，得到了我需要的解决方案。我还使用了这篇RStudio博客和这篇SO文章。变量名称中的“lag”在末尾而非开头，但我可以接受。

我最终的代码发布在这里，以防对其他人有用：

lagged <- iris %>%
  group_by(Species) %>%
  mutate_at(
    vars(Sepal.Length:Petal.Length),
    funs("lag" = lag)) %>%
  ungroup

# A tibble: 6 x 8
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_lag Sepal.Width_lag
         <dbl>       <dbl>        <dbl>       <dbl>  <fctr>            <dbl>           <dbl>
1          5.1         3.5          1.4         0.2  setosa               NA              NA
2          4.9         3.0          1.4         0.2  setosa              5.1             3.5
3          4.7         3.2          1.3         0.2  setosa              4.9             3.0
4          4.6         3.1          1.5         0.2  setosa              4.7             3.2
5          5.0         3.6          1.4         0.2  setosa              4.6             3.1
6          5.4         3.9          1.7         0.4  setosa              5.0             3.6
# ... with 1 more variables: Petal.Length_lag <dbl>

- Brad Cannell

3个回答

3

以下是一个data.table的方法。在这个例子中，我选择了数字列。您需要做的是提前选择列名称并创建新的列名称。然后，应用shift()函数，该函数类似于dplyr包中的lag()和lead()函数，应用于所选的每一列。

library(data.table)

# Crate a df for this demo.
mydf <- iris

# Choose columns that you want to apply lag() and create new colnames.
cols = names(iris)[sapply(iris, is.numeric)]
anscols = paste("lag_", cols, sep = "")

# Apply shift() to each of the chosen columns.
setDT(mydf)[, (anscols) := shift(.SD, 1, type = "lag"),
            .SDcols = cols]

     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species lag_Sepal.Length lag_Sepal.Width
 1:          5.1         3.5          1.4         0.2    setosa               NA              NA
 2:          4.9         3.0          1.4         0.2    setosa              5.1             3.5
 3:          4.7         3.2          1.3         0.2    setosa              4.9             3.0
 4:          4.6         3.1          1.5         0.2    setosa              4.7             3.2
 5:          5.0         3.6          1.4         0.2    setosa              4.6             3.1
 ---                                                                                             
146:          6.7         3.0          5.2         2.3 virginica              6.7             3.3
147:          6.3         2.5          5.0         1.9 virginica              6.7             3.0
148:          6.5         3.0          5.2         2.0 virginica              6.3             2.5
149:          6.2         3.4          5.4         2.3 virginica              6.5             3.0
150:          5.9         3.0          5.1         1.8 virginica              6.2             3.4
     lag_Petal.Length lag_Petal.Width
  1:               NA              NA
  2:              1.4             0.2
  3:              1.4             0.2
  4:              1.3             0.2
  5:              1.5             0.2
 ---                                 
146:              5.7             2.5
147:              5.2             2.3
148:              5.0             1.9
149:              5.2             2.0
150:              5.4             2.3

- jazzurro

1

既然您也可以接受非dplyr的方式，那么请尝试以下方法：

lagger <- function(x, n) c(rep(NA,n), head(x,-n) )
iris[paste0("lag_", names(iris) )] <- lapply(iris, lagger, n=1)

head(iris,2)[-(1:5)]
#  lag_Sepal.Length lag_Sepal.Width lag_Petal.Length lag_Petal.Width lag_Species
#1               NA              NA               NA              NA          NA
#2              5.1             3.5              1.4             0.2           1

- thelatemail

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ben Fasoli · Accepted Answer

您可以使用mutate_all（或mutate_at用于特定列），然后在列名前添加lag_。

data(iris)
library(dplyr) 

lag_iris <- iris %>%
  group_by(Species) %>%
  mutate_all(funs(lag(.))) %>%
  ungroup
colnames(lag_iris) <- paste0('lag_', colnames(lag_iris))

head(lag_iris)

  lag_Sepal.Length lag_Sepal.Width lag_Petal.Length lag_Petal.Width lag_Species
             <dbl>           <dbl>            <dbl>           <dbl>      <fctr>
1               NA              NA               NA              NA      setosa
2              5.1             3.5              1.4             0.2      setosa
3              4.9             3.0              1.4             0.2      setosa
4              4.7             3.2              1.3             0.2      setosa
5              4.6             3.1              1.5             0.2      setosa
6              5.0             3.6              1.4             0.2      setosa