将变量作为列名传递给dplyr？

Question

将变量作为列名传递给dplyr？

3

我有一个非常难看的数据集，它是关系数据库的扁平文件。这里是一个最小可复现的例子：

df <- data.frame(col1 = c(letters[1:4],"c"), 
                  col1.p = 1:5, 
                  col2 = c("a","c","l","c","l"), 
                 col2.p = 6:10,
                  col3= letters[3:7],
                 col3.p = 11:20)

我需要能够识别具有“c”值的“col＃”的“.p”值。我在SO上的先前问题得到了第一部分：在R中，查找包含每行字符串的列。这是提供背景信息。

tmp <- which(projectdata=='Transmission and Distribution of Electricity', arr.ind=TRUE)
cnt <- ave(tmp[,"row"], tmp[,"row"], FUN=seq_along)
maxnames <- paste0("max",sequence(max(cnt)))
projectdata[maxnames] <- NA
projectdata[maxnames][cbind(tmp[,"row"],cnt)] <- names(projectdata)[tmp[,"col"]]
rm(tmp, cnt, maxnames)

这将产生一个类似以下形式的数据框：

df
   col1 col1.p col2 col2.p col3 col3.p max1
1     a      1    a      6    c     11 col3
2     b      2    c      7    d     12 col2
3     c      3    l      8    e     13 col1
4     d      4    c      9    f     14 col2
5     c      5    l     10    g     15 col1
6     a      1    a      6    c     16 col3
7     b      2    c      7    d     17 col2
8     c      3    l      8    e     18 col1
9     d      4    c      9    f     19 col2
10    c      5    l     10    g     20 col1

当我尝试获取与“max1”中的值匹配的“.p”时，我一直遇到错误。我认为方法是这样的:

df %>%
   mutate(my.p = eval(as.name(paste0(max1,'.p'))))
Error: object 'col3.p' not found

很明显这种方法行不通，所以我想也许这与在函数中传递列名类似，需要使用“get”方法。但是这种方法也没有成功。

df %>%
   mutate(my.p = get(as.name(paste0(max1,'.p'))))
Error: invalid first argument
df %>%
   mutate(my.p = get(paste0(max1,'.p')))
Error: object 'col3.p' not found

我发现一个可以消除这个错误的方法，使用来自不同但相关问题的 data.table，链接在这里：http://codereply.com/answer/7y2ra3/dplyr-error-object-found-using-rle-mutate.html。然而，它给我每一行都是“col3.p”。第一行的max1为df$max1[1]。

library('dplyr')
library('data.table') # must have the data.table package
df %>%
  tbl_dt(df) %>% 
  mutate(my.p = get(paste0(max1,'.p')))

Source: local data table [10 x 8]

   col1 col1.p col2 col2.p col3 col3.p max1 my.p
1     a      1    a      6    c     11 col3   11
2     b      2    c      7    d     12 col2   12
3     c      3    l      8    e     13 col1   13
4     d      4    c      9    f     14 col2   14
5     c      5    l     10    g     15 col1   15
6     a      1    a      6    c     16 col3   16
7     b      2    c      7    d     17 col2   17
8     c      3    l      8    e     18 col1   18
9     d      4    c      9    f     19 col2   19
10    c      5    l     10    g     20 col1   20

使用lazyeval的interp方法（来自这个SO：如何将dplyr中的动态列名传递到自定义函数中？）对我来说不起作用。也许我实现得不正确？

library(lazyeval)
library(dplyr)
df %>%
  mutate_(my.p = interp(~colp, colp = as.name(paste0(max1,'.p'))))

我遇到了一个错误：

Error in paste0(max1, ".p") : object 'max1' not found

理想情况下，我将使新列my.p等于基于max1中标识的列的适当p。 我可以使用ifelse完成所有操作，但我正在尝试使用更少的代码，并使其适用于下一个丑陋的平面表格。

- jessi

如果您正在使用 data.table，则应该使用 setDT(df)[, my.p:= get(paste0(max1, '.p')), 1:nrow(df)] 来获得您想要的输出。 - akrun

1

我不确定为什么这个回答看起来不像一个答案，@akrun，但它确实有效。如果它变成了一个答案，我可以接受它。感谢您的帮助。为什么在dplyr中放入mutate(my.p=get(paste0(max1, '.p'))不能工作？我真的很想理解这个。 - jessi

我也无法解决。我将使用data.table。但是，这是奇怪的dplyr行为。其他SO主题建议，就像我上面展示的那样，应该可以使用interp。如果我确定这是一个错误，我会去github上面。 - jessi

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Answer 1

我们可以使用 data.table 来实现这个功能。我们将 'data.frame' 转换为 'data.table' (setDT(df))，按行序列进行分组，然后使用 get 函数获取 paste 输出的值，并将其赋值 (:=) 给一个新列 ('my.p')。

library(data.table)
setDT(df)[, my.p:= get(paste0(max1, '.p')), 1:nrow(df)]
df
#    col1 col1.p col2 col2.p col3 col3.p max1 my.p
# 1:    a      1    a      6    c     11 col3   11
# 2:    b      2    c      7    d     12 col2    7
# 3:    c      3    l      8    e     13 col1    3
# 4:    d      4    c      9    f     14 col2    9
# 5:    c      5    l     10    g     15 col1    5
# 6:    a      1    a      6    c     16 col3   16
# 7:    b      2    c      7    d     17 col2    7
# 8:    c      3    l      8    e     18 col1    3
# 9:    d      4    c      9    f     19 col2    9
#10:    c      5    l     10    g     20 col1    5