按名称重命名多个列

119

应该已经有人问过这个问题了,但我找不到答案。 假设我有:

x = data.frame(q=1,w=2,e=3, ...and many many columns...)  

如何以最优雅的方式将一组任意列(其位置未必已知)重命名为其他任意名称?

例如,如果我想将 "q""e" 重命名为 "A""B",最优雅的代码是什么?

显然,我可以使用循环:

oldnames = c("q","e")
newnames = c("A","B")
for(i in 1:2) names(x)[names(x) == oldnames[i]] = newnames[i]

但我想知道是否有更好的方法?也许可以使用一些包(如 plyr::rename 等)?

21个回答

4
这将改变所有名称中这些字母的出现:
 names(x) <- gsub("q", "A", gsub("e", "B", names(x) ) )

2
我认为一旦你超过了几个重命名实例,这并不特别优雅。 - thelatemail
我不够好,无法轻松编写 gsubfn 的答案。也许 G.Grothendieck 会过来。他是正则表达式大师。 - IRTFM

3
如果表格中包含两个相同名称的列,则代码如下:
rename(df,newname=oldname.x,newname=oldname.y)

3

您可以获取名称集,将其保存为列表,然后对字符串进行批量重命名。一个很好的例子是当您在数据集上执行长到宽的转换时:

names(labWide)
      Lab1    Lab10    Lab11    Lab12    Lab13    Lab14    Lab15    Lab16
1 35.75366 22.79493 30.32075 34.25637 30.66477 32.04059 24.46663 22.53063

nameVec <- names(labWide)
nameVec <- gsub("Lab","LabLat",nameVec)

names(labWide) <- nameVec
"LabLat1"  "LabLat10" "LabLat11" "LabLat12" "LabLat13" "LabLat14""LabLat15"    "LabLat16" " 

2

附注:如果您希望将一个字符串连接到所有列名中,可以使用以下简单代码。

colnames(df) <- paste("renamed_",colnames(df),sep="")

1

有很多类似答案,因此我编写了该函数,以便您可以复制/粘贴。

rename <- function(x, old_names, new_names) {
    stopifnot(length(old_names) == length(new_names))
    # pull out the names that are actually in x
    old_nms <- old_names[old_names %in% names(x)]
    new_nms <- new_names[old_names %in% names(x)]

    # call out the column names that don't exist
    not_nms <- setdiff(old_names, old_nms)
    if(length(not_nms) > 0) {
        msg <- paste(paste(not_nms, collapse = ", "), 
            "are not columns in the dataframe, so won't be renamed.")
        warning(msg)
    }

    # rename
    names(x)[names(x) %in% old_nms] <- new_nms
    x
}

 x = data.frame(q = 1, w = 2, e = 3)
 rename(x, c("q", "e"), c("Q", "E"))

   Q w E
 1 1 2 3

在dplyr中,rename(x, c("q", "e"), c("Q", "E"))似乎不再起作用了? - s_baldur

0
如果数据的一行包含您要更改所有列名称的名称,则可以执行以下操作
names(data) <- data[row,]

假设data是您的数据框,row是包含新值的行号。

然后,您可以使用以下代码删除包含名称的行:

data <- data[-row,]

0
使用setNames基本方法,利用[]将获取第一个匹配项。
names(x) <- setNames(c(newnames, names(x)), c(oldnames, names(x)))[names(x)]

names(x) <- (\(.) setNames(c(newnames, .), c(oldnames, .))[.])(names(x)) #Variant

x
#  A w B
#1 1 2 3

使用 transform

names(x) <- do.call(transform, c(list(as.list(setNames(names(x), names(x)))),
                                 as.list(setNames(newnames, oldnames))))

数据

x = data.frame(q=1,w=2,e=3)
oldnames = c("q","e")
newnames = c("A","B")

0

上面有很多使用专业软件包的好答案。这是一种仅使用基本R的简单方法。

df.rename.cols <- function(df, col2.list) {
  tlist <- transpose(col2.list)
    
  names(df)[which(names(df) %in% tlist[[1]])] <- tlist[[2]]

  df
} 

这里是一个例子:

df1 <- data.frame(A = c(1, 2), B = c(3, 4), C = c(5, 6), D = c(7, 8))
col.list <- list(c("A", "NewA"), c("C", "NewC"))
df.rename.cols(df1, col.list)

  NewA B NewC D
1    1 3    5 7
2    2 4    6 8

0

我最近在@agile bean的答案基础上构建了一个函数(使用rename_with,以前是rename_at),如果数据框中存在列名,则更改列名,这样当适用时,可以使异构数据框的列名相匹配。

循环肯定可以改进,但我想分享给后人。

创建示例数据框:
x= structure(list(observation_date = structure(c(18526L, 18784L, 
17601L), class = c("IDate", "Date")), year = c(2020L, 2021L, 
2018L)), sf_column = "geometry", agr = structure(c(id = NA_integer_, 
common_name = NA_integer_, scientific_name = NA_integer_, observation_count = NA_integer_, 
country = NA_integer_, country_code = NA_integer_, state = NA_integer_, 
state_code = NA_integer_, county = NA_integer_, county_code = NA_integer_, 
observation_date = NA_integer_, time_observations_started = NA_integer_, 
observer_id = NA_integer_, sampling_event_identifier = NA_integer_, 
protocol_type = NA_integer_, protocol_code = NA_integer_, duration_minutes = NA_integer_, 
effort_distance_km = NA_integer_, effort_area_ha = NA_integer_, 
number_observers = NA_integer_, all_species_reported = NA_integer_, 
group_identifier = NA_integer_, year = NA_integer_, checklist_id = NA_integer_, 
yday = NA_integer_), class = "factor", .Label = c("constant", 
"aggregate", "identity")), row.names = c("3", "3.1", "3.2"), class = "data.frame")
函数
match_col_names <- function(x){

  col_names <- list(date = c("observation_date", "date"),
                    C =    c("observation_count", "count","routetotal"),
                    yday  = c("dayofyear"),
                    latitude  = c("lat"),
                    longitude = c("lon","long")
                    )

  for(i in seq_along(col_names)){
    newname=names(col_names)[i]
    oldnames=col_names[[i]]

  toreplace = names(x)[which(names(x) %in% oldnames)]
  x <- x %>%
    rename_with(~newname, toreplace)
}

return(x)

}

应用函数
x <- match_col_names(x)

0
为了执行时间的目的,我建议使用数据表结构:

> df = data.table(x = 1:10, y = 3:12, z = 4:13)
> oldnames = c("x","y","z")
> newnames = c("X","Y","Z")
> library(microbenchmark)
> library(data.table)
> library(dplyr)
> microbenchmark(dplyr_1 = df %>% rename_at(vars(oldnames), ~ newnames) ,
+                dplyr_2 = df %>% rename(X=x,Y=y,Z=z) ,
+                data_tabl1= setnames(copy(df), old = c("x","y","z") , new = c("X","Y","Z")),
+                times = 100) 
Unit: microseconds
       expr    min      lq     mean  median      uq     max neval
    dplyr_1 5760.3 6523.00 7092.538 6864.35 7210.45 17935.9   100
    dplyr_2 2536.4 2788.40 3078.609 3010.65 3282.05  4689.8   100
 data_tabl1  170.0  218.45  368.261  243.85  274.40 12351.7   100


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接