在R中使用dplyr重命名多个列并进行汇总

Question

在R中使用dplyr重命名多个列并进行汇总

3

我正在尝试使用tidyverse找到一种方便的方法来重命名多个列。假设我有一个tibble：

df <- tibble(a = 1, b = 2, tmp_2000 = 23, tmp_2001 = 22.1, tmp_2002 = 25, pre_2000, pre_2001, pre_2002)

# A tibble: 1 x 8
  a     b tmp_2000 tmp_2001 tmp_2002 pre_2000 pre_2001 pre_2002
<dbl> <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
  1     2       23     22.1       25      100      103      189

temp和pre代表温度和降水量。我想以整洁的形式重新组织这个表格，即有一个列是温度，一个列是降水量，每一行是对应年份的值。

目前我找到的唯一选项是做类似于这样的事情

df <- df %>%
  select(-starts_with("pre"))

names(df)[3:5] <- substr(names(df)[3:5],5,8) 

df<-df %>%
  gather(`2000`:`2002`,key = "year",value="temp")  %>%
  mutate("year" = as.integer(year)) 

# A tibble: 3 x 4
  a     b  year  temp
<dbl> <dbl> <int> <dbl>
  1     2  2000  23  
  1     2  2001  22.1
  1     2  2002  25

我需要做相同的事情来处理降水数据，然后将两个表连接起来。未来我将获得更多的天气变量，这个过程会很快变得繁琐。

有没有人知道如何使用tidyverse更有效地完成这项工作？

谢谢，

Jo

PS：我看到的唯一类似的帖子是关于重新编码变量（使用mutate_at）或使用上面提到的names重命名列。

- Jo_

reshape(df,3:ncol(df),sep="_",dir="long") - Onyambu

Onyambu，这个不起作用，我得到了以下警告信息： 1：在tibble上设置行名称已被弃用。 2：在tibble上设置行名称已被弃用。 3：在tibble上设置行名称已被弃用。 - Jo_

2

警告是因为你有一个tibble，没有别的。也就是说，你可以执行reshape(data.frame(df),3:ncol(df),idvar = 1:2,sep="_",dir="long")，然后将rownames设置为NULL。 - Onyambu

好的，谢谢。比tidyverse更简洁但可读性较差。 - Jo_

什么是“不易读”？我猜可能是因为reshape函数对你来说很新？我不能确定。也许你可以尝试使用data.table::melt函数。 - Onyambu

是的，我正在使用 dplyr:: 进行所有的重塑操作，所以 reshape 看起来有些陌生。但我相信随着练习，它会变得很明显。谢谢！ - Jo_

3个回答

2

data.frame(df)%>%
   reshape(3:ncol(df),sep="_",dir="long")%>%
   `rownames<-`(NULL)
  a b time  tmp pre id
1 1 2 2000 23.0 100  1
2 1 2 2001 22.1 103  1
3 1 2 2002 25.0 189  1

- Onyambu

0

df <- tibble(
  a = 1,
  b = 2,
  tmp_2000 = 23,
  tmp_2001 = 22.1,
  tmp_2002 = 25,
  pre_2000=100,
  pre_2001=103,
  pre_2002=189
)


df %>% 
  gather(key, value, -a:-b) %>% 
  separate(key, c("type", "year")) %>% 
  spread(type, value= value )

#> # A tibble: 3 x 5
#>       a     b year    pre   tmp
#>   <dbl> <dbl> <chr> <dbl> <dbl>
#> 1     1     2 2000    100  23  
#> 2     1     2 2001    103  22.1
#> 3     1     2 2002    189  25

```

- Nettle

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- AndS. · Accepted Answer

你可以像这样做：

library(tidyverse)
df %>%
    gather(measure, value, -a, -b) %>% 
    separate(measure, into = c("type", "year"), sep = "_") %>% 
    mutate(type = case_when(type == "tmp" ~ "temp", type == "pre" ~ "precip")) %>% 
    spread(type, value)
#       a     b year  precip  temp
# 1     1     2 2000     100  23  
# 2     1     2 2001     103  22.1
# 3     1     2 2002     189  25

我们首先将所有数据按照长格式进行收集，然后将年份与测量值分开，接着更改测量值的名称，最后将数据再次转换为宽格式。