使用R将多个变量从宽格式转换为长格式,其中包含NA值

3

我目前正在尝试使用R中的pivot_longer函数将多个变量转置(宽到长)。

这是我的数据:

primary_id = c("A1", "A2")
id1 = c("B1", "B2")
score1 = c("good", "bad")
id2 = c("C1", "C2")
score2 = c(NA, "neutral")
id3 = c("D1", "D2")
score3 = c("bad", "neutral")
id4 = c("E1", "E2")
score4 = c(NA, "good")
have = data.frame(primary_id, id1, score1, id2, score2, id3, score3, id4, score4)

have
  primary_id id1 score1 id2  score2 id3  score3 id4 score4
1         A1  B1   good  C1    <NA>  D1     bad  E1   <NA>
2         A2  B2    bad  C2 neutral  D2 neutral  E2   good

请注意,我有20个id变量和20个分数变量(仅示例显示4个)。
以下是我正在寻找的内容。
  primary_id ids  scores
1         A1  B1    good
2         A1  C1    <NA>
3         A1  D1     bad
4         A1  E1    <NA>
5         A2  B2     bad
6         A2  C2 neutral
7         A2  D2 neutral
8         A2  E2    good

任何建议/帮助将不胜感激! 请注意,分数变量在这里和那里都有NA值。
3个回答

5

在基础R中:

reshape(have,-1, dir='long', sep='', idvar = 'primary_id')

     primary_id time id   score
A1.1         A1    1 B1    good
A2.1         A2    1 B2     bad
A1.2         A1    2 C1    <NA>
A2.2         A2    2 C2 neutral
A1.3         A1    3 D1     bad
A2.3         A2    3 D2 neutral
A1.4         A1    4 E1    <NA>
A2.4         A2    4 E2    good

在 data.table 中:

library(data.table)
melt(setDT(have),1, patterns(id = '^id', score = '^score'))
   primary_id variable id   score
1:         A1        1 B1    good
2:         A2        1 B2     bad
3:         A1        2 C1    <NA>
4:         A2        2 C2 neutral
5:         A1        3 D1     bad
6:         A2        3 D2 neutral
7:         A1        4 E1    <NA>
8:         A2        4 E2    good

在tidyverse中

library(tidyverse)
pivot_longer(have, -1, names_to = '.value', names_pattern = '(\\D+)')

# A tibble: 8 × 3
  primary_id id    score  
  <chr>      <chr> <chr>  
1 A1         B1    good   
2 A1         C1    NA     
3 A1         D1    bad    
4 A1         E1    NA     
5 A2         B2    bad    
6 A2         C2    neutral
7 A2         D2    neutral
8 A2         E2    good   

3
你可以在names_to中设置".value",并提供names_sepnames_pattern之一来指定如何拆分列名。
library(tidyr)

have %>%
  pivot_longer(
    -primary_id,
    names_to = c(".value", NA),
    names_sep = "(?=\\d)" # or names_pattern = "(.+)(.)"
  )

# A tibble: 8 × 3
  primary_id id    score
  <chr>      <chr> <chr>
1 A1         B1    good
2 A1         C1    NA
3 A1         D1    bad
4 A1         E1    NA     
5 A2         B2    bad
6 A2         C2    neutral
7 A2         D2    neutral
8 A2         E2    good

1
这里有一个类似于@Darren Tsai的解决方案,使用一个中间步骤来准备names_sep的名称,使用rename_with
library(dplyr)

have %>%
  rename_with(., ~sub("(.*)(\\d+)$", "\\1_\\2", .)) %>% 
  pivot_longer(
    -primary_id,
    names_to = c(".value", NA),
    names_sep = "_"
  )

 primary_id id    score  
  <chr>      <chr> <chr>  
1 A1         B1    good   
2 A1         C1    NA     
3 A1         D1    bad    
4 A1         E1    NA     
5 A2         B2    bad    
6 A2         C2    neutral
7 A2         D2    neutral
8 A2         E2    good  

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接