如何将这个宽数据框转换为长数据框？

Question

如何将这个宽数据框转换为长数据框？

4

如何将这个宽数据框转换为:

# A tibble: 2 x 7
  name  question_1  question_1_response question_2    question_2_response question_3 question_3_response
  <chr> <chr>                     <dbl> <chr>                       <dbl> <chr>                    <dbl>
1 ken   PC1,PC2,PC4                 4.5 PC3,MK1,MK2                   3.5 SBP1,SBP5                    5
2 hello PC1,PC5                     4   MK1,SBP1,SBP2                 4   NA                          NA

转换成这样？

# A tibble: 13 x 3
   name  subcomp value
   <chr> <chr>   <dbl>
 1 ken   PC1       4.5
 2 ken   PC2       4.5
 3 ken   PC4       4.5
 4 ken   PC3       3.5
 5 ken   MK1       3.5
 6 ken   MK2       3.5
 7 ken   SBP1      5  
 8 ken   SBP5      5  
 9 hello PC1       4  
10 hello PC5       4  
11 hello MK1       4  
12 hello SBP1      4  
13 hello SBP2      4

示例数据：

library(tidyverse)
test <- tribble(
  ~name, ~question_1, ~question_1_response, ~question_2, ~question_2_response, ~question_3, ~question_3_response,
  "ken", "PC1,PC2,PC4", 4.5, "PC3,MK1,MK2", 3.5, "SBP1,SBP5", 5,
  "hello", "PC1,PC5", 4, "MK1,SBP1,SBP2", 4, NA, NA
)

我尝试使用gather/separate/spread，但无法完全理解它。

非常感谢！

- KKW

2个回答

0

涉及到编程的一个选项是使用 dplyr、tidyr 和 purrr：

map_dfr(.x = split.default(test[-1], ceiling(1:length(test[-1])/2)),
        ~ .x %>%
         rowid_to_column() %>%
         separate_rows(2) %>%
         setNames(c("rowid", "subcomb", "value"))) %>%
 left_join(test %>%
            rowid_to_column() %>%
            select(rowid, name), by = c("rowid" = "rowid")) %>%
 filter(!is.na(subcomb))

   rowid subcomb value name 
   <int> <chr>   <dbl> <chr>
 1     1 PC1       4.5 ken  
 2     1 PC2       4.5 ken  
 3     1 PC4       4.5 ken  
 4     2 PC1       4   hello
 5     2 PC5       4   hello
 6     1 PC3       3.5 ken  
 7     1 MK1       3.5 ken  
 8     1 MK2       3.5 ken  
 9     2 MK1       4   hello
10     2 SBP1      4   hello
11     2 SBP2      4   hello
12     1 SBP1      5   ken  
13     1 SBP5      5   ken

- tmfmnk

@akrun，我之前注意到过你的这种行为，但现在我真的很好奇它是什么意思。为什么你会在一段时间后重新点赞并对别人的帖子进行随机编辑？ - tmfmnk

1

没问题，我原以为它意思不同。不过，你的帖子在这里确实更好。 - tmfmnk

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

我们可以使用str_replace来重命名'response'列(这里我们捕获了_后面的数字(\\d+)和单词(\\w+)作为一组，并在替换中以相反的顺序指定捕获组的反向引用(\\1, \\2))，并使用pivot_longer将其转换为'long'格式。

library(dplyr)
library(tidyr)
library(stringr)
test %>%
    rename_at(vars(ends_with('response')),
        ~ str_replace(., '_(\\d+)_(\\w+)', '\\2_\\1')) %>% 
    pivot_longer(cols = -name, names_to = c('.value', 'group'), 
         names_sep="_", values_drop_na = TRUE) %>% 
    separate_rows(question)%>% 
    select(name, subcomp = question, value = questionresponse)
# A tibble: 13 x 3
#   name  subcomp value
#   <chr> <chr>   <dbl>
# 1 ken   PC1       4.5
# 2 ken   PC2       4.5
# 3 ken   PC4       4.5
# 4 ken   PC3       3.5
# 5 ken   MK1       3.5
# 6 ken   MK2       3.5
# 7 ken   SBP1      5  
# 8 ken   SBP5      5  
# 9 hello PC1       4  
#10 hello PC5       4  
#11 hello MK1       4  
#12 hello SBP1      4  
#13 hello SBP2      4

或者使用data.table

library(data.table)
library(splitstackshape)
melt(setDT(test), measure = patterns("\\d+$", "response$"), 
    value.name = c("subcomp", 'value'), na.rm = TRUE)[, 
       cSplit(.SD, "subcomp", ",", "long")][, variable := NULL][]
#    name subcomp value
# 1:   ken     PC1   4.5
# 2:   ken     PC2   4.5
# 3:   ken     PC4   4.5
# 4: hello     PC1   4.0
# 5: hello     PC5   4.0
# 6:   ken     PC3   3.5
# 7:   ken     MK1   3.5
# 8:   ken     MK2   3.5
# 9: hello     MK1   4.0
#10: hello    SBP1   4.0
#11: hello    SBP2   4.0
#12:   ken    SBP1   5.0
#13:   ken    SBP5   5.0