如何在R中交换列和行条目

17
library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
           class = c("sports", "", "music, sports", ""),
           hobby = c("knitting, music, sports", "", "", "music"))
> dat1
  id         class                   hobby
1  1        sports knitting, music, sports
2  2                                      
3 34 music, sports                        
4 99                                 music

我有一个数据集,dat1,每行对应一个唯一的id。对于每个id,多个classhobby的输入以逗号分隔。

我想要交换该数据集的行和列,以获得以下结果:

     input class hobby
1   sports 1, 34     1
2 knitting           1
3    music    34 1, 99
在这个数据集中,每行对应来自dat1的唯一输入。现在,class和hobby列存储了来自dat1的相应id,每个id用逗号隔开。在R中有一种快速交换行和列的方法吗?
5个回答

9

Here is a data.table solution

Input

library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
                   class = c("sports", "", "music, sports", ""),
                   hobby = c("knitting, music, sports", "", "", "music"))
dat1
#>    id         class                   hobby
#> 1:  1        sports knitting, music, sports
#> 2:  2                                      
#> 3: 34 music, sports                        
#> 4: 99                                 music

数据预处理

# in long format
dt_melted <- melt.data.table(dat1, id.vars = "id", variable.name = "type", value.name = "value")
dt_melted
#>    id  type                   value
#> 1:  1 class                  sports
#> 2:  2 class                        
#> 3: 34 class           music, sports
#> 4: 99 class                        
#> 5:  1 hobby knitting, music, sports
#> 6:  2 hobby                        
#> 7: 34 hobby                        
#> 8: 99 hobby                   music

# split values by comma
dt_splitted <- dt_melted[, .(input = unlist(data.table::tstrsplit(value, ","))), by = .(id, type)]
dt_splitted
#>    id  type    input
#> 1:  1 class   sports
#> 2: 34 class    music
#> 3: 34 class   sports
#> 4:  1 hobby knitting
#> 5:  1 hobby    music
#> 6:  1 hobby   sports
#> 7: 99 hobby    music

最后一步 1

# bring back to desired wide format
dt_casted <- dcast.data.table(dt_splitted, 
                              formula = "input ~ type",
                              value.var = "id",
                              fun.aggregate = paste, 
                              collapse = ", ")
dt_casted
#>       input class hobby
#> 1: knitting           1
#> 2:    music    34 1, 99
#> 3:   sports 1, 34     1

最后一步2 (更加详细)

# combine ids by class/hobby
dt_splitted[, .(class = paste(id[type == "class"], collapse = ", "),
                hobby = paste(id[type == "hobby"], collapse = ", ")),
            by = .(input = trimws(input))]
#>       input class hobby
#> 1:   sports 1, 34     1
#> 2:    music    34 1, 99
#> 3: knitting           1

7

使用 dcast + melt 的另一个 data.table 选项

dcast(
  melt(dat1[, lapply(.SD, strsplit, ", "), id], "id")[
    ,
    .(input = unlist(value)),
    .(id, variable)
  ], input ~ variable,
  value.var = "id",
  fun = toString
)

这提供了

      input class hobby
1: knitting           1
2:    music    34 1, 99
3:   sports 1, 34     1

5

以下是一个快捷的 tidyverse 的方式:

library(dplyr)
library(tidyr)
dat1 %>% 
  pivot_longer(-id, values_to = "input") %>%
  separate_rows(input) %>% 
  filter(input != "") %>% 
  pivot_wider(names_from = "name", values_from = "id", values_fn = toString)

  input    class hobby
1 sports   1, 34 1    
2 knitting NA    1    
3 music    34    1, 99

4

这里有一个(不是太长的)tidyverse选项。问题在于将未来“input”列的值分散到多个列中,并且可以将多个值包含在同一字符串中。

首先使用pivot_longer将所有未来的input放到同一列中:

dat2 = dat1 %>% pivot_longer(!id)
dat2
# A tibble: 8 x 3
     id name  value                    
  <dbl> <chr> <chr>                    
1     1 class "sports"                 
2     1 hobby "knitting, music, sports"
3     2 class ""                       
4     2 hobby ""                       
5    34 class "music, sports"          
6    34 hobby ""                       
7    99 class ""                       
8    99 hobby "music" 

然后的方法是使用stringr包来拆分所有可能的input值,然后再进行旋转以得到所有组合。 然后过滤行以找到具有inputnchar(input)>0)的行。 最后按inputname分组,将多个id粘在一起id = paste0(id, collapse = ",")。最后,用pivot_wider重新调整数据框的格式为所需格式。
cbind(dat2, str_split(str_remove_all(dat2$value, ","), " ", simplify = T)) %>%
  select(!value) %>% 
  pivot_longer(!c("id", "name"), names_to = "name_2", values_to = "input") %>% 
  select(!name_2) %>% 
  filter(nchar(input)>0) %>%
  group_by(input, name) %>% 
  summarise(id = paste0(id, collapse = ",")) %>% 
  pivot_wider(names_from = name, values_from = id)

# A tibble: 3 x 3
# Groups:   input [3]
  input    hobby class
  <chr>    <chr> <chr>
1 knitting 1     NA   
2 music    1,99  34   
3 sports   1     1,34 

3

使用dplyr的另一种方法是通过调用strsplit将行中的值列出,然后展开它,进行长透视,再返回到宽透视,以反转这些值。

library(dplyr)


dat1 |>
  rowwise() |> 
  mutate(across(-id, ~ ifelse(nchar(.x), strsplit(.x, ", "), list("")))) |> 
  unnest(cols = -id) |> 
  pivot_longer(cols = class:hobby) |> 
  filter(value != "") |> 
  pivot_wider(id_cols = value, names_from = name, values_from = id, 
              values_fn = \(x) paste(unique(x), collapse = " ,"))

  value    class hobby
  <chr>    <chr> <chr>
1 sports   1 ,34 1    
2 knitting NA    1    
3 music    34    1 ,99

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接