如何将列中的因子展开为多个列 - 从而将数据集缩短一半?

3

我该如何将“strike”一列拆分为“kick_type”和“punch_type”,并将“damage”一列拆分为“kick damage”和“punch damage”?

我已经花了3个小时,但我无法想出如何进行拆分。请注意,我使用了pivot_longer从混乱的格式转换到当前格式,其中所有打击都是列,因此在此之前我完成了其他步骤,但仍然无法解决这个问题。

可复制的代码:

trial <- data.frame(fighter=c("Saenchai","Saenchai","Saenchai","Saenchai","Buakaw","Buakaw","Buakaw","Buakaw"), 
strike=rep(c("roundhouse_kick","side_kick","lefthook_punch","uppercut_punch")),
damage=c(0.7,0.8,0.6,0.3,0.9,0.5,0.7,0.1))

它应该看起来像这样,但我不知道如何到达那里:
fighter   kick_type         kick_damage   punch_type      punch_damage
Saenchai  roundhouse_kick   0.7           lefthook_punch  0.6
Saenchai  side_kick         0.8           uppercut_punch  0.3

如果您想将打击列分为kick_type和punch_type,则生成的punch_type必须是punch而不是lefthook_punch,因为lefthook是踢类型而不是拳类型。 - danlooo
3个回答

4

我相信有更好的方式,但这是一种不适合正则表达式的方法:

library(tidyverse)

trial %>% 
  pivot_wider(names_from = "strike", values_from = "damage") %>% 
  pivot_longer(ends_with('kick'), names_pattern = '(.*)_kick', names_to = "kick_type", values_to = "kick_damage") %>% 
  pivot_longer(ends_with('punch'), names_pattern = '(.*)_punch', names_to = "punch_type", values_to = "punch_damage") %>% 
  group_by(fighter) %>% 
  filter(row_number() == 1 | row_number() == n())

# A tibble: 4 x 5
# Groups:   fighter [2]
  fighter  kick_type  kick_damage punch_type punch_damage
  <chr>    <chr>            <dbl> <chr>             <dbl>
1 Saenchai roundhouse         0.7 lefthook            0.6
2 Saenchai side               0.8 uppercut            0.3
3 Buakaw   roundhouse         0.9 lefthook            0.7
4 Buakaw   side               0.5 uppercut            0.1

另一种更简单的方法是使用separate
trial %>%
  separate(strike, into = c("type", "move")) %>% 
  group_by(fighter, move) %>% 
  mutate(n = row_number()) %>% 
  pivot_wider(c(fighter, n), names_from = move, values_from = c(type, damage))

# A tibble: 4 x 6
# Groups:   fighter [2]
  fighter      n type_kick  type_punch damage_kick damage_punch
  <chr>    <int> <chr>      <chr>            <dbl>        <dbl>
1 Saenchai     1 roundhouse lefthook           0.7          0.6
2 Saenchai     2 side       uppercut           0.8          0.3
3 Buakaw       1 roundhouse lefthook           0.9          0.7
4 Buakaw       2 side       uppercut           0.5          0.1

1
哇,兄弟,这太棒了! - Pineapple

2
使用data.table解决方案(受@Maël答案启发)
library(data.table)

# data.table
trial <- data.table::data.table(fighter=c("Saenchai","Saenchai","Saenchai","Saenchai","Buakaw","Buakaw","Buakaw","Buakaw"), 
                                strike=rep(c("roundhouse_kick","side_kick","lefthook_punch","uppercut_punch")),
                                damage=c(0.7,0.8,0.6,0.3,0.9,0.5,0.7,0.1))

# pivot_wider() equivalent
trial <-  dcast(trial, fighter~strike, value.var="damage")

# pivot_longer() equivalent, punch
trial <- data.table::melt(data          = trial,
                            id.vars       = c("fighter",
                                              "roundhouse_kick","side_kick"),
                            measure.vars  = c("lefthook_punch",
                                              "uppercut_punch"),
                            value.name    = "punch_damage",
                            variable.name = "punch_type")

# pivot_longer() equivalent, kick
trial <- data.table::melt(data          = trial,
                            id.vars       = c("fighter", "punch_damage","punch_type"),
                            measure.vars  = c("roundhouse_kick","side_kick"),
                            value.name    = "kick_damage",
                            variable.name = "kick_type")

# Select first and last row by fighter
trial <- trial[
  j = .SD[unique(c(1,.N))],
  by = c("fighter")
]

1

这里是另一种 tidyverse 的方法。

library(tidyverse)

trial %>% 
  separate(strike, sep = "_", into = c("type", "attack")) %>% 
  pivot_wider(everything(), names_from = attack, names_glue = "{attack}_{.value}", 
              values_from = c("type", "damage"), values_fn = list) %>% 
  unnest(cols = !fighter) %>% 
  select(fighter, kick_type, kick_damage, punch_type, punch_damage)

# A tibble: 4 × 5
  fighter  kick_type  kick_damage punch_type punch_damage
  <chr>    <chr>            <dbl> <chr>             <dbl>
1 Saenchai roundhouse         0.7 lefthook            0.6
2 Saenchai side               0.8 uppercut            0.3
3 Buakaw   roundhouse         0.9 lefthook            0.7
4 Buakaw   side               0.5 uppercut            0.1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接