重塑宽格式为多列长格式,即将数据从横向排列转换为纵向排列。

17

我想对一个宽格式数据集进行重塑,该数据集包含在3个时间点上测量的多个测试:

   ID   Test Year   Fall Spring Winter
    1   1   2008    15      16      19
    1   1   2009    12      13      27
    1   2   2008    22      22      24
    1   2   2009    10      14      20
    2   1   2008    12      13      25
    2   1   2009    16      14      21
    2   2   2008    13      11      29
    2   2   2009    23      20      26
    3   1   2008    11      12      22
    3   1   2009    13      11      27
    3   2   2008    17      12      23
    3   2   2009    14      9       31

将测试按列分隔到数据集中,但将测量时间转换为长格式,在每个新列中都是这样的:

    ID  Year    Time        Test1 Test2
    1   2008    Fall        15      22
    1   2008    Spring      16      22
    1   2008    Winter      19      24
    1   2009    Fall        12      10
    1   2009    Spring      13      14
    1   2009    Winter      27      20
    2   2008    Fall        12      13
    2   2008    Spring      13      11
    2   2008    Winter      25      29
    2   2009    Fall        16      23
    2   2009    Spring      14      20
    2   2009    Winter      21      26
    3   2008    Fall        11      17
    3   2008    Spring      12      12
    3   2008    Winter      22      23
    3   2009    Fall        13      14
    3   2009    Spring      11      9
    3   2009    Winter      27      31

我曾尝试使用reshape和melt但没有成功。现有的帖子都是关于将数据转换为单列结果的。

4个回答

18

使用reshape2

# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")

更新:自1.9.0版起,使用data.table的melt/cast方法:

data.table自版本1.9.0开始导入reshape2包,并在C中为data.tables实现了快速的meltdcast方法。下面是在大数据上速度比较的结果。

有关新闻的更多信息,请点击这里

require(data.table) ## ver. >=1.9.0
require(reshape2)

dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")

目前,您必须显式地编写dcast.data.table,因为它在reshape2中还不是S3通用函数。


在更大的数据上进行基准测试:

# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE), 
        Test = sample(1e3, 1e7, TRUE), 
        Year = sample(2008:2014, 1e7,TRUE), 
        Fall = sample(50, 1e7, TRUE), 
        Spring = sample(50, 1e7,TRUE), 
        Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)

reshape2 重塑时间:

reshape2_melt <- function(df) {
    df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
#   user  system elapsed 
# 43.319   4.909  48.932 

df.m <- transform(df.m, Test = paste0("Test", Test))

reshape2_cast <- function(df) {
    dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
#   user  system elapsed 
# 57.728   9.712  69.573 

data.table时间:

DT_melt <- function(dt) {
    dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
#   user  system elapsed 
#  0.276   0.001   0.279 

dt.m[, Test := paste0("Test", Test)]

DT_cast <- function(dt) {
    dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
#   user  system elapsed 
# 12.732   0.825  14.006 

melt.data.tablereshape2:::melt快大约175倍dcast.data.tablereshape2:::dcast约5倍


4
哇,我不知道你可以用data.table做那个。 - IRTFM
1
@Arun,希望你不介意,我在你的答案中添加了基准测试。(我认为将其作为单独的答案发布是不正确的) - Ricardo Saporta
1
@Arun,使用melt(df, id.var = c("ID", "Test", "Year"), variable.name="Time")更优雅地执行reshape2解决方案,并消除names(df.m)[names(df.m)=="variable"] <- "Time"行。与Ricardo不同,我没有勇气自己进行此编辑。 - Ista
@RicardoSaporta - 你觉得有兴趣对我的基本 R 解决方案与另外两个进行基准测试吗? - thelatemail
@thelatemail,你可以做到!!我相信你;) - Ricardo Saporta

4

如果您使用基础R,这是另一个使用“stack + reshape”例程的好选择。假设我们的数据集称为“mydf”:

mydf.temp <- data.frame(mydf[1:3], stack(mydf[4:6]))
mydf2 <- reshape(mydf.temp, direction = "wide", 
                 idvar=c("ID", "Year", "ind"), 
                 timevar="Test")
names(mydf2) <- c("ID", "Year", "Time", "Test1", "Test2")
mydf2
#    ID Year   Time Test1 Test2
# 1   1 2008   Fall    15    22
# 2   1 2009   Fall    12    10
# 5   2 2008   Fall    12    13
# 6   2 2009   Fall    16    23
# 9   3 2008   Fall    11    17
# 10  3 2009   Fall    13    14
# 13  1 2008 Spring    16    22
# 14  1 2009 Spring    13    14
# 17  2 2008 Spring    13    11
# 18  2 2009 Spring    14    20
# 21  3 2008 Spring    12    12
# 22  3 2009 Spring    11     9
# 25  1 2008 Winter    19    24
# 26  1 2009 Winter    27    20
# 29  2 2008 Winter    25    29
# 30  2 2009 Winter    21    26
# 33  3 2008 Winter    22    23
# 34  3 2009 Winter    27    31

3

tidyverse/tidyr 解决方案:

library(dplyr)
library(tidyr)

df %>% 
  gather("Time", "Value", Fall, Spring, Winter) %>% 
  spread(Test, Value, sep = "")

这也是一个很好的解决方案。感谢您将其翻译到电视上。 - Sam

3

以下是基础的reshape函数替代方法。虽然这需要使用两次reshape,但可能存在更简单的方法。

假设您的数据集名为df1

tmp <- reshape(df1,idvar=c("ID","Year"),timevar="Test",direction="wide")
result <- reshape(
   tmp,
   idvar=c("ID","Year"),
   varying=list(3:5,6:8),
   v.names=c("Test1","Test2"),
   times=c("Fall","Spring","Winter"),
   direction="long"
)

这将会得到:

> result
              ID Year   time Test1 Test2
1.2008.Fall    1 2008   Fall    15    22
1.2009.Fall    1 2009   Fall    12    10
2.2008.Fall    2 2008   Fall    12    13
2.2009.Fall    2 2009   Fall    16    23
3.2008.Fall    3 2008   Fall    11    17
3.2009.Fall    3 2009   Fall    13    14
1.2008.Spring  1 2008 Spring    16    22
1.2009.Spring  1 2009 Spring    13    14
2.2008.Spring  2 2008 Spring    13    11
2.2009.Spring  2 2009 Spring    14    20
3.2008.Spring  3 2008 Spring    12    12
3.2009.Spring  3 2009 Spring    11     9
1.2008.Winter  1 2008 Winter    19    24
1.2009.Winter  1 2009 Winter    27    20
2.2008.Winter  2 2008 Winter    25    29
2.2009.Winter  2 2009 Winter    21    26
3.2008.Winter  3 2008 Winter    22    23
3.2009.Winter  3 2009 Winter    27    31

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接