使用R进行数据转换

4
这是我的原始输出。
BizDate     Description                             TotalAmount TotalQty
2013-09-01 NIN JIOM COUGH SYRUP 75ML                11.8        2
2013-09-01 BREACOL COUGH SYRUP ADULTS 120ML         15.8        2
2013-09-02 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    8.5         1
2013-09-03 AFRICAN SEA COCONUT COUGH SYRUP 177ML    8.2         1
2013-09-01 THREE LEGS COOLING WATER 200ML           21.0        14
2013-09-01 SEAHORSE BRAND COOLING WATER 200ML       4.4         4
2013-09-05 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    8.5         1
2013-09-06 AFRICAN SEA COCONUT COUGH SYRUP 177ML    8.2         1

我该如何将其转换为类似于以下内容的东西

 Description                              2013-09-01   2013-09-02 .......
 NIN JIOM COUGH SYRUP 75ML                some number some number .......
 BREACOL COUGH SYRUP ADULTS 120ML         some number some number .......
 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    some number some number .......
 AFRICAN SEA COCONUT COUGH SYRUP 177ML    some number some number .......
 THREE LEGS COOLING WATER 200ML           some number some number .......
 SEAHORSE BRAND COOLING WATER 200ML       some number some number .......
 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    some number some number .......
 AFRICAN SEA COCONUT COUGH SYRUP 177ML    some number some number .......

这里是一些虚假数据。

set.seed(1)
dat <- data.frame(product = rep(letters[1:4], each = 4)
    , date = sample(seq(as.Date("2013-01-01"), as.Date("2013-01-10"), by = 1), 16, TRUE)
    , amount = sample(1:100, 16, TRUE)
    , qty = sample(1:4, 16, TRUE))

dat
   product       date amount qty
1        a 2013-01-03     72   2
2        a 2013-01-04    100   1
3        a 2013-01-06     39   4
4        a 2013-01-10     78   3
5        b 2013-01-03     94   4
6        b 2013-01-09     22   1
7        b 2013-01-10     66   3
8        b 2013-01-07     13   2
9        c 2013-01-07     27   4
10       c 2013-01-01     39   3
11       c 2013-01-03      2   4
12       c 2013-01-02     39   3
13       d 2013-01-07     87   3
14       d 2013-01-04     35   4
15       d 2013-01-08     49   1
16       d 2013-01-05     60   2

我这样做有些愚蠢。有更聪明的方法吗?我在考虑使用plyr。请直接指向正确的包或方法。

dat1<-subset(dat,date=="2013-01-01")
colnames(dat1)[c(3,4)]<-c("2013-01-01.amount","2013-01-01.qty")
dat1<-dat1[-2]
dat2<-subset(dat,date=="2013-01-02")
colnames(dat2)[c(3,4)]<-c("2013-01-02.amount","2013-01-02.qty")
dat2<-dat2[-2]
dat3<-subset(dat,date=="2013-01-03")
colnames(dat3)[c(3,4)]<-c("2013-01-03.amount","2013-01-03.qty")
dat3<-dat3[-2]

new.dat<-merge(dat1,dat2,by="product",all.x=T,all.y=T)
new.dat<-merge(new.dat,dat3,by="product",all.x=T,all.y=T)
new.dat[is.na(new.dat)]<-0

new.dat
  product 2013-01-01.amount 2013-01-01.qty 2013-01-02.amount 2013-01-02.qty
1       a                 0              0                 0              0
2       b                 0              0                 0              0
3       c                39              3                39              3
  2013-01-03.amount 2013-01-03.qty
1                72              2
2                94              4
3                 2              4

它将创建n行和61列。谢谢。 - chee.work.stuff
5
你尝试过什么?你查看了哪些包?你是否为解决方案自行研究了一些内容?你认为哪些函数在这里可能有用?请表现出一点意愿。这不是一个“我提供数据,你编写代码”的网站。 - Simon O'Hanlon
我正在使用plyr,但还没有成功。我尝试了很多方法。我找到了一种愚蠢的做法,通过子集化日期并手动重命名它们。然后再将它们连接起来。我只需要一个更聪明的方法。抱歉。 - chee.work.stuff
1
请在问题中添加您尝试过的内容,以及为什么它没有起作用以及您期望的结果。如果您的真实数据太大,请使用一个玩具示例。当有可供参考的内容时,人们更愿意提供帮助。 - Simon O'Hanlon
2
+1 更新问题,干得好! - Simon O'Hanlon
2个回答

4
考虑到您当前数据的形式,只需使用R语言的基本函数reshape:
> reshape(dat, direction = "wide", idvar="product", timevar="date")
   product amount.2013-01-03 qty.2013-01-03 amount.2013-01-04 qty.2013-01-04 amount.2013-01-06
1        a                72              2               100              1                39
5        b                94              4                NA             NA                NA
9        c                 2              4                NA             NA                NA
13       d                NA             NA                35              4                NA
   qty.2013-01-06 amount.2013-01-10 qty.2013-01-10 amount.2013-01-09 qty.2013-01-09
1               4                78              3                NA             NA
5              NA                66              3                22              1
9              NA                NA             NA                NA             NA
13             NA                NA             NA                NA             NA
   amount.2013-01-07 qty.2013-01-07 amount.2013-01-01 qty.2013-01-01 amount.2013-01-02
1                 NA             NA                NA             NA                NA
5                 13              2                NA             NA                NA
9                 27              4                39              3                39
13                87              3                NA             NA                NA
   qty.2013-01-02 amount.2013-01-08 qty.2013-01-08 amount.2013-01-05 qty.2013-01-05
1              NA                NA             NA                NA             NA
5              NA                NA             NA                NA             NA
9               3                NA             NA                NA             NA
13             NA                49              1                60              2

另外,您也可以考虑使用"reshape2"软件包:

library(reshape2)
datL <- melt(dat, id.vars=c("product", "date"))
dcast(datL, product ~ date + variable, value.var="value")

3

试试这个:

#Dummy data
df <- read.table(text="
BizDate Description TotalAmount TotalQty
2013-09-01 NIN 11.8 2
2013-09-01 BREACOL 15.8 2
2013-09-02 AFRICAN 8.5 1
2013-09-03 AFRICAN 8.2 1
2013-09-01 THREE 21.0 14
2013-09-01 SEAHORSE 4.4 4
2013-09-05 AFRICAN 8.5 1
2013-09-06 AFRICAN 8.2 1",
                 header=TRUE)

library(reshape2)

dcast(df,Description~BizDate,value="TotalAmount")
#Description 2013-09-01 2013-09-02 2013-09-03 2013-09-05 2013-09-06
#1     AFRICAN         NA        8.5        8.2        8.5        8.2
#2     BREACOL       15.8         NA         NA         NA         NA
#3         NIN       11.8         NA         NA         NA         NA
#4    SEAHORSE        4.4         NA         NA         NA         NA
#5       THREE       21.0         NA         NA         NA         NA
dcast(df,Description~BizDate,value="TotalQty")
#Description 2013-09-01 2013-09-02 2013-09-03 2013-09-05 2013-09-06
#1     AFRICAN         NA          1          1          1          1
#2     BREACOL          2         NA         NA         NA         NA
#3         NIN          2         NA         NA         NA         NA
#4    SEAHORSE          4         NA         NA         NA         NA
#5       THREE         14         NA         NA         NA         NA

@chee.work.stuff,我不认为“reshape”是正确的包,因为它已经不再积极开发,并且有一个跟进它的包。zx8754,我建议您更新到“reshape2”。 - A5C1D2H2I1M1N2O1R2T1
@AnandaMahto 更新了。acast vs dcast https://dev59.com/kmcs5IYBdhLWcg3w8oTK - zx8754
2
+1 更新。我知道这只是一个字母的问题,但两者之间存在一些相当重要的差异,为了更好地保护代码的“未来性”和提高效率,我建议您进行更新 :-) - A5C1D2H2I1M1N2O1R2T1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接