重构data.frame中的魔法

Question

重构data.frame中的魔法

rdataframereshape

6

我目前正在学习如何使用data.frame，并对如何重新排序它们感到困惑。

目前，我有一个数据框，它显示：

第1列：商店名称
第2列：产品
第3列：该商店购买此产品的数量

或者在视觉上类似于这样：

+---+-----------+-------+----------+--+
|   | Shop.Name | Items | Product  |  |
+---+-----------+-------+----------+--+
| 1 | Shop1     |     2 | Product1 |  |
| 2 | Shop1     |     4 | Product2 |  |
| 3 | Shop2     |     3 | Product1 |  |
| 4 | Shop3     |     2 | Product1 |  |
| 5 | Shop3     |     1 | Product4 |  |
+---+-----------+-------+----------+--+

我希望您能完成以下与商店相关的“以商店为中心”的结构：

第1列：商店名称
第2列：销售产品1的商品
第3列：销售产品2的商品
第4列：销售产品3的商品 ...

当某个商店/产品没有销售时，我想显示0。

+---+-------+-------+-------+-------+-------+-----+--+--+
|   | Shop  | Prod1 | Prod2 | Prod3 | Prod4 | ... |  |  |
+---+-------+-------+-------+-------+-------+-----+--+--+
| 1 | Shop1 |     2 |     4 |     0 |     0 | ... |  |  |
| 2 | Shop2 |     3 |     0 |     0 |     0 | ... |  |  |
| 3 | Shop3 |     2 |     0 |     0 |     1 | ... |  |  |
+---+-------+-------+-------+-------+-------+-----+--+--+

- xav

3

请查看reshape2包中的函数reshape或dcast。这些函数可以帮助你重塑数据格式。 - joran

https://dev59.com/HWkw5IYBdhLWcg3w1eC7#9617424 有很多种方法可以做到这一点。 - Aaron left Stack Overflow

3个回答

1

使用reshape2库中的dcast函数：

library(reshape2)

> df <- data.frame(Shop.Name=rep(c("Shop1","Shop2","Shop3"),each=3),
+                  Items=rpois(9,5),
+                  Product=c(rep(c("Prod1","Prod2","Prod3","Prod4"),2),"Prod5")
+ )
> df
  Shop.Name Items Product
1     Shop1     6   Prod1
2     Shop1     5   Prod2
3     Shop1     6   Prod3
4     Shop2     5   Prod4
5     Shop2     6   Prod1
6     Shop2     6   Prod2
7     Shop3     4   Prod3
8     Shop3     7   Prod4
9     Shop3     5   Prod5
> dcast(df,Shop.Name ~ Product,value.var="Items",fill=0)
  Shop.Name Prod1 Prod2 Prod3 Prod4 Prod5
1     Shop1     6     5     6     0     0
2     Shop2     6     6     0     5     0
3     Shop3     0     0     4     7     5

- Jonathan Christensen

不太对。请查看我的回答。 - A5C1D2H2I1M1N2O1R2T1

0

如果您因为任何原因想要使用原始的reshape包：

Shop.Name <- c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3")
Items <- c(2,4,3,2,1)
Product <- c("Product1", "Product2", "Product1", "Product1", "Product4")
(df <- data.frame(Shop.Name, Items, Product))

cast(df, formula = Shop.Name ~ Product, value="Items", fill=0)

- Redfoot

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

到目前为止，答案在一定程度上起作用，但并没有完全回答你的问题。特别是，它们没有解决某种情况下没有商店销售特定产品的问题。根据您的示例输入和所需输出，没有商店销售“Product3”。事实上，“Product3”甚至没有出现在您的源代码数据框中。此外，它们还没有解决每个商店+产品组合可能有多行的情况。

这是您的数据和迄今为止两个解决方案的修改版本。我为“Shop1”和“Product1”的组合添加了另一行。请注意，我已将您的产品转换为一个包括变量可以采取的级别的

 factor 变量，即使没有一个案例实际上具有该级别。mydf <- data.frame(
  Shop.Name = c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3", "Shop1"),
  Items = c(2, 4, 3, 2, 1, 2),
  Product = factor(
    c("Product1", "Product2", "Product1", "Product1", "Product4", "Product1"),
    levels = c("Product1", "Product2", "Product3", "Product4")))

dcast from "reshape2"
library(reshape2)
dcast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Using Product as value column: use value.var to override.
# Aggregation function missing: defaulting to length
# Error in .fun(.value[i], ...) : 
#   2 arguments passed to 'length' which requires 1
什么？突然不起作用了。请尝试这样做：
dcast(mydf, formula = Shop.Name ~ Product, 
      fill = 0, value.var = "Items", 
      fun.aggregate = sum, drop = FALSE)
#   Shop.Name Product1 Product2 Product3 Product4
# 1     Shop1        4        4        0        0
# 2     Shop2        3        0        0        0
# 3     Shop3        2        0        0        1
让我们变得有点老派。从“reshape”中使用cast转换
library(reshape)
cast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Aggregation requires fun.aggregate: length used as default
#   Shop.Name Product1 Product2 Product4
# 1     Shop1        2        1        0
# 2     Shop2        1        0        0
# 3     Shop3        1        0        1
额，又不是你想要的东西了...试试这个：
cast(mydf, formula = Shop.Name ~ Product, 
     value = "Items", fill = 0, 
     add.missing = TRUE, fun.aggregate = sum)
#   Shop.Name Product1 Product2 Product3 Product4
# 1     Shop1        4        4        0        0
# 2     Shop2        3        0        0        0
# 3     Shop3        2        0        0        1
回归基础。使用R语言的基础函数xtabs。xtabs(Items ~ Shop.Name + Product, mydf)
#          Product
# Shop.Name Product1 Product2 Product3 Product4
#     Shop1        4        4        0        0
#     Shop2        3        0        0        0
#     Shop3        2        0        0        1
或者，如果你更喜欢一个data.frame（注意你的“Shop.Name”变量已经被转换为data.frame的row.names）：
as.data.frame.matrix(xtabs(Items ~ Shop.Name + Product, mydf))
#       Product1 Product2 Product3 Product4
# Shop1        4        4        0        0
# Shop2        3        0        0        0
# Shop3        2        0        0        1