R编程:plyr如何使用ddply计算列中的值

8
我希望能够总结我的数据的通过/失败状态,如下所示。换句话说,我想告诉每种产品/类型的通过和失败案例数量。
library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)

以下命令返回通过和失败用例的总数,但我想要分别统计通过和失败的个数。
dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))

结果如下:

        product type N
 1      p1      t1   6
 2      p1      t2   6
 3      p2      t1   6
 4      p2      t2   6

理想的结果应该是:
         product type Pass Fail
 1       p1      t1   5    1
 2       p1      t2   3    3
 3       p2      t1   4    2
 4       p2      t2   3    3

我尝试了类似以下的操作:
 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )

但显然这是错误的,因为结果是不及格和及格的总数。

提前感谢您的建议! 敬礼, Riad。

2个回答

12

尝试:

dfSummary <- ddply(df, c("product", "type"), summarise, 
                   Pass=sum(result=="pass"), Fail=sum(result=="fail") )

这给我带来了结果:

  product type Pass Fail
1      p1   t1    5    1
2      p1   t2    3    3
3      p2   t1    4    2
4      p2   t2    3    3

解释:

  1. 你将数据集df提供给了ddply函数。
  2. ddply根据变量"product"和"type"进行拆分
    • 这会产生length(unique(product)) * length(unique(type))个子集(即数据df的所有组合)。
  3. 对于每个子集,ddply都会应用您提供的某些函数。 在本例中,您计算result=="pass"result=="fail"的数量。
  4. 现在,ddply会针对每个子集得到一些结果,即您要拆分的变量(product和type)以及您请求的结果(Pass和Fail)。
  5. 它将所有子集组合在一起并返回。

太好了,这正是我需要的!感谢您及时回答! - Riad

4
您也可以使用reshape2::dcast
library(reshape2)
dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result')
##   product type fail pass
## 1      p1   t1    1    5
## 2      p1   t2    3    3
## 3      p2   t1    2    4
## 4      p2   t2    3    3

非常感谢!这也能完成任务。 - Riad
比ddply快得多。谢谢 :) - AnksG

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接