如何从决策树中计算错误率？

Question

如何从决策树中计算错误率？

rclassificationdecision-treerpart

32

有人知道如何使用R计算决策树的错误率吗？我正在使用rpart()函数。

- teo6389

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- chl · Accepted Answer

假设你是指在拟合模型时计算样本上的误差率，您可以使用printcp()。例如，使用在线示例，

> library(rpart)
> fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
> printcp(fit)

Classification tree:
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)

Variables actually used in tree construction:
[1] Age   Start

Root node error: 17/81 = 0.20988

n= 81 

        CP nsplit rel error  xerror    xstd
1 0.176471      0   1.00000 1.00000 0.21559
2 0.019608      1   0.82353 0.82353 0.20018
3 0.010000      4   0.76471 0.82353 0.20018

“Root node error”被用来计算预测性能的两个指标，考虑到在“rel error”和“xerror”列中显示的值，并取决于复杂参数（第一列）：

0.76471 x 0.20988 = 0.1604973（16.0％）是 内部误差率（即在训练样本上计算出的误差率）-- 这大致相当于
```
class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis)
1-sum(diag(class.pred))/sum(class.pred)
```
0.82353 x 0.20988 = 0.1728425 (17.2%)是交叉验证误差率（使用10折交叉验证，参见rpart.control()中的xval；但也请参考xpred.rpart()和依赖此类度量的plotcp()）。该度量是更客观的预测精度指标。

请注意，它与 tree 的分类精度基本一致：

> library(tree)
> summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis))

Classification tree:
tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)
Number of terminal nodes:  10 
Residual mean deviance:  0.5809 = 41.24 / 71 
Misclassification error rate: 0.1235 = 10 / 81

误分类错误率是从训练样本中计算出来的。