在 ConfusionMatrix 中出现错误，数据和参考因素必须具有相同数量的级别。

Question

在 ConfusionMatrix 中出现错误，数据和参考因素必须具有相同数量的级别。

26

我用R caret训练了一颗树模型，现在尝试生成混淆矩阵时一直出现以下错误：

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

在生成混淆矩阵时出现了错误。这两个对象的级别是相同的。我无法弄清楚问题所在。它们的结构和级别如下。

它们应该是相同的。如果有帮助，将不胜感激，因为这让我崩溃了!!

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"

- user2987739

在您的错误中，“category”被拼写为“catgeory”。如果问题与此无关，那么identical(levels(predictionsTree),levels(testdata$category))的输出是什么？ - fxi

嗨，谢谢你指出那个愚蠢的拼写错误......哎呀!!! 我运行了相同的函数并输出结果为[1] TRUE 。但是当我运行confusionMatrix函数时，现在出现以下错误.....table(data, reference, dnn = dnn, ...) :所有参数必须具有相同的长度。 - user2987739

检查是否还有拼写错误的 catgeory，检查 length(testdata$category) 和 length(predictionsTree)，同时检查两个向量的摘要。如果您只想要一个简单的混淆矩阵：table(predictionsTree,testdata$category)。 - fxi

11个回答

5

也许您的模型没有预测某个因素。使用table()函数而不是confusionMatrix()来查看是否存在该问题。

- Red

4

你可以将此作为评论添加。 - Top Cat

我发现这很有帮助，但现在我在想，这两者之间似乎没有太大的区别。只是图形上的差异吗？ - Rich_Rich

如果是这种情况，那么我们如何优雅地修复或解决它？ - Nuclear03020704

4

尝试在na.action选项中指定na.pass：

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

- aristotll

2

将它们转换为数据框，然后在confusionMatrix函数中使用它们：

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)

my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)

# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))

confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))

- S. Think

1

我曾遇到同样的问题，但在阅读数据文件后，我进行了更改，如下所示：

data = na.omit(data)

感谢所有人的指引！

- Alicia

0

测试数据中可能存在缺失值，请在“predictionsTree <- predict(treeFit, testdata)”之前添加以下行以删除NAs。我曾经遇到过同样的错误，现在它对我起作用了。

testdata <- testdata[complete.cases(testdata),]

- EaswerC

0

请确保您已经安装了包及其所有的依赖项：

install.packages('caret', dependencies = TRUE)

confusionMatrix( table(prediction, true_value) )

- desval

0

你遇到的长度问题可能是由于训练集中存在NA值导致的——要么删除不完整的案例，要么填充缺失值。

- orange1

0

看看数据类型！我的问题是数据的类型是int，而引用的类型是num。它们需要相同的类型。

- Hannes

你的回答可以通过分享一个带有代码示例的解决方法来改进。一般来说，尽量具体明确。 - Rico

0

如果您的数据包含NA，则有时它会被视为因子级别，因此最好一开始就省略这些NA。

DF = na.omit(DF)

如果你的模型拟合预测了一些不正确的水平，那么最好使用表格。

confusionMatrix(table(Arg1, Arg2))

- Sanjay Nandakumar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mayk Tulio · Accepted Answer

尝试使用：

confusionMatrix(table(Argument 1, Argument 2))

那对我有用。