解读DecisionTreeRegressor得分？

Question

解读DecisionTreeRegressor得分？

pythonmachine-learningscikit-learndecision-treesupervised-learning

5

我正在尝试评估特征的相关性，并使用 DecisionTreeRegressor()

相关代码如下:

# TODO: Make a copy of the DataFrame, using the 'drop' function to drop the given feature
new_data = data.drop(['Frozen'], axis = 1)

# TODO: Split the data into training and testing sets(0.25) using the given feature as the target
# TODO: Set a random state.

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(new_data, data['Frozen'], test_size = 0.25, random_state = 1)

# TODO: Create a decision tree regressor and fit it to the training set

from sklearn.tree import DecisionTreeRegressor


regressor = DecisionTreeRegressor(random_state=1)
regressor.fit(X_train, y_train)

# TODO: Report the score of the prediction using the testing set

from sklearn.model_selection import cross_val_score


#score = cross_val_score(regressor, X_test, y_test)
score = regressor.score(X_test, y_test)

print score  # python 2.x

当我运行print函数时，它会返回给定的分数：

-0.649574327334

你可以在这里找到分数函数实现及一些解释：

返回预测的R^2决定系数。 ... 最佳得分是1.0，但它可能为负（因为模型可能变得任意糟糕）。

我还没有完全理解整个概念，所以这个解释对我来说并不是很有帮助。例如，我无法理解为什么分数可以为负，以及它确切表示了什么（如果某些东西被平方，我会期望它只能是正的）。

这个得分表示什么，为什么它可以是负数？

如果您想了解更多信息，阅读一些文章可能也会有所帮助！

- clockworks

请参考以下链接：https://en.wikipedia.org/wiki/Coefficient_of_determination 和 http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html。 - cs95

1

如果你了解系数的基础知识，那么你可以阅读这篇文章和那篇文章。第一篇文章已经足够理解背景故事了。好消息是，这也适用于初学者！ - E.Z

2个回答

0

本文介绍了如何使用cross_val_score函数，其中包含了DecisionTreeRegressor的实现。您可以查看scikitlearn的DecisionTreeRegressor文档。基本上，您看到的分数是R^2，或者是(1-u/v)。其中，u是您预测的残差平方和，v是总平方和（样本平方和）。

当您进行非常糟糕的预测时，u/v可能会任意大，而当u和v是平方残差和（>=0）时，它只能为零或更小。

- chrisckwong821

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Longyu Zhao · Accepted Answer

< p >根据其定义，R^2可以是负数（参见https://en.wikipedia.org/wiki/Coefficient_of_determination），如果模型拟合数据比水平线更差。基本上，这意味着：

R^2 = 1 - SS_res/SS_tot

而且SS_res和SS_tot总是正数。如果SS_res >> SS_tot，则R^2为负数。也可以参考这个答案：https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative