如何使用Shap正确地获取预测解释？

Question

如何使用Shap正确地获取预测解释？

6

我刚开始使用shap，所以我还在努力理解它。基本上，我有一个简单的sklearn.ensemble.RandomForestClassifier模型通过model.fit(X_train,y_train)训练等等。训练后，我想获取Shap值来解释在未知数据上的预测。根据文档和其他教程，这似乎是正确的方法：

explainer = shap.Explainer(model.predict, X_train)
shap_values = explainer.shap_values(X_test)

然而，这需要很长的时间才能运行（我的数据大约需要18小时）。如果我在第一行将model.predict替换为只有model，即：

explainer = shap.Explainer(model, X_train)
shap_values = explainer.shap_values(X_test)

这显著减少了运行时间（缩短到约40分钟左右）。所以我想知道在第二种情况下实际上得到了什么？

再次强调，我只想解释新的预测结果，但这种高昂的费用让我感到困惑 - 所以我肯定是做错了什么。

- radishapollo

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sergey Bushmanov · Accepted Answer

我认为你的问题已经包含了一个提示：

explainer = shap.Explainer(model.predict, X_train)
shap_values = explainer.shap_values(X_test)

这可能是一种精确算法，用于从函数中计算 Shapely 值，而且它非常昂贵。

explainer = shap.Explainer(model, X_train)
shap_values = explainer.shap_values(X_test)

平均值是从训练模型中轻松获得的预测结果。

为了证明第一个声明（第二个声明是事实），让我们研究Explainer类的源代码（链接）。

类定义：

class Explainer(Serializable):
    """ Uses Shapley values to explain any machine learning model or python function.
    This is the primary explainer interface for the SHAP library. It takes any combination
    of a model and masker and returns a callable subclass object that implements
    the particular estimation algorithm that was chosen.
    """

    def __init__(self, model, masker=None, link=links.identity, algorithm="auto", output_names=None, feature_names=None, linearize_link=True,
                 seed=None, **kwargs):
        """ Build a new explainer for the passed model.
        Parameters
        ----------
        model : object or function
            User supplied function or model object that takes a dataset of samples and
            computes the output of the model for those samples.

现在你知道可以将模型或函数作为第一个参数提供。

如果Pandas作为遮罩器被提供：

        if safe_isinstance(masker, "pandas.core.frame.DataFrame") or \
                ((safe_isinstance(masker, "numpy.ndarray") or sp.sparse.issparse(masker)) and len(masker.shape) == 2):
            if algorithm == "partition":
                self.masker = maskers.Partition(masker)
            else:
                self.masker = maskers.Independent(masker)

最后，如果提供了可调用对象：

                elif callable(self.model):
                    if issubclass(type(self.masker), maskers.Independent):
                        if self.masker.shape[1] <= 10:
                            algorithm = "exact"
                        else:
                            algorithm = "permutation"

希望你现在可以明白为什么第一个是一个准确的结果（因此需要长时间计算）。

现在回答您的问题：

“如何正确地使用Shap获取预测解释？”以及“那么我在第二种情况下到底得到了什么？”

如果您有一个由SHAP支持的模型（树状、线性或其他），请使用：

explainer = shap.Explainer(model, X_train)
shap_values = explainer.shap_values(X_test)

这些是从模型中提取的SHAP值，这也是为什么SHAP应运而生的原因。

如果不支持，使用第一种方法。

两种方法都应该得到类似的结果。