获取对应的类别以进行predict_proba（GridSearchCV sklearn）

Question

获取对应的类别以进行predict_proba（GridSearchCV sklearn）

5

我正在使用GridSearchCV和管道来对一些文本文档进行分类。以下是代码片段：

clf = Pipeline([('vect', TfidfVectorizer()), ('clf', SVC())])
parameters = {'vect__ngram_range' : [(1,2)], 'vect__min_df' : [2], 'vect__stop_words' : ['english'],
                  'vect__lowercase' : [True], 'vect__norm' : ['l2'], 'vect__analyzer' : ['word'], 'vect__binary' : [True], 
                  'clf__kernel' : ['rbf'], 'clf__C' : [100], 'clf__gamma' : [0.01], 'clf__probability' : [True]} 
grid_search = GridSearchCV(clf, parameters, n_jobs = -2, refit = True, cv = 10)
grid_search.fit(corpus, labels)

我的问题是，在使用grid_search.predict_proba(new_doc)时，想要通过grid_search.classes_查看概率对应的类别，但是出现了以下错误：

AttributeError: 'GridSearchCV'对象没有属性'classes_'。

我错过了什么？我以为如果管道中的最后一个“步骤”是分类器，则GridSearchCV的返回值也是分类器。因此可以使用该分类器的属性，例如classes_。

- Josefine

2个回答

8

尝试使用grid_search.best_estimator_.classes_。 GridSearchCV的返回是一个GridSearchCV实例，它本身并不是一个估计器。相反，它为尝试的每个参数组合实例化一个新的估计器（请参阅文档）。

当refit=True时，您可以使用predict或predict_proba等方法，因此可能认为返回值是分类器，但是GridSearchCV.predict_proba实际上看起来像这样（源代码中的剧透）：

def predict_proba(self, X):
    """Call predict_proba on the estimator with the best found parameters.
    Only available if ``refit=True`` and the underlying estimator supports
    ``predict_proba``.
    Parameters
    -----------
    X : indexable, length n_samples
        Must fulfill the input assumptions of the
        underlying estimator.
    """
    return self.best_estimator_.predict_proba(X)

希望这能帮到你。

- ldirer

´grid_search.best_estimator_.classes_´没有起作用。我得到了一个错误，说管道没有称为classes_的属性。然而，我设法找到了一个解决方案（请参见答案）。 - Josefine

好的。我本以为会出现这种情况，但是我用了一个类似于你的例子后发现它对我有用。grid_search.best_estimator_是一个Pipeline对象，但我仍然可以获得grid_search.best_estimator_.classes_。不过，我正在使用开发版本。或者，您可以使用steps属性访问管道的每个步骤：dict(grid_search.best_estimator_.steps)["clf"].classes_应该适用于您。 - ldirer

好的，那或许就是不同之处了。我之前找到的解决方案几乎相同，只是我直接使用了 named_steps 而没有在使用 steps 属性时创建字典（请参阅答案）。感谢您的帮助！ - Josefine

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Josefine · Accepted Answer

如上面评论中所提到的，grid_search.best_estimator_.classes_返回一个错误消息，因为它返回了一个没有属性.classes_的管道。然而，通过首先调用管道的步骤分类器，我能够使用类属性。以下是解决方案：

grid_search.best_estimator_.named_steps['clf'].classes_