Python scikit-learn - TypeError

4
我正在编写一个小程序,用于绘制带有交叉验证的 SVM 和朴素贝叶斯算法的学习曲线。以下是绘图函数的代码:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.learning_curve import learning_curve

def plot_learning_curves(X, y, nb=GaussianNB, svc=SVC(kernel='linear'), ylim=None, cv=None, n_jobs=1,
                     train_sizes=np.linspace(.1, 1.0, 5)):
    plt.figure()
    plt.title('Learning Curves with NB and SVM')
    if ylim is not None:
        plt.ylim(*ylim)

    train_sizes_nb, test_scores_nb = learning_curve(
        nb, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
    test_scores_mean_nb = np.mean(test_scores_nb, axis=1)

    train_sizes_svc, test_scores_svc = learning_curve(
        svc, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
    test_scores_mean_svc = np.mean(test_scores_svc, axis=1)

    plt.grind()

    plt.plot(train_sizes_nb, test_scores_mean_nb, 'o-', color="g",
             label="NB")
    plt.plot(train_sizes_svc, test_scores_mean_svc,'o',color="r",label="SVM")    

return plt

这是函数调用:

digits = load_digits()
X, y = digits.data, digits.target

cv = cross_validation.ShuffleSplit(digits.data.shape[0], n_iter=100,
                               test_size=0.2, random_state=0)
plot_learning_curves(X, y, ylim=(0.7, 1.01), cv=cv,n_jobs=1)
plt.show()

我不知道问题出在哪里,但我收到了这个错误:

Traceback (most recent call last):
File "C:/Users/Gianmarco/PycharmProjects/Learning/plotLearningCurves.py", line 43, in <module>
plot_learning_curves(X, y, ylim=(0.7, 1.01), cv=cv,n_jobs=1)
File "C:/Users/Gianmarco/PycharmProjects/Learning/plotLearningCurves.py", line 19, in plot_learning_curves
nb, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\learning_curve.py", line 136, in learning_curve
for train, test in cv for n_train_samples in train_sizes_abs)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 652, in __call__
for function, args, kwargs in iterable:
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\learning_curve.py", line 136, in <genexpr>
for train, test in cv for n_train_samples in train_sizes_abs)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\base.py", line 45, in clone
new_object_params = estimator.get_params(deep=False)
TypeError: unbound method get_params() must be called with GaussianNB instance as first argument (got nothing instead)

Process finished with exit code 1

我不理解这行代码的含义:"TypeError: unbound method get_params() must be called with GaussianNB instance as first argument (got nothing instead)"

可能的解决方案是什么?


1
似乎sklearn的错误信息并不太好。 我不知道这个模块,所以不确定这是否有帮助: 它说你需要一个GaussianNB实例。也许你需要创建一个实例?把nb=GaussianNB改成nb=GaussianNB() - Håken Lid
2个回答

16

解决方案相当简单。它并不是

nb=GaussianNB

但是

nb=GaussianNB()

6
这个错误意味着方法get_params()收到了一个None而不是一个GaussianNB对象。该错误发生在sklearn模块的内部几个步骤之后。因此,很难调试确切的原因,除非使用调试工具并阅读sklearn源代码。如果您使用ipython,则%debug魔术命令对于调查这些类型的异常非常有用。查看您的代码,问题似乎可能是您将类GaussianNB传递给了sklearn.learning_curve.learning_curve()而不是该类的实例。从docs的learning_curve中可以看出:参数estimator:实现“fit”和“predict”方法的对象类型,每个验证都会克隆该类型的对象。
我觉得这有些模糊不清。但在示例代码中,使用的是一个GaussianNB实例,而不是一种类型。
除此之外,通常不建议使用可变默认参数。对象实例是可变的。这也会使您的代码更难以阅读和调试。
有这么多可选的关键字参数,像这样写可能更容易阅读。
def plot_learning_curves(x, y, ylim=None, **kwargs):
    """ Plots learning curves with NB and SVM """
    nb = kwargs.get('nb', GaussianNB())
    svc = kwargs.get('svc', SVC(kernel='linear'))
    train_sizes = kwargs.get('train_sizes', np.linspace(.1, 1.0, 5))     

你可能根本不需要那些关键字参数。看起来你是通过复制一些示例代码并添加自己的东西开始的。最好先简化示例代码,确保你理解正在发生的事情。

def plot_learning_curves(x, y, ylim=None):
    nb = GaussianNB()
    svc = SVC(kernel='linear')
    train_sizes = np.linspace(.1, 1.0, 5)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接