每个类别中有超过三个元素,但我收到了这个错误消息:“在scikit-learn中类别不能少于k=3”。

16

这是我的目标 (y):

target = [7,1,2,2,3,5,4,
      1,3,1,4,4,6,6,
      7,5,7,8,8,8,5,
      3,3,6,2,7,7,1,
      10,3,7,10,4,10,
      2,2,2,7]

我不知道为什么在执行以下操作时:

...
# Split the data set in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)

# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                 'C': [1, 10, 100, 1000]},
                {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

scores = ['precision', 'recall']

for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    print()

    clf = GridSearchCV(SVC(C=1), tuned_parameters)#scoring non esiste
    # I get an error in the line below
    clf.fit(X_train, y_train, cv=5)
...

我收到了这个错误:

Traceback (most recent call last):
  File "C:\Python27\SVMpredictCROSSeGRID.py", line 232, in <module>
clf.fit(X_train, y_train, cv=5)  #The minimum number of labels for any class cannot be less than k=3.
File "C:\Python27\lib\site-packages\sklearn\grid_search.py", line 354, in fit
return self._fit(X, y)
File "C:\Python27\lib\site-packages\sklearn\grid_search.py", line 372, in _fit
cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1148, in check_cv
cv = StratifiedKFold(y, cv, indices=is_sparse)
File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 358, in __init__
" be less than k=%d." % (min_labels, k))
ValueError: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than k=3.
2个回答

19

该算法要求训练集中至少有3个实例的标签。尽管您的 target 数组包含每个标签的至少3个实例,但在将数据分割为训练和测试时,并非所有训练标签都具有3个实例。

您需要合并一些类标签或增加训练样本以解决问题。


1
你也可以传递一个“cv”参数,例如“KFold”。顺便问一下,你用的是哪个版本?我认为在sklearn的新版本中,StratifiedKFold(默认的cv)的输入验证变得不那么严格了。但要小心解释结果。它们可能并不那么有意义。 - Andreas Mueller
1
@AndreasMueller,在 StratifiedKFold 的输入验证方面,我还没有尝试过。我一定会去检查它。谢谢你的建议。 - jitendra

0

如果无法在每个折叠中保证每个类别的数量足够,则尝试更新Scikit库。

pip install -U scikit-learn

你会收到一个警告信息,这样你就可以运行代码了。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接