Sklearn：如何将数据提供给sklearn中的RandomForestClassifier？

Question

Sklearn：如何将数据提供给sklearn中的RandomForestClassifier？

4

我有这些数据：

print training_data
print labels

# prints

[[1, 0, 1, 1], [1, 1, 1, 1], [1, 0, 1, 1], [1, 1, 1, 0], [1, 1, 0, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 0,0], [1, 1, 1, 1], [1, 0, 1, 1]]
['a', 'b', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'b']

我试图将其提供给来自sklearn Python库的RandomForestClassifier。

classifier = RandomForestClassifier(n_estimators=10)
classifier.fit(training_data, labels)

但是收到了这个错误：

Traceback (most recent call last):
  File "learn.py", line 52, in <module>
    main()
  File "learn.py", line 48, in main
    classifier = train_classifier()
  File "learn.py", line 33, in train_classifier
    classifier.fit(training_data, labels)
  File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 348, in fit
    y = np.ascontiguousarray(y, dtype=DOUBLE)
  File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_bbcfcf6_20130307-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 419, in ascontiguousarray
    return array(a, dtype, copy=False, order='C', ndmin=1)
ValueError: could not convert string to float: a

我的猜测是我没有正确地为适配格式化数据。但我不明白为什么从文档中得到的答案看起来相当基础和简单。有人知道答案吗？

- David Williams

猜测一下，尝试使用数值代替字符值：例如，用0/1代替'a'/'b'。 - Matt

3

好的，我会翻译，但这将是一个巨大的失望，因为对于决策树来说，标签不需要是数字。我无法想象sklearn的作者会这样做。 - David Williams

可能是Scikit-Learn中的非整数类标签的重复问题。 - BrenBarn

2个回答

0

您可以使用numpy数组，分类器会自动识别，如下所示：

import numpy as np
from sklearn.ensemble import RandomForestClassifier
np_training = np.array(training_data)
np_labels = np.array(labels)

clf = RandomForestClassifier(n_estimators=20, max_depth=5)
clf.fit(np_training, np_labels)

应该可以运行

- user2750362

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Matt · Accepted Answer

尝试使用LabelEncoder提前转换标签。