决策树分类中,轴0的大小为1,但索引1超出了范围。

3

我对我的数据集进行了分类,它有超过46列和约500,000行。

import numpy as np 
import pandas as pd
from sklearn.cross_validation import train_test_split
%matplotlib inline

我在这里导入了数据集

df=pd.read_csv('Terror.csv', sep=',')
df.head()

在这里,我将列分为目标和训练。

column_target=['success']
column_train=['iyear','country','region','latitude','longitude','specificity','vicinity','doubtterr','alternative','attacktype1','multiple','targtype1','natlty1','gname_id']
x=df[column_train]
y=df[column_target]

我用NA填充了空行。

x['latitude']=x['latitude'].fillna(x['latitude'].median())
x['longitude']=x['longitude'].fillna(x['longitude'].median())
x['doubtterr']=x['doubtterr'].fillna(x['doubtterr'].median())
x['alternative']=x['alternative'].fillna(x['alternative'].median())
x['natlty1']=x['natlty1'].fillna(x['natlty1'].median())
x['natlty1']=x['natlty1'].fillna(x['natlty1'].median())

在这里,我将我的X和Y分为测试集和训练集。

x_train, x_test, y_train, y_test=train_test_split(x, y, test_size=0.33, 
random_state=42) 

尝试生成决策树图。
from sklearn import tree
Tree=tree.DecisionTreeClassifier()
Tree=Tree.fit(x_train,y_train)
import pydotplus
from IPython.display import Image
dot_data= tree.export_graphviz(Tree, out_file=None,feature_names=x_train.columns,class_names=y_train.columns,filled=True,rounded=True,special_characters=True,max_depth=10)
graph= pydotplus.graph_from_dot_data(dot_data)
Image(graph.create_png())

但是它给我返回了这个错误。
IndexError                                Traceback (most recent call last)
<ipython-input-42-1ac22988949f> in <module>()
1 import pydotplus
2 from IPython.display import Image
----> 3 dot_data= tree.export_graphviz(Tree, out_file=None, 

feature_names=x_train.columns,class_names=y_train.columns,
filled=True,rounded=True,special_characters=True,max_depth=10)
4 graph= pydotplus.graph_from_dot_data(dot_data)
5 Image(graph.create_png())

C:\Users\dell\Anaconda2\lib\site-packages\sklearn\tree\export.pyc in 
export_graphviz(decision_tree, out_file, max_depth, feature_names, class_names, label, filled, leaves_parallel, impurity, node_ids, proportion, rotate, rounded, special_characters)
431             recurse(decision_tree, 0, criterion="impurity")
432         else:
--> 433             recurse(decision_tree.tree_, 0, criterion=decision_tree.criterion)
434 
435         # If required, draw leaf nodes at same depth as each other

C:\Users\dell\Anaconda2\lib\site-packages\sklearn\tree\export.pyc in 
recurse(tree, node_id, criterion, parent, depth)
319             out_file.write('%d [label=%s'
320                            % (node_id,
--> 321                               node_to_str(tree, node_id, criterion)))
322 
323             if filled:

C:\Users\dell\Anaconda2\lib\site-packages\sklearn\tree\export.pyc in 
node_to_str(tree, node_id, criterion)
284                 node_string += 'class = '
285             if class_names is not True:
--> 286                 class_name = class_names[np.argmax(value)]
287             else:
288                 class_name = "y%s%s%s" % (characters[1],

C:\Users\dell\Anaconda2\lib\site-packages\pandas\indexes\base.pyc in 
__getitem__(self, key)
1421 
1422         if is_scalar(key):
-> 1423             return getitem(key)
1424 
1425         if isinstance(key, slice):

IndexError: index 1 is out of bounds for axis 0 with size 1

我不知道我的代码哪里出了问题,导致它没有给我那个决策树图像。

2个回答

0

我遇到了类似的问题,当我尝试使用tree_plotRandomForestClassifier中可视化决策树时。最终,我发现问题在于class_names的长度取决于你最终的分类数目,而不是df.columns, 的长度为1的success

例如,如果输入总共有5个类别,则class_names(一个str listarray)的长度必须等于5。


0
你可以将 "class_names=y_train.columns" 替换为 class_names = df.columns.values[df['success']所在的列数]. 这样应该可以解决你的问题。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接