实际上,这里的问题是如果dtype是结构化的(即具有多个类型),
np.genfromtxt
和
np.loadtxt
都会返回一个
结构化数组。您的数组报告其形状为
(3,)
,因为从技术上讲它是一个'记录'的一维数组。这些'记录'保存了所有列,但您可以像访问二维数据一样访问所有数据。
您正在正确加载它:
In [82]: d = np.genfromtxt('tmp',dtype=None)
正如您报告的那样,它具有1d形状:
In [83]: d.shape
Out[83]: (3,)
但是你的所有数据都在那里:
In [84]: d
Out[84]:
array([ (38, 'Private', 215646, 'HS-grad', 9, 'Divorced', 'Handlers-cleaners', 'Not-in-family', 'White', 'Male', 0, 0, 40, 'United-States', '<=50K'),
(53, 'Private', 234721, '11th', 7, 'Married-civ-spouse', 'Handlers-cleaners', 'Husband', 'Black', 'Male', 0, 0, 40, 'United-States', '<=50K'),
(30, 'State-gov', 141297, 'Bachelors', 13, 'Married-civ-spouse', 'Prof-specialty', 'Husband', 'Asian-Pac-Islander', 'Male', 0, 0, 40, 'India', '>50K')],
dtype=[('f0', '<i8'), ('f1', 'S9'), ('f2', '<i8'), ('f3', 'S9'), ('f4', '<i8'), ('f5', 'S18'), ('f6', 'S17'), ('f7', 'S13'), ('f8', 'S18'), ('f9', 'S4'), ('f10', '<i8'), ('f11', '<i8'), ('f12', '<i8'), ('f13', 'S13'), ('f14', 'S5')])
数组的
dtype
结构如下:
In [85]: d.dtype
Out[85]: dtype([('f0', '<i8'), ('f1', 'S9'), ('f2', '<i8'), ('f3', 'S9'), ('f4', '<i8'), ('f5', 'S18'), ('f6', 'S17'), ('f7', 'S13'), ('f8', 'S18'), ('f9', 'S4'), ('f10', '<i8'), ('f11', '<i8'), ('f12', '<i8'), ('f13', 'S13'), ('f14', 'S5')])
您仍然可以使用dtype中给定的名称访问“列”(也称为字段):
In [86]: d['f0']
Out[86]: array([38, 53, 30])
In [87]: d['f1']
Out[87]:
array(['Private', 'Private', 'State-gov'],
dtype='|S9')
更加方便的做法是为字段命名:
In [104]: names = "age,military,id,edu,a,marital,job,fam,ethnicity,gender,b,c,d,country,income"
In [105]: d = np.genfromtxt('tmp',dtype=None, names=names)
现在您可以访问'age'
字段等:
In [106]: d['age']
Out[106]: array([38, 53, 30])
In [107]: d['income']
Out[107]:
array(['<=50K', '<=50K', '>50K'],
dtype='|S5')
或者35岁以下人群的收入
In [108]: d[d['age'] < 35]['income']
Out[108]:
array(['>50K'],
dtype='|S5')
并超过35
In [109]: d[d['age'] > 35]['income']
Out[109]:
array(['<=50K', '<=50K'],
dtype='|S5')