Numpy genfromtxt 迭代列

Question

Numpy genfromtxt 迭代列

3

我正在使用NumPy的genfromtext从CSV文件中获取列。
需要将每个列拆分并指定给单独的SQLAlchemy SystemRecord，同时结合其他列和属性，并添加到数据库。
在迭代f1到f9列并将它们添加到会话对象 session object 中时，哪种做法最佳？
到目前为止，我已经使用了以下代码，但我不想为每个f列做同样的事情：

t = np.genfromtxt(FILE_NAME,dtype=[(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20),(np.str_, 20), (np.str_, 20), (np.str_, 20),(np.str_, 20)]\
 ,delimiter=',',filling_values="None", skiprows=0,usecols=(0,1,2,3,4,5,6,7,8,9,10))

for r in enumerate(t):
    _acol = r['f1'].split('-')
    _bcol = r['f2'].split('-')
    ....
    arec = t_SystemRecords(first=_acol[0], second=_acol[1], third=_acol[2], ... )
    db.session.add(arec)
    db.session.commit()

- codervince

无法遍历t的转置，只需使用: for col in t.T: ... ？ - Saullo G. P. Castro

有趣，我会试一下。 - codervince

通常（总是？）genfromtxt 生成一个一维结构化数组。transpose 没有任何效果。 - hpaulj

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hpaulj · Accepted Answer

看一下t.dtype或者r.dtype。

创建一个样本结构化数组(即genfromtxt返回的内容)：

t = np.ones((5,), dtype='i4,i4,f8,S3')

看起来像这样：

array([(1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1'),
       (1, 1, 1.0, b'1'), (1, 1, 1.0, b'1')], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

dtype和dtype.names是：

In [135]: t.dtype
Out[135]: dtype([('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', 'S3')])

In [138]: t.dtype.names
Out[138]: ('f0', 'f1', 'f2', 'f3')

遍历名称以查看各个列：

In [139]: for n in t.dtype.names:
   .....:     print(t[n])
   .....:     
[1 1 1 1 1]
[1 1 1 1 1]
[ 1.  1.  1.  1.  1.]
[b'1' b'1' b'1' b'1' b'1']

在您的情况下，需要迭代“rows”，然后迭代名称：

In [140]: for i,r in enumerate(t):
   .....:     print(r)
   .....:     for n in r.dtype.names:
   .....:         print(r[n])
   .....:         
(1, 1, 1.0, b'1')
1
1
1.0
b'1'
(1, 1, 1.0, b'1')
...

对于r，它的形状是0d（检查r.shape），你可以按编号选择项目或进行迭代。

r[1]  # == r[r.dtype.names[1]]
for i in r: print(r)

对于一维数组t，这种方法不适用；t[1]会引用一个项目。

一维结构化数组的行为类似于二维数组但又有所不同。通常讨论中的“行”和“列”需要替换为“行”（或项目）和“字段”。

为了创建一个更接近你情况的t。

In [175]: txt=[b'one-1, two-23, three-12',b'four-ab, five-ss, six-ss']

In [176]: t=np.genfromtxt(txt,dtype=[(np.str_,20),(np.str_,20),(np.str_,20)])

In [177]: t
Out[177]: 
array([('one-1,', 'two-23,', 'three-12'),
       ('four-ab,', 'five-ss,', 'six-ss')], 
      dtype=[('f0', '<U20'), ('f1', '<U20'), ('f2', '<U20')])

np.char拥有可应用于数组的字符串函数：

In [178]: np.char.split(t['f0'],'-')
Out[178]: array([['one', '1,'], ['four', 'ab,']], dtype=object)

它在结构化数组上不起作用，但可以在单个字段上工作。该输出可以被索引为列表的列表（它不是二维的）。