给定一个结构化的numpy数组,我想通过名称删除某些列,而不需要复制该数组。 我知道我可以这样做:
names = list(a.dtype.names)
if name_to_remove in names:
names.remove(name_to_remove)
a = a[names]
但这会创建一个临时数组副本,我希望避免这种情况,因为我正在处理的数组可能非常大。有没有好的方法可以做到这一点?
dtype
函数处理许多格式的参数;相关的格式在文档的一部分中描述,该部分称为“指定和构造数据类型”。向下滚动到以...开始的子节。{'names': ..., 'formats': ..., 'offsets': ..., 'titles': ..., 'itemsize': ...}
import numpy as np
def view_fields(a, names):
"""
`a` must be a numpy structured array.
`names` is the collection of field names to keep.
Returns a view of the array `a` (not a copy).
"""
dt = a.dtype
formats = [dt.fields[name][0] for name in names]
offsets = [dt.fields[name][1] for name in names]
itemsize = a.dtype.itemsize
newdt = np.dtype(dict(names=names,
formats=formats,
offsets=offsets,
itemsize=itemsize))
b = a.view(newdt)
return b
def remove_fields(a, names):
"""
`a` must be a numpy structured array.
`names` is the collection of field names to remove.
Returns a view of the array `a` (not a copy).
"""
dt = a.dtype
keep_names = [name for name in dt.names if name not in names]
return view_fields(a, keep_names)
In [297]: a
Out[297]:
array([(10.0, 13.5, 1248, -2), (20.0, 0.0, 0, 0), (30.0, 0.0, 0, 0),
(40.0, 0.0, 0, 0), (50.0, 0.0, 0, 999)],
dtype=[('x', '<f8'), ('y', '<f8'), ('i', '<i8'), ('j', '<i8')])
In [298]: b = remove_fields(a, ['i', 'j'])
In [299]: b
Out[299]:
array([(10.0, 13.5), (20.0, 0.0), (30.0, 0.0), (40.0, 0.0), (50.0, 0.0)],
dtype={'names':['x','y'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':32})
通过更改b[0]['x']
来验证b
是否是a
的视图(而不是副本)...
In [300]: b[0]['x'] = 3.14
并且可以看到 a
也被改变了:
In [301]: a[0]
Out[301]: (3.14, 13.5, 1248, -2)
TypeError: Cannot change data-type for object array.
- mapfview_fields
相同。 - hpaulj
a[names]
创建了原始数组的副本,将其分配给 a,然后才删除原始数组。我想避免这种复制。也许我应该以某种方式澄清我的问题? - Konstantin Schubert