如何向numpy数组添加列

Question

如何向numpy数组添加列

65

我正在尝试向从recfromcsv创建的数组中添加一列。在这种情况下，它是一个数组：[210,8]（行，列）。

我想要添加第九列。空的或者填充零都可以。

from numpy import genfromtxt
from numpy import recfromcsv
import numpy as np
import time

if __name__ == '__main__':
 print("testing")
 my_data = recfromcsv('LIAB.ST.csv', delimiter='\t')
 array_size = my_data.size
 #my_data = np.append(my_data[:array_size],my_data[9:],0)

 new_col = np.sum(x,1).reshape((x.shape[0],1))
 np.append(x,new_col,1)

- user2130951

1

这个有什么问题吗？ - Fred Foo

无法正常工作的问题是，无论我尝试哪个版本，它都无法给出正确的尺寸。 - user2130951

8个回答

19

如果你有一个数组a，例如210行8列：

a = numpy.empty([210,8])

如果您想要添加第九列全为零的列，可以这样做：

b = numpy.append(a,numpy.zeros([len(a),1]),1)

- Lee

2

这会生成返回concatenate((arr, values), axis=axis)的值错误：所有输入数组必须具有相同数量的维度。 - user2130951

1

嗯嗯，我刚刚双重检查了一下。对我来说可以工作（使用IDLE - Python版本2.7）。 - Lee

也许是因为，正如@askewchan所建议的那样，你实际上有一个重新载入的问题？如果你使用numpy.genfromtxt或numpy.loadtxt导入，我认为这会起作用。 - Lee

2

如果列的形状为(X, )，那么在应用append之前必须使用.reshape(X, 1)。这种情况发生在使用data[:,1]提取列后。 - mateuszb

这对我来说在使用Python 3.5时有效，但是，是的，必须小心形状才能使用它。 - Thom Ives

1

np.append或者np.hstack期望添加的列是正确的形状，即N x 1。我们可以使用np.zeros来创建这个零列（或者np.ones来创建一个全为1的列），并将其附加到我们的原始矩阵（2D数组）中。

def append_zeros(x):
    zeros = np.zeros((len(x), 1))  # zeros column as 2D array
    return np.hstack((x, zeros))   # append column

- qwr

1

最简单的解决方案是使用 numpy.insert()。 np.insert()相比于np.append的优点是可以将新列插入到自定义索引中。

import numpy as np

X = np.arange(20).reshape(10,2)

X = np.insert(X, [0,2], np.random.rand(X.shape[0]*2).reshape(-1,2)*10, axis=1)
'''

- RyanAbnavi

重塑部分最后发生了什么？ - Oortone

0

与其他建议使用numpy.hstack的答案类似，但更易读：

import numpy as np

# declare 10 rows x 3 cols integer array of all 1s
arr = np.ones((10, 3), dtype=np.int64)

# get the number of rows in the original array (as if we didn't know it was 10 or it could be different in other cases)
numRows = arr.shape[0]
# declare the new array which will be the new column, integer array of all 0s so it's visually distinct from the original array
additionalColumn = np.zeros((numRows, 1), dtype=np.int64)

# use hstack to tack on the additionl column
result = np.hstack((arr, additionalColumn))

print(result)

结果：

$ python3 scratchpad.py 
[[1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]
 [1 1 1 0]]

- cdahms

0

这是一个更短的一行代码：

import numpy as np

data = np.random.rand(210, 8)
data = np.c_[data, np.zeros(len(data))]

我经常使用的一种方法是使用np.ones将点转换为齐次坐标。

- heethesh

0

可以像这样完成：

import numpy as np

# create a random matrix:
A = np.random.normal(size=(5,2))

# add a column of zeros to it:
print(np.hstack((A,np.zeros((A.shape[0],1)))))

通常情况下，如果A是一个m×n的矩阵，并且您需要添加一列，则必须创建一个n×1的零矩阵，然后使用“hstack”将零矩阵添加到矩阵A的右侧。

- aderchox

0

我用以下方式向矩阵数组添加一个新列：

Z = append([[1 for _ in range(0,len(Z))]], Z.T,0).T

也许它不是那么高效？

- Tomas

4

不要使用列表推导式，使用np.ones或np.ones_like:append([np.ones_like(Z)], Z.T, 0).T。 - askewchan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- askewchan · Accepted Answer

我认为你的问题在于你期望np.append会就地添加列，但实际上由于numpy数据存储方式的原因，它创建了连接数组的副本。

Returns
-------
append : ndarray
    A copy of `arr` with `values` appended to `axis`.  Note that `append`
    does not occur in-place: a new array is allocated and filled.  If
    `axis` is None, `out` is a flattened array.

所以您需要保存输出all_data = np.append(...)：

my_data = np.random.random((210,8)) #recfromcsv('LIAB.ST.csv', delimiter='\t')
new_col = my_data.sum(1)[...,None] # None keeps (n, 1) shape
new_col.shape
#(210,1)
all_data = np.append(my_data, new_col, 1)
all_data.shape
#(210,9)

其他方式：

all_data = np.hstack((my_data, new_col))
#or
all_data = np.concatenate((my_data, new_col), 1)

我相信这三个函数（以及 np.vstack）之间唯一的区别在于当 axis 未指定时它们的默认行为：

concatenate 假设 axis = 0
hstack 假设 axis = 1，除非输入是1d，则假设 axis = 0
vstack 在添加轴后假设 axis = 0 如果输入是1d
append 将数组展平

根据您的评论，并仔细查看您的示例代码，我现在认为您可能想要做的是向记录数组添加一个字段。您导入了genfromtxt，它返回一个结构化数组和recfromcsv，它返回略有不同的记录数组(recarray)。您使用了recfromcsv，因此当前my_data实际上是一个recarray，这意味着最可能my_data.shape = (210,)，因为recarrays是记录的一维数组，每个记录都是具有给定数据类型的元组。

所以你可以尝试这个：

import numpy as np
from numpy.lib.recfunctions import append_fields
x = np.random.random(10)
y = np.random.random(10)
z = np.random.random(10)
data = np.array( list(zip(x,y,z)), dtype=[('x',float),('y',float),('z',float)])
data = np.recarray(data.shape, data.dtype, buf=data)
data.shape
#(10,)
tot = data['x'] + data['y'] + data['z'] # sum(axis=1) won't work on recarray
tot.shape
#(10,)
all_data = append_fields(data, 'total', tot, usemask=False)
all_data
#array([(0.4374783740738456 , 0.04307289878861764, 0.021176067323686598, 0.5017273401861498),
#       (0.07622262416466963, 0.3962146058689695 , 0.27912715826653534 , 0.7515643883001745),
#       (0.30878532523061153, 0.8553768789387086 , 0.9577415585116588  , 2.121903762680979 ),
#       (0.5288343561208022 , 0.17048864443625933, 0.07915689716226904 , 0.7784798977193306),
#       (0.8804269791375121 , 0.45517504750917714, 0.1601389248542675  , 1.4957409515009568),
#       (0.9556552723429782 , 0.8884504475901043 , 0.6412854758843308  , 2.4853911958174133),
#       (0.0227638618687922 , 0.9295332854783015 , 0.3234597575660103  , 1.275756904913104 ),
#       (0.684075052174589  , 0.6654774682866273 , 0.5246593820025259  , 1.8742119024637423),
#       (0.9841793718333871 , 0.5813955915551511 , 0.39577520705133684 , 1.961350170439875 ),
#       (0.9889343795296571 , 0.22830104497714432, 0.20011292764078448 , 1.4173483521475858)], 
#      dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('total', '<f8')])
all_data.shape
#(10,)
all_data.dtype.names
#('x', 'y', 'z', 'total')