逐步将numpy数组附加到保存文件中

Question

逐步将numpy数组附加到保存文件中

pythonarraysnumpy

5

我尝试了Hpaulji概述的方法，但似乎没有起作用：

如何在Python中将许多numpy文件附加到一个numpy文件中

基本上，我正在遍历生成器，对数组进行一些更改，然后尝试保存每次迭代的数组。

这是我的示例代码：

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(filename, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

在这里，我将进行5次迭代，所以我希望保存5个不同的数组。

为了调试目的，我打印出了每个数组的一部分：

[ 0.  0.  0.  0.  0.]
[ 0.          3.37349415  0.          0.          1.62561738]
[  0.          20.28489304   0.           0.           0.        ]
[ 0.  0.  0.  0.  0.]
[  0.          21.98013496   0.           0.           0.        ]

但是当我尝试按照这里所述的多次加载数组（如何在Python中将多个Numpy文件附加到一个Numpy文件中）时，我遇到了EOFERROR错误：

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

它只输出最后一个数组，然后是EOFERROR:

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

我希望所有5个数组都能被保存，但是当我多次加载保存的.npy文件时，我只得到了最后一个数组。

那么，我应该如何保存和追加新数组到文件中？

编辑：使用“.npz”进行测试只会保存最后一个数组。

filename = 'testing.npz'

current_iteration = 0
with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.savez(f, prediction)



        current_iteration += 1
        if current_iteration == 5:
            break


#loading

    file = 'testing.npz'

    with open(file,'rb') as f:
        arr = np.load(f)
        print(arr.keys())


>>>['arr_0']

- Moondra

顺便说一句，我不知道你的数据有多大，但你尝试过使用HDF5吗？或者你只能使用.npy进行存储？ - jpp

我还没有尝试过HDF5。似乎它是更好的选择（我的数据大约有100,000张图片），但是我需要仔细研究文档，因为我对HDF5不太熟悉。 - Moondra

很抱歉，我无法回答你的问题，但是你可以查阅h5py文档，它的语法很容易上手，可以用来存储/追加数值数据，并且如果使用正确，速度也会很快。 - jpp

@jp_data_analysis 谢谢，我觉得我可能会转换到HDF5，因为它被更广泛地使用。 - Moondra

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- YSelf · Accepted Answer

所有的np.save调用都使用文件名而不是文件句柄。由于您不重复使用文件句柄，每次保存都会覆盖该文件，而不是将数组附加到其中。

这应该可以正常工作：

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(f, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

虽然在一个 .npy 文件中存储多个数组可能有优势（我想在内存有限的情况下会有优点），但它们技术上是用来存储单个数组的，您可以使用.npz文件 (np.savez 或 np.savez_compressed) 来存储多个数组:

filename = 'testing.npz'
predictions = []
for (x, _), index in zip(train_generator, range(5)):
    prediction = base_model.predict(x)
    predictions.append(prediction)
np.savez(filename, predictions) # will name it arr_0
# np.savez(filename, predictions=predictions) # would name it predictions
# np.savez(filename, *predictions) # would name it arr_0, arr_1, …, arr_4