如何在Python中将多个NumPy文件附加到一个NumPy文件中

Question

如何在Python中将多个NumPy文件附加到一个NumPy文件中

5

我试图将多个numpy文件合并成一个大的numpy文件，我尝试遵循以下两个链接：在Python中将多个numpy文件附加到一个大的numpy文件上和Python按给定顺序将多个文件附加到一个大文件中，这是我做的：

import matplotlib.pyplot as plt 
import numpy as np
import glob
import os, sys
fpath ="/home/user/Desktop/OutFileTraces.npy"
npyfilespath ="/home/user/Desktop/test"   
os.chdir(npyfilespath)
with open(fpath,'wb') as f_handle:
    for npfile in glob.glob("*.npy"):
        # Find the path of the file
        filepath = os.path.join(npyfilespath, npfile)
        print filepath
        # Load file
        dataArray= np.load(filepath)
        print dataArray
        np.save(f_handle,dataArray)
        dataArray= np.load(fpath)
        print dataArray

我有一个结果的例子：

/home/user/Desktop/Trace=96
[[ 0.01518007  0.01499514  0.01479736 ..., -0.00392216 -0.0039761
  -0.00402747]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=97
[[ 0.00614908  0.00581004  0.00549154 ..., -0.00814741 -0.00813457
  -0.00809347]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=98
[[-0.00291786 -0.00309509 -0.00329287 ..., -0.00809861 -0.00797789
  -0.00784175]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=99
[[-0.00379887 -0.00410453 -0.00438963 ..., -0.03497837 -0.0353842
  -0.03575151]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]

这行代表第一条跟踪：

[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
      -0.00762086]]

这是一直在重复的操作。

我两天前提出了第二个问题，起初我认为我已经有了最佳答案，但在尝试将模型打印并将最终文件'OutFileTraces.npy'批量保存后，我发现我的代码：

1/ 无法按顺序（trace0，trace1，trace2，...）打印文件夹“test”中的numpy文件。

2/ 只在文件中保存最后一条迹线，也就是说，当我打印或绘制OutFileTraces.npy时，我只发现了一条迹线，即第一条。

因此，我需要纠正我的代码，因为我真的被卡住了。如果你能帮助我，我将不胜感激。

提前致谢。

- nass9801

@http://stackoverflow.com/users/6626530/shijo，这是我的代码。 - nass9801

在您提供的第一个链接中引用的链接中，我探讨了使用多个“save”读取文件的方法，链接为https://dev59.com/OZTfa4cB1Zd3GeqPTqjd#35752728。 - hpaulj

@hpaulj，实际上我能够通过使用我的代码读取所有数据，问题仅出现在保存到文件时，它只保存了第一个文件。这是我的更新代码：https://dev59.com/Ip_ha4cB1Zd3GeqPtRmW - nass9801

为什么最后一个 load 要缩进？ - hpaulj

@hpaulj，您能否请看一下修改后的代码？ - nass9801

显示剩余2条评论

2个回答

2

如讨论的，我们可以在打开的文件中保存多次，也可以多次加载。虽然这并没有得到官方文档的认可，但是它确实是可行的。使用savez归档是保存多个数组的首选方法。

下面是一个示例：

In [777]: with open('multisave.npy','wb') as f:
     ...:     arr = np.arange(10)
     ...:     np.save(f, arr)
     ...:     arr = np.arange(20)
     ...:     np.save(f, arr)
     ...:     arr = np.ones((3,4))
     ...:     np.save(f, arr)
     ...:     
In [778]: ll multisave.npy
-rw-rw-r-- 1 paul 456 Feb 13 08:38 multisave.npy
In [779]: with open('multisave.npy','rb') as f:
     ...:     arr = np.load(f)
     ...:     print(arr)
     ...:     print(np.load(f))
     ...:     print(np.load(f))
     ...:     
[0 1 2 3 4 5 6 7 8 9]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]

这是一个保存相同形状数组列表的简单示例。

In [780]: traces = [np.arange(10),np.arange(10,20),np.arange(100,110)]
In [781]: traces
Out[781]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]
In [782]: arr = np.array(traces)
In [783]: arr
Out[783]: 
array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
       [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])

In [785]: np.save('mult1.npy', arr)

In [786]: data = np.load('mult1.npy')
In [787]: data
Out[787]: 
array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
       [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])
In [788]: list(data)
Out[788]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]

- hpaulj

非常感谢您的回答，但是我有一百万条追踪记录，这个解决方案并不实际，您同意我的观点吗？ - nass9801

为什么不呢？我本可以用几个循环来编写这个示例，从文件列表或数组中保存并加载到一个列表中。顺便问一下 - 这些“跟踪”大小都相同吗？如果是的话，它们可以连接成一个大数组，让你只需进行一次调用即可保存/加载。 - hpaulj

@hpauli，它们的大小都相同，大约为32.1 kB。 - nass9801

我添加了一个保存大小相同的数组列表的示例。 - hpaulj

谢谢@hpaulj，我会修改我的代码并告诉您结果。 - nass9801

亲爱的@hpaulj，您的想法很好用，我成功地将所有的跟踪信息放在了一个数组文件中，但是我无法绘制这个文件。我把问题放在另一个问题中，因为我不能在同一篇文章中提出两个问题：https://dev59.com/L1gQ5IYBdhLWcg3wym4Y ，您能帮我吗？ - nass9801

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Pierre de Buyl · Accepted Answer

Glob produces unordered lists. You need to sort explicitly with an extra line as the sorting procedure is in-place and does not return the list.
```
npfiles = glob.glob("*.npy")
npfiles.sort()
for npfile in npfiles:
    ...
```
NumPy files contain a single array. If you want to store several arrays in a single file you may have a look at .npz files with np.savez https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy.savez I have not seen this in use widely, so you may wish seriously to consider alternatives.
1. If your arrays are all of the same shape and store related data, you can make a larger array. Say that the current shape is (N_1, N_2) and that you have N_0 such arrays. A loop with
```
all_arrays = []
for npfile in npfiles:
    all_arrays.append(np.load(os.path.join(npyfilespath, npfile)))
all_arrays = np.array(all_arrays)
np.save(f_handle, all_array)
```
  will produce a file with a single array of shape (N_0, N_1, N_2)
2. If you need per-name access to the arrays, HDF5 files are a good match. See http://www.h5py.org/ (a full intro is too much for a SO reply, see the quick start guide http://docs.h5py.org/en/latest/quick.html)