Python中简单读取Fortran二进制数据并不那么简单

Question

Python中简单读取Fortran二进制数据并不那么简单

3

我有一个来自FORTRAN代码的二进制输出文件。想要在Python中读取它。（使用FORTRAN读取并输出文本以便Python读取不是选项。长话短说。）我可以以简单的方式读取第一条记录：

>>> binfile=open('myfile','rb')
>>> pad1=struct.unpack('i',binfile.read(4))[0]
>>> ver=struct.unpack('d',binfile.read(8))[0]
>>> pad2=struct.unpack('i',binfile.read(4))[0]
>>> pad1,ver,pad2
(8,3.13,8)

很好。但这是一个大文件，我需要更有效地处理它。所以我尝试：

>>> (pad1,ver,pad2)=struct.unpack('idi',binfile.read(16))

这段代码无法运行。报错信息告诉我，unpack函数需要一个长度为20的参数，但是这对我来说没有任何意义，因为根据我上一次检查的结果，4+8+4=16。当我不得不放弃并将数字16替换为20时，虽然程序可以运行，但是三个数字却被填充了数值垃圾。有人看到我错在哪里吗？谢谢！

- bob.sacamento

3个回答

6

由于对齐导致你得到的大小，尝试使用 struct.calcsize('idi') 来验证在对齐后实际大小是否为 20。如需不进行对齐的本机字节顺序，请指定 struct.calcsize('=idi') 并将其适应于你的示例。

有关 struct 模块的更多信息，请查看 http://docs.python.org/2/library/struct.html。

- mmgp

1

我建议使用数组来读取由FORTRAN编写的UNFORMATTED，SEQUENTIAL文件。您使用数组的具体示例如下：

import array
binfile=open('myfile','rb')
pad = array.array('i')
ver = array.array('d')
pad.fromfile(binfile,1)   # read the length of the record 
ver.fromfile(binfile,1)   # read the actual data written by FORTRAN
pad.fromfile(binfile,1)   # read the length of the record

如果您有编写整数和双精度数组的FORTRAN记录，这是非常普遍的，您的Python代码将类似于以下内容：

import array
binfile=open('myfile','rb')
pad = array.array('i')
my_integers = array.array('i')
my_floats = array.array('d')
number_of_integers = 1000 # replace with how many you need to read
number_of_floats = 10000 # replace with how many you need to read
pad.fromfile(binfile,1)   # read the length of the record
my_integers.fromfile(binfile,number_of_integers) # read the integer data
my_floats.fromfile(binfile,number_of_floats)     # read the double data
pad.fromfile(binfile,1)   # read the length of the record

最后一点是，如果文件中有字符，您也可以将其读入数组中，然后解码为字符串。类似这样的代码：

import array
binfile=open('myfile','rb')
pad = array.array('i')
my_characters = array.array('B')
number_of_characters = 63 # replace with number of characters to read
pad.fromfile(binfile,1)   # read the length of the record 
my_characters.fromfile(binfile,number_of_characters ) # read the data
my_string = my_characters.tobytes().decode(encoding='utf_8') 
pad.fromfile(binfile,1)   # read the length of the record

- Arjaan Buijk

@Tunaki 很有趣。我不知道“array”模块。我得去了解一下。谢谢！ - bob.sacamento

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Hristo Iliev · Accepted Answer

struct 模块主要用于与 C 结构交互，因此它会对数据成员进行对齐。 idi 对应以下 C 结构：

struct
{
   int int1;
   double double1;
   int int2;
}

双精度条目需要8字节的对齐方式，以便与大多数CPU负载操作高效（甚至正确）地运行。这就是为什么在int1和double1之间添加4个字节填充的原因，这将使结构体大小增加到20字节。 struct模块执行相同的填充，除非您通过在格式字符串开头添加<（在小端机器上）或>（在大端机器上），或者简单地添加= 来抑制填充：

>>> struct.unpack('idi', d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: unpack requires a string argument of length 20
>>> struct.unpack('<idi', d)
(-1345385859, 2038.0682530887993, 428226400)
>>> struct.unpack('=idi', d)
(-1345385859, 2038.0682530887993, 428226400)

（d是由16个随机字符组成的字符串。）