Python循环遍历文本文件读取数据

Question

Python循环遍历文本文件读取数据

4

我是Python的新手，虽然我相信这可能是一个琐碎的问题，但我已经花了一整天尝试以不同的方式解决它。我有一个包含如下数据的文件:

<string>
<integer>
<N1>
<N2>
data
data
...
<string>
<integer>
<N3>
<N4>
data
data
...

这个需要读取多组数据，每一组数据包括若干行X、Y和Z的数值。如果只有一组数据，可以先读取所有数据，然后获取N1、N2的值，将其切片成X、Y和Z，并进行重塑处理。但是如果文件中包含多组数据，如何只读取下一组数据之前的一个字符串，然后对下一组数据进行相同的操作，直到读取完整个文件呢？

我尝试定义了一个函数：

def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if isinstance('line', str) or (not line):
                break
            for line in ifile:
                yield line

但是它没有工作，我得到的数组中没有数据。任何评论将不胜感激。谢谢！

- jealopez

这是一个XML文件吗？如果是的话，你可以使用Python内置的XML解析模块。 - John

这是一个纯文本文件。@johnthexiii。我想要所有的数据集，有些文件包含两个数据集，有些则更多。从每个数据集中，我需要使用X、Y和Z来创建一些图表，如果我手动创建只有一个数据集的独立文件"<string> \n <integer>\n <N1>\n <N2> data ...."，那么我已经成功地做到了。我希望能够读取一个数据集，直到达到下一个<string>（并使用它的数据），然后读取下一个数据集的数据，直到达到下一个<string>，以此类推，直到文件结束。谢谢！ - jealopez

3个回答

3

使用这样的结构化数据，我建议只阅读您需要的内容。例如：

with open("inpfile.txt", "r") as ifile:
    first_string = ifile.readline().strip() # Is this the name of the data set?
    first_integer = int(ifile.readline()) # You haven't told us what this is, either
    n_one = int(ifile.readline())
    n_two = int(ifile.readline())

    x_vals = []
    y_vals = []
    z_vals = []

    for index in range(n_one):
         x_vals.append(ifile.readline().strip())
    for index in range(n_two):
         y_vals.append(ifile.readline().strip())
    for index in range(n_one*n_two):
         z_vals.append(ifile.readline().strip())

您可以通过添加循环并yield值将其转换为数据集生成函数：

with open("inpfile.txt", "r") as ifile:
    while True:
        first_string = ifile.readline().strip() # Is this the name of the data set?
        if first_string == '':
            break
        first_integer = int(ifile.readline()) # You haven't told us what this is, either
        n_one = int(ifile.readline())
        n_two = int(ifile.readline())

        x_vals = []
        y_vals = []
        z_vals = []

        for index in range(n_one):
            x_vals.append(ifile.readline().strip())
        for index in range(n_two):
            y_vals.append(ifile.readline().strip())
        for index in range(n_one*n_two):
            z_vals.append(ifile.readline().strip())
        yield (x_vals, y_vals, z_vals) # and the first string and integer if you need those

- Rob Watts

非常感谢！如果我只对第一组数据感兴趣，那么这将是处理它的方法。但我想遍历整个文件，并能够将每组数据放入不同的数组中（比如说将第一个和第二个<string>之间的数据放入data1中，将第二个和第三个<string>之间的数据放入data2中，以此类推，直到文件末尾）。"first_integer"是一个整数，与生成该特定数据集的过程有关，因此我只对n_one和n_two感兴趣... - jealopez

但我认为一旦我理解了如何将字符串之间的数据放入数组中，我就更容易弄清楚如何将n_one、n_two等读取为整数。谢谢。 - jealopez

是的，第一个字符串（以及每个字符串）都是相应数据集的名称。 - jealopez

1

def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if isinstance('line', str) or (not line): # 'line' is always a str, and so is the line itself
                break 
            for line in ifile:
                yield line

Change this to:

def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if not line:
                break
            yield line

- Rushy Panchal

“not line” 很可能永远不会成为“True”; 除了最后一行之外，其他所有行都会有一个换行符，即使是最后一行也不会是空的。 - Martijn Pieters

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

所有的行都是str实例，因此你可以在第一行中退出。删除该测试，并首先通过去除空格来测试空行：

def dat_fun():
    with open("inpfile.txt", "r") as ifile:
        for line in ifile:
            if not line.strip():
                break
            yield line

我认为你不需要在空行处断开，因为for循环会在文件末尾自动结束。

如果你的行包含其他类型的数据，则需要自己进行转换，从字符串中进行转换。