Python中扫描文件的替代方法，类似于R中的scan('file', what=list(...))是什么？

Question

Python中扫描文件的替代方法，类似于R中的scan('file', what=list(...))是什么？

3

我可以帮忙翻译，以下是内容：

我有一个文件，格式如下：

我想从这个文件中创建一个DataFrame（跳过前5行），如下所示：

x1   x2    y1  y2
0.00 0.00  0   1
0.00 0.01  0   1

因此，这些行被转换为列（其中每三行也分成两列，y1和y2）。

在R中，我是这样做的：

df = as.data.frame(scan(".../test.txt", what=list(x1=0, x2=0, y1=0, y2=0), skip=5))

我正在寻找一种Python替代方案（例如Pandas），用于扫描(file, what=list(...))函数。是否存在该替代方案，还是我需要编写更复杂的脚本？

- 2xu

3个回答

0

据我所知，我在这里http://pandas.pydata.org/pandas-docs/stable/io.html看不到任何选项来按照您的要求组织DataFrame；

但是您可以轻松实现它：

lines = open('YourDataFile.txt').read() # read the whole file
import re                               # import re
elems = re.split('\n| ', lines)[5:]     # split each element and exclude the first 5 
grouped = zip(*[iter(elems)]*4)          # group them 4 by 4
import pandas as pd                     # import pandas
df = pd.DataFrame(grouped)              # construct DataFrame
df.columns = ['x1', 'x2', 'y1', 'y2']   # columns names

它不够简洁，也不够优雅，但它清晰地表达了它的功能...

- Giupo

不错。得查一下 *iter(elems)*4 这一部分，但是找到了。而且我并不追求优雅，只是采用了蛮力方法 :-) - 2xu

还有一个错别字（elem而不是elems）。很高兴你理解了;) - Giupo

0

好的，这是我是如何做到的（实际上是 Jon 和 Giupo 的答案的组合，谢谢你们！）：

with open('myfile.txt') as file:
    data = file.read().split()[5:]
grouped = zip(*[iter(data)]*4)
import pandas as pd
df = pd.DataFrame(grouped)
df.columns = ['x1', 'x2', 'y1', 'y2']

- 2xu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jon Clements · Accepted Answer

您可以跳过前5个元素，然后取4个元素为一组构建Python列表，然后将其作为起点放入pandas...不过，如果pandas提供更好的解决方案也不会让我感到惊讶：

from itertools import islice, izip_longest

with open('input') as fin:
    # Skip header(s) at start
    after5 = islice(fin, 5, None)
    # Take remaining data and group it into groups of 4 lines each... The
    # first 2 are float data, the 3rd is two integers together, and the 4th
    # is the blank line between groups... We use izip_longest to ensure we
    # always have 4 items (padded with None if needs be)...
    for lines in izip_longest(*[iter(after5)] * 4):
            # Convert first two lines to float, and take 3rd line, split it and
            # convert to integers
        print map(float, lines[:2]) + map(int, lines[2].split())

#[0.0, 0.0, 0, 1]
#[0.0, 0.01, 0, 1]