如何将这个文本文件转换成字典？

Question

如何将这个文本文件转换成字典？

4

我有一个文件f，看起来像这样：

#labelA
there
is
something
here
#label_Bbb
here
aswell
...

它可以有许多标签和任意数量的元素（只有str）在一行上，并且每个标签可以有多行。我想将这些数据存储在一个字典中，如下所示：

d = {'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell', ...}

我有几个子问题：

如何使用#字符以知道何时有一个新条目？
如何删除它并保留直到行尾的内容？
如何将紧随其后的每个字符串附加到新行上，直到再次出现#字符？
如何在文件结束时停止？

- nyw

1

循环遍历文件中的每一行并检查它是否以#字符开头。使用str[1：]获取剩余字符串，并将其作为字典中的键。然后将每个其他行附加到该键，直到找到另一个＃。 - gen_Eric

标签是否可以出现多次？我的意思是，是否会被其他标签中断？那么预期的结果是什么？ - Alfe

7个回答

2

使用 re.findall() 函数的最短解决方案：

import re 

with open("lines.txt", 'r') as fh:
    d = {k:v.replace('\n', '') for k,v in re.findall(r'^#(\w+)\s([^#]+)', fh.read(), re.M)}

print(d)

输出结果:

{'label_Bbb': 'hereaswell', 'labelA': 'thereissomethinghere'}

re.findall 将返回一个元组列表，每个元组都包含两个项目，表示两个连续的捕获组。

- RomanPerekhrest

很棒的答案！非常整洁！+1 - Haifeng Zhang

我喜欢正则表达式，真的。我已经用它们做了一些恶心的事情，可能会让你脸红。但我从来不会将像 r'^#(\w+)\s([^#]+)' 这样的东西称为整洁和干净。 - Alfe

2

f = open('untitled.txt', 'r')

line = f.readline()
d = {}
last_key = None
last_element = ''
while line:
    if line.startswith('#'):
        if last_key:
            d[last_key] = last_element
            last_element = ''
        last_key = line[:-1]
        last_element = ''
    else:
        last_element += line
    line = f.readline()

d[last_key] = last_element

- llulai

1

最好使用 with 来自动关闭文件。你的解决方案会留下一个未关闭的文件。 - Alfe

1

使用`collections.defaultdict`：

from collections import defaultdict

d = defaultdict(list)

with open('f.txt') as file:
    for line in file:
        if line.startswith('#'):
            key = line.lstrip('#').rstrip('\n')
        else:
            d[key].append(line.rstrip('\n'))
for key in d:
    d[key] = ''.join(d[key])

- Uriel

1

作为一次单独的通行，无需制作中间字典：

res = {}
with open("sample") as lines:
    try:
        line = lines.next()
        while True:
            entry = ""
            if line.startswith("#"):
                next = lines.next()
                while not next.startswith("#"):
                    entry += next
                    next = lines.next()
            res[line[1:]] = entry
            line = next
    except StopIteration:
        res[line[1:]] = entry  # Catch the last entry

- TemporalWolf

1

我会这样做（这是伪代码，所以它不会编译！）

dict = dict()
key = read_line()[1:]
while not end_file():
    text = ""
    line = read_line()
    while(line[0] != "#" and not end_file()):
        text += line
        line = read_line()

    dict[key] = text
    key = line[1:]

- Joseph Chotard

1

这是我的方法：

def eachChunk(stream):
  key = None
  for line in stream:
    if line.startswith('#'):
      line = line.rstrip('\n')
      if key:
        yield key, value
      key = line[1:]
      value = ''
    else:
      value += line
  yield key, value

你可以像这样快速创建所需的字典：

with open('f') as data:
  d = dict(eachChunk(data))

- Alfe

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Haifeng Zhang · Accepted Answer

首先，mydict 包含以 # 开头的键，值是一个列表（列表可以保持其追加顺序），我们将行附加到此列表中，直到找到下一行以 # 开头。然后，我们只需要将行列表转换为单个字符串即可。

如果您使用的是 Python3，则可以使用 mydict.items() 进行键值对迭代。如果您使用的是 Python2，请使用 mydict.iteritems()。

mydict = dict()
with open("sample.csv") as inputs:
    for line in inputs:
        if line.startswith("#"):
            key = line.strip()[1:]
            mydict.setdefault(key,list())
        else:
            mydict[key].append(line.strip())

result = dict()
for key, vlist in mydict.items():
    result[key] = "".join(vlist)

print(result)

输出：

{'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell'}