将Python中的文本网格转换为嵌套字典

3

想象一下一个文本文件中的网格,就像这样:

  A  B  C
A 0  1  2
B 3  0  5
C 6  7  0

以下是将此内容转换为Python字典的最佳方法: ```python { "key1": "value1", "key2": "value2", "key3": "value3" } ``` 请注意,键和值可以根据需要更改。
{
  'A': {'A': 0, 'B':3, 'C':6},
  'B': {'A': 1, 'B':0, 'C':7},
  'B': {'A': 2, 'B':5, 'C':0}
}

所以我可以使用以下方式访问单元格:

matrix['A']['B'] # 3

我目前有一些非常粗糙的代码(请不要太苛刻地评价我):
matrix = {}
f = open(filepath, 'r')
lines = f.readlines()
keys = lines[0].split()

for key in keys:
    matrix[key] = {}

for line in lines[1:]:
    chars = line.split()
    key_a = chars[0]
    for i, c in enumerate(chars[1:]):
        key_b = keys[i-1]
        matrix[key_a][key_b] = int(c)

print matrix

# Outputs {'A': {'A': 1, 'C': 0, 'B': 2}, 'C': {'A': 7, 'C': 6, 'B': 0}, 'B': {'A': 0, 'C': 3, 'B': 5}}

虽然这种方法没有错,但是我已经很长时间没有接触Python了,有更好的方法吗?也许嵌套字典并不是最好的方法?

更新:

  1. 不幸的是,我需要使用纯Python实现,因此无法使用外部库(相信我,我很想用)。
  2. 将样例代码从伪代码更新为实际代码。惭愧地低下头。

你能发一下你已经有的代码吗? - Snakes and Coffee
1
完成,对眼睛造成不适表示歉意。 - Pete Hamilton
“虽然这样做并没有错...但有更好的方法吗?” - 你的问题最好发布在http://codereview.stackexchange.com上。 - Robᵩ
谢谢,我之前没看到过这个!以后有类似的问题会去那里问。 - Pete Hamilton
1个回答

4
您的代码是合理的,但这里有一个替代方案:
import collections
with open('grid_file.txt', 'r') as f:
    columns = next(f).split()
    matrix = collections.defaultdict(dict)
    for line in f:
        items = line.split()
        row, vals = items[0], items[1:]
        for col, val in zip(columns, vals):
            matrix[col][row] = int(val)
print(matrix)

产生

defaultdict(<type 'dict'>, {'A': {'A': 0, 'C': 6, 'B': 3}, 'C': {'A': 2, 'C': 0, 'B': 5}, 'B': {'A': 1, 'C': 7, 'B': 0}})

一些提示:
  • Use

    with open(...) as f
        ...
    

    instead of

    f = open(...)
    f.close()
    

    because the file handle is closed for you when Python leaves the with-block. By using with you'll never forget to close a filehandle, and even if an exception occurs, the filehandle will still be closed upon leaving the with-block.

  • Generally, it is better to avoid f.readlines() if you can. This slurps the entire file into a list. That can be onerous on memory, especially if the file is huge. Usually

    with open(...) as f:
        for line in f:
    

    can be used instead.

  • If you make matrix a collections.default(dict) then matrix[field] will be a dict by default. So you can skip the initialization:

    for key in keys:
        matrix[key] = {}
    
  • A defaultdict is a subclass of dict, so you can use it very much as you would a dict. If you don't like the way it prints or would like to stop matrix from automagically assigning an empty dict to matrix[key] for any key, you can convert the defaultdict back to a regular dict with:

    matrix = dict(matrix)
    
  • Avoid using numerical indices in for-loops if you can.

    for i, c in enumerate(chars[1:]):
    

    Although this is de rigueur for most C-like languages, Python has a better way: looping over the items themselves:

    for col, val in zip(columns, vals):
    

    This makes the code more readable, because it assigns a variable name to the object you are actually interested in, not just an index which you then have to compose into things like keys[i-1]. It also helps you avoid "off-by-one" errors which can occur when you have to adjust the index by one, as is done in keys[i-1].


另一种可能是不使用嵌套字典,而是将2元组(列,行)作为键:

with open('grid_file.txt', 'r') as f:
    columns = next(f).split()
    matrix = {}
    for line in f:
        items = line.split()
        row, vals = items[0], items[1:]
        for col, val in zip(columns, vals):
            matrix[col, row] = int(val)
print(matrix)

产量
{('B', 'C'): 7, ('A', 'A'): 0, ('B', 'B'): 0, ('B', 'A'): 1, ('C', 'A'): 2, ('C', 'B'): 5, ('C', 'C'): 0, ('A', 'B'): 3, ('A', 'C'): 6}

然后你可以像这样访问矩阵中的 (列, 行):
print(matrix['A','C'])
# 6

顺便提一下,如果你安装了 pandas
import pandas as pd
import io

text = '''\
A  B  C
A 0  1  2
B 3  0  5
C 6  7  0'''

df = pd.read_table(io.BytesIO(text), sep='\s+')
print(df.to_dict())

产出
{'A': {'A': 0, 'B': 3, 'C': 6},
 'B': {'A': 1, 'B': 0, 'C': 7},
 'C': {'A': 2, 'B': 5, 'C': 0}}

谢谢回复,看起来很不错,我也看了pandas,真不错!但是, 烦人的是,在这种情况下我最好用原生Python编写。有什么想法吗? - Pete Hamilton
非常棒的答案,成功地使用了其中一些技巧来整理其他代码。真的很感激。如果可以的话,我会多次点赞的。顺便问一下,元组相对于字典是否会有速度访问或内存占用的影响? - Pete Hamilton
我会根据预期的使用情况选择数据结构。如果您总是想要在给定列和行的情况下检索单个值,则对我而言,matrix[col, row]似乎比matrix[col][row]更易读。速度上基本没有区别。(我使用IPython的%timeit命令计时了这两个方法)。 - unutbu
内存占用方面有一点小差异(由 pympler 测量)——字典嵌套字典的大小比使用元组作为键的字典稍小——但除非它成为瓶颈,否则我不会担心这个问题(或性能问题)。 - unutbu
谢谢,我只是感兴趣而已,这不会成为代码瓶颈,我只是好奇。 - Pete Hamilton

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接