您的文本文件可能存在重复项,这些重复项将覆盖字典中现有的键(Python 中哈希表的名称)。您可以创建一个唯一的键集,然后使用字典推导式填充字典。
sample_file.txt
a
b
c
c
Python代码
with open("sample_file.txt") as f:
keys = set(line.strip() for line in f.readlines())
my_dict = {key: 1 for key in keys if key}
>>> my_dict
{'a': 1, 'b': 1, 'c': 1}
这里是一个包含100万个随机10位字母的实现。时间相对较短,不到半秒。
import string
import numpy as np
letter_map = {n: letter for n, letter in enumerate(string.ascii_lowercase, 1)}
long_alpha_list = ["".join([letter_map[number] for number in row]) + "\n"
for row in np.random.random_integers(1, 26, (1000000, 10))]
>>> long_alpha_list[:5]
['mfeeidurfc\n',
'njbfzpunzi\n',
'yrazcjnegf\n',
'wpuxpaqhhs\n',
'fpncybprrn\n']
>>> len(long_alpha_list)
1000000
with open('sample_file.txt', 'wb') as f:
f.writelines(long_alpha_list)
with open("sample_file.txt") as f:
keys = set(line.strip() for line in f.readlines())
>>> %%timeit -n 10
>>> my_dict = {key: 1 for key in keys if key}
10 loops, best of 3: 379 ms per loop