将数据从文件导入到Python字典中

Question

将数据从文件导入到Python字典中

3

我希望能将文件导入字典以进行进一步处理。该文件包含用于自然语言处理的嵌入向量，格式如下：

the 0.011384 0.010512 -0.008450 -0.007628 0.000360 -0.010121 0.004674 -0.000076 
of 0.002954 0.004546 0.005513 -0.004026 0.002296 -0.016979 -0.011469 -0.009159 
and 0.004691 -0.012989 -0.003122 0.004786 -0.002907 0.000526 -0.006146 -0.003058
one 0.014722 -0.000810 0.003737 -0.001110 -0.011229 0.001577 -0.007403 -0.005355

我使用的代码是：

embeddingTable = {}

with open("D:\\Embedding\\test.txt") as f:
    for line in f:
       (key, val) = line.split()
       d[key] = val
print(embeddingTable)

错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-3612e9012ffe> in <module>()
 24 with open("D:\\Embedding\\test.txt") as f:
 25     for line in f:
---> 26        (key, val) = line.split()
 27        d[key] = val
 28 print(embeddingTable)

ValueError: too many values to unpack (expected 2)

我理解它期望的是2个值而不是9个，但是否有可能将单词作为键插入，向量作为值？

- Masyaf

把 d 替换成 embeddingTable 也是有道理的，不是吗？ - Tom Karzes

是的，当然，那只是一个草稿版本。无论如何，谢谢。 - Masyaf

@Masyaf，你有重复的键吗？ - Padraic Cunningham

你是什么意思？我对Python还比较新手 :D - Masyaf

3个回答

4

使用 csv 库来解析数据，然后使用字典推导式将值映射为浮点数。可以使用 map 函数对值进行转换。

import csv

with open("D:/Embedding/test.txt") as f:
    d = {k:list(map(float, vals)) for k, *vals in csv.reader(f,delimiter=" ")}

- Padraic Cunningham

3

如果你不能使用*操作符，因为你正在使用Python 2，你可以按照以下方式执行：

embeddingTable = {}
with open('test.txt') as f:
    for line in f:
       values = line.split()
       embeddingTable[values[0]] = values[1:]
print(embeddingTable)

如果您正在使用Python 3，请使用更优雅的*运算符。

- Rein

1

起初我认为这个答案毫无意义，因为它是在一个更优雅的答案发布近10分钟之后才发布的。另一方面，另一个答案仅适用于Python 3的较新版本，但您的答案适用于Python 2，并且OP可能需要迁移到Python 2，因为它对科学库的支持更好，所以+1。 - John Coleman

@JohnColeman 谢谢。我更新了我的答案，并注明如果你被困在Python 2中，这将是一个不错的选择。 - Rein

我不会使用“被卡住”的说法。现实情况是，Python 2 中有一个巨大且不断增长的代码库，这使它成为首选的 Python 版本，特别是如果你打算使用 numpy、pandas 等库（尽管现在大多数这些库都支持 Python 3，但在实践中帮助不大，因为使用这些库的代码通常仍然是 Python 2）。 - John Coleman

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- styvane · Accepted Answer

您需要使用 * 运算符

embeddingTable = {}
with open("D:\\Embedding\\test.txt") as f:
    for line in f:
       key, *values = line.split() # fix here
       embeddingTable[key] = [float(value) for value in values]
print(embeddingTable)