如何在Python中将字典转换为矩阵？

Question

如何在Python中将字典转换为矩阵？

7

我有这样一个字典：

{device1 : (news1, news2, ...), device2 : (news 2, news 4, ...)...}

请问如何在Python中将它们转换为2-D 0-1矩阵？格式如下：

         news1 news2 news3 news4
device1    1     1     0      0
device2    0     1     0      1
device3    1     0     0      1

- Yue Deng

你是想以给定的格式直接打印输出，还是希望将其放入一个列表（或者可能是列表的列表）中？你所说的转换为二维矩阵具体指的是什么？ - yeniv

@yeniv 嗯，我想将其转换为二进制矩阵，以便稍后执行一些矩阵操作，例如计算余弦相似度等。 - Spencer

3个回答

3

鉴于我认为之前的回答假设您的数据结构不同并且没有直接解决您的问题，我在此进行补充。

假设我正确理解了您的数据结构并且矩阵中索引的名称并不重要：

from sklearn.feature_extraction import DictVectorizer

dict = {'device1':['news1', 'news2'],
        'device2':['news2', 'news4'],
        'device3':['news1', 'news4']}

restructured = []

for key in dict:
    data_dict = {}
    for news in dict[key]:
        data_dict[news] = 1
    data_dict['news3'] = 0
    restructured.append(data_dict)

#restructured should now look like
'''
[{'news1':1, 'news2':1, 'news3':0},
 {'news2':1, 'news4':1, 'news3':0},
 {'news1':1, 'news4':1, 'news3':0}]
'''

dictvectorizer = DictVectorizer(sparse=False)
features = dictvectorizer.fit_transform(restructured)

print(features)

#output
'''
[[1, 1, 0, 0],
 [0, 1, 1, 0],
 [1, 0, 1, 0]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['news1', 'news2', 'news4', 'news3']
'''

- mgrogger

2

这里有另一种将字典转换为矩阵的选择：「最初的回答」。

# Load library
from sklearn.feature_extraction import DictVectorizer

# Our dictionary of data
data_dict = [{'Red': 2, 'Blue': 4},
             {'Red': 4, 'Blue': 3},
             {'Red': 1, 'Yellow': 2},
             {'Red': 2, 'Yellow': 2}]
# Create DictVectorizer object
dictvectorizer = DictVectorizer(sparse=False)

# Convert dictionary into feature matrix
features = dictvectorizer.fit_transform(data_dict)
print(features)
#output
'''
[[4. 2. 0.]
 [3. 4. 0.]
 [0. 1. 2.]
 [0. 2. 2.]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['Blue', 'Red', 'Yellow']
'''

- tolgabuyuktanir

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Robbie · Accepted Answer

以下是使用numpy包创建矩阵（或2D数组）的代码。请注意，由于字典不一定按输入顺序存储键/值，因此我们必须按顺序使用名称列表。

import numpy as np

dataDict = {'device1':(1,1,0,1), 'device2':(0,1,0,1), 'device3':(1,0,0,1)}
orderedNames = ['device1','device2','device3']

dataMatrix = np.array([dataDict[i] for i in orderedNames])

print dataMatrix

输出结果为：

[[1 1 0 1]
 [0 1 0 1]
 [1 0 0 1]]