使用Numpy数组表示图形

Question

使用Numpy数组表示图形

5

我正在接收以下格式的数据：

tail head
P01106  Q09472
P01106  Q13309
P62136  Q13616
P11831  P18146
P13569  P20823
P20823  P01100
...

有没有一种好的方法可以使用numpy数组将这些数据格式化为图形呢？我希望使用这个图来计算PageRank。

到目前为止，我已经做了：

import numpy as np
data = np.genfromtxt('wnt_edges.txt', skip_header=1, dtype=str)

我考虑使用来自Python中表示图形（数据结构）的方法的图数据结构，但在这种情况下似乎没有意义，因为我将进行矩阵乘法。

- Simon

1

你考虑过使用 networkx 吗？你可以轻松地将你的数组转换成边列表。 - DYZ

2个回答

3

我强烈推荐使用 networkx：

import networkx as nx
#make the graph 
G = nx.Graph([e for e in data])
#compute the pagerank 
nx.pagerank(G)
# output:
# {'P01100': 0.0770275315329843,  'P01106': 0.14594493693403143, 
# 'P11831': 0.1,  'P13569': 0.0770275315329843,  'P18146': 0.1, 
# 'P20823': 0.1459449369340315,  'P62136': 0.1,  'Q09472':
# 0.07702753153298428,  'Q13309': 0.07702753153298428,  'Q13616': 0.1}

只需要这些。 pagerank 文档在这里。

- Mahdi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MB-F · Accepted Answer

为了避免重复造轮子，建议您按照评论和其他答案中的建议使用networkx。

如果出于教育目的，您想要重新发明轮子，可以创建一个邻接矩阵。从该矩阵中可以计算出PageRank：

PageRank值是修改后的邻接矩阵的主右特征向量的条目。

由于邻接矩阵的每行/列代表一个节点，因此您需要枚举节点，以便每个节点由从0开始的唯一编号表示。

import numpy as np

data = np.array([['P01106', 'Q09472'],
                 ['P01106', 'Q13309'],
                 ['P62136', 'Q13616'],
                 ['P11831', 'P18146'],
                 ['P13569', 'P20823'],
                 ['P20823', 'P01100']])


nodes = np.unique(data)  # mapping node name --> index
noidx = {n: i for i, n in enumerate(nodes)}  # mapping node index --> name

n = nodes.size  # number of nodes

numdata = np.vectorize(noidx.get)(data)  # replace node id by node index

A = np.zeros((n, n))
for tail, head in numdata:
    A[tail, head] = 1
    #A[head, tail] = 1  # add this line for undirected graph

这将导致以下图形表示A:

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

例如，第5行第0列的1表示从节点5到节点0有一条边，对应于'P20823'-->'P01100'。使用nodes数组通过索引查找节点名称：

print(nodes)
['P01100' 'P01106' 'P11831' 'P13569' 'P18146' 'P20823' 'P62136' 'Q09472'
 'Q13309' 'Q13616']

如果节点很多但连接较少，则最好使用 稀疏矩阵 代替 A。但首先尝试使用密集矩阵，只有在内存或性能问题时才考虑切换到稀疏矩阵。