相邻连接顶点之间的最短路径长度（不是任意两个顶点！）

Question

相邻连接顶点之间的最短路径长度（不是任意两个顶点！）

pythonalgorithmnetworkxigraphshortest-path

3

我之前问过类似的问题，但是我认为表述不够清晰。我有一个无向且无权的图，包含10000个顶点和约5000000条边，我将它们作为边列表读入Python。

在我的工作中，我试图为每一条边构建一个函数，该函数依赖于每条边上相邻顶点之间的距离。假设我们有两个相连的顶点v1、v2表示一条边，对于v1，有n1个连接的邻居，对于v2，也有n2个连接的邻居。为了构建我的函数，我需要获取n1和n2邻居之间的距离。对于图中的所有边，该函数如下：

e_1*d_1 +e_1*d_2 +...+e_1*d_n+...e_m*d_1+...e_m*d_n

其中 n 是每个边缘上的两个顶点的邻居数，d_n 是这些顶点之间的距离，m 是图中边缘的数量，e_m 是该图中的最后一个边缘。

通常，如果我们想要获取最短路径长度，我们会考虑图遍历，例如 Dijkstra 算法或 BFS，特别是在我的图是无权图的情况下。我已经使用了许多包中已经编写好的函数（如 networkx 和 igraph），但是这些函数不够高效，对于我的图来说耗费时间太多。例如，函数 shortest_paths_dijkstra() 花费大约 6.9 小时才能得到距离，因为我需要多次调用它。另外，函数all_pairs_shortest_path_length（通过将路径长度固定为 3）需要大约 13 分钟，并且还需要另外 16 分钟来为图中每对邻居调用距离！

正如问题所述，我们需要获取 v1、v2 的邻居之间的距离，因此最大距离为 3，因为 v1、v2 是相连的。我感觉可以通过使用可能的路径距离（在我的情况下）为 0、1、2、3 的事实来减少时间复杂度，因此我不需要针对源和目标之间的每个路径遍历整个图！我只是在寻找一种聪明的方法来获取邻居之间的距离（而不是任意两个顶点之间的距离）！

我编写了这段代码，但它需要很长时间，大约 54 分钟，因此也不够高效！

neighbor1 = {}
neighbor2 = {}
distance = {}
for i in list(edges.values()):
  list_of_distances = []
  neighbor1 = tuple(graph.vs[graph.neighbors(i[0], mode="all")]["name"])
  neighbor2 = tuple(graph.vs[graph.neighbors(i[1], mode="all")]["name"])
  for n2 in neighbor2:
    for n1 in neighbor1:
       if n1 == n2:
            list_of_distances.append(0)
       elif (n1 != n2) and not graph.are_connected(n1,n2):
            if ( graph.are_connected(i[0],n2) ) or ( graph.are_connected(n1,i[1])  ): 
               list_of_distances.append(2)
            elif ( not graph.are_connected(i[0],n2)  ) or ( not graph.are_connected(n1,i[1]) ):
               list_of_distances.append(3)
       else:
            list_of_distances.append(1)
  distance[tuple(i)] = list_of_distances

我想知道是否有另一种方式来获取这些距离，而不需要占用大量的内存和时间。是否可以修改图遍历方法之一，如Bfs或Dijkstra，以便不必在每次迭代中搜索整个图形，仅需执行本地操作（如果可能的话）。感谢任何帮助。

- Noah16

如果Vx和Vy是（直接）相连的，它们之间的距离是多少？如何获得零距离？ - wwii

@wwii Vx==Vy。 - Joel

@NicoSchertler 距离代表了目标函数，据我所知，我应该包括它们。实际上，你说的消耗时间问题是由你描述的原因引起的。我无法处理第二个问题，但我正在考虑替换 are_connected 函数（尽管它是最好的），但我还没有找到替代方案。 - Noah16

尝试通过在 neighbor1 循环中仅使用 pass，不做任何操作来测量时间。 - juvian

你知道服务器中的处理器模块是什么吗？我在igraph中生成图像时遇到了与你类似的问题，所以无法检查我的代码是否比你的更快。 - vurmux

显示剩余12条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- vurmux · Answer 1

你有一个极其计算密集的任务，所以你的脚本运行几个小时也是正常的。你可以尝试使用CUDA或类似的东西来并行化处理，或者你可以尝试构建一个大的缓存（GB级别）。但如果你不能或不想这样做，我建议你不要使用networkx/igraph函数，因为它们对你来说非常慢。你可以通过使用Python集合来解决问题，而不需要运行1000000次DFS。这里是一种可能的解决方案（我认为它会比你的快，也许不是很快）。

import networkx as nx

# Create a graph like yours
G = nx.fast_gnp_random_graph(1000, 0.05)

# Create neighbours dict
G_adj = dict(G.adjacency())
nbrs_dict = {node: {n for n in G_adj[node]} for node in G_adj}

# Result dict
distances = {}

# For every edge:
for e in G.edges:

    # Start value
    dist_value = 0

    # Get N1 and N2 neighbours
    n1_nbrs = nbrs_dict[e[0]]
    n2_nbrs = nbrs_dict[e[1]]

    # Triangles - nodes that connected to both N1 and N2
    # Every triangle value = 0
    tri = n1_nbrs & n2_nbrs
    for t in tri:

        # Get neighbours to find nodes with distance length = 2
        t_nbrs = nbrs_dict[t]

        t_in_n1 = n1_nbrs & t_nbrs
        t_in_n2 = n2_nbrs & t_nbrs

        t_not_in_n1 = n1_nbrs - t_nbrs
        t_not_in_n2 = n2_nbrs - t_nbrs

        dist_value += len(t_in_n1) + len(t_in_n2)
        dist_value += (2 * len(t_not_in_n1)) + (2 * len(t_not_in_n2))

    # Exclude all triangle nodes because we processed them all
    n1nt_nbrs = n1_nbrs - tri
    n2nt_nbrs = n2_nbrs - tri

    # Select squares - nodes with length = 1
    direct = set([])
    for n1 in n1nt_nbrs:
        nbrs = nbrs_dict[n1]
        d = nbrs & n2nt_nbrs
        for node in d:
            direct.add((n1, node))
    dist_value += len(direct)

    # Exclude squares so we have only nodes with length = 3
    n1final = n1nt_nbrs - set(e[0] for e in direct)
    n2final = n2nt_nbrs - set(e[1] for e in direct)
    dist_value += 3 * len(n1final) * len(n2final)

    # Distance for an edge
    distances[e] = dist_value

无论如何，你的问题具有O(n^3)的复杂度，因此我强烈建议你尝试分割你的图形。也许你有桥梁或者只是有几个连通组件。如果你将它们分别处理，你将会极大地提高速度。