如何寻找原始数据的最短路径

3

我是一个有用的助手,可以翻译文本。

我有一个数据文件,其中包含以分类数据作为列。例如:

node_id,second_major,gender,major_index,year,dorm,high_school,student_fac
0,0,2,257,2007,111,2849,1
1,0,2,271,2005,0,51195,2
2,0,2,269,2007,0,21462,1
3,269,1,245,2008,111,2597,1
..........................

这些数据是按列组织的。我将其转换为边缘列表和节点列表。边缘列表如下:

0   4191
0   949
1   3002
1   4028
1   957
2   2494
2   959
2   3011
3   4243
4   965
5   1478
........
........

如何找到节点之间的最短路径,且边缘没有权重。我该如何用Python编写代码实现这一功能?


1
已经将 matplotlib 标签移除,因为我不认为它与绘图有关。如果您不同意,请随时恢复。 - tacaswell
1个回答

2
这是一个经典的广度优先搜索问题,其中你有一个无向、无权图,想要找到两个顶点之间的最短路径。
一些关于广度优先搜索的有用链接: 你需要注意一些边缘情况:
  • 源顶点和目标顶点之间没有路径
  • 源顶点和目标顶点相同
我假设你的边缘列表是一个字典列表或列表列表,例如:
[[4191, 949], [3002, 4028, 957], [2494, 959, 3011], [4243, 965], [1478], ...]

或者

{ 0: [4191, 949],
  1: [3002, 4028, 957],
  2: [2494, 959, 3011],
  3: [4243, 965],
  4: [1478], ...}

我写了一些代码来展示广度优先搜索的工作原理:
import sys
import sys
import Queue

def get_shortest_path(par, src, dest):
    '''
    Returns the shortest path as a list of integers
    par - parent information
    src - source vertex
    dest - destination vertex
    '''
    if dest == src:
        return [src]
    else:
        ret = get_shortest_path(par, src, par[dest])
        ret.append(dest)
        return ret

def bfs(edgeList, src, dest):
    '''
    Breadth first search routine. Returns (distance, shortestPath) pair from src to dest. Returns (-1, []) if there is no path from src to dest
    edgeList - adjacency list of graph. Either list of lists or dict of lists
    src - source vertex
    dest - destination vertex
    '''
    vis = set() # stores the vertices that have been visited
    par = {} # stores parent information. vertex -> parent vertex in BFS tree
    distDict = {} # stores distance of visited vertices from the source. This is the number of edges between the source vertex and the given vertex
    q = Queue.Queue()
    q.put((src, 0)) # enqueue (source, distance) pair
    par[src] = -1 # source has no parent
    vis.add(src) # minor technicality, will explain later
    while not q.empty():
        (v,dist) = q.get() # grab vertex in queue
        distDict[v] = dist # update the distance
        if v == dest:
            break # reached destination, done
        nextDist = dist+1
        for nextV in edgeList[v]:
            # visit vertices adjacent to the current vertex
            if nextV not in vis:
                # not yet visited
                par[nextV] = v # update parent of nextV to v
                q.put((nextV, nextDist)) # add into queeu
                vis.add(nextV) # mark as visited
    # obtained shortest path now
    if dest in distDict:
        return (distDict[dest], get_shortest_path(par, src, dest))
    else:
        return (-1, []) # no shortest path

# example run, feel free to remove this
if __name__ == '__main__':
    edgeList = {
        0: [6,],
        1: [2, 7],
        2: [1, 3, 6],
        3: [2, 4, 5],
        4: [3, 8],
        5: [3, 7],
        6: [0, 2],
        7: [1, 5],
        8: [4],
    }
    while True:
        src = int(sys.stdin.readline())
        dest = int(sys.stdin.readline())
        (dist, shortest_path) = bfs(edgeList, src, dest)
        print 'dist =', dist
        print 'shortest_path =', shortest_path

1
没问题,对我来说使用Python编写BFS也是很好的练习 =) - yanhan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接