Python：对依赖列表进行排序

Question

Python：对依赖列表进行排序

16

我想确定能否使用内置的sorted()函数解决我的问题，如果不能，我需要自己编写代码来排序。我的数据集如下所示：

x = [ ('business', Set('fleet','address')), ('device', Set('business','model','status','pack')), ('txn', Set('device','business','operator')), .... ]

排序规则基本上是对于所有的N和Y值，其中Y > N，x[N][0]不在x[Y][1]中。

虽然我正在使用Python 2.6，其中仍然提供了cmp参数，但我正在尝试使其在Python 3中也可用。

那么，能否使用lambda函数和key参数完成此操作？

-== UPDATE ==-

谢谢Eli和Winston！我真的没有想到可以使用key参数解决问题，或者如果它可以，我怀疑它会是一个勉强解决方案，这并不理想。

由于我的问题是关于数据库表依赖关系的，因此我必须稍微修改Eli的代码以从其依赖列表中删除一个项目（在设计良好的数据库中，这不会发生，但是谁生活在那个神奇完美的世界中呢？）

我的解决方案：

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, set(names of dependancies))`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source]        
    emitted = []
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
            if deps:
                next_pending.append(entry)
            else:
                yield name
                emitted.append(name) # <-- not required, but preserves original order
                next_emitted.append(name)
        if not next_emitted:
            raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
        pending = next_pending
        emitted = next_emitted

- DisabledLeopard

http://en.wikipedia.org/wiki/Topological_sorting - interjay

@JonClements 我会认为他的意思是 sets.Set，不过这在他所说的使用 Python 2.6 甚至已经被弃用了。然而，如果他的意思是这个的话，那么他需要向构造函数提供单个的可迭代对象，而不是多个参数。 - Duncan

是的，sets.Set()。我支持Python 2.3 - 3.1环境，"Set('item1'，'item2')"是Python解释器为包含字符串和集合的元组列表打印的输出（复制的输出，而不是创建此输出的代码）。使用集合可以忽略重复项，这有助于处理输入数据集非常混乱的情况。 - DisabledLeopard

6个回答

8

我会进行类似如下的拓扑排序：

def topological_sort(items):
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if dependencies.issubset(provided):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

我认为这比Eli的版本更加直接，关于效率我不太确定。

- Winston Ewert

这比我的版本*简单多了。我认为你的主要效率问题是issubset()调用，但这只会影响大型数据集 - 我的版本受到初始设置成本的限制，对于小型数据集来说速度较慢，但它可以修改集合以避免在有很多依赖关系时使用issubset。尽管如此，你的基本结构仍然更好，我希望你不介意我重新设计了我的实现，借鉴了你的一些解决方案 :) - Eli Collins

6

忽略掉格式不好的内容和奇怪的"Set"类型...(我已经把它们保留为元组并正确地分隔了列表项...)...并使用“networkx”库使事情变得更加方便...

x = [
    ('business', ('fleet','address')),
    ('device', ('business','model','status','pack')),
    ('txn', ('device','business','operator'))
]

import networkx as nx

g = nx.DiGraph()
for key, vals in x:
    for val in vals:
        g.add_edge(key, val)

print nx.topological_sort(g)

- Jon Clements

这是唯一对我有效的解决方案... 我不喜欢依赖其他库，但这几行代码是值得的。 - devarni

2

关于这个解决方案有一个警告；它只适用于您的依赖项形成完全连接的图。如果存在没有任何依赖关系（因此没有与其他节点相连的边）的节点，则它们将不会包含在“topological_sort()”的输出中。 - larsks

2

这是温斯顿的建议，带有文档字符串和微小的调整，将 dependencies.issubset(provided) 反转为 provided.issuperset(dependencies)。这个改变允许你将每个输入对中的 dependencies 作为任意可迭代对象而不一定是一个 set。

我的用例涉及一个 dict，其键是项目字符串，每个键的值都是该键所依赖的项目名称的 list。一旦我确定了这个 dict 是非空的，我就可以将它的 iteritems() 传递给修改后的算法。

再次感谢温斯顿。

def topological_sort(items):
    """
    'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
    is an iterable of the same type as 'items'.

    If 'items' is a generator rather than a data structure, it should not be
    empty. Passing an empty generator for 'items' (zero yields before return)
    will cause topological_sort() to raise TopologicalSortFailure.

    An empty iterable (e.g. list, tuple, set, ...) produces no items but
    raises no exception.
    """
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if provided.issuperset(dependencies):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

- Nat Goodspeed

0

基于Jon的答案，提出另一个示例，使用有向图，但它在输入上使用dict结构，看起来更加直观，而且并非所有项都需要依赖关系。

import networkx as nx

dig = nx.DiGraph()

items= [
    {'name': 'first', 'depends': ['second', 'forth']},
    {'name': 'second', 'depends': []},
    {'name': 'third', 'depends': ['first']},
    {'name': 'forth', 'depends': []},
    {'name': 'fifth', 'depends': []}
]

for item in items:
    if not item['depends']:
        dig.add_node(item['name'])
    for dep in item['depends']:
        dig.add_edge(dep, item['name'])

print(list(nx.topological_sort(dig)))
# output: ['second', 'forth', 'fifth', 'first', 'third']

- Philipp

0

使用pypi包的一个示例，toposort：

>>> from toposort import toposort
>>> 
>>> x = {
...  'business': {'fleet','address'},
...  'device': {'business','model','status','pack'},
...  'txn': {'device','business','operator'},
...  }

>>> list(toposort(x))
[{'pack', 'model', 'operator', 'status', 'address', 'fleet'}, {'business'}, {'device'}, {'txn'}]

- monkut

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eli Collins · Accepted Answer

你需要的是拓扑排序。虽然可以使用内置的sort()实现，但这样做非常笨拙，最好直接在Python中实现拓扑排序。

为什么会很笨拙呢？如果你研究一下维基页面上的两个算法，它们都依赖于一个“标记节点”的运行集合，这个概念很难转化为sort()可用的形式，因为key=xxx（甚至cmp=xxx）最适合使用无状态比较函数，特别是因为timsort不能保证元素将被检查的顺序。我（相当）确定任何使用sort()的解决方案都将在每次调用关键字/ cmp函数时冗余地计算一些信息，以解决无状态问题。

以下是我一直在使用的算法（用于对一些JavaScript库进行排序依赖）：

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, [list of dependancies])`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place       
    emitted = []        
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(emitted) # remove deps we emitted last pass
            if deps: # still has deps? recheck during next pass
                next_pending.append(entry) 
            else: # no more deps? time to emit
                yield name 
                emitted.append(name) # <-- not required, but helps preserve original ordering
                next_emitted.append(name) # remember what we emitted for difference_update() in next pass
        if not next_emitted: # all entries have unmet deps, one of two things is wrong...
            raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
        pending = next_pending
        emitted = next_emitted

顺便提一下：可以将cmp()函数塞入key=xxx中，具体细节请参考这个Python错误跟踪器message。