从有向无环图中提取树

6

我有一个由节点和它们的后继边表示的DAG。使用一个简单的递归函数可以将其作为嵌套数据结构重建。

#tree1.pl
#!/usr/bin/env perl
use 5.028; use strictures; use Moops; use Kavorka qw(fun); use List::AllUtils qw(first);
class Node :ro {
    has label => isa => Str;
    has children => isa => ArrayRef[Str];
}
fun N($label, $children) {
    return Node->new(label => $label, children => $children);
}

# list is really flat, but
# indentation outlines desired tree structure
our @dag = (
    N(N0 => ['N1']),
        N(N1 => ['N2']),
            N(N2 => ['N3']),
                N(N3 => ['N4', 'N5']),
                    N(N4 => []),
                    N(N5 => []),
);

fun tree(Node $n) {
    return bless [
        map {
            my $c = $_;
            tree(first {
                $_->label eq $c
            } @dag)
        } $n->children->@*
    ] => $n->label;
}

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N2
#             bless([ #N3
#                 bless([] => 'N4'),
#                 bless([] => 'N5'),
#             ] => 'N3')
#         ] => 'N2')
#     ] => 'N1')
# ] => 'N0')

那只是一个小问题。


在我的应用程序中,我遇到了一个复杂的问题,即DAG包含多个具有相同标签的节点。

our @dag = (
    N(N0 => ['N1']),
    N(N1 => ['N2']),
    ︙
    N(N1 => ['N6', 'N5']),
    ︙

请注意,这并不意味着在适当的意义上存在多重边。
这是错误的,因为现在N1似乎有三个相等的子节点。
为了图形遍历的目的,N1节点不能折叠成一个节点,只能用于标记输出树;因此换句话说,这些节点必须具有不同的身份。让我们用颜色来可视化这一点。
our @dag = (
    N(N0 => ['N1']),
    N([N1 => 'red'] => ['N2']),
    ︙
    N([N1 => 'blue'] => ['N6', 'N5']),
    ︙

目标是将此DAG实现为两棵树。分别在两个步骤中遵循每个虚线继承边。我通过在遍历节点时记住节点上一个颜色的索引号,并在下一次树构造中按顺序选择下一个颜色来实现这一点。

#tree2.pl
#!/usr/bin/env perl
use 5.028; use strictures; use Moops; use Kavorka qw(fun); use List::AllUtils qw(first);
class Node :ro {
    has label => isa => Str;
    has col => isa => Maybe[Str];
    has children => isa => ArrayRef[Str];
    has col_seen => is => 'rw', isa => Int;
}
fun N($c_l, $children) {
    return ref $c_l
        ? Node->new(label => $c_l->[0], col => $c_l->[1], children => $children)
        : Node->new(label => $c_l, children => $children);
}

# indentation outlines desired tree structure
our @dag = (
    ### start 1st tree
    N(N0 => ['N1']),
        N([N1 => 'red'] => ['N2']),
            N(N2 => ['N3']),
                N(N3 => ['N4', 'N5']),
                    N(N4 => []),
                    N(N5 => []),
    ### end 1st tree

    ### start 2nd tree
    # N0
        N([N1 => 'blue'] => ['N6', 'N5']),
            N(N6 => ['N7']),
                N(N7 => ['N4']),
                    # N4
            # N5
    ### end 2nd tree
);

fun tree(Node $n) {
    return bless [
        map {
            my $c = $_;
            my @col = map { $_->col } grep { $_->label eq $c } @dag;
            if (@col > 1) {
                $n->col_seen($n->col_seen + 1);
                die 'exhausted' if $n->col_seen > @col;
                tree(first {
                    $_->label eq $c && $_->col eq $col[$n->col_seen - 1]
                } @dag);
            } else {
                tree(first { $_->label eq $c } @dag);
            }
        } $n->children->@*
    ] => $n->label;
}

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N2
#             bless([ #N3
#                 bless([] => 'N4'),
#                 bless([] => 'N5')
#             ] => 'N3')
#         ] => 'N2')
#     ] => 'N1')
# ] => 'N0')

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N6
#             bless([ #N7
#                 bless([] => 'N4')
#             ] => 'N7')
#         ] => 'N6'),
#         bless([] => 'N5')
#     ] => 'N1')
# ] => 'N0')

tree($dag[0]);
# exhausted

那段代码有效,我得到了两棵树。


然而,当我有几个带有彩色后继节点的节点时,我的代码存在问题。代码与上面相同,只是输入不同:

#tree3.pl

︙

our @dag = (
    N(N0 => ['N1']),
        N([N1 => 'red'] => ['N2']),
            N(N2 => ['N3']),
                N(N3 => ['N4', 'N5']),
                    N(N4 => []),
                    N(N5 => []),
    # N0
        N([N1 => 'blue'] => ['N6', 'N5']),
            N(N6 => ['N7']),
                N(N7 => ['N8', 'N4']),
                    N([N8 => 'purple'] => ['N5']),
                        # N5
                    N([N8 => 'orange'] => []),
                    N([N8 => 'cyan'] => ['N5', 'N5']),
                        # N5
                        # N5
                    # N4
            # N5
);

︙

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N2
#             bless([ #N3
#                 bless([] => 'N4'),
#                 bless([] => 'N5')
#             ] => 'N3')
#         ] => 'N2')
#     ] => 'N1')
# ] => 'N0')
tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N6
#             bless([ #N7
#                 bless([ #N8
#                     bless([] => 'N5')
#                 ] => 'N8'),
#                 bless([] => 'N4')
#             ] => 'N7')
#         ] => 'N6'),
#         bless([] => 'N5')
#     ] => 'N1')
# ] => 'N0')
tree($dag[0]);
# exhausted

问题在于搜索仅在两棵树之后就停止了,尽管我应该得到四棵树:
  • 通过红色路径
  • 通过蓝色,然后紫色的路径
  • 通过蓝色,然后橙色的路径
  • 通过蓝色,然后青色的路径

您可以使用任何编程语言回答。

我喜欢这个问题。我有很多操纵图和树的经验,但是我对Perl的经验非常少。我的建议是:1)DAG应该是统一的:选择N([id => label] => child)N(id => child),但不要同时支持两者。显然,第二张图不支持标签,但在图中添加标签只是解决问题的一种方法。2)将tree编写为不改变$dag的纯函数,这样将更容易使用,我保证,... - Mulan
  1. 在构建节点时,您可以使用保证唯一的串行ID。或者,节点可以保留边缘数组而不是子项数组 - 这只是一些启示性的想法。如果您没有得到有关此代码的帮助,请回复我,我会尝试用Scheme或JavaScript提供答案。
- Mulan
为什么需要为共享相同标签的对象分配身份?引用标签对象不也可以吗?(具有相同标签的节点指向同一标签对象) - clamp
为什么你只跟随红色路径,却跟随蓝色+紫色路径,却不跟随蓝色+红色路径?是什么让蓝色/红色如此特别? - Corion
用户633183,请开始吧,JS会很不错。 - daxim
显示剩余7条评论
1个回答

2

以下是您想要实现的内容吗?(Python 3)

from collections import defaultdict
from itertools import product

class bless:
    def __init__(self, label, children):
        self.label = label
        self.children = children

    def __repr__(self):
        return self.__str__()

    # Just pretty-print stuff
    def __str__(self):
        formatter = "\n{}\n" if self.children else "{}"
        formatted_children = formatter.format(",\n".join(map(str, self.children)))
        return "bless([{}] => '{}')".format(formatted_children, self.label)

class Node:
    def __init__(self, label, children):
        self.label = label
        self.children = children

class DAG:
    def __init__(self, nodes):
        self.nodes = nodes

        # Add the root nodes to a singular, generated root node (for simplicity)
        # This is not necessary to implement the color-separation logic,
        # it simply lessens the number of edge cases I must handle to demonstate
        # the logic. Your existing code will work fine without this "hack"
        non_root = {child for node in self.nodes for child in node.children}
        root_nodes = [node.label for node in self.nodes if node.label not in non_root]
        self.root = Node("", root_nodes)

        # Make a list of all the trees
        self.tree_list = self.make_trees(self.root)

    def tree(self):
        if self.tree_list:
            return self.tree_list.pop(0)
        return list()

    # This is the meat of the program, and is really the logic you are after
    # Its a recursive function that parses the tree top-down from our "made-up"
    # root, and makes <bless>s from the nodes. It returns a list of all separately
    # colored trees, and if prior (recusive) calls already made multiple trees, it
    # will take the cartesian product of each tree per label
    def make_trees(self, parent):
        # A defaultdict is just a hashtable that's empty values
        # default to some data type (list here)
        trees = defaultdict(list)
        # This is some nasty, inefficient means of fetching the children
        # your code already does this more efficiently in perl, and since it
        # contributes nothing to the answer, I'm not wasting time refactoring it
        for node in (node for node in self.nodes if node.label in parent.children):
            # I append the tree(s) found in the child to the list of <label>s trees
            trees[node.label] += self.make_trees(node)
        # This line serves to re-order the trees since the dictionary doesn't preserve
        # ordering, and also restores any duplicated that would be lost
        values = [trees[label] for label in parent.children]
        # I take the cartesian product of all the lists of trees each label
        # is associated with in the dictionary. So if I have
        #    [N1-subtree] [red-N2-subtree, blue-N2-subtree] [N3-subtree]
        # as children of N0, then I'll return:
        # [bless(N0, [N1-st, red-N2-st, N3-st]), bless(N0, [N1-st, blue-N2-st, N3-st])]
        return [bless(parent.label, prod) for prod in product(*values)]

if __name__ == "__main__":
    N0  = Node('N0', ['N1'])
    N1a = Node('N1', ['N2'])
    N2  = Node('N2', ['N3'])
    N3  = Node('N3', ['N4', 'N5'])
    N4  = Node('N4', [])
    N5  = Node('N5', [])

    N1b = Node('N1', ['N6', 'N5'])
    N6  = Node('N6', ['N7'])
    N7  = Node('N7', ['N8', 'N4'])
    N8a = Node('N8', ['N5'])
    N8b = Node('N8', [])
    N8c = Node('N8', ['N5', 'N5'])

    dag = DAG([N0, N1a, N2, N3, N4, N5, N1b, N6, N7, N8a, N8b, N8c])

    print(dag.tree())
    print(dag.tree())
    print(dag.tree())
    print(dag.tree())
    print(dag.tree())
    print(dag.tree())

我在注释中已经相当详细地解释了逻辑,但为了澄清一下——我使用从根节点开始的递归DFS一次性生成所有可能的树。为确保只有一个根节点,我创建了一个“虚构”的根节点,其中包含所有没有父节点的其他节点,然后从该节点开始搜索。这对算法的工作并非必需,我只是想简化与您的问题无直接关系的逻辑。

在此DFS中,我为每个标签创建一个哈希表/列表字典,并将可以从每个子代制作的所有不同子树存储在这些列表中。对于大多数节点,这个列表的长度将为1,因为大多数节点将生成单个树,除非它们的标签或(子)子项具有重复的标签。无论如何,我取所有这些列表的笛卡尔积,并形成新的bless对象(来自每个积)。我返回这个列表,该过程一直重复到调用堆栈上,直到我们最终拥有完整的树列表。

所有打印逻辑都是不必要的(显然),但我想让你更容易验证这是否是你想要的行为。我无法(轻松地)使其缩进嵌套的bless,但手动调整应该很容易。唯一真正有趣的部分是make_trees()函数,其余部分只是为验证设置或使代码尽可能与您的perl代码相似。

格式化输出:

bless([
    bless([
        bless([
            bless([
                bless([
                    bless([] => 'N4'),
                    bless([] => 'N5')
                ] => 'N3')
            ] => 'N2')
        ] => 'N1')
    ] => 'N0')
] => '')
bless([
    bless([
        bless([
            bless([
                bless([
                    bless([
                        bless([] => 'N5')
                    ] => 'N8'),
                    bless([] => 'N4')
                ] => 'N7')
            ] => 'N6'),
            bless([] => 'N5')
        ] => 'N1')
    ] => 'N0')
] => '')
bless([
    bless([
        bless([
            bless([
                bless([
                    bless([] => 'N8'),
                    bless([] => 'N4')
                ] => 'N7')
            ] => 'N6'),
            bless([] => 'N5')
        ] => 'N1')
    ] => 'N0')
] => '')
bless([
    bless([
        bless([
            bless([
                bless([
                    bless([
                        bless([] => 'N5'),
                        bless([] => 'N5')
                    ] => 'N8'),
                    bless([] => 'N4')
                ] => 'N7')
            ] => 'N6'),
            bless([] => 'N5')
        ] => 'N1')
    ] => 'N0')
] => '')
[]
[]

我也想知道您是如何想到Cartesion产品的,是在分析问题时得到灵感还是这是图遍历/递归领域中众所周知的方法? - daxim
@daxim 我相信我已经修复了这两个 bug,请检查新的输出。 - Dillon Davis
回答你的问题,我实际上是通过分析问题而受到启发的。我最初尝试设计一个使用Python生成器的递归解决方案,但由于我使用递归DFS遍历它,所以无法实现。因此,我知道我必须返回一个子树列表,并在更仔细地查看一个示例后,我将其识别为笛卡尔积。 - Dillon Davis

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接