如何在Haskell中表示两棵树之间的映射？

Question

如何在Haskell中表示两棵树之间的映射？

algorithmhaskelltreemappingabstract-syntax-tree

18

我正在尝试在Haskell中实现一种树处理算法，但由于这是我的第一个Haskell程序，因此在设计数据结构时遇到了困难。是否有任何FP大师可以给予帮助？

我将首先描述算法的重要特征，勾勒出我会如何使用命令式语言来解决问题，并最后介绍我在Haskell中迈出的蹒跚步伐。

问题

我不会详细描述整个算法，但其要点如下：

- 该算法操作两棵玫瑰树X和Y。 - 算法的第一阶段基于节点的标签、属性及其后代的标签和属性计算每个节点的派生属性。 - 这些派生属性用于计算两棵树之间的部分映射，使X中的节点可以与Y中的节点相对应，反之亦然。由于映射是部分的，因此X或Y中的任何节点都可能被映射（即具有另一棵树中的伙伴），也可能未被映射。 - 算法的最后阶段通过检查已映射节点的父/子/同级节点的序列来优化这些映射。

因此，数据结构必须具有以下特征：

- 给定节点的引用，提供访问该节点的父节点、同级节点和子节点的方法。 - 给定输入树中的节点，允许对该节点进行注释（派生属性和其他树中节点的可选引用）。

命令式解决方案草图

如果我要使用一种命令式语言实现这个算法，解决方案可能如下所示。

假设起点是以下输入树的定义：

struct node {
    // Identifier for this node, unique within the containing tree
    size_t id;

    // Label of this node
    enum label label;

    // Attributes of this node
    // An attribute can be assumed to be a key-value pair
    // Details of the attributes themselves aren't material to this
    // discussion, so the "attribute" type is left opaque
    struct attribute **attributes;
    size_t n_attributes;

    // Pointer to parent of this node
    // NULL iff this node is root
    struct node *parent;

    // Pointer to first child of this node
    // NULL iff this node is leaf
    struct node *child;

    // Double-linked list of siblings of this node
    struct node *prev;
    struct node *next;
};

每个节点中嵌入的指针明显支持算法所需的上/下/左/右遍历。

注释可以通过定义以下结构来实现：

struct algo_node {
    // Pointer to input node which has been wrapped
    struct node *node;

    // Derived properties computed by first phase of the algorithm
    // Details of the properties themselves aren't material to this
    // discussion, so the "derived" type is left opaque
    struct derived props;

    // Pointer to corresponding node in the other tree
    // NULL iff this node is unmatched
    struct node *match;
};

算法的第一阶段为每个输入树中的每个节点构建一个"algo_node"。从"algo_node"到"node"的映射是微不足道的:只需跟随嵌入的"*node"指针即可。从另一个方向进行的映射可以通过将"algo_node"存储在数组中，由输入节点的"id"索引来支持。这当然只是一种可能的实现方式。有许多变化都是可能的，包括：1、抽象出孩子链表背后的"list"或"queue"接口，而不是存储三个原始指针；2、不是通过索引将输入树与算法树关联起来，而是直接在"struct algo_node"中编码父/子/兄弟关系。

转向Haskell

让我们从以下定义输入树开始:

data Tree = Leaf Label Attributes
          | Node Label Attributes [Tree]

可以通过以下方式实现每个节点的id增强：

data AnnotatedTree = Tree Int

addIndex :: Int -> AnnotatedTree -> (AnnotatedTree, Int)

indexedTree = addIndex 0 tree

同样地，我们可以编写一个计算派生属性的函数：

data AnnotatedTree = Tree DerivedProperties

computeDerived :: DerivedProperties -> AnnotatedTree -> (AnnotatedTree, DerivedProperties)

derivedTree = computeDerived DefaultDerived tree

上述代码片段可以进行少量修改，使得AnnotatedTree包含索引和派生属性。

然而，我不知道如何表示两个树之间的映射关系。基于一些阅读，我有一些半成品的想法...

- 定义AnnotatedTree包含从另一棵树的根节点到映射节点的路径 - 编码为每个连续子列表的索引列表[Integer] - 使用zipper(我目前对其了解比较松散)通过路径访问映射节点(或其父节点/子节点/兄弟节点) - 或者使用lens(...我甚至更不清楚!)来做同样的事情 - 将AnnotatedTree定义为直接包含映射节点的引用，作为Maybe Tree - 但是，我没有看到一种方法可以遍历映射节点的父节点/兄弟节点

...但我确实需要一些关于哪些(如果有的话)值得追求的指导。

非常感谢您的帮助！

- Gareth Stockwell

如果X中的节点x有对应的节点Y中的节点y，那么与x对应的Y中的所有后代节点是否也是y的后代节点？ - danidiaz

@danidiaz 不一定。 - Gareth Stockwell

1

我认为拉链确实是你在这里想要的。 - leftaroundabout

将我的数据转换为Data.Tree，以便我可以使用Data.Tree.Zipper，这样做值得吗？还是我应该自己实现Zipper？在任何一条路上，我都应该注意到有什么陷阱吗？ - Gareth Stockwell

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- pat · Accepted Answer

你可以使用Int id对树节点进行标记，并使用拉链（使用Data.Tree和Data.Tree.Zipper是个好主意，因为无需重复发明轮子）在它们之间移动。然后，您可以使用Data.IntMap将辅助属性附加到节点，以将节点id映射到任何您想要的内容。特别地，您可以创建一个IntMap，将节点id映射到该节点的TreePos Full Int，以便您可以探索该节点的父节点、兄弟节点和子节点。