如何在安全的 Rust 中表示互递归数据结构？

Question

如何在安全的 Rust 中表示互递归数据结构？

46

我希望在 Rust 中实现一个类似场景图的数据结构。我想要一个等价于下面这段 C++ 代码的安全 Rust 实现：

struct Node
{
    Node* parent; // should be mutable, and nullable (no parent)
    std::vector<Node*> child;

    virtual ~Node() 
    { 
        for(auto it = child.begin(); it != child.end(); ++it)
        {
            delete *it;
        }
    }

    void addNode(Node* newNode)
    {
        if (newNode->parent)
        {
            newNode->parent.child.erase(newNode->parent.child.find(newNode));
        }
        newNode->parent = this;
        child.push_back(newNode);
    }
}

我想要的属性：

父节点需要拥有其子节点
可以通过某种引用从外部访问节点
影响一个 Node 的事件可能会改变整棵树

- nulleight

2

你有什么问题？ - Shepmaster

2

你当前的示例中并没有通过类型表达所有权，是不是应该写成 std::vector<std::unique_ptr<Node>> children;？此外，为什么析构函数要声明为虚函数？这是一个接口吗？ - Matthieu M.

2

节点拥有其子节点，并由父节点拥有。我不想使用比必要更多的模板。 - nulleight

4个回答

29

问题在于这种数据结构本身是不安全的；它没有直接等价的 Rust 实现，而不使用 unsafe。这是有意为之。

如果你想将其翻译成安全的 Rust 代码，你需要更加具体地说明你需要的是什么。我知道你上面列出了一些属性，但是经常有人来到 Rust，会说“我想要 C/C++ 代码中所有的东西”，那么直接的答案是“嗯，你做不到”。

此外，你也不可避免地需要改变你的处理方式。你举的例子有没有所有权语义的指针、可变别名和循环；所有这些 Rust 都不能像 C++ 那样简单地忽略。

最简单的解决方案就是摆脱parent指针，并在外部维护它（就像文件系统路径一样）。这也与借用玩得很好，因为没有任何循环：

pub struct Node1 {
    children: Vec<Node1>,
}

如果你需要父节点指针，你可以采用一种折中的方法，使用ID代替：

use std::collections::BTreeMap;

type Id = usize;

pub struct Tree {
    descendants: BTreeMap<Id, Node2>,
    root: Option<Id>,
}

pub struct Node2 {
    parent: Id,
    children: Vec<Id>,
}

BTreeMap 是您的“地址空间”，通过不直接使用内存地址，绕过了借用和别名问题。

当然，这引入了一个问题，即给定的 Id 没有与特定树相关联，这意味着它所属的节点可能会被销毁，现在您实际上拥有了一个悬挂指针。但是，这就是你为拥有别名和突变所付出的代价，这也更加不直接。

或者，你可以彻底去掉它，并使用引用计数和动态借用检查：

use std::cell::RefCell;
use std::rc::{Rc, Weak};

// Note: do not derive Clone to make this move-only.
pub struct Node3(Rc<RefCell<Node3_>>);

pub type WeakNode3 = Weak<RefCell<Node3>>;

pub struct Node3_ {
    parent: Option<WeakNode3>,
    children: Vec<Node3>,
}

impl Node3 {
    pub fn add(&self, node: Node3) {
        // No need to remove from old parent; move semantics mean that must have
        // already been done.
        (node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0));
        self.children.push(node);
    }
}

在这里，您可以使用Node3将节点的所有权在树的不同部分之间转移，而对于外部引用，则可以使用WeakNode3。或者，您可以使Node3可克隆，并将add中的逻辑添加回来，以确保给定节点不会意外地保持在错误父级下。

这并不比第二个选项严格地更好，因为此设计绝对无法从静态借用检查中受益。第二个选项至少可以在编译时防止您从两个位置同时修改图表；在这里，如果发生这种情况，您只会崩溃。

关键是：您不能拥有所有东西。您必须决定您实际需要支持哪些操作。在那一点上，通常只要选择给您必要属性的类型即可。

- DK.

1

虽然我同意 OP 提供的示例是不安全的，但请注意 parent 指针本身并不是不安全的，因为它不是父内存区域的所有者。问题在于 vector<Node*> 应该至少是 unique_ptr 或 shared_ptr。 - Abhinav Gauniyal

第二个解决方案对我来说无法编译。 - GuiTaek

17

在某些情况下，您也可以使用一个“竞技场”（arena）。竞技场保证存储在其中的值将具有与竞技场本身相同的生命周期。这意味着添加更多的值不会使任何现有的生命周期失效，但移动竞技场会使其失效。因此，如果您需要“返回”树，则此类解决方案不可行。

通过从节点本身中移除所有权来解决了这个问题。

以下是一个示例，它还使用“内部可变性”来允许在创建后对节点进行修改。在其他情况下，如果树只构建一次并且仅导航，则可以删除此可变性。

use std::{
    cell::{Cell, RefCell},
    fmt,
};
use typed_arena::Arena; // 1.6.1

struct Tree<'a, T: 'a> {
    nodes: Arena<Node<'a, T>>,
}

impl<'a, T> Tree<'a, T> {
    fn new() -> Tree<'a, T> {
        Self {
            nodes: Arena::new(),
        }
    }

    fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> {
        self.nodes.alloc(Node {
            data,
            tree: self,
            parent: Cell::new(None),
            children: RefCell::new(Vec::new()),
        })
    }
}

struct Node<'a, T: 'a> {
    data: T,
    tree: &'a Tree<'a, T>,
    parent: Cell<Option<&'a Node<'a, T>>>,
    children: RefCell<Vec<&'a Node<'a, T>>>,
}

impl<'a, T> Node<'a, T> {
    fn add_node(&'a self, data: T) -> &'a Node<'a, T> {
        let child = self.tree.new_node(data);
        child.parent.set(Some(self));
        self.children.borrow_mut().push(child);
        child
    }
}

impl<'a, T> fmt::Debug for Node<'a, T>
where
    T: fmt::Debug,
{
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{:?}", self.data)?;
        write!(f, " (")?;
        for c in self.children.borrow().iter() {
            write!(f, "{:?}, ", c)?;
        }
        write!(f, ")")
    }
}

fn main() {
    let tree = Tree::new();
    let head = tree.new_node(1);
    let _left = head.add_node(2);
    let _right = head.add_node(3);

    println!("{:?}", head); // 1 (2 (), 3 (), )
}

- Shepmaster

1

TL;DR：DK的第二个版本无法编译，因为父级类型与self.0不同，通过将其转换为WeakNode来修复它。此外，在紧接着的下一行中，“self”没有“children”属性，但self.0有。

我已经修正了DK的版本，使其能够编译和工作。以下是我的代码：

dk_tree.rs

use std::cell::RefCell;
use std::rc::{Rc, Weak};

// Note: do not derive Clone to make this move-only.
pub struct Node(Rc<RefCell<Node_>>);


pub struct WeakNode(Weak<RefCell<Node_>>);

struct Node_ {
    parent: Option<WeakNode>,
    children: Vec<Node>,
}

impl Node {
    pub fn new() -> Self {
        Node(Rc::new(RefCell::new(Node_ {
            parent: None,
            children: Vec::new(),
        })))
    }
    pub fn add(&self, node: Node) {
        // No need to remove from old parent; move semantics mean that must have
        // already been done.
        node.0.borrow_mut().parent = Some(WeakNode(Rc::downgrade(&self.0)));
        self.0.borrow_mut().children.push(node);
    }
    // just to have something visually printed
    pub fn to_str(&self) -> String {
        let mut result_string = "[".to_string();
        for child in self.0.borrow().children.iter() {
            result_string.push_str(&format!("{},", child.to_str()));
        }
        result_string += "]";
        result_string
    }
}

接下来是 main.rs 中的主函数：

mod dk_tree;

use crate::dk_tree::{Node};


fn main() {
    let root = Node::new();
    root.add(Node::new());
    root.add(Node::new());
    let inner_child = Node::new();
    inner_child.add(Node::new());
    inner_child.add(Node::new());
    root.add(inner_child);
    let result = root.to_str();
    println!("{result:?}");
}

我将WeakNode设计得更像Node的原因是为了更容易地在两者之间进行转换。

- GuiTaek

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Francis Gagné · Accepted Answer

Rust试图通过禁止可能不安全的操作来确保内存安全。由于此分析在编译时执行，因此编译器只能推理已知为安全的一部分操作。

在Rust中，您可以轻松地存储父节点的引用（通过借用父节点，从而防止变异）或子节点列表（通过拥有它们，这给了您更多的自由），但不能同时拥有两者（不使用unsafe的情况下）。这对于您的addNode实现尤其有问题，因为它需要对给定节点的父节点进行可变访问。但是，如果将parent指针存储为可变引用，则由于特定对象只能使用单个可变引用，因此访问父节点的唯一方法是通过子节点，并且您只能拥有单个子节点，否则您将拥有两个可变引用到同一父节点。

如果您想避免不安全的代码，有许多替代方案，但它们都需要做出一些牺牲。

最简单的解决方案是直接删除parent字段。在遍历树时，我们可以定义一个单独的数据结构来记录节点的父节点，而不是将其存储在节点本身中。

首先，让我们定义Node:

#[derive(Debug)]
struct Node<T> {
    data: T,
    children: Vec<Node<T>>,
}

impl<T> Node<T> {
    fn new(data: T) -> Node<T> {
        Node { data: data, children: vec![] }
    }

    fn add_child(&mut self, child: Node<T>) {
        self.children.push(child);
    }
}

我添加了一个data字段，因为树节点没有数据是不太有用的！

现在让我们定义另一个结构体来跟踪我们导航时的父节点：

#[derive(Debug)]
struct NavigableNode<'a, T: 'a> {
    node: &'a Node<T>,
    parent: Option<&'a NavigableNode<'a, T>>,
}

impl<'a, T> NavigableNode<'a, T> {
    fn child(&self, index: usize) -> NavigableNode<T> {
        NavigableNode {
            node: &self.node.children[index],
            parent: Some(self)
        }
    }
}

impl<T> Node<T> {
    fn navigate<'a>(&'a self) -> NavigableNode<T> {
        NavigableNode { node: self, parent: None }
    }
}

如果您在导航树时不需要改变树的结构，并且可以保留父NavigableNode对象（这对于递归算法很有效，但如果您想将NavigableNode存储在其他数据结构中并保留它，则效果不佳），则此解决方案可以很好地工作。使用除了借用指针之外的其他东西来存储父节点可以减轻第二个限制；如果您希望获得最大的通用性，则可以使用Borrow trait来允许直接值、借用指针、Box、Rc等。

现在，让我们谈谈拉链。在函数式编程中，拉链用于“聚焦”于数据结构（列表、树、映射等）的特定元素，以便访问该元素需要恒定时间，同时仍保留该数据结构的所有数据。如果您需要在导航期间对树进行变异，同时保留向上导航的能力，则可以将树转换为拉链，并通过拉链执行修改。

以下是如何为上面定义的Node实现拉链：

#[derive(Debug)]
struct NodeZipper<T> {
    node: Node<T>,
    parent: Option<Box<NodeZipper<T>>>,
    index_in_parent: usize,
}

impl<T> NodeZipper<T> {
    fn child(mut self, index: usize) -> NodeZipper<T> {
        // Remove the specified child from the node's children.
        // A NodeZipper shouldn't let its users inspect its parent,
        // since we mutate the parents
        // to move the focused nodes out of their list of children.
        // We use swap_remove() for efficiency.
        let child = self.node.children.swap_remove(index);

        // Return a new NodeZipper focused on the specified child.
        NodeZipper {
            node: child,
            parent: Some(Box::new(self)),
            index_in_parent: index,
        }
    }

    fn parent(self) -> NodeZipper<T> {
        // Destructure this NodeZipper
        let NodeZipper { node, parent, index_in_parent } = self;

        // Destructure the parent NodeZipper
        let NodeZipper {
            node: mut parent_node,
            parent: parent_parent,
            index_in_parent: parent_index_in_parent,
        } = *parent.unwrap();

        // Insert the node of this NodeZipper back in its parent.
        // Since we used swap_remove() to remove the child,
        // we need to do the opposite of that.
        parent_node.children.push(node);
        let len = parent_node.children.len();
        parent_node.children.swap(index_in_parent, len - 1);

        // Return a new NodeZipper focused on the parent.
        NodeZipper {
            node: parent_node,
            parent: parent_parent,
            index_in_parent: parent_index_in_parent,
        }
    }

    fn finish(mut self) -> Node<T> {
        while let Some(_) = self.parent {
            self = self.parent();
        }

        self.node
    }
}

impl<T> Node<T> {
    fn zipper(self) -> NodeZipper<T> {
        NodeZipper { node: self, parent: None, index_in_parent: 0 }
    }
}

要使用这个Zipper，您需要拥有树的根节点所有权。通过拥有节点的所有权，Zipper可以移动东西以避免复制或克隆节点。当我们移动Zipper时，实际上是放弃旧的Zipper并创建一个新的（尽管我们也可以通过突变self来完成，但我认为那样更清晰，而且它还可以让您链接方法调用）。

如果以上选项不令人满意，您绝对必须在节点中存储父节点，则下一个最佳选项是使用 Rc<RefCell<Node<T>>> 来引用父节点和 Weak<RefCell<Node<T>>> 来引用子节点。Rc 允许共享所有权，但增加了运行时引用计数的开销。RefCell 允许内部可变性，但增加了运行时活动借用的跟踪开销。Weak 类似于 Rc，但它不会增加引用计数；这是用来打破循环引用的，否则将防止引用计数降为零，导致内存泄漏。请参见 DK.'s 的答案，其中使用了 Rc、Weak 和 RefCell 实现。