将一个Vec中的值按照惯用方式移动/排序到另一个Vec中

Question

将一个Vec中的值按照惯用方式移动/排序到另一个Vec中

rustfunctional-programming

3

我最近接触了 Rust，之前是 Python 背景。我还在努力掌握函数式编程，因此我正在寻找关于编写 Rust 惯用方法的见解/反馈。

在下面的示例中，我有一个Parent元素和Child元素的列表，并希望根据id将Child元素排序到它们各自的父元素中。

在 Python 中，我会嵌套两个 for 循环，执行测试并相应地继续。但我不确定是否有更好/更高效/惯用的方法来做到这一点。

我已经标记了相关代码部分。当然，任何反馈都很好！

这里是一个可工作的 playgound: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=233cfa5b5798090fa969ba348a479b1c

#[derive(Debug)]
struct Parent {
    id: String,
    children: Vec<Child>,
}

impl Parent {
    pub fn from_id(id: String) -> Self {
        Self {
            id,
            children: Vec::new(),
        }
    }
}

#[derive(Debug)]
struct Child {
    parent_id: String,
}

impl Child {
    pub fn from_parent_id(parent_id: String) -> Self {
        Self { parent_id }
    }
}

fn main() {
    let mut parents: Vec<Parent> = vec!["a", "b", "c"]
        .iter()
        .map(|s| s.to_string())
        .map(Parent::from_id)
        .collect();

    let mut children: Vec<Child> = vec!["a", "a", "b", "c", "c", "c"]
        .iter()
        .map(|s| s.to_string())
        .map(Child::from_parent_id)
        .collect();

    // Is there a better way to do this?
    while let Some(child) = children.pop() {
        for parent in parents.iter_mut() {
            if child.parent_id == parent.id {
                parent.children.push(child);
                break;
            }
        }
    }

    dbg!(parents);
    dbg!(children);
}

- tnahs

你一定要在这里使用Vec吗？将整个父子关系作为HashMap<String，Vec<Child>>或类似的东西处理更为自然。这样您就不必每次迭代父向量以找到具有正确ID的向量。 - cadolphs

嗯，我想是这样的吧？我正在查询两个不同的数据库并返回两个Vec，然后根据id将它们混合在一起。此外，实际的Parent和Child结构体是更复杂的类型，具有其他字段，稍后将使用serde进行序列化。 - tnahs

我猜一个副作用更少的解决方案就是只消耗 children，像这样，这允许 children 不是 mut。然后你可以使用 find() 和 map() 来使它更加函数化。你可以将外部的 for 循环转换为 children.into_iter().for_each(|child| ...)，但这似乎并没有提高可读性。 - user4815162342

1

但是，与其试图使您的代码完全功能化，您应该考虑此操作的时间复杂度，它当前为 O(n*m)- 如果这些数字很大，它可能会崩溃。创建父向量的 id->位置的临时映射可以将其变为 O(n+m)，如此示例所示。 - user4815162342

哦，这是一个很好的观点。我现在明白了。我不希望Parent的实例超过1000个，但Child的实例肯定会超过。我不确定周围代码的当前实现有多灵活，但临时映射是个好主意，特别是如果我找不到解决方法。让我看看我能用这个做些什么！ - tnahs

2个回答

1

你的代码没问题。但这里有一些其他的实现想法。

通过将“From”作为替代“from_id()”和“from_parent_id()”方法来实现从一种类型到另一种类型的转换非常容易。

impl From<&str> for Parent {
    fn from(id: &str) -> Self {
        Self { id: id.into(), children: vec![] }
    }
}

impl From<&str> for Child {
    fn from(id: &str) -> Self {
        Child { parent_id: id.into() }
    }
}

后续的示例假定已经按照上述方式实现了 From。

如果为这些类型实现 From，可以简化从 ID 向量创建对象的过程。但是，差异并不明显。你已经编写了创建 Child 和 Parent 对象的代码，也很好。

    let mut parents  = vec!["a", "b", "c"]
                        .into_iter().map(|id| id.into())
                        .collect::<Vec<Parent>>();

    let mut children = vec!["a", "a", "b", "c", "c", "c"]
                        .into_iter().map(|id| id.into())
                        .collect::<Vec<Child>>();

下面是一个更加有效的方法，通过调用.for_each()将Child对象与Parents相匹配 - 一个典型的for循环也同样适用。

    children.into_iter().for_each(|child| {
        let cmp = |p: &Parent| p.id.cmp(&child.parent_id);

        if let Ok(idx) = parents.binary_search_by(cmp) {
            parents[idx].children.push(child); 
        }});

在上面的例子中，二分查找是一种使匹配子项和父项的过程更加高效的方法，假设Parent按照它们的ID排序。
一个更有效的方法是将父项放入HashMap中。

    let mut parents  = vec!["a", "b", "c"]
                        .into_iter().map(|id| (id.into(), id.into()))
                        .collect::<HashMap<String, Parent>>();

以下显示了类似于二进制搜索示例的方式，将Child对象与HashMap中的Parents进行匹配。

    children.into_iter().for_each(|child| { 
        if let Some(p) = parents.get_mut(&child.parent_id) {
            p.children.push(child); 
        }});

- Todd

使用children.into_iter()比children.drain(0..)更好，不是吗？排出适用于需要保留向量的情况，而这似乎不适用于OP，OP可能使用pop()是一种有效的解决方案。 - user4815162342

@user4815162342，是的。你说得对。Drain迭代器不是必需的。我已经从示例中删除了它。谢谢。 - Todd

这真的非常有帮助！特别是 From 特质实现。我最终重新设计了一些东西，以便将所有 Parent 对象存储到 HashMap 中。此外，clippy 建议我使用 if let 块将 Child 元素推入 Parent.children。非常感谢！ - tnahs

不客气，@se432。是的，我希望 Clippy 建议使用 if let，这可能是更好的风格选择。我使用了 match，因为它在示例中占用了更少的水平空间。如果你想要一个同时具备 HashMap 和可迭代排序列表特征的存储父母信息的数据结构，可以考虑使用 BTreeMap。 - Todd

@Todd 如果我理解正确的话，HashMap 在我的情况下应该足够了。在实际数据中，Parent.id 是一个类似 uuid 的字符串，不包含任何顺序信息，也不需要以有序的方式访问数据。所以我认为使用 BTreeMap 不会有任何优势，是这样吧？ - tnahs

同意，HashMap 是你想要的。@se432 - Todd

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user4815162342 · Accepted Answer

当你需要保留向量的部分或全部内容时，通常会使用从向量末尾弹出项的方法。如果你需要消耗整个向量，可以直接将其传递给for循环：

for child in children {
    for parent in parents.iter_mut() {
        if child.parent_id == parent.id {
            parent.children.push(child);
            break;
        }
    }
}

您可以使用迭代器查找父级，例如：

for child in children {
    parents
        .iter_mut()
        .find(|parent| parent.id == child.parent_id)
        .map(|parent| parent.children.push(child));
}

性能方面最重要的问题是需要总共执行n*m次迭代，其中n和m是父母和孩子的数量。如果这些数字可以达到数万个，您最终将得到数亿次迭代，这将使您变慢。您可以为父向量创建一个临时映射id->position，以使操作O(n + m)：

let parent_pos_by_id: HashMap<_, _> = parents
    .iter()
    .enumerate()
    .map(|(idx, parent)| (parent.id.clone(), idx))
    .collect();

for child in children {
    if let Some(&parent_pos) = parent_pos_by_id.get(&child.parent_id) {
        parents[parent_pos].children.push(child);
    }
}