如何在 flat_map 中处理结果

17

我知道我们可以使用collectResult从内部移动到外部,例如:

fn produce_result(my_struct: &MyStruct) -> Result<MyStruct, Error>;

let my_results: Vec<MyStruct> = vec![];
let res = my_results.iter().map(|my_struct| produce_result(&my_struct)).collect::<Result<Vec<MyStruct>, Error>>;

该方法会将闭包中的错误传播到外部。

然而,在flat_map情况下无法使用该方法 (Rust playground):

fn produce_result(my_struct: &MyStruct) -> Result<Vec<MyStruct>, Error>;

let my_results: Vec<MyStruct> = vec![];
let res = my_results.iter().flat_map(|my_struct| produce_result(&my_struct)).collect::<Result<Vec<MyStruct>, Error>>;

编译器报错:"无法从类型为std::vec::Vec<MyStruct>的元素的迭代器构建类型为std::result::Result<std::vec::Vec<MyStruct>, Error>的集合"

如何解决这个问题?


1
这回答了你的问题吗?如何在不收集到临时向量的情况下对结果的迭代器进行迭代计算?https://dev59.com/Nqnka4cB1Zd3GeqPJCPR 如何在Rust中使用文件中的数字列表进行简单数学运算并打印结果?https://stackoverflow.com/questions/59243725/how-to-do-simple-math-with-a-list-of-numbers-from-a-file-and-print-out-the-resul - Stargateur
1
重复的结果 - Stargateur
3个回答

13
flat_map会通过调用闭包返回值的IntoIterator实现,"展平"了值的顶层。重要的是它不会试图到达内部 - 也就是说,如果您有自己的MyResult,则会在flat_map本身上出错:
enum Error {}

enum MyResult<T, U> {
    Ok(T),
    Err(U),
}

struct MyStruct;

fn produce_result(item: &MyStruct) -> MyResult<Vec<MyStruct>, Error> {
    MyResult::Ok(vec![])
}

fn main() {
    let my_structs: Vec<MyStruct> = vec![];
    let res = my_structs
        .iter()
        .flat_map(|my_struct| produce_result(&my_struct))
        .collect::<Result<Vec<MyStruct>, Error>>();
}

(Playground)

错误:

error[E0277]: `MyResult<std::vec::Vec<MyStruct>, Error>` is not an iterator
  --> src/main.rs:18:10
   |
18 |         .flat_map(|my_struct| produce_result(&my_struct))
   |          ^^^^^^^^ `MyResult<std::vec::Vec<MyStruct>, Error>` is not an iterator
   |
   = help: the trait `std::iter::Iterator` is not implemented for `MyResult<std::vec::Vec<MyStruct>, Error>`
   = note: required because of the requirements on the impl of `std::iter::IntoIterator` for `MyResult<std::vec::Vec<MyStruct>, Error>`

在您的情况下,行为有所不同,因为Result实现了IntoIterator。此迭代器保持Ok值不变并跳过Err,因此在对Result执行flat_map时,您实际上忽略了每个错误并仅使用成功调用的结果。
虽然有点麻烦,但有一种方法可以解决它。您应该明确匹配Result,将Err情况包装在Vec中并“分发”已存在的Vec上的Ok情况,然后让flat_map完成其工作:
let res = my_structs
    .iter()
    .map(|my_struct| produce_result(&my_struct))
    .flat_map(|result| match result {
        Ok(vec) => vec.into_iter().map(|item| Ok(item)).collect(),
        Err(er) => vec![Err(er)],
    })
    .collect::<Result<Vec<MyStruct>, Error>>();

Playground

如果确实存在错误(即使只有时候),还有另一种可能更高效的方法:

fn external_collect(my_structs: Vec<MyStruct>) -> Result<Vec<MyStruct>, Error> {
    Ok(my_structs
        .iter()
        .map(|my_struct| produce_result(&my_struct))
        .collect::<Result<Vec<_>, _>>()?
        .into_iter()
        .flatten()
        .collect())
}

Playground

我已经进行了快速的基准测试-代码也在playground中,不过由于缺少命令,无法在那里运行,所以我在本地运行了它们。这里是结果:

test vec_result::external_collect_end_error   ... bench:   2,759,002 ns/iter (+/- 1,035,039)
test vec_result::internal_collect_end_error   ... bench:   3,502,342 ns/iter (+/- 438,603)

test vec_result::external_collect_start_error ... bench:          21 ns/iter (+/- 6)
test vec_result::internal_collect_start_error ... bench:          30 ns/iter (+/- 19)

test vec_result::external_collect_no_error    ... bench:   7,799,498 ns/iter (+/- 815,785)
test vec_result::internal_collect_no_error    ... bench:   3,489,530 ns/iter (+/- 170,124)

如果执行成功,使用两个链接的collect版本需要花费嵌套collect方法的两倍时间,但是如果执行在某些错误时进行短路处理,则速度大约快三分之一。这个结果在多次基准测试运行中保持一致,因此报告的大方差可能并不重要。


这个方法可行!谢谢。但我也想知道,通过 my_structs.iter().map(...).collect::<Result<Vec<_>, _>>()?.iter().flatten().collect 也可以得到预期的结果,因为两种方式都需要多次 collect,哪一种更有效率? - Evian
明白了。谢谢! - Evian
"test vec_result::external_collect_start_error ... bench: 21 ns/iter (+/- 6) test vec_result::internal_collect_start_error ... bench: 30 ns/iter (+/- 19)" 这是一个不幸的实际结果,显然这并没有被编译器运行和优化。 - Stargateur
@Stargateur 尝试将错误移动到第二个条目而不是第一个 - 分别得到了61 ns和62 ns。看起来编译器在编译时进行了短路处理? - Cerberus
谢谢您提供这么详细的答案!我认为.map(|item| Ok(item))可以缩写为.map(Ok) - n8henrie

5
使用 itertoolsflatten_ok 方法:
use itertools::Itertools;
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    // without error
    let res = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
        .into_iter()
        .map(Ok::<_, &str>)
        .flatten_ok()
        .collect::<Result<Vec<_>, _>>()?;
    println!("{:?}", res);

    // with error
    let res = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
        .into_iter()
        .map(|_| Err::<[i32; 2], _>("errored out"))
        .flatten_ok()
        .collect::<Result<Vec<_>, _>>()?;
    println!("{:?}", res); // not printed

    Ok(())
}

示例链接


0
最佳性能的解决方案是使用Either。它类似于Cerberus的Vec解决方案, 只是将Vec替换为LeftRight,如另一个问题中所述,以获得更好的性能。

fn to_either(
    result: Result<Vec<MyStruct>, Error>,
) -> impl Iterator<Item = Result<MyStruct, Error>> {
    match result {
        Ok(vec) => Right(vec.into_iter().map(Ok)),
        Err(e) => Left(std::iter::once(Err(e))),
    }
}


fn either_collect(my_structs: &[MyStruct], err_value: u64) -> Result<Vec<MyStruct>, Error> {
    my_structs
            .iter()
            .map(|my_struct| produce_result(&my_struct, err_value))
            .flat_map(to_either)
            .collect::<Result<Vec<MyStruct>, Error>>()
}

此外,Box 可以像 https://dev59.com/ll0a5IYBdhLWcg3w88vU#29760740 中提到的那样使用。

下面是包括所有解决方案的更新基准测试结果。

#![feature(test)]
extern crate test;
use test::{black_box, Bencher};
use either::*;

struct Error;

struct MyStruct(u64);

fn produce_result(item: &MyStruct, err_value: u64) -> Result<Vec<MyStruct>, Error> {
    if item.0 == err_value {
        Err(Error)
    } else {
        Ok((0..item.0).map(MyStruct).collect())
    }
}


fn to_box_iterator(
    result: Result<Vec<MyStruct>, Error>,
) -> Box<dyn Iterator<Item = Result<MyStruct, Error>>> {
    match result {
        Ok(vec) => Box::new(vec.into_iter().map(Ok)),
        Err(e) => Box::new(std::iter::once(Err(e))),
    }
}


fn to_either(
    result: Result<Vec<MyStruct>, Error>,
) -> impl Iterator<Item = Result<MyStruct, Error>> {
    match result {
        Ok(vec) => Left(vec.into_iter().map(Ok)),
        Err(e) => Right(std::iter::once(Err(e))),
    }
}


fn either_collect(my_structs: &[MyStruct], err_value: u64) -> Result<Vec<MyStruct>, Error> {
    my_structs
            .iter()
            .map(|my_struct| produce_result(&my_struct, err_value))
            .flat_map(to_either)
            .collect::<Result<Vec<MyStruct>, Error>>()
}

fn box_collect(my_structs: &[MyStruct], err_value: u64) -> Result<Vec<MyStruct>, Error> {
    my_structs
            .iter()
            .map(|my_struct| produce_result(&my_struct, err_value))
            .flat_map(to_box_iterator)
            .collect::<Result<Vec<MyStruct>, Error>>()
}

fn internal_collect(my_structs: &[MyStruct], err_value: u64) -> Result<Vec<MyStruct>, Error> {
    my_structs
            .iter()
            .map(|my_struct| produce_result(&my_struct, err_value))
            .flat_map(|result| match result {
                Ok(vec) => vec.into_iter().map(|item| Ok(item)).collect(),
                Err(er) => vec![Err(er)],
            })
            .collect::<Result<Vec<MyStruct>, Error>>()
}

fn external_collect(my_structs: &[MyStruct], err_value: u64) -> Result<Vec<MyStruct>, Error> {
    Ok(my_structs
            .iter()
            .map(|my_struct| produce_result(&my_struct, err_value))
            .collect::<Result<Vec<_>, _>>()?
            .into_iter()
            .flatten()
            .collect())
}

#[bench]
pub fn internal_collect_start_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| internal_collect(&my_structs, 0));
}

#[bench]
pub fn box_collect_start_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| box_collect(&my_structs, 0));
}

#[bench]
pub fn either_collect_start_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| either_collect(&my_structs, 0));
}

#[bench]
pub fn external_collect_start_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| external_collect(&my_structs, 0));
}

#[bench]
pub fn internal_collect_end_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| internal_collect(&my_structs, 999));
}

#[bench]
pub fn box_collect_end_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| box_collect(&my_structs, 999));
}

#[bench]
pub fn either_collect_end_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| either_collect(&my_structs, 999));
}

#[bench]
pub fn external_collect_end_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| external_collect(&my_structs, 999));
}

#[bench]
pub fn internal_collect_no_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| internal_collect(&my_structs, 1000));
}

#[bench]
pub fn box_collect_no_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| box_collect(&my_structs, 1000));
}

#[bench]
pub fn either_collect_no_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| either_collect(&my_structs, 1000));
}

#[bench]
pub fn external_collect_no_error(b: &mut Bencher) {
    let my_structs: Vec<_> = black_box((0..1000).map(MyStruct).collect());
    b.iter(|| external_collect(&my_structs, 1000));
}

结果:

cargo bench

  Downloaded either v1.8.1
  Downloaded 1 crate (16.0 KB) in 0.24s
   Compiling either v1.8.1
   Compiling my-project v0.1.0 (/home/runner/DarksalmonLiquidPi)
    Finished bench [optimized] target(s) in 4.63s
     Running unittests src/main.rs (target/release/deps/my_project-03c6287d7b53ee50)

running 12 tests
test box_collect_end_error        ... bench:   9,439,159 ns/iter (+/- 7,009,221)
test box_collect_no_error         ... bench:   9,538,551 ns/iter (+/- 7,923,652)
test box_collect_start_error      ... bench:          81 ns/iter (+/- 178)
test either_collect_end_error     ... bench:   4,266,292 ns/iter (+/- 6,008,125)
test either_collect_no_error      ... bench:   3,341,910 ns/iter (+/- 6,344,290)
test either_collect_start_error   ... bench:          41 ns/iter (+/- 53)
test external_collect_end_error   ... bench:     209,960 ns/iter (+/- 663,883)
test external_collect_no_error    ... bench:  10,074,473 ns/iter (+/- 4,737,417)
test external_collect_start_error ... bench:          17 ns/iter (+/- 55)
test internal_collect_end_error   ... bench:   8,860,670 ns/iter (+/- 6,148,916)
test internal_collect_no_error    ... bench:   8,564,756 ns/iter (+/- 6,842,558)
test internal_collect_start_error ... bench:          44 ns/iter (+/- 165)

test result: ok. 0 passed; 0 failed; 0 ignored; 12 measured; 0 filtered out; finished in 69.52s

这个基准测试可以在https://replit.com/@Atry/DarksalmonLiquidPi找到。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接