我需要从一个生产环境中非常大的 Vec
中获取前 N 个元素。目前我使用了一种效率低下的方法:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
v.sort_unstable();
v = v[0..3].to_vec();
在C++中,我会使用
std::partial_sort
,但我在Rust文档中找不到等效物。我是不是疏忽了它,还是它不存在(尚)?
我需要从一个生产环境中非常大的 Vec
中获取前 N 个元素。目前我使用了一种效率低下的方法:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
v.sort_unstable();
v = v[0..3].to_vec();
std::partial_sort
,但我在Rust文档中找不到等效物。lazysort
crate正是你需要的:#![feature(test)]
extern crate lazysort;
extern crate rand;
extern crate test;
use std::cmp::Ordering;
trait SortLazy<T> {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
}
impl<T> SortLazy<T> for [T] {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(&data[pivot], &data[lower]) {
Ordering::Less => {
data.swap(pivot, lower);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
data.swap(pivot, upper);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
unsafe {
use std::mem::swap;
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(data.get_unchecked(pivot), data.get_unchecked(lower)) {
Ordering::Less => {
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(lower) as *mut T),
);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(upper) as *mut T),
);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
}
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
}
#[cfg(test)]
mod tests {
use test::Bencher;
use lazysort::Sorted;
use std::collections::BinaryHeap;
use SortLazy;
use rand::{thread_rng, Rng};
const SIZE_VEC: usize = 100_000;
const N: usize = 42;
#[bench]
fn sort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_unstable();
})
}
#[bench]
fn lazysort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let _: Vec<_> = v.iter().sorted().take(N).collect();
})
}
#[bench]
fn lazysort_in_place(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_lazy(i32::cmp, N);
})
}
#[bench]
fn lazysort_in_place_fast(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
unsafe { v.sort_lazy_fast(i32::cmp, N) };
})
}
#[bench]
fn binaryheap(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let mut iter = v.iter();
let mut heap: BinaryHeap<_> = iter.by_ref().take(N).collect();
for i in iter {
heap.push(i);
heap.pop();
}
let _ = heap.into_sorted_vec();
})
}
}
running 5 tests
test tests::binaryheap ... bench: 3,283,938 ns/iter (+/- 413,805)
test tests::lazysort ... bench: 1,669,229 ns/iter (+/- 505,528)
test tests::lazysort_in_place ... bench: 1,781,007 ns/iter (+/- 443,472)
test tests::lazysort_in_place_fast ... bench: 1,652,103 ns/iter (+/- 691,847)
test tests::sort ... bench: 5,600,513 ns/iter (+/- 711,927)
test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out
lazysort
比使用 BinaryHeap
的方案更快。我们也可以看到,当 N
不断增加时,BinaryHeap
的解决方案变得越来越差。
lazysort
的问题在于它创建了一个第二个 Vec<_>
。一个“更好”的解决方案是在原地实现部分排序。我提供了这样一种实现的示例。N
大约为 SIZE_VEC / 3
时,经典的 sort
获胜。有一个select_nth_unstable
函数,它相当于std::nth_element
。然后可以对其结果进行排序,以达到您想要的效果。
示例:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
let top_three = v.select_nth_unstable(3).0;
top_three.sort();
3
是“第n个”元素的索引,所以我们实际上选择的是第4个元素,这是因为select_nth_unstable
返回一个元组,其中包括:
select_nth_unstable()
已经是不稳定的,因此使用稳定的 sort()
没有意义。请使用 sort_unstable()
。 - Chayim Friedman