String::chars是否有自己的版本?(涉及IT技术,无需回答)

10
下面的代码无法编译:
use std::str::Chars;

struct Chunks {
    remaining: Chars,
}

impl Chunks {
    fn new(s: String) -> Self {
        Chunks {
            remaining: s.chars(),
        }
    }
}

错误信息如下:
error[E0106]: missing lifetime specifier
 --> src/main.rs:4:16
  |
4 |     remaining: Chars,
  |                ^^^^^ expected lifetime parameter

Chars不拥有它遍历的字符,也不能超过创建它的&strString的生命周期。

是否有一个拥有所有权的版本Chars,不需要生命周期参数,还是我必须自己保留Vec<char>和索引?


顺便说一下:最好让Chunks泛型化迭代器类型:struct Chunks<Iter> {...}, impl<Iter: Iterator<Item = char>> Chunks<Iter> {..}, fn new<IntoIter: IntoIterator<Item = char>>(iter: IntoIter) ->Self {...}。这样它就不会关心字符来自哪里;它可以是拥有的、借用的或其他任何形式。 - chbaker0
6个回答

7

还有一个owned-chars crate,它提供了一个扩展特性,包括两个方法:into_chars和into_char_indices。这些方法与String::chars和String::char_indices相似,但它们创建的迭代器会消耗String而不是借用它。


6

std::vec::IntoIter 是一种拥有所有权的迭代器版本。

use std::vec::IntoIter;

struct Chunks {
    remaining: IntoIter<char>,
}

impl Chunks {
    fn new(s: String) -> Self {
        Chunks {
            remaining: s.chars().collect::<Vec<_>>().into_iter(),
        }
    }
}

游乐场链接

缺点是需要额外分配内存和空间开销,但我不知道你的具体情况是否有相应的迭代器。


5

Ouroboros

你可以使用ouroboros crate创建一个包含StringChars迭代器的自引用结构体:

use ouroboros::self_referencing; // 0.4.1
use std::str::Chars;

#[self_referencing]
pub struct IntoChars {
    string: String,
    #[borrows(string)]
    chars: Chars<'this>,
}

// All these implementations are based on what `Chars` implements itself

impl Iterator for IntoChars {
    type Item = char;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.next())
    }

    #[inline]
    fn count(mut self) -> usize {
        self.with_mut(|me| me.chars.count())
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.with(|me| me.chars.size_hint())
    }

    #[inline]
    fn last(mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.last())
    }
}

impl DoubleEndedIterator for IntoChars {
    #[inline]
    fn next_back(&mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.next_back())
    }
}

impl std::iter::FusedIterator for IntoChars {}

// And an extension trait for convenience

trait IntoCharsExt {
    fn into_chars(self) -> IntoChars;
}

impl IntoCharsExt for String {
    fn into_chars(self) -> IntoChars {
        IntoCharsBuilder {
            string: self,
            chars_builder: |s| s.chars(),
        }
        .build()
    }
}

参见:

Rental

您可以使用rental创建一个包含StringChars迭代器的自引用结构体:

#[macro_use]
extern crate rental;

rental! {
    mod into_chars {
        pub use std::str::Chars;

        #[rental]
        pub struct IntoChars {
            string: String,
            chars: Chars<'string>,
        }
    }
}

use into_chars::IntoChars;

// All these implementations are based on what `Chars` implements itself

impl Iterator for IntoChars {
    type Item = char;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.next())
    }

    #[inline]
    fn count(mut self) -> usize {
        self.rent_mut(|chars| chars.count())
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.rent(|chars| chars.size_hint())
    }

    #[inline]
    fn last(mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.last())
    }
}

impl DoubleEndedIterator for IntoChars {
    #[inline]
    fn next_back(&mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.next_back())
    }
}

impl std::iter::FusedIterator for IntoChars {}

// And an extension trait for convenience 

trait IntoCharsExt {
    fn into_chars(self) -> IntoChars;
}

impl IntoCharsExt for String {
    fn into_chars(self) -> IntoChars {
        IntoChars::new(self, |s| s.chars())
    }
}

另请参阅:


3

这里有一个没有使用unsafe的解决方案。

它提供了与s.chars().collect::<Vec<_>>().into_iter()相同的效果,但没有分配开销。

此外,它可能是尽可能快的。它不会重新分配内存,也不会重复迭代,只是从字符到字符地步进,在每一步中都是O(1),总迭代次数为O(n)。这同时也是迭代任何内容的下限。

最重要的是,它不是自引用的。因此,这种方法可能是您想要的,它结合了其他答案的所有优点,没有任何缺点。

struct OwnedChars {
    s: String,
    index: usize,
}

impl OwnedChars {
    pub fn new(s: String) -> Self {
        Self { s, index: 0 }
    }
}

impl Iterator for OwnedChars {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        // Slice of leftover characters
        let slice = &self.s[self.index..];

        // Iterator over leftover characters
        let mut chars = slice.chars();

        // Query the next char
        let next_char = chars.next()?;

        // Compute the new index by looking at how many bytes are left
        // after querying the next char
        self.index = self.s.len() - chars.as_str().len();

        // Return next char
        Some(next_char)
    }
}

再加上一点特性魔法:

trait StringExt {
    fn into_chars(self) -> OwnedChars;
}
impl StringExt for String {
    fn into_chars(self) -> OwnedChars {
        OwnedChars::new(self)
    }
}

你可以做:

struct Chunks {
    remaining: OwnedChars,
}

impl Chunks {
    fn new(s: String) -> Self {
        Chunks {
            remaining: s.into_chars(),
        }
    }
}

我猜s.chars().collect::<Vec<_>>().into_iter()不会分配内存,因为这是标准库处理的特殊情况。 - undefined
@StephenChung 我不确定。我有这样的印象,它确实分配资源。你为什么认为它不会呢? - undefined

1

如何将Chars迭代器存储在与其正在迭代的String相同的结构体中?复制而来:

use std::mem;
use std::str::Chars;

/// I believe this struct to be safe because the String is
/// heap-allocated (stable address) and will never be modified
/// (stable address). `chars` will not outlive the struct, so
/// lying about the lifetime should be fine.
///
/// TODO: What about during destruction?
///       `Chars` shouldn't have a destructor...
struct OwningChars {
    _s: String,
    chars: Chars<'static>,
}

impl OwningChars {
    fn new(s: String) -> Self {
        let chars = unsafe { mem::transmute(s.chars()) };
        OwningChars { _s: s, chars }
    }
}

impl Iterator for OwningChars {
    type Item = char;
    fn next(&mut self) -> Option<Self::Item> {
        self.chars.next()
    }
}

0
您可以实现自己的迭代器,或者像这样包装Chars(只需要一个小的unsafe块):
// deriving Clone would be buggy. With Rc<>/Arc<> instead of Box<> it would work though.
struct OwnedChars {
    // struct fields are dropped in order they are declared,
    // see https://dev59.com/NlgR5IYBdhLWcg3wp-u5#41056727
    // with `Chars` it probably doesn't matter, but for good style `inner`
    // should be dropped before `storage`.

    // 'static lifetime must not "escape" lifetime of the struct
    inner: ::std::str::Chars<'static>,
    // we need to box anyway to be sure the inner reference doesn't move when
    // moving the storage, so we can erase the type as well.
    // struct OwnedChar<S: AsRef<str>> { ..., storage: Box<S> } should work too
    storage: Box<AsRef<str>>,
}

impl OwnedChars {
    pub fn new<S: AsRef<str>+'static>(s: S) -> Self {
        let storage = Box::new(s) as Box<AsRef<str>>;
        let raw_ptr : *const str = storage.as_ref().as_ref();
        let ptr : &'static str = unsafe { &*raw_ptr };
        OwnedChars{
            storage: storage,
            inner: ptr.chars(),
        }
    }

    pub fn as_str(&self) -> &str {
        self.inner.as_str()
    }
}

impl Iterator for OwnedChars {
    // just `char` of course
    type Item = <::std::str::Chars<'static> as Iterator>::Item;

    fn next(&mut self) -> Option<Self::Item> {
        self.inner.next()
    }
}

impl DoubleEndedIterator for OwnedChars {
    fn next_back(&mut self) -> Option<Self::Item> {
        self.inner.next_back()
    }
}

impl Clone for OwnedChars {
    fn clone(&self) -> Self {
        // need a new allocation anyway, so simply go for String, and just
        // clone the remaining string
        OwnedChars::new(String::from(self.inner.as_str()))
    }
}

impl ::std::fmt::Debug for OwnedChars {
    fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result {
        let storage : &str = self.storage.as_ref().as_ref();
        f.debug_struct("OwnedChars")
            .field("storage", &storage)
            .field("inner", &self.inner)
            .finish()
    }
}

// easy access
trait StringExt {
    fn owned_chars(self) -> OwnedChars;
}
impl<S: AsRef<str>+'static> StringExt for S {
    fn owned_chars(self) -> OwnedChars {
        OwnedChars::new(self)
    }
}

查看 playground


2
使用 rental crate 后的功能与之前相同,但很遗憾在 playground 中无法运行。 - red75prime
为什么需要额外的盒子?S只能是StringBox<str>或其他拥有str引用的类型,对吧?因此,存储必须在堆上分配(如果它不是'static),因此在S被丢弃之前不会移动。(只要OwnedChars没有push东西或触发移动。) - trent
我可以使用小字符串优化(参见smallvec)创建一个字符串存储类型。 - Stefan
@Stefan Ah,没错。但是看起来这个结构体的正常用法是当你手头有一个String时,而在这种情况下它是双重包装的。你觉得存储一个Box<str>是否安全,并且使用new<S: Into<Box<str>>>?这将适用于任何引用以及拥有的String,只在必要时复制内容,不会双重包装。 - trent
我不确定将String转换为Box<str>的分配开销 - 如果它重用了Vec内存,那么这应该会更快,是吗?如果您知道您只想对String执行此操作,您当然也可以使用未装箱的String - 据我所知,String保证堆分配。 - Stefan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接