使用`std::search`而不是`string::find`

Question

使用`std::search`而不是`string::find`

13

我有一个关于使用 std::search 和 string::find 处理字符串的问题。我知道通常最好使用类特定成员函数算法而不是标准库算法，因为它可以基于类进行优化，但我想知道是否合理，比如为了保持一致性，使用迭代器而不是索引来使用 std::search 而非 string::find。

如果我这样做会犯什么错误吗？还是应该坚持使用 string::find？从性能或风格方面来看，这两者之间有巨大的优势吗？

- gowrath

std::string::find 可以使用 Boyer-Moore 算法等方法来加速查找。我不确定 std::find 是否允许使用这样的方法。在处理字符串时，我建议仍然使用字符串版本。 - NathanOliver

@NathanOliver 实际上，std::string::find 与 Boyer-Moore 搜索器不兼容，而 std::search 可以。或者你是指可以使用 Boyer-Moore 实现 std::string::find？ - Corristo

我和@Corristo有类似的想法，但有点困惑。为什么std::search提供了执行Boyer-Moore算法的功能，而string::find没有（至少在文档中没有正式说明）？ - gowrath

在使用std::string时，使用索引是相当愚蠢的，因为这应该被认为是接口中的明显错误。如果符合您的审美选择并且不会极大地降低性能，我建议使用std::search。 - Passer By

如果性能是您关注的问题，您可能会对这个SO问题感兴趣：https://dev59.com/FJLea4cB1Zd3GeqP0UzV - kreuzerkrieg

2个回答

8

string::find 使用线性搜索，但对于某些情况（最新修补程序）比 Boyer-Moore 快数倍。我向 libstdc++ 和 libc++ 提交了一个修补程序（首元素再 memcomp），大大提高了 string::find 的性能。你可以尝试使用最新的 gcc（7.1），你会得到更好的性能表现。你也可以使用我编写的简单基准测试套件来测量性能：https://github.com/hiraditya/std-benchmark

特别是对于较小的字符串，在 Boyer-Moore 构造内部数据结构的时间内，(子)线性 string::find 就已经完成了。另外，在解析 HTML 等需要执行大量不匹配搜索的情况下，string::find 应该更快。

commit fc7ebc4b8d9ad7e2891b7f72152e8a2b7543cd65
Author: redi <redi@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 9 13:05:58 2017 +0000

    PR66414 optimize std::string::find

    2017-01-09  Jonathan Wakely  <jwakely@redhat.com>
            Aditya Kumar  <hiraditya@msn.com>

        PR libstdc++/66414
        * include/bits/basic_string.tcc
        (basic_string::find(const CharT*, size_type, size_type)): Optimize.

    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244225 138bc75d-0d04-0410-961f-82ee72b054a4

提示：使用std :: find总是比当前实现的std :: string :: find慢。

- A. K.

有没有可能比“某些情况”更具体？我正在使用 std::search / std::boyer_moore_searcher 和 std::string::find 进行测试，发现后者总是快了约10-15倍。在我的情况下，干草堆最多只有几千个字符。GCC 9.2.1. - Lieuwe

1

@Lieuwe 这是一份演示文稿，其中详细介绍了为什么 string::find 已经有了很大的改进。第 6-7 页提出了相关想法。https://github.com/hiraditya/std-benchmark/blob/master/docs/slides/slide-cppnow.pdf，我希望您也会发现其他幻灯片有用。 - A. K.

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Corristo · Accepted Answer

当前（2017年4月27日），至少GCC的libstdc++（也是clang默认使用的）采用线性搜索实现std::string::find，因此比使用Boyer-Moore算法或Knuth-Morris-Pratt算法等更慢。

std::string_view substr{"whatever"};
auto it = std::search(s.cbegin(), s.cend(),
                      std::boyer_moore_searcher(substr.begin(), substr.end()));

问题在于Boyer-Moore搜索器会为内部数据结构分配内存，因此可能会出现std::bad_alloc异常。然而，std::string::find被标记为noexcept，因此在std::string::find中使用已经实现的Boyer-Moore搜索器并不直接。