如何使用boost split函数分割字符串并忽略空值？

Question

如何使用boost split函数分割字符串并忽略空值？

23

我将使用boost::split来解析一个数据文件。数据文件包含如下所示的行。

data.txt

1:1~15  ASTKGPSVFPLAPSS SVFPLAPSS   -12.6   98.3

这些项目之间的空格是制表符。我拆分上述行的代码如下。

std::string buf;
/*Assign the line from the file to buf*/
std::vector<std::string> dataLine;
boost::split( dataLine, buf , boost::is_any_of("\t "), boost::token_compress_on);       //Split data line
cout << dataLine.size() << endl;

对于上面的代码行，我应该得到 5 的输出，但我得到了 6。我已经尝试阅读文档，这个解决方案似乎应该实现我想要的功能，显然我漏掉了什么。谢谢！

编辑：在 dataLine 上运行以下 for 循环，您将得到以下结果。

cout << "****" << endl;
for(int i = 0 ; i < dataLine.size() ; i ++) cout << dataLine[i] << endl;
cout << "****" << endl;


****
1:1~15
ASTKGPSVFPLAPSS
SVFPLAPSS
-12.6
98.3

****

- PhiloEpisteme

dataLine 中存储了哪些值？ - Anon Mail

我得到了5，你的“buf”包含其他内容。 - Jesse Good

也许它没有正确地复制到这个页面，或者你将它错误地复制到了测试代码中。让我来确保它能够正确地复制。 - PhiloEpisteme

1

仅使用 boost::algorithm::trim 变体是否不足够？ - Dan Lecocq

我想不是。但这只会修剪字符串的前导和尾随空格（而不仅仅是空格字符），对吗？ - PhiloEpisteme

显示剩余2条评论

3个回答

7

我建议使用 C++ 字符串工具库。在我看来，这个库比 Boost 快得多。我曾经使用 Boost 来分割（也称为令牌化）一行文本，但发现这个库更符合我的要求。 strtk::parse 的一个很棒的特点是它将令牌转换为它们的最终值并检查元素数量。

你可以这样使用它：

std::vector<std::string> tokens;

// multiple delimiters should be treated as one
if( !strtk::parse( dataLine, "\t", tokens ) )
{
    std::cout << "failed" << std::endl;
}

--- 另一个版本

std::string token1;
std::string token2;
std::string token3:
float value1;
float value2;

if( !strtk::parse( dataLine, "\t", token1, token2, token3, value1, value2) )
{
     std::cout << "failed" << std::endl;
     // fails if the number of elements is not what you want
}

这是该库的在线文档：字符串分词器文档源代码链接：C++字符串工具库

- DannyK

我可能会考虑将来转向STL来满足我的需求，但现在我有很多使用boost的代码。 - PhiloEpisteme

14

我使用了大量的代码，其中涉及到boost库。我也使用了boost tokenizer工具。但由于速度较慢，我将这个特定功能转换成了strtk。同时，strtk还具备将标记转换为数字的能力，因此对我来说，毫不犹豫地进行了切换。 - DannyK

@DannyK 这个库在 ~2013 年左右似乎非常流行，你认为它在 2022 年仍然有用吗？ - Typewar

1

< p > boost::split 故意保留前导和尾随空格，因为它不知道它是否重要。解决方案是在调用 boost::split 之前使用 boost::trim。

#include <boost/algorithm/string/trim.hpp>

....

boost::trim(buf);

- Jesse Good

1

在调用之前？通常你会先分割然后修剪令牌，对吧？ - Nick

@Nick：这取决于情况。在原始问题中，用户正在拆分制表符分隔的文件，因此先进行修剪是正确的。 - Jesse Good

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Oberon · Accepted Answer

尽管“相邻的分隔符被合并在一起”，但似乎尾部的分隔符是问题所在，因为即使当它们被视为一个分隔符时，它仍然是一个分隔符。

因此，您的问题不能仅通过使用split()解决。但幸运的是，Boost String Algo有trim()和trim_if()，可以从字符串开头和结尾剥离空格或分隔符。因此，只需对buf调用trim()，如下所示：

std::string buf = "1:1~15  ASTKGPSVFPLAPSS SVFPLAPSS   -12.6   98.3    ";
std::vector<std::string> dataLine;
boost::trim_if(buf, boost::is_any_of("\t ")); // could also use plain boost::trim
boost::split(dataLine, buf, boost::is_any_of("\t "), boost::token_compress_on);
std::cout << out.size() << std::endl;

这个问题已经被问过了：boost::split在字符串的开头和结尾留下空标记 - 这是期望的行为吗？