将std::string拆分并插入到std::set中

Question

将std::string拆分并插入到std::set中

c++

5

根据C++聊天室里那些了不起的家伙们的要求，将一个文件（我的情况是包含大约100行字符串，每行大约有10个单词）拆分并将所有这些单词插入到std::set中，有什么好的方法？

- Nico Bellic

我不确定你所说的动词“to index”是什么意思。也许你的意思是“并将所有这些单词插入到std::set中？” - Robᵩ

3个回答

3

假设您已经将文件读入字符串中，boost::split可以解决问题：

#include <set>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>

std::string astring = "abc 123 abc 123\ndef 456 def 456";  // your string
std::set<std::string> tokens;                              // this will receive the words
boost::split(tokens, astring, boost::is_any_of("\n "));    // split on space & newline

// Print the individual words
BOOST_FOREACH(std::string token, tokens){
    std::cout << "\n" << token << std::endl;
}

如有需要，可以使用列表或向量代替集合。

此外，请注意这几乎是一个重复的问题：在C ++中如何拆分字符串？

- Josh

2

#include <set>
#include <iostream>
#include <string>

int main()
{
  std::string temp, mystring;
  std::set<std::string> myset;

  while(std::getline(std::cin, temp))
      mystring += temp + ' ';
  temp = "";      

  for (size_t i = 0; i < mystring.length(); i++)
  {
    if (mystring.at(i) == ' ' || mystring.at(i) == '\n' || mystring.at(i) == '\t')
    {
      myset.insert(temp);
      temp = "";
    }
    else
    {
      temp.push_back(mystring.at(i));
    }
  }
  if (temp != " " || temp != "\n" || temp != "\t")
    myset.insert(temp);

  for (std::set<std::string>::iterator i = myset.begin(); i != myset.end(); i++)
  {
    std::cout << *i << std::endl;
  }
  return 0;
}

让我们从顶部开始。首先，您需要一些变量来使用。`temp`只是一个占位符，用于在从要解析的字符串中构建它时存储每个字符的字符串。`mystring`是要拆分的字符串，而`myset`是您将要放置拆分字符串的地方。

然后我们读取文件（通过`<`管道输入）并将内容插入到`mystring`中。

现在，我们要沿着字符串的长度进行迭代，搜索空格、换行符或制表符以拆分字符串。如果我们找到其中一个字符，则需要将字符串插入集合中，并清空占位符字符串，否则，我们将字符添加到占位符中，这将构建字符串。完成后，我们需要将最后一个字符串添加到集合中。

最后，我们沿着集合进行迭代，并打印每个字符串，这仅仅是为了验证，但也可能有其他用途。

编辑：Loki Astari 在评论中对我的代码进行了显著改进，我认为应该将其整合到答案中：

#include <set>
#include <iostream>
#include <string>

int main()
{
  std::set<std::string> myset;
  std::string word;

  while(std::cin >> word)
  {
      myset.insert(std::move(word));
  }

  for(std::set<std::string>::const_iterator it=myset.begin(); it!=myset.end(); ++it)
    std::cout << *it << '\n';
}

- Drise

虽然Dirse的代码更冗长，但似乎比Mooning Duck的代码运行得更快。 - Lukas Schmelzeisen

让我手动处理比让 <algorithm> 来处理更少的开销？ - Drise

@LukasSchmelzeisen：如果这是真的，那么你做错了其他事情。 - Martin York

@LokiAstari: 我的名字只有一个'n' :P 而且我同意，从我所看到的，我的代码应该比Drise的快得多。这是一个基于Drise的版本，它应该比我们两个答案都要快，我出于没有真正的理由编码了这个版本：http://ideone.com/VzQi5 - Mooing Duck

@LokiAstari很不幸，这是我大学教授的方式，所以我仍然保持那种思维模式。但是，我正在努力打破这个习惯。 - Drise

显示剩余2条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mooing Duck · Accepted Answer

构建任何容器的最简单方法是使用接受一对迭代器的构造函数，这些迭代器指向该元素的一系列来源。使用 istream_iterator 迭代流。

#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

using namespace std;

int main()
{
  //I create an iterator that retrieves `string` objects from `cin`
  auto begin = istream_iterator<string>(cin);
  //I create an iterator that represents the end of a stream
  auto end = istream_iterator<string>();
  //and iterate over the file, and copy those elements into my `set`
  set<string> myset(begin, end);

  //this line copies the elements in the set to `cout`
  //I have this to verify that I did it all right
  copy(myset.begin(), myset.end(), ostream_iterator<string>(cout, "\n"));
  return 0;
}

http://ideone.com/iz1q0