将map<string, int>的映射反转为vector<vector<string>>

4

在我之前的问题(链接)的基础上,我有一个单词及其计数存储在map<string, int>中。我想将其反转,使所有具有相同计数的单词分组在一起。我的解决方案是使用vector<vector<string> >。第一个向量的索引是计数,第二个向量是单词的集合。

在阅读之前问题的答案后,这是我一直在尝试实现的:

  vector<vector<string> > sorted_words;
    for (map<string, int>::const_iterator it = counters.begin();
       it != counters.end(); ++it) {
    cout << "word:" << it->first
         << " count:" << it-> second
         << " vector size: " << sorted_words.size()
         << endl;

    if (sorted_words.size() - 1 > it->second && 
        sorted_words[ it->second ].size() > 0) {
      cout << "Adding " << it->first << endl;
      sorted_words[ it->second ].push_back(it->first);
    } else {
      cout << "Creating " << it->first << endl;
      vector<string> v;
      v.push_back(it->first);
      sorted_words.resize( it->second + 1 );
      sorted_words[it->second] = v;
    }
  }

在循环的第一次迭代中,if语句会导致段错误。

我的目标是检查外部向量的大小是否足够容纳当前值,如果是,我是否已经创建了嵌套向量。(我需要这样做,因为映射可以以任何顺序返回。例如,第一个元素可能是<"foo", 3>。)

如果我用的方法基本上不符合C++的方式,请随时指出。


1
map<int, vector<string> > 是一种实现方式。 - nhahtdh
那么基本上我使用 vector<vector<string>> 是走了错误的路吗? - They Call Me Bruce
1
取决于大部分索引是否已满(即存在大小为1、2、3等的单词),则vector<vector<string>>是一个不错的选择;否则,使用map更好。 - Eli Algranti
vector-of-vectors 的问题在于你的外部向量很可能是稀疏的——如果你有几个单词出现了数千次,但没有单词出现在100到1000次之间怎么办?你将会有900个无用的向量! - John Zwinck
你可以使用 std::multimap <int,std :: string>,不过我个人更喜欢 std::map<int,std :: vector <std :: string>> - WhozCraig
一般情况下,无法确定索引是否已满,因为它取决于数据。对于 1000 或 10000 个字符串的较小案例,vectormap 都可以使用,但超出此范围,我预计在内存中存储字符串的问题会超过使用哪种数据结构的问题。 - nhahtdh
4个回答

4
快速猜测: sorted_words.size() 是一些无符号类型(即 size_t),因此 sorted_words.size() - 1 在应该是-1(最初)时仍然是无符号的,因此您始终通过第一个条件,if条件的第二个部分被评估并崩溃。

3

如果涉及到空间问题,你最好使用 std::map<int, std::vector<string>>。下面的代码很简单(但可以通过将所有单词转为小写并去除标点来改进):

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <map>
using namespace std;

int main(int argc, char *argv[])
{
    if (argc < 2)
        return EXIT_FAILURE;

    // map of strings to counts.
    std::map<string, int> strs;
    ifstream inf(argv[1]);
    string str;
    while (inf >> str)
        ++strs[str];

    // map of counts to strings, smallest to largest.
    std::map<int, std::vector<string>> vals;
    for (auto it : strs)
        vals[ it.second ].push_back(it.first);

    // report counts for each
    for (auto it : vals)
    {
        cout << "Count: " << it.first << ": ";
        std::copy(it.second.begin(), it.second.end(),
                  ostream_iterator<string>(cout, " "));
        cout << endl;
    }
}

样例输入

我选择了《皆大欢喜》中威廉·莎士比亚的独白,这段独白有一些有趣的特点,稍后您会看到:

All the world's a stage,
And all the men and women merely players:
They have their exits and their entrances;
And one man in his time plays many parts,
His acts being seven ages. At first, the infant,
Mewling and puking in the nurse's arms.
And then the whining school-boy, with his satchel
And shining morning face, creeping like snail
Unwillingly to school. And then the lover,
Sighing like furnace, with a woeful ballad
Made to his mistress' eyebrow. Then a soldier,
Full of strange oaths and bearded like the pard,
Jealous in honour, sudden and quick in quarrel,
Seeking the bubble reputation
Even in the cannon's mouth. And then the justice,
In fair round belly with good capon lined,
With eyes severe and beard of formal cut,
Full of wise saws and modern instances;
And so he plays his part. The sixth age shifts
Into the lean and slipper'd pantaloon,
With spectacles on nose and pouch on side,
His youthful hose, well saved, a world too wide
For his shrunk shank; and his big manly voice,
Turning again toward childish treble, pipes
And whistles in his sound. Last scene of all,
That ends this strange eventful history,
Is second childishness and mere oblivion,
Sans teeth, sans eyes, sans taste, sans everything.

样例输出

 Count: 1: All At Even For In Into Is Jealous Last Made Mewling Sans Seeking Sighing That The Then They Turning Unwillingly acts again age ages. all all, arms. ballad beard bearded being belly big bubble cannon's capon childish childishness creeping cut, ends entrances; eventful everything. exits eyebrow. eyes eyes, face, fair first, formal furnace, good have he history, honour, hose, infant, instances; justice, lean lined, lover, man manly many men mere merely mistress' modern morning mouth. nose nurse's oaths oblivion, one pantaloon, pard, part. parts, pipes players: pouch puking quarrel, quick reputation round satchel saved, saws scene school-boy, school. second seven severe shank; shifts shining shrunk side, sixth slipper'd snail so soldier, sound. spectacles stage, sudden taste, teeth, this time too toward treble, voice, well whining whistles wide wise woeful women world world's youthful 
 Count: 2: Full His With on plays strange their to 
 Count: 3: like sans then with 
 Count: 4: a of 
 Count: 6: in 
 Count: 7: his 
 Count: 8: And 
 Count: 11: and the 

有趣的是这篇独白中独特词汇串的数量如此之多。几乎像他事先计划好了一样。然而,如果考虑大小写和标点符号的情况,数字显然是不同的。幸运的是,这也很容易做到,只需更改第一个while循环:

while (inf >> str)
{
    string alpha;
    for_each(str.begin(), str.end(),
            [](char& c){c=tolower(static_cast<unsigned char>(c));});
    copy_if(str.begin(), str.end(), back_inserter(alpha),
            [](const char& c){return isalpha(static_cast<unsigned char>(c));});
    ++strs[alpha];
}

这给我们带来了以下结果:
 Count: 1: acts again age ages arms at ballad beard bearded being belly big bubble cannons capon childish childishness creeping cut ends entrances even eventful everything exits eyebrow face fair first for formal furnace good have he history honour hose infant instances into is jealous justice last lean lined lover made man manly many men mere merely mewling mistress modern morning mouth nose nurses oaths oblivion one pantaloon pard part parts pipes players pouch puking quarrel quick reputation round satchel saved saws scene school schoolboy second seeking seven severe shank shifts shining shrunk side sighing sixth slipperd snail so soldier sound spectacles stage sudden taste teeth that they this time too toward treble turning unwillingly voice well whining whistles wide wise woeful women world worlds youthful 
 Count: 2: eyes full on plays strange their to 
 Count: 3: all like 
 Count: 4: a of sans then 
 Count: 5: with 
 Count: 7: in 
 Count: 9: his 
 Count: 12: the 
 Count: 19: and 

还是相当不错的,Billy。

由于第一张地图排序的性质,您可以按计数字母顺序获取结果单词列表。额外功能太棒了。


0

如果你想要反转一个 map<K, V>,可以使用一个 map<V, vector<K>>。如果你不关心此时实际计数是多少,你可以通过从中间的 map 移动来高效地构建 vector<vector<V>>。例如:

vector<vector<string>> invert(const map<string, int>& input) {

  map<int, vector<string>> inverse;
  for (const auto& pair : input)
    inverse[pair.second].push_back(pair.first);

  vector<vector<string>> result;
  for (auto& pair : inverse)
    result.push_back(move(pair.second));

  return result;

}

0

不要使用vector<vector<string>>或倒置映射map<int, vector>,而是可以利用已有的映射(map<string, int>),创建一个vector<vector<int>>来收集所有具有相同计数的单词的索引。这将节省大量空间。此外,当修改之前的map<string, int>时,您只需要更新vector<vector<int>>中的索引,这比其他两种方法更快。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接