如何在用户输入字符串的“坏词”数组中显示出现频率最高的前5个词?

3

我有一个包含不良词汇的数组和一个用于查找的函数:

let words = textWords.split(' ');
  console.log('words', words)
  let badWords = listOfBadWords.join('|');
  let regex = new RegExp(badWords, 'gi');

  let matches = words.reduce((acc, word) => {
    if (word.match(regex)) {
      let match = word.toLowerCase();
      acc[match] = (acc[match] || 0) + 1;
    }
    return acc;
  }, {});

  let sortedMatches = Object.entries(matches).sort((a, b) => b[1] - a[1]);
  let top5 = sortedMatches.slice(0, 5).map(m => m[0]);

  console.log(sortedMatches, top5); // in format [['word', numberOfUsing], ....]

如何将输出格式化为最常用的前五个单词:单词 - 出现次数


1
你的意思是想显示用户使用最多的糟糕词汇吗?这需要数据。 - m3ow
是的,我有一个包含不良词汇的数组,还有一个包含用户输入单词的数组。 - Parcurcik
2个回答

3

个人而言,我不会在这里使用正则表达式。我会创建一个对象来从badWords数组中计算单词,并简单地迭代用户的单词以增加计数器。类似于:

const badWords = ['cunt', 'shit', 'idiot', 'motherfucker', 'asshole', 'dickhead', 'slut', 'prick', 'whore', 'wanker'];
const userText = "I met this idiot today, he did behave like an asshole. This idiot treated me like a slut. Can't believe this idiot made me feeling like a cunt. Such an asshole, really. I tried to speak with this asshole, but he was just a dickhead.";

const counterObject = badWords.reduce((acc, curr) => {
  acc[curr] = 0;
  return acc;
}, {});

userText.split(/[\s,\.]/).forEach(
  token => {
    if (token in counterObject) counterObject[token]++;
  }
);

const topWords = Object.entries(counterObject).sort((a, b) => b[1] - a[1]).slice(0, 5);

console.log(topWords);


还有,我该如何获取值(使用不良词汇的次数)? - Parcurcik
从你的例子中,我需要取出数字 ["idiot", 3] - 3。 - Parcurcik

2
我会这样做。
const listOfBadWords = ['is', 'or'];
const badWordsFound = [];
const userString = 'This is a test, works or not, is to be seen';

for(i = 0; i < listOfBadWords.length; i++){
    const matches = (userString.match(new RegExp(listOfBadWords[i], 'g')) || []).length;
    if(matches)
        badWordsFound.push([listOfBadWords[i], matches]);
}

const sorted = badWordsFound.sort((a, b) => b[1] - a[1]);
console.log(sorted);

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接