如何使用正则表达式在Java中计算字符串中每个单词出现的次数?
String text = "I like good mules. Mules are good :)";
String[] words = text.split("([\\W\\s]+)");
Map<String, Integer> counts = new HashMap<String, Integer>();
for (String word: words) {
if (counts.containsKey(word)) {
counts.put(word, counts.get(word) + 1);
} else {
counts.put(word, 1);
}
}
结果:{骡子=1,很好=2,像=1,我=1}
\W
也匹配 \s
:因此在字符集中不需要包括 \s
。 - Bart KiersPattern p = Pattern.compile("\\babba\\b");
Matcher m = p.matcher("abba is abba with abbabba and abba doing abba");
int count = 0;
while(m.find()){
count++;
}
System.out.println(count); //4
使用Guava,这是一行代码:
Multiset<String> countOfEachWord =
HashMultiset.create(Splitter.on(" ").omitEmptyStrings().split(myString));
then to get the count of "dog" for example you would say:
countOfEachWord.count("dog")
你一定要使用正则表达式吗?如果不是,这可能会有所帮助:
public static int count(final String string, final String substring)
{
int count = 0;
int idx = 0;
while ((idx = string.indexOf(substring, idx)) != -1)
{
idx++;
count++;
}
return count;
}
int CountWords(String t){
return t.split("([[a-z][A-Z][0-9][\\Q-\\E]]+)",-1).length+(t.replaceAll("([[a-z][A-Z][0-9][\\W]]*)", "")).length()-1;
}
英文单词(化学名称)+中文单词
abba
,并且字符串是ABBA members didn't wear abbadabbas
,那么abba
的计数是多少?为什么要使用正则表达式? - Bart Kiers