你好,我想要计算一个字符串中句子的数量,目前我正在使用以下方法:
int count = str.split("[!?.:]+").length;
但是我的字符串中包括名称和单词之间的".",例如:
“他的名字是沃尔顿D.C.,他去年刚完成了他的B.Tech。”
现在以上述行作为示例,计数将返回4个句子,但实际上只有一个。
那么如何处理这些情况?
你好,我想要计算一个字符串中句子的数量,目前我正在使用以下方法:
int count = str.split("[!?.:]+").length;
但是我的字符串中包括名称和单词之间的".",例如:
“他的名字是沃尔顿D.C.,他去年刚完成了他的B.Tech。”
现在以上述行作为示例,计数将返回4个句子,但实际上只有一个。
那么如何处理这些情况?
private static void markBoundaries(String target, BreakIterator iterator) {
StringBuffer markers = new StringBuffer();
markers.setLength(target.length() + 1);
for (int k = 0; k < markers.length(); k++) {
markers.setCharAt(k, ' ');
}
int count = 0;
iterator.setText(target);
int boundary = iterator.first();
while (boundary != BreakIterator.DONE) {
markers.setCharAt(boundary, '^');
++count;
boundary = iterator.next();
}
System.out.println(target);
System.out.println(markers);
System.out.println("Number of Boundaries: " + count);
System.out.println("Number of Sentences: " + (count-1));
}
public static void main(String[] args) {
Locale currentLocale = new Locale("en", "US");
BreakIterator sentenceIterator
= BreakIterator.getSentenceInstance(currentLocale);
String someText = "He name is Walton D.C. and he just completed his B.Tech last year.";
markBoundaries(someText, sentenceIterator);
someText = "This order was placed for QT3000! MK?";
markBoundaries(someText, sentenceIterator);
}
He name is Walton D.C. and he just completed his B.Tech last year.
^ ^
Number of Boundaries: 2
Number of Sentences: 1
This order was placed for QT3000! MK?
^ ^ ^
Number of Boundaries: 3
Number of Sentences: 2
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! MK? \n Thats amazing. \n But I am not sure.";
String pattern = "([.!?])([\\s\\n])([A-Z]*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
int count=0;
while (m.find( )) {
count++;
}
count++; //for the last line, which will not get included here.
System.out.println("COUNT=="+count);
}
public static void main(String[] args) {
// TODO Auto-generated method stub
String s="Find the number Sentence";
int count=0;
for (int i = 0; i < s.length(); i++) {
if(s.charAt(i)==' ') {
count++;
}
}
count=count+1;
System.out.println(count);
}
}
一个解决方案是,如果在点之前有一个或多个大写字母,则可以跳过该点。在这种情况下,名称(如果它们是大写的)也是如此。实施此操作后,您将只有一个句子。
另一个解决方案:改进此处的一个答案可能是:[小写]([点]或[?]或[!])[空格][大写]
但正如我所说,如果没有确切的规则,这几乎是不可能的。