Lucene有一个默认的停用词过滤器 (http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html),有人知道列表中有哪些单词吗?
Lucene有一个默认的停用词过滤器 (http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html),有人知道列表中有哪些单词吗?
StandardAnalyzer和EnglishAnalyzer
中默认的停用词集来自于StopAnalyzer.ENGLISH_STOP_WORDS_SET
,在源文件中可以找到相关信息。
"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"
StopFilter
本身没有定义默认的停用词集。
Lucene
5.5.0 进行关键词提取。我使用tokenStream = new StopFilter(new ClassicFilter(new LowerCaseFilter(stdToken)), StopAnalyzer.ENGLISH_STOP_WORDS_SET);
指定停用词过滤器,但是Lucene
并没有过滤停用词。我是否遗漏了什么? - Mike