如何在Lucene 4中获取Lucene字段的所有术语

12
我正在尝试将我的代码从Lucene 3.4更新到4.1。我已经弄清楚了所有的更改,除了一个。我有一段代码需要迭代遍历一个字段的所有术语值。在Lucene 3.1中,有一个IndexReader#terms()方法提供一个TermEnum,我可以在其中进行迭代。但是,在Lucene 4.1中,似乎发生了变化,即使在文档中搜索了几个小时,我也无法弄清楚该怎么做。请问有人能指点我吗?
谢谢。

我刚刚把答案部分移到了你标记的答案处,因为在问题表述中搜索和找到答案可能会令人困惑和不直观。 - MahNas92
1个回答

4

请遵循Lucene 4迁移指南::

How you obtain the enums has changed. The primary entry point is the Fields class. If you know your reader is a single segment reader, do this:

Fields fields = reader.Fields();
if (fields != null) {
  ...
}

If the reader might be multi-segment, you must do this:

Fields fields = MultiFields.getFields(reader);
if (fields != null) {
  ...
}

The fields may be null (eg if the reader has no fields).

Note that the MultiFields approach entails a performance hit on MultiReaders, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (use oal.util.ReaderUtil) and then step through those readers yourself, if you can (this is how Lucene drives searches).

If you pass a SegmentReader to MultiFields.fields it will simply return reader.fields(), so there is no performance hit in that case.

Once you have a non-null Fields you can do this:

Terms terms = fields.terms("field");
if (terms != null) {
  ...
}

The terms may be null (eg if the field does not exist).

Once you have a non-null terms you can get an enum like this:

TermsEnum termsEnum = terms.iterator();

The returned TermsEnum will not be null.

You can then .next() through the TermsEnum


11
提到迁移指南是很好的,但我认为如果你能在这里发布与问题相关的部分以及你建议做什么,那么你的答案会更加有用。 - javanna
2
我刚刚在我的问题上面添加了相关部分。 - ali

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接