Hibernate Search索引未完成的文档

Question

Hibernate Search索引未完成的文档

hibernatesearchbatch-fileindexinghibernate-search

3

我在批量索引数据时遇到了问题。我想要对一些成员使用@IndexedEmbedded，并且需要获取信息的Article列表进行索引。文章从另外两个bean：Page和Articlefulltext中获取附加信息。

批量会通过Hibernate Search Annotations正确更新数据库并添加新的Document到我的Lucene索引。但是添加的文档具有不完整的字段。似乎Hibernate Search没有看到所有的注解。

因此，当我查看Lucene Index时，可以看到关于Article和Page对象的某些字段，但没有关于ArticleFulltext的任何字段，但我的数据库中有正确的数据，这意味着persist（）操作已正确执行...

我真的需要一些帮助，因为我不知道Page和ArticleFullText之间有什么区别...

奇怪的是，如果我使用MassIndexer，它将正确地将Article + Page + Articlefulltext数据添加到Lucene索引中。但我不想每次进行大型更新时都重建数百万条文档索引...

我将log4j日志级别设置为hibernate search和lucene的debug级别。他们没有给我太多信息。

以下是我的bean代码和批处理代码。

感谢您的帮助，

Article.java:

@Entity
@Table(name = "article", catalog = "test")
@Indexed(index="articleText")
@Analyzer(impl = FrenchAnalyzer.class)
public class Article implements java.io.Serializable {

    @Id
    @GeneratedValue(strategy = IDENTITY)
    @Column(name = "id", unique = true, nullable = false)
    @DocumentId        
    private Integer id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "firstpageid", nullable = false)
    @IndexedEmbedded
    private Page page;

    @Column(name = "heading", length = 300)
    @Field(name= "title", index = Index.YES, store = Store.YES)
    @Boost(2.5f)
    private String heading;

    @Column(name = "subheading", length = 300)
    private String subheading;

    @OneToOne(fetch = FetchType.LAZY, mappedBy = "article") 
    @IndexedEmbedded
    private Articlefulltext articlefulltext;
    [... bean methods etc ...]

Page.java

@Entity
@Table(name = "page", catalog = "test")
public class Page implements java.io.Serializable {

    private Integer id;
    @IndexedEmbedded
    private Issue issue;
    @ContainedIn
    private Set<Article> articles = new HashSet<Article>(0);
    [... bean method ...]

Articlefulltext.java

@Entity
@Table(name = "articlefulltext", catalog = "test")
@Analyzer(impl = FrenchAnalyzer.class)
public class Articlefulltext implements java.io.Serializable {

    @GenericGenerator(name = "generator", strategy = "foreign", parameters = @Parameter(name = "property", value = "article"))
    @Id
    @GeneratedValue(generator = "generator")
    @Column(name = "aid", unique = true, nullable = false)
    private int aid;

    @OneToOne(fetch = FetchType.LAZY)
    @PrimaryKeyJoinColumn
    @ContainedIn
    private Article article;

    @Column(name = "fulltextcontents", nullable = false)
    @Field(store=Store.YES, index=Index.YES, analyzer = @Analyzer(impl = FrenchAnalyzer.class), bridge= @FieldBridge(impl = FulltextSplitBridge.class))
    // This Field is not add to the Resulting Document ! I put a log into FulltextSplitBridge, and it's never called during a batch process. But if I use a MassIndexer, i see that FulltextSplitBridge is called for each Articlefulltext ...
    private String fulltextcontents;
    [... bean method ...]

以下是用于更新数据库和Lucene索引的代码：

批处理源代码：

FullTextEntityManager em = null;

@Override
protected void executeInternal(JobExecutionContext arg0) throws JobExecutionException {
    ApplicationContext ap = null;
    EntityManagerFactory emf = null;
    EntityTransaction tx = null;


    try {
        ap = (ApplicationContext) arg0.getScheduler().getContext().get("applicationContext");
        emf = (EntityManagerFactory) ap.getBean("entityManagerFactory", EntityManagerFactory.class);
        em = Search.getFullTextEntityManager(emf.createEntityManager());
        tx = em.getTransaction();


        tx.begin();
                // [... em.persist() some things which aren't lucene related, so i skip them ....]
        for(File xmlFile : xmlList){
            Reel reel = new Reel(title, reelpath);
            em.persist(reel);
                    Article article = new Article();
                        // [... set Article fields, so i skip them ....]
                    Articlefulltext ft = new Articlefulltext();
                        // [... set Articlefulltext fields, so i skip them ....]
                    ft.setArticle(article);
                    ft.setFulltextcontents(bufferBlock.toString());
                    em.persist(ft); // i persist ft before article because of FK issues
                    em.persist(article); // there, the Annotation update Lucene index, but there's not updating fultextContent (see my first post)
            if ( nbFileDone % 50 == 0 ) {
                //flush a batch of inserts and release memory:
                em.flush();
                em.clear();
            }
        }
            tx.commit();
    }
    catch(Exception e){
        tx.rollback();
    }
    em.close();
}

- user1882300

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Hardy · Accepted Answer

2

嗯，您似乎没有设置关系的两边。我可以看到 ft.setArticle(article)，但没有看到 article.setFtArticle(ft)。关系的两边都需要设置。在您的情况下，Articlefulltext 是关系的所有者，但这并不意味着您不必设置两边。

- Hardy

好的，你说得对，非常感谢……这很简单。奇怪的是，对于_Page_，我只建立了一个方向的关系，但它却能正常工作。这是因为它是多对一的关系吗？ - user1882300

取决于我们需要遍历哪个方向的关系链来更新索引。很可能Page的那个索引是在主方向（仅此方向）中填充的，所以只是偶然起作用。 - Sanne

“By chance”，所以“最佳”的做法是在双方都定义所有关系吗？好的，我会在任何地方都这样做的。谢谢你们两个！ - user1882300

你不必将所有关系定义为双向的，但如果你将其定义为双向的，你必须同时更新两个方向，否则会导致哪个实例提供正确信息的混淆。搜索可能需要你将关系定义为双向的，以便为一个地方定义IndexedEmbedded和ContainedIn。 - Sanne