关于Lucene评分的问题

4

关于Lucene评分的问题。我在索引中有两个文档,其中一个包含“我的名字”,另一个包含“我的姓氏”。当我搜索关键字“我的名字”时,第二个文档排在第一个文档之上。我想要的是,如果文档包含我输入的确切关键字,则应首先列出,然后是其他文档。有谁能帮助我如何做到这一点。谢谢。

4个回答

3
第二次回答: Lucene的默认行为应该是您所要求的。 关键因素在于得分的lengthNorm()部分 - 有时会将较长的文档评分低于较短的文档。请参见Lucene的相似性API以获取上下文信息。如果说,两个命中结果的lengthNorm值相同,它们将随机排序。 explain()函数将帮助您了解文档的得分方式,而不是按照默认方式。
我假设您正在使用BooleanQuery。如果您发布查询的确切方式,我可能能够提供更多信息。请参见查询解析器语法。 希望这次更接近您的需求。

这将导致第二个文档成为唯一匹配的文档。发帖者仅仅是为了比其他人获得更高的分数而请求它。 - Avi
谢谢您的回复。但是我想做的是不同的事情。我想搜索包含两个单词“我的”和“名字”的所有文档。问题在于我输入的关键字是“我的名字”,所以我希望包含整个短语“我的名字”的结果排在列表的顶部,而包含“我的姓氏”的结果排在底部。 - Truong Do

0
如果您从命令行使用lucli(下载最新的Lucene源代码,它在contrib目录中),您可以使用“explain”命令让Lucene解释为什么它评分如此之高。
它会输出像这样的内容:
---------------- 2 score:0.6089077---------------------
(啦啦啦,你的文档)
Explanation:4.260467 = (MATCH) sum of:                                                                                                                                                                                                       
  0.59024054 = (MATCH) weight(description:warwick in 276780), product of:                                                                                                                                                                    
    0.05595057 = queryWeight(description:warwick), product of:                                                                                                                                                                               
      5.2746606 = idf(docFreq=13531, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.549321 = (MATCH) fieldWeight(description:warwick in 276780), product of:                                                                                                                                                              
      1.0 = tf(termFreq(description:warwick)=1)                                                                                                                                                                                              
      5.2746606 = idf(docFreq=13531, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=description, doc=276780)                                                                                                                                                                                         
  0.832554 = (MATCH) weight(keywords:warwick in 276780), product of:                                                                                                                                                                         
    0.066450186 = queryWeight(keywords:warwick), product of:                                                                                                                                                                                 
      6.264497 = idf(docFreq=5028, numDocs=843621)                                                                                                                                                                                           
      0.010607426 = queryNorm                                                                                                                                                                                                                
    12.528994 = (MATCH) fieldWeight(keywords:warwick in 276780), product of:                                                                                                                                                                 
      1.0 = tf(termFreq(keywords:warwick)=1)                                                                                                                                                                                                 
      6.264497 = idf(docFreq=5028, numDocs=843621)                                                                                                                                                                                           
      2.0 = fieldNorm(field=keywords, doc=276780)                                                                                                                                                                                            
  0.19180772 = (MATCH) weight(url:warwick in 276780), product of:                                                                                                                                                                            
    0.048220757 = queryWeight(url:warwick), product of:                                                                                                                                                                                      
      4.5459433 = idf(docFreq=28043, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    3.9777002 = (MATCH) fieldWeight(url:warwick in 276780), product of:                                                                                                                                                                      
      1.0 = tf(termFreq(url:warwick)=1)                                                                                                                                                                                                      
      4.5459433 = idf(docFreq=28043, numDocs=843621)                                                                                                                                                                                         
      0.875 = fieldNorm(field=url, doc=276780)                                                                                                                                                                                               
  0.023709858 = (MATCH) weight(content:warwick in 276780), product of:                                                                                                                                                                       
    0.03373665 = queryWeight(content:warwick), product of:                                                                                                                                                                                   
      3.1804748 = idf(docFreq=109863, numDocs=843621)                                                                                                                                                                                        
      0.010607426 = queryNorm                                                                                                                                                                                                                
    0.7027923 = (MATCH) fieldWeight(content:warwick in 276780), product of:                                                                                                                                                                  
      1.4142135 = tf(termFreq(content:warwick)=2)                                                                                                                                                                                            
      3.1804748 = idf(docFreq=109863, numDocs=843621)                                                                                                                                                                                        
      0.15625 = fieldNorm(field=content, doc=276780)                                                                                                                                                                                         
  0.46163678 = (MATCH) weight(siteDescription:warwick in 276780), product of:                                                                                                                                                                
    0.0494812 = queryWeight(siteDescription:warwick), product of:                                                                                                                                                                            
      4.6647696 = idf(docFreq=24901, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    9.329539 = (MATCH) fieldWeight(siteDescription:warwick in 276780), product of:                                                                                                                                                           
      1.0 = tf(termFreq(siteDescription:warwick)=1)                                                                                                                                                                                          
      4.6647696 = idf(docFreq=24901, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=siteDescription, doc=276780)                                                                                                                                                                                     
  0.96127754 = (MATCH) weight(siteUrl:warwick in 276780), product of:                                                                                                                                                                        
    0.10097861 = queryWeight(siteUrl:warwick), product of:                                                                                                                                                                                   
      9.519615 = idf(docFreq=193, numDocs=843621)                                                                                                                                                                                            
      0.010607426 = queryNorm                                                                                                                                                                                                                
    9.519615 = (MATCH) fieldWeight(siteUrl:warwick in 276780), product of:                                                                                                                                                                   
      1.0 = tf(termFreq(siteUrl:warwick)=1)                                                                                                                                                                                                  
      9.519615 = idf(docFreq=193, numDocs=843621)                                                                                                                                                                                            
      1.0 = fieldNorm(field=siteUrl, doc=276780)                                                                                                                                                                                             
  0.62917286 = (MATCH) weight(title:warwick in 276780), product of:                                                                                                                                                                          
    0.05776636 = queryWeight(title:warwick), product of:                                                                                                                                                                                     
      5.4458413 = idf(docFreq=11402, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.891683 = (MATCH) fieldWeight(title:warwick in 276780), product of:                                                                                                                                                                    
      1.0 = tf(termFreq(title:warwick)=1)                                                                                                                                                                                                    
      5.4458413 = idf(docFreq=11402, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=title, doc=276780)                                                                                                                                                                                               
  0.57006776 = (MATCH) weight(second_title:warwick in 276780), product of:                                                                                                                                                                   
    0.05498614 = queryWeight(second_title:warwick), product of:                                                                                                                                                                              
      5.18374 = idf(docFreq=14819, numDocs=843621)                                                                                                                                                                                           
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.36748 = (MATCH) fieldWeight(second_title:warwick in 276780), product of:                                                                                                                                                              
      1.0 = tf(termFreq(second_title:warwick)=1)                                                                                                                                                                                             
      5.18374 = idf(docFreq=14819, numDocs=843621)                                                                                                                                                                                           
      2.0 = fieldNorm(field=second_title, doc=276780)    

(抱歉,我只有一个大索引来获取示例,没有简单的!)


0

0

我将按照以下方式更改查询。

(my AND name) OR "my name"

在这里,附加的短语查询会在出现短语匹配时增加得分。如果文档内容为“我的名字”,则短语查询不会产生任何额外的得分。但是,内容为“我的名字”的文档将具有额外的得分并显示在顶部。

在这里,我假设长度归一化被忽略。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接