温馨提示:本文翻译自stackoverflow.com,查看原文请点击:其他 - Solr is giving wrong FIeld Length
solr

其他 - Solr输入的场长错误

发布于 2020-05-02 08:37:02

我的功能列表如下:

[
   {
    "store": "myfeature_store",
    "name" : "titleLength",
    "class" : "org.apache.solr.ltr.feature.FieldLengthFeature",
    "params" : {
    "field":"title" 
     }
   }
]

当我搜索以下查询时:

curl -g 'http://localhost:8983/solr/nutch/select?indent=on&q=python&wt=json&fl=title,score,[features%20efi.query=python%20store=myfeature_store]'

我得到以下结果:

{
  "responseHeader":{
    "status":0,
    "QTime":8,
    "params":{
      "q":"python",
      "indent":"on",
      "fl":"title,score,[features efi.query=python store=myfeature_store]",
      "wt":"json"}},
  "response":{"numFound":793,"start":0,"maxScore":0.33828905,"docs":[
      {
        "title":"Newest 'python' Questions - Stack Overflow",
        "score":0.33828905,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Newest 'python-3.x' Questions - Stack Overflow",
        "score":0.14434122,
        "[features]":"titleLength=5349.8774"},
      {
        "title":"Geographic Information Systems Stack Exchange",
        "score":0.08331977,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Stack Overflow em Português",
        "score":0.08331977,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Stack Overflow en español",
        "score":0.07460209,
        "[features]":"titleLength=2621.44"},
      {
        "title":"Hot Questions - Stack Exchange",
        "score":0.06534503,
        "[features]":"titleLength=655.36"},
      {
        "title":"Code Review Stack Exchange",
        "score":0.05356382,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Software Recommendations Stack Exchange",
        "score":0.05356382,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Raspberry Pi Stack Exchange",
        "score":0.042962566,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Welcome to The Apache Software Foundation!",
        "score":0.042862184,
        "[features]":"titleLength=455.1111"}]
  }}

可以看到,titleLength这完全是错误的。例如,对于最后一个结果,标题为Welcome to The Apache Software Foundation!titleLength应为5,但即将到来455.1111。问题可能出在哪里?

查看更多

提问者
Vedanshu
被浏览
79
MatsLindh 2020-02-10 05:20

titleLength处理程序使用存储的字段的规范-这些被映射到与256个可能值浮标的查找表这些值不是精确的(因为字段的长度可以大于256),而是将整数值的整个空间映射2^31到单个字节中。

这还包括任何索引时间提升,因此,如果在对字段进行索引时(例如,通过Nutch插件)对字段进行了提升,则这将反映在该字段存储的规范中。您不能依赖于titleLength为该文档的字段存储确切数量的术语,但是它代表了该字段的“提升”。