Warm tip: This article is reproduced from stackoverflow.com, please click
solr

Solr is giving wrong FIeld Length

发布于 2020-04-23 12:30:12

I've got my feature list as follows:

[
   {
    "store": "myfeature_store",
    "name" : "titleLength",
    "class" : "org.apache.solr.ltr.feature.FieldLengthFeature",
    "params" : {
    "field":"title" 
     }
   }
]

When I search for the following query:

curl -g 'http://localhost:8983/solr/nutch/select?indent=on&q=python&wt=json&fl=title,score,[features%20efi.query=python%20store=myfeature_store]'

I'm getting following results:

{
  "responseHeader":{
    "status":0,
    "QTime":8,
    "params":{
      "q":"python",
      "indent":"on",
      "fl":"title,score,[features efi.query=python store=myfeature_store]",
      "wt":"json"}},
  "response":{"numFound":793,"start":0,"maxScore":0.33828905,"docs":[
      {
        "title":"Newest 'python' Questions - Stack Overflow",
        "score":0.33828905,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Newest 'python-3.x' Questions - Stack Overflow",
        "score":0.14434122,
        "[features]":"titleLength=5349.8774"},
      {
        "title":"Geographic Information Systems Stack Exchange",
        "score":0.08331977,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Stack Overflow em Português",
        "score":0.08331977,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Stack Overflow en español",
        "score":0.07460209,
        "[features]":"titleLength=2621.44"},
      {
        "title":"Hot Questions - Stack Exchange",
        "score":0.06534503,
        "[features]":"titleLength=655.36"},
      {
        "title":"Code Review Stack Exchange",
        "score":0.05356382,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Software Recommendations Stack Exchange",
        "score":0.05356382,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Raspberry Pi Stack Exchange",
        "score":0.042962566,
        "[features]":"titleLength=1820.4445"},
      {
        "title":"Welcome to The Apache Software Foundation!",
        "score":0.042862184,
        "[features]":"titleLength=455.1111"}]
  }}

As one can see the titleLength is completely coming wrong. For example for the last result, the title is Welcome to The Apache Software Foundation!, the titleLength should be 5 but it's coming 455.1111. Where might be the problem ?

Questioner
Vedanshu
Viewed
66
MatsLindh 2020-02-10 05:20

The titleLength handler uses the norms stored for the fields - these are mapped to a lookup table of floats with 256 possible values. These values are not expected to be exact (since the length of a field can be larger than 256), but to map the whole space of 2^31 integer values into a single byte.

This also include any index time boosts, so if a field is boosted when you're indexing it (for example by a Nutch plugin), this will be reflected in the norm stored for the field. You can't rely on titleLength to be an exact number of terms stored for the field for that document, but it represents the "boost" for the field.