Stronger boosting by date in Solr

Boosting by date field in solr is defined as:

{!boost b=recip(ms(NOW,datefield),3.16e-11,1,1)}

Everywhere I look (example: Solr Dismax Config for Boost Scoring and Solr boost for multivalued date field and they reference the SolrRelevancyFAQ), same definition is used. But I found that this is not boosting my results sufficiently. How can I make this date boosting stronger?

User is searching for two keywords. Both items contain both keywords (in same order) in both title and description. Neither of the keywords is repeated.

And the solr debug output is waaay too confusing to me to understand the problem.

Now, this is not a huge problem. 99% of queries work fine and produce expected results, so its not like solr is not working at all, I just found this situation that is very confusing to me and don't know how to proceed.

Answers


User is searching for two keywords. Both items contain both keywords (in same order) in both title and description. Neither of the keywords is repeated.

Well, by your example, it is clear that your results have landed into a tie situation. To understand this problem of confusing debug output and devise a tie-breaker policy, it is important to understand dismax.

With DisMax queries, the different terms of the user input are executed against different fields, if many of them hit (the term appears in different fields in the same document) the hit that scores higher is used, but what happens with the other sub-queries that hit in that document for the term? Well, that’s what the tie parameter defines. DisMax will calculate the score for a term query as:

score= [score of the top scoring subquery] + tie * (sum of other hitting subqueries)

In consequence, the tie parameter is a value between 0 and 1 that will define if the Dismax will only consider the max hit score for a term (setting tie=0), all the hits for a term (setting tie=1) or something between those two extremes.

The boost parameter is very similar to the bf parameter, but instead of adding its result to the final score, it will multiply it. This is only available in the Extended Dismax Query Parser or the Lucid Query Parser.

There is an interesting article Comparing Boost Methods of SOLR which may be useful to you.

References for this answer:

Shishir


recip(x, m, a, b) implements f(x) = a/(xm+b) with :

  • x : the document age in ms, defined as ms(NOW,<datefield>).

  • m : a constant that defines a time scale which is used to apply boost. It should be relative to what you consider an old document age (a reference_time) in milliseconds. For example, choosing a reference_time of 1 year (3.16e10ms) implies to use its inverse : 3.16e-11 (1/3.16e10 rounded).

  • a and b are constants (defined arbitrarily).

  • xm = 1 when the document is 1 reference_time old (multiplier = a/(1+b)). xm ≈ 0 when the document is new, resulting in a value close to a/b.

  • Using the same value for a and b ensures the multiplier doesn't exceed 1 with recent documents.

  • With a = b = 1, a 1 reference_time old document has a multiplier of about 1/2, a 2 reference_time old document has a multiplier of about 1/3, and so on.

How to make a date boosting stronger ?

  • Increase m : choose a lower reference_time for example 6 months, that gives us m = 6.33e-11. Comparing to a 1 year reference, the multiplier decreases 2x faster as the document age increases.

  • Decreasing a and b expands the response curve of the function. This can be very agressive. Example here (page 8)

  • Apply a boost to the boost function itself with the bf parameter using dismax or edismax query parser : bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0

Note that bf behaves like an additive boost : it acts as a bonus added to newer document scoring, while {!boost b} acts more as a penalty applied to the score of older document. Anyway, using additive boost might be a good way to boost newer docs. Just remember that a bf score is independant of the global score (relevancy), meaning that a relevant resultset (with higher scores) may not be impacted as much as a no relevant resultset (with lower scores), so depending on your needs it could be interesting.


Need Your Help

NSXMLParser leaking

iphone ipad memory-management memory-leaks nsxmlparser

Here i am trying to parse the xml data coming from server side of my application. But this code is showing a leak on instruments.

Regex with java

java regex

I need to check for lines that have either one of the following patterns:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.