Closed Bug 616569 Opened 14 years ago Closed 14 years ago

Evaluate different search rankers

Categories

(support.mozilla.org :: Search, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jsocol, Assigned: jsocol)

References

Details

(Whiteboard: [qa-])

Once we've switched to using the EXTENDED2 matching mode, we have options for search ranking modes.[1]

My guess is that PROXIMITY_BM25 (the default with EXTENDED2 matching) is going to be the best but it can't hurt to play with it.

Sooner or later I'd really like to upgrade to Sphinx 1.10 beta (need to evaluate it for stability with IT) and then we can try SPH04[2], which is PROXIMITY_BM25 + extra weight for matches at the beginning or end of strings. That means that searching for "Cookies" would rank the article "Cookies" higher than "How to enable or disable cookies" (assuming title has enough weight to overpower anything in the content).

[1] http://www.sphinxsearch.com/docs/manual-0.9.9.html#api-func-setrankingmode
[2] http://www.sphinxsearch.com/docs/manual-1.10.html#api-func-setrankingmode
So I've been playing around a lot. My goal was to make "Cookies" the first result for "cookies." So far, no combination of ranker or weights is getting me there. I think those other three articles ("Deleting cookies", "Blocking cookies", and "Disabling third party cookies") really are just better matches, based on content.

(I've been pushing up title weight relative to everything else, up to 50x, even going so far as ignoring everything else, and still no luck.)

Anyway, my general feeling from the other results (variations on "flash" and "sync" again) is that the order of "best" ranking modes is:

SPH04 (not available until we upgrade sphinx)
PROXIMITY_BM25 (default + current)
BM25
PROXIMITY

Basically, I think we should switch to SPH04 when we can, but in the meantime, I think PROXIMITY_BM25 is the best we can do.

Again, I'll implement a change making this explicit and possible to set on a per-index basis. It may make more sense to order questions, for example, with an age factor in addition to weight (something like weight*(1+exp(-age)) would work).
https://github.com/jsocol/kitsune/commit/aefc252d

Nothing to verify as we didn't change the mode, but it's easier to change in the future if we want.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [qa-]
Closed as [qa-]
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.