Evaluate different search sort modes

RESOLVED WONTFIX

Status

P2
normal
RESOLVED WONTFIX
8 years ago
7 years ago

People

(Reporter: jsocol, Unassigned)

Tracking

unspecified
Future

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
So, search actually has 3 things that go into building a result set, in order, these are*:

1) MATCH - collect the documents that match the query in set S
2) RANK - assign a weight to each document in S
3) SORT - order the results in S

(* This probably has nothing to do with how it's really implemented, but conceptually this is right.)

So far we've only been changing the MATCH (bug 607306) and RANK (bug 616569) modes, and we've seen minimal effect on the result quality.

There are a couple of interesting options, but I don't think any of them are likely to help with KB results.


SPH_SORT_TIME_SEGMENTS

This seems like it would be appropriate for Questions, and possibly discussion forums. It groups results into chunks (last hour, last day, last week, last month, 3 months, older) and then sorts by weight. This gives preference to newer results.

If we use the date the question was asked, instead of its last answer, it would keep old results down and hopefully discourage questions from gathering dozens of answers over time.


SPH_SORT_EXPR

This lets us define a mathematical expression. I'd hoped to use something like exp(-len(title)) or something, to give shorter titles preference in the KB, but it doesn't have a "len" function, and only seems to work on numeric types. (We could select the length of the title as an attribute if necessary.)

I'm not sure that's a great metric to use, though. I'll play with it some tomorrow.
(Reporter)

Comment 1

8 years ago
SPH_SORT_TIME_SEGMENTS is really hard to test without very up-to-the-minute data. Without data in the "last hour" and "last day" segments, it loses most of its appeal.

I definitely think it's worth playing more with after the database gets smaller and it's easier to get up-to-date data, though.

I think further testing of the SORT mode should wait. Changing too much at a time is going to make it difficult to decide whether each change is an improvement.
Target Milestone: 2.4.2 → Future
(Reporter)

Comment 2

7 years ago
Moving off Sphinx to ES. Will continue to iterate there.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.