Closed Bug 347864 Opened 18 years ago Closed 18 years ago

Generally improve fulltext search

Categories

(Bugzilla :: Query/Bug List, enhancement)

2.23
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 3.0

People

(Reporter: mkanat, Assigned: mkanat)

References

Details

Attachments

(1 file, 1 obsolete file)

As I was working on bug 287170, I noticed that fulltext search could be improved a lot.

For example, let's say you type two words, "crash freeze" into the box. If a summary has the word "crash" in it, but not the word "freeze", it won't show up in the results at all, even if a comment has the word "freeze" in it.

I think there are also some other changes that we could make to improve fulltext search even more. For example, we should just be factoring our relevance on the number of comment matches, not the SUM()/COUNT() way we currently do it, because that doesn't make sense. (You have to read the code and look at the output of the "relevance" column in SQL to understand why it doesn't make sense, but basically: right now if a word appears in two comments, it might actually be ranked *lower* than if it appears in only one comment.)
My code for this is going to depend on the code for bug 287170.
Depends on: 287170
Attached patch v1 (obsolete) — Splinter Review
Okay, here we go. On my test installation (I have a large, real dataset--not just landfill's dataset), this gives a somewhat different set of results than our original code.

In order to really know if it works better than our original code, I'd have to ask some people to apply the patch, and then do some typical searches for things they'd actually want to look for, and see if the right things come up in the right order. I have my testing dataset, but it's not on a system I actually *use*, so I don't know if this version of the code is actually *better*.
Assignee: query-and-buglist → mkanat
Status: NEW → ASSIGNED
Attachment #232714 - Flags: review?(myk)
Comment on attachment 232714 [details] [diff] [review]
v1

>-               "(SUM($term1)/COUNT(${table}.comment_id) + $term2) AS relevance";
>+               "(SUM($term1) + $term2) AS relevance";

I'm not sure this is better either, but it's worth a shot.


>-        # XXX Perhaps this should be OR, but to duplicate the behavior of 
>-        # bug 251567 I'm making it AND for now.
>-        $clause = join(' AND ', @words);
>+        $clause = join(' OR ', @words);

This seems more obviously like the right thing to do, if we want summary searches to integrate as well as possible into our fuzzy fulltext search system.
Attachment #232714 - Flags: review?(myk) → review+
Flags: approval?
Flags: approval? → approval+
I had to fix bitrot due to checkin changes on bug 287170. I'll post what actually got checked in.

Checking in Bugzilla/Search.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v  <--  Search.pm
new revision: 1.138; previous revision: 1.137
done
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Attachment #232714 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: