Closed Bug 251567 Opened 20 years ago Closed 20 years ago

Find a specific bug misses too many obvious hits

Categories

(Bugzilla :: Query/Bug List, defect)

2.18
defect
Not set
major

Tracking

()

RESOLVED FIXED
Bugzilla 2.18

People

(Reporter: bugreport, Assigned: bugreport)

Details

Attachments

(1 file, 1 obsolete file)

Particularly on small sites, find a specific bug misses far too much. I started with 10 bugs, one had "description" in a bug summary I search for "description" and got zero. Added "description bananna" to a comment on that bug. Then, got 2 bugs for the same search. The second one (less relevent) had no mention of description anywhere. find a specific bug should probably become ("content" "matches" whatever) OR ("summary" "contains" whatever)
Flags: blocking2.18?
Target Milestone: --- → Bugzilla 2.18
The problem appears to be that specific bug search is only searching long description and comments, and then only using the summary as a measure of relevance. As pointed out in the newsgroup over the last day or two, this seriously breaks down on sites that frequently have small narrowly-focused bugs where the summary is self-explanatory and doesn't get duplicated or discussed in the description.
Flags: blocking2.18? → blocking2.18+
Assignee: justdave → bugreport
Status: NEW → ASSIGNED
Attachment #153359 - Flags: review?(myk)
and then my next question is relevance.... a bug which has a matching summary should be 100% relevant. If we're still only comparing the fulltext match to the summary for calculating relevance, then one of the bugs which originally triggered this report would wind up with a near-zero relevance even though it's a direct hit on the summary...
Comment on attachment 153359 [details] [diff] [review] Patch - ORs in allwords in short_desc This fixes the "find a specific bug" page but will break as soon as someone creates a search page with both a "content" field and some boolean charts. Instead of hardcoding the search terms into the first boolean chart, use the first available chart by checking $params->param('fieldN-0-0') for the first undefined N. This is also going to need performance testing to make sure it doesn't severely reduce search performance on large databases.
Attachment #153359 - Flags: review?(myk) → review-
OK, the patch is revised to locate the first available chart I tried a few tests on landfill-tip creating the criteria using a boolean chart, getting the sql, and jamming it into the mysql client, and the time taken to run the query went from 0.25sec to 0.5sec. You would have to repeat this on BMO. I dont think that should stop us from fixing this. The current implementation misses bugs where the string being searched for is right in front of us in the summary. Missing it is very serious. There is one other alternative, but I don't really like it myself... We could keep a copy of the summary in the longdescs table in a row marked for the purpose and use it only for this purpose. I'd hate to resort to that, though.
Attachment #153359 - Attachment is obsolete: true
Attachment #153401 - Flags: review?(myk)
Comment on attachment 153401 [details] [diff] [review] Patch - finds first unused chart and does the same as prior patch Not only does this provide the expected results (a bug with a summary which has no part of it occurring in a comment body shows up near the top of the list when searching on part of the summary), but it also significantly speeds up the query, believe it or not... I'll post the stats in a moment.
Attachment #153401 - Flags: review?(myk) → review+
OK, here's what I did to test this. We currently have a test install (henceforth called "bztest" which is running on bugzilla.mozilla.org with a separate copy of the code, but against the main database. I created bug 251805 with the summary "a silly bug to search on" and made sure that none of those 6 words occurred in the description. I then performed searches on the phrase "silly bug" both with and without the patch applied. WITHOUT the patch applied, the following SQL is generated: SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, bugs.bug_severity, bugs.priority, bugs.rep_platform, map_assigned_to.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(MATCH(longdescs_.thetext) AGAINST('silly bug' ))/COUNT(MATCH(longdescs_.thetext) AGAINST('silly bug' )) + MATCH(bugs.short_desc) AGAINST('silly bug' )) AS relevance FROM bugs, profiles AS map_assigned_to INNER JOIN longdescs longdescs_ ON bugs.bug_id = longdescs_.bug_id LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id AND bug_group_map.group_id NOT IN (9,5,10,3,2,1,11,4,14,16,13,12,6,8,7) LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = 10297 WHERE bugs.assigned_to = map_assigned_to.userid AND longdescs_.bug_id = bugs.bug_id AND (bugs.bug_status IN ('UNCONFIRMED','NEW','ASSIGNED','REOPENED')) AND (bugs.product_id IN (19)) AND (MATCH(longdescs_.thetext) AGAINST('silly bug' )) AND ((bug_group_map.group_id IS NULL) OR (bugs.reporter_accessible = 1 AND bugs.reporter = 10297) OR (bugs.cclist_accessible = 1 AND cc.who IS NOT NULL) OR (bugs.assigned_to = 10297) OR (bugs.qa_contact = 10297) ) GROUP BY bugs.bug_id ORDER BY relevance DESC LIMIT 200; Testing this by hitting the reload button in the browser and timing it with a stopwatch, yielded the following timings: 1: 23.96 sec 2: 21.96 sec 3: 31.02 sec 4: 1:08.34 sec Copying the SQL and pasting it into the mysql command line client to get its timings yielded: 200 rows in set (1 min 16.75 sec) 200 rows in set (51.54 sec) 200 rows in set (20.83 sec) 200 rows in set (27.48 sec) WITH the patch applied, the following SQL is generated: SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, bugs.bug_severity, bugs.priority, bugs.rep_platform, map_assigned_to.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(MATCH(longdescs_0.thetext) AGAINST('silly bug' ))/COUNT(MATCH(longdescs_0.thetext) AGAINST('silly bug' )) + MATCH(bugs.short_desc) AGAINST('silly bug' )) AS relevance FROM bugs, profiles AS map_assigned_to INNER JOIN longdescs longdescs_0 ON bugs.bug_id = longdescs_0.bug_id LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id AND bug_group_map.group_id NOT IN (9,5,10,3,2,1,11,4,14,16,13,12,6,8,7) LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = 10297 WHERE bugs.assigned_to = map_assigned_to.userid AND longdescs_0.bug_id = bugs.bug_id AND (bugs.bug_status IN ('UNCONFIRMED','NEW','ASSIGNED','REOPENED')) AND (bugs.product_id IN (19)) AND ((MATCH(longdescs_0.thetext) AGAINST('silly bug' )) OR (lower(bugs.short_desc) regexp '(^|[^a-z0-9])silly($|[^a-z0-9])' AND lower(bugs.short_desc) regexp '(^|[^a-z0-9])bug($|[^a-z0-9])')) AND ((bug_group_map.group_id IS NULL) OR (bugs.reporter_accessible = 1 AND bugs.reporter = 10297) OR (bugs.cclist_accessible = 1 AND cc.who IS NOT NULL) OR (bugs.assigned_to = 10297) OR (bugs.qa_contact = 10297) ) GROUP BY bugs.bug_id ORDER BY relevance DESC LIMIT 200; Testing this by hitting the reload button in the browser and timing it with a stopwatch, yielded the following timings: 1: 6.34 sec 2: 15.56 sec 3: 7.65 sec 4: 5.98 sec Copying the SQL and pasting it into the mysql command line client to get its timings yielded: 200 rows in set (43.75 sec) 200 rows in set (4.95 sec) 200 rows in set (13.47 sec) 200 rows in set (7.49 sec)
Flags: approval2.18+
Flags: approval+
I forgot to mention: Without the patch, the test bug does not even show up in any of the 200 results. With the patch, it was the second hit.
checked in on both branches
and fixed
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: