Closed Bug 251567 Opened 20 years ago Closed 20 years ago

Find a specific bug misses too many obvious hits

Categories

(Bugzilla :: Query/Bug List, defect)

2.18
defect
Not set
major

Tracking

()

RESOLVED FIXED
Bugzilla 2.18

People

(Reporter: bugreport, Assigned: bugreport)

Details

Attachments

(1 file, 1 obsolete file)

Particularly on small sites, find a specific bug misses far too much.

I started with 10 bugs, one had "description" in a bug summary

I search for "description" and got zero.

Added "description bananna" to a comment on that bug.

Then, got 2 bugs for the same search.  The second one (less relevent) had no
mention of description anywhere.

find a specific bug should probably become ("content" "matches" whatever) OR
("summary" "contains" whatever)
Flags: blocking2.18?
Target Milestone: --- → Bugzilla 2.18
The problem appears to be that specific bug search is only searching long
description and comments, and then only using the summary as a measure of relevance.

As pointed out in the newsgroup over the last day or two, this seriously breaks
down on sites that frequently have small narrowly-focused bugs where the summary
is self-explanatory and doesn't get duplicated or discussed in the description.
Flags: blocking2.18? → blocking2.18+
Assignee: justdave → bugreport
Status: NEW → ASSIGNED
Attachment #153359 - Flags: review?(myk)
and then my next question is relevance....  a bug which has a matching summary
should be 100% relevant.  If we're still only comparing the fulltext match to
the summary for calculating relevance, then one of the bugs which originally
triggered this report would wind up with a near-zero relevance even though it's
a direct hit on the summary...
Comment on attachment 153359 [details] [diff] [review]
Patch - ORs in allwords in short_desc

This fixes the "find a specific bug" page but will break as soon as someone
creates a search page with both a "content" field and some boolean charts. 
Instead of hardcoding the search terms into the first boolean chart, use the
first available chart by checking $params->param('fieldN-0-0') for the first
undefined N.

This is also going to need performance testing to make sure it doesn't severely
reduce search performance on large databases.
Attachment #153359 - Flags: review?(myk) → review-
OK, the patch is revised to locate the first available chart

I tried a few tests on landfill-tip creating the criteria using a boolean
chart, getting the sql, and jamming it into the mysql client, and the time
taken to run the query went from 0.25sec to 0.5sec. You would have to repeat
this on BMO.

I dont think that should stop us from fixing this.  The current implementation
misses bugs where the string being searched for is right in front of us in the
summary. Missing it is very serious.

There is one other alternative, but I don't really like it myself...

We could keep a copy of the summary in the longdescs table in a row marked for
the purpose and use it only for this purpose.  I'd hate to resort to that,
though.
Attachment #153359 - Attachment is obsolete: true
Attachment #153401 - Flags: review?(myk)
Comment on attachment 153401 [details] [diff] [review]
Patch - finds first unused chart and does the same as prior patch

Not only does this provide the expected results (a bug with a summary which has
no part of it occurring in a comment body shows up near the top of the list
when searching on part of the summary), but it also significantly speeds up the
query, believe it or not...  I'll post the stats in a moment.
Attachment #153401 - Flags: review?(myk) → review+
OK, here's what I did to test this.  We currently have a test install
(henceforth called "bztest" which is running on bugzilla.mozilla.org with a
separate copy of the code, but against the main database.

I created bug 251805 with the summary "a silly bug to search on" and made sure
that none of those 6 words occurred in the description.

I then performed searches on the phrase "silly bug" both with and without the
patch applied.

WITHOUT the patch applied, the following SQL is generated:

SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status,
bugs.resolution, bugs.bug_severity, bugs.priority, bugs.rep_platform,
map_assigned_to.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc,
(SUM(MATCH(longdescs_.thetext) AGAINST('silly bug'
))/COUNT(MATCH(longdescs_.thetext) AGAINST('silly bug' )) +
MATCH(bugs.short_desc) AGAINST('silly bug' )) AS relevance FROM bugs, profiles
AS map_assigned_to INNER JOIN longdescs longdescs_ ON bugs.bug_id =
longdescs_.bug_id LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id
AND bug_group_map.group_id NOT IN (9,5,10,3,2,1,11,4,14,16,13,12,6,8,7) LEFT
JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = 10297 WHERE bugs.assigned_to =
map_assigned_to.userid AND longdescs_.bug_id = bugs.bug_id AND (bugs.bug_status
IN ('UNCONFIRMED','NEW','ASSIGNED','REOPENED')) AND (bugs.product_id IN (19))
AND (MATCH(longdescs_.thetext) AGAINST('silly bug' )) AND
((bug_group_map.group_id IS NULL) OR (bugs.reporter_accessible = 1 AND
bugs.reporter = 10297) OR (bugs.cclist_accessible = 1 AND cc.who IS NOT NULL) OR
(bugs.assigned_to = 10297) OR (bugs.qa_contact = 10297) ) GROUP BY bugs.bug_id
ORDER BY relevance DESC LIMIT 200;

Testing this by hitting the reload button in the browser and timing it with a
stopwatch, yielded the following timings:

1: 23.96 sec
2: 21.96 sec
3: 31.02 sec
4: 1:08.34 sec

Copying the SQL and pasting it into the mysql command line client to get its
timings yielded:

200 rows in set (1 min 16.75 sec)
200 rows in set (51.54 sec)
200 rows in set (20.83 sec)
200 rows in set (27.48 sec)


WITH the patch applied, the following SQL is generated:

SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status,
bugs.resolution, bugs.bug_severity, bugs.priority, bugs.rep_platform,
map_assigned_to.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc,
(SUM(MATCH(longdescs_0.thetext) AGAINST('silly bug'
))/COUNT(MATCH(longdescs_0.thetext) AGAINST('silly bug' )) +
MATCH(bugs.short_desc) AGAINST('silly bug' )) AS relevance FROM bugs, profiles
AS map_assigned_to INNER JOIN longdescs longdescs_0 ON bugs.bug_id =
longdescs_0.bug_id LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id
AND bug_group_map.group_id NOT IN (9,5,10,3,2,1,11,4,14,16,13,12,6,8,7) LEFT
JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = 10297 WHERE bugs.assigned_to =
map_assigned_to.userid AND longdescs_0.bug_id = bugs.bug_id AND (bugs.bug_status
IN ('UNCONFIRMED','NEW','ASSIGNED','REOPENED')) AND (bugs.product_id IN (19))
AND ((MATCH(longdescs_0.thetext) AGAINST('silly bug' )) OR
(lower(bugs.short_desc) regexp '(^|[^a-z0-9])silly($|[^a-z0-9])' AND
lower(bugs.short_desc) regexp '(^|[^a-z0-9])bug($|[^a-z0-9])')) AND
((bug_group_map.group_id IS NULL) OR (bugs.reporter_accessible = 1 AND
bugs.reporter = 10297) OR (bugs.cclist_accessible = 1 AND cc.who IS NOT NULL) OR
(bugs.assigned_to = 10297) OR (bugs.qa_contact = 10297) ) GROUP BY bugs.bug_id
ORDER BY relevance DESC LIMIT 200;

Testing this by hitting the reload button in the browser and timing it with a
stopwatch, yielded the following timings:

1: 6.34 sec
2: 15.56 sec
3: 7.65 sec
4: 5.98 sec

Copying the SQL and pasting it into the mysql command line client to get its
timings yielded:

200 rows in set (43.75 sec)
200 rows in set (4.95 sec)
200 rows in set (13.47 sec)
200 rows in set (7.49 sec)
Flags: approval2.18+
Flags: approval+
I forgot to mention:  Without the patch, the test bug does not even show up in
any of the 200 results.  With the patch, it was the second hit.
checked in on both branches
and fixed
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: