Closed
Bug 127200
Opened 23 years ago
Closed 23 years ago
Query for CC/longdesc/OR takes long time
Categories
(Bugzilla :: Query/Bug List, defect, P2)
Tracking
()
RESOLVED
FIXED
Bugzilla 2.18
People
(Reporter: tessarakt, Assigned: bugreport)
References
()
Details
(Keywords: perf)
Attachments
(1 file, 4 obsolete files)
|
4.21 KB,
patch
|
gerv
:
review+
|
Details | Diff | Splinter Review |
Reproduction:
1. Go to Bugzilla page
2. Go to query page
3. Uncheck entries under "Status"
4. Enter "jens@unfaehig.de" under e-mail adress, exact match, matching field CC
5. Run query
The query takes a long time (such a long time that I haven't waited for it to
finish, aborted it after some minutes).
Maybe no index in the database for that field?
Comment 1•23 years ago
|
||
SELECT bugs.bug_id, bugs.groupset, substring(bugs.bug_severity, 1, 3),
substring(bugs.priority, 1, 3), substring(bugs.rep_platform, 1, 3),
map_assigned_to.login_name, substring(bugs.bug_status,1,4),
substring(bugs.resolution,1,4), substring(bugs.short_desc, 1, 60)
FROM bugs, profiles map_assigned_to, profiles map_reporter
LEFT JOIN profiles map_qa_contact ON bugs.qa_contact = map_qa_contact.userid
LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id
LEFT JOIN profiles map_cc_ ON cc_.who = map_cc_.userid
WHERE ((bugs.groupset & 0) = bugs.groupset )
AND bugs.assigned_to = map_assigned_to.userid
AND bugs.reporter = map_reporter.userid
AND (map_cc_.login_name = 'jens@unfaehig.de')
...
+-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+
| table | type | possible_keys | key | key_len |
ref | rows | Extra |
+-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+
| bugs | ALL | assigned_to,reporter | NULL | NULL |
NULL | 5 | where used; Using temporary; Using filesort |
| map_assigned_to | eq_ref | PRIMARY | PRIMARY | 3 |
bugs.assigned_to | 1 | |
| map_reporter | index | PRIMARY | PRIMARY | 3 |
NULL | 4 | where used; Using index |
| map_qa_contact | eq_ref | PRIMARY | PRIMARY | 3 |
bugs.qa_contact | 1 | Using index |
| cc_ | index | bug_id | bug_id | 6 |
NULL | 4 | Using index |
| map_cc_ | eq_ref | PRIMARY | PRIMARY | 3 |
cc_.who | 1 | where used |
+-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+
Comment 2•23 years ago
|
||
Thats because mysql is examining every table in the bugs row. On a db with 442
bugs, and a simplified query
mysql> EXPLAIN SELECT bugs.bug_id FROM profiles map_assigned_to INNER JOIN bugs
ON bugs.assigned_to=map_assigned_to.userid LEFT JOIN cc ON bugs.bug_id =
cc.bug_id WHERE cc.who='1'\G
*************************** 1. row ***************************
table: bugs
type: ALL
possible_keys: assigned_to
key: NULL
key_len: NULL
ref: NULL
rows: 442
Extra:
*************************** 2. row ***************************
table: map_assigned_to
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: bugs.assigned_to
rows: 1
Extra: Using index
*************************** 3. row ***************************
table: cc
type: ref
possible_keys: bug_id
key: bug_id
key_len: 3
ref: bugs.bug_id
rows: 1
Extra: where used; Using index
3 rows in set (0.00 sec)
If I remove the join to the assignee table (which isn't being used):
mysql> EXPLAIN SELECT bugs.bug_id FROM bugs LEFT JOIN cc ON bugs.bug_id =
cc.bug_id WHERE cc.who='1'\G
*************************** 1. row ***************************
table: bugs
type: index
possible_keys: NULL
key: PRIMARY
key_len: 3
ref: NULL
rows: 442
Extra: Using index
*************************** 2. row ***************************
table: cc
type: ref
possible_keys: bug_id
key: bug_id
key_len: 3
ref: bugs.bug_id
rows: 1
Extra: where used; Using index
2 rows in set (0.00 sec)
Also see bug 96101.
Comment 3•23 years ago
|
||
Wow, shouldn't it look up the CCs table first since it actually has only one key
to look up for that?
Comment 4•23 years ago
|
||
No, it can't, because its a left join, so it has to find the non-matching bits
too. This means that it has to sequencially scan the table
On a test db with 10000 bugs, 30000 cc, and 25000 profiles:
SELECT bugs.bug_id,map_assigned_to.login_name FROM profiles map_assigned_to,
bugs LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id LEFT JOIN profiles map_cc_ ON
cc_.who = map_cc_.userid WHERE bugs.assigned_to=map_assigned_to.userid AND
map_cc_.login_name='QI';
takes 3.62 seconds.
Changing the second LEFT JOIN to an INNER JOIN takes this to 1.23 seconds. Why
isn't this done? The only difference will be if there is someone on the cc list
who is not in the profiles table (actually, they'd have to be the only person in
this case). Thats invalid, and I don't think we should slow down queries because
of people who may have data corruption, so that should be a plain JOIN. (I note
that we do the same thing with the qa contact, but thats a trickier issue
because there it is optional. We probably need to rethink that)
Any objections to making that change for 2.16?
Changing the first LEFT JOIN to an INNER JOIN as well makes this take 0.00
seconds. For cc equals, then we could use an inner join. For cc not equals, do
you want to match bugs with no ccs? Tihs gets us back into the problems of bug
109528 of contains vs equals. We'll probably revisit that for 2.18, I guess.
Comment 5•23 years ago
|
||
select count(cc.bug_id) from cc left join profiles ON cc.who=profiles.userid
where profiles.userid=NULL;
returns 0 on bmo, and if bmo doesn't have corruption, then noone does, right?.
This is detected by sanity check anyway.
Taking for 2.16 to test with. I'm reasonably certain that the first left join
means that the bug row will still get through anyway, but I have to check,
hopefully later this week.
The line in question in the join at about line 368 of buglist.cgi
Assignee: endico → bbaetz
Severity: normal → major
Keywords: perf
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.16
Comment 6•23 years ago
|
||
OK, try this. This removes both inner joins, on the basis that "not contains"
doesn't pick up bugs with no ccs at the moment anyway.
This is, I think, provably identical - the condition in $f restricted the
column in the last table to be non null, so the left join is identical to the
inner join, since the only difference between a left join and an inner join is
that non matching rows in the join get NULL as entries.
The reason that "does not contain" fails is because INSTR(NULL) == NULL != 0,
and NULL fails all the = and != conditionals anyway.
We so need to clean this up, but not for 2.16
Updated•23 years ago
|
Comment 7•23 years ago
|
||
Comment on attachment 72194 [details] [diff] [review]
patch
This looks good. Works and passes tests. I haven't measured performance but
that it works should be a no-brainer.
I was a little suspicious, so I looked back at the history of this code. This
appears to have been inserted as is (rather than changed from inner to left
outer), and there was a change to this code that was backed out, but this
doesn't seem to have the same problem.
Attachment #72194 -
Flags: review+
I have a saved query called "Open bugs I care about", which searches for (among
other things) my email address in the CC: field. The query doesn't work. My
browser (Mozilla 0.99) waits for Bugzilla to return some result, but it never
gets it! After about a minute of waiting, it appears that Mozilla just gives
up, and I'm left with a web page that just says, "Please stand by ...".
So the problem is not just that it takes a long time. The problem is that it
takes too long.
Comment 9•23 years ago
|
||
Comment on attachment 72194 [details] [diff] [review]
patch
r= justdave
Attachment #72194 -
Flags: review+
Comment 10•23 years ago
|
||
Checking in buglist.cgi;
/cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v <-- buglist.cgi
new revision: 1.162; previous revision: 1.161
done
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 11•23 years ago
|
||
This patch is invalid, as has been backed out. See bug 139759.
In particular see bug 139759 comment #6, which gives some possible solutions to
this problem.
reopening, and -> 2.18
Comment 12•23 years ago
|
||
Comment on attachment 72194 [details] [diff] [review]
patch
patch was vetoed by bbeatz and backed out. Marking needs-work on his behalf
Attachment #72194 -
Flags: review-
Comment 13•23 years ago
|
||
*** Bug 144966 has been marked as a duplicate of this bug. ***
Comment 14•23 years ago
|
||
Bug 144966 is not a duplicate of this one, rather this is a duplicate/subset of
that one. (the problem is not just with CC but with most of the email search fields)
What about adding a box "Fast search (may fail on corrupt profiles)"?
Blocks 133173 ("Change "My Bugs" into a preset query") - the user can't query
his own bugs if it takes so long. (personally, with all the boxes by email
matched, I waited over 30 mins for results before giving up.)
Blocks: 133173
Comment 15•23 years ago
|
||
The actual problem is 'queries with OR in them are slow'. queries involving CC
tend to be the easiest way to run into this, but the comments (longdescs in the
schema) have the same problem.
The actual problem is a hard problem to solve, in general. Since bugzilla only
works on mysql, we don't know if changing databases would speed this up.
My fix for this, which possibly could have been extended to longdecs didn't
work. Its possible to fix this with subselects, except mysql doesn't support
those. We could simulate subselects with a separate query, and thats somethign
to look into.
Summary: Query for CC takes long time → Query for CC/longdesc/OR takes long time
Comment 16•23 years ago
|
||
I don't think this is related to OR, I don't see any ORs or sets in the
description of this problem.
Comment 17•23 years ago
|
||
if Bug 144966 is not a dup of this one
shouldn't it be reopened ?
| Assignee | ||
Comment 18•23 years ago
|
||
The actual problem is more likely this.....
When a user chooses to search for an email address in CC list or comments,
Search.pm takes the single matching email and turns it into (paraphrased) ....
SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id LEFT
JOIN profiles AS map_cc ON cc.who = map_cc.id LEFT JOIN longdescs ON bugs.id =
longdescs.bug_id LEFT JOIN profiles AS map_comment ON longdescs.who =
map_comment.id LEFT_JOIN profiles AS map_reporter ON bugs.reporter =
map_reporter.id ....othestuff... WHERE (map_comment.login_name =
'sameuser@mozilla.org' OR map_cc.login_name = 'sameuser@mozilla.org' OR
map_reporter - 'sameuser@mozilla.org') ..otherstuff..
This means that, just for this function, every bug expands to count(cc) *
count(comments) rows each of which is joined to a profile 3 times and then
matched to the form field 3 times.
In the usual case where a moderate list of users match, this can really be done
in a manner requiring only 1 row per bug for this function...
first look up userids of matched user..
SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id AND
cc.who IN(useridlist) LEFT JOIN longdescs ON bugs.id = longdescs.bug_id AND
logdescs.who IN(useridlist) ...otherstuff... WHERE ((cc.who IS NOT NULL) OR
(longdescs.who IS NOT NULL) OR (bugs.reporter IN (useridlist))) .. otherstuff..
Implementation note...
In Search.pm.
Check the $email1 and $email2 fields for matching email addresses directly from
profiles. During the initial processing of hard-coded stuff, load an array with
a list of matching userids ONLY if there are a reasonable number of matches (< 50?).
In the funcdefs, prior to any "LEFT JOIN profiles" operation, check what is
about to be included in the query to see if it can be done with the list already
stored.
| Assignee | ||
Comment 19•23 years ago
|
||
*** Bug 175425 has been marked as a duplicate of this bug. ***
| Assignee | ||
Comment 20•23 years ago
|
||
This patch causes exact matches to be handled with far fewer joins. I believe
that this is really the crux of the matter. If this causes the email "is"
queries to become much faster than they used to be, then the patch should be
updated to handle email "contains" queries where the matching email returns a
moderate number of matches (like fewer than 50).
Attachment #72194 -
Attachment is obsolete: true
| Assignee | ||
Comment 21•23 years ago
|
||
This will speed up queries for users making comments or on CC lists
dramatically.
| Assignee | ||
Updated•23 years ago
|
Attachment #103452 -
Attachment is obsolete: true
Comment 23•23 years ago
|
||
This seems OK to me (although you probably mean either (@list < 51) or LIMIT
50), although I'd prefer not to review it as I'm not familiar with this code.
What's the relationship between this bug and 133173? That seems wrong to me...
Gerv
| Assignee | ||
Comment 24•23 years ago
|
||
If we hit the LIMIT, then we don't know if there were really more than we were
willing to check for this way, so we only use the mechanism if the number
returned is fewer than the LIMIT.
I'll ping around for a reviewer. I think that everyone is unhappy with reviews
in this part of the code.
| Assignee | ||
Comment 25•23 years ago
|
||
There is no reason for bug 133173 to depend on this.
This still does need a review, though.
No longer blocks: 133173
Status: NEW → ASSIGNED
| Assignee | ||
Comment 26•23 years ago
|
||
Prior to settling on the patch in attachment 103459 [details] [diff] [review], MattyT suggested trying the
approach of creating a temporary table with the matching userids and joining
with that. I will try that in the next few days, but it will need a performance
test from myk.
myk - If you are up for trying out a few key queries, I'll do this. If you
would rather stick with the current patch, It's your call. Please comment here.
| Assignee | ||
Comment 27•23 years ago
|
||
OK... here's the 2 SQL statements that need to b timed on BMO. Please post
result here.
create temporary table garbage select userid from profiles where login_name
like '%@%' ;
alter table garbage add index userid;
| Assignee | ||
Comment 28•23 years ago
|
||
It looks like there is no common syntax for temporary tables populated from a
select. mysql cannot handle pg's way and vice-versa.
What say we go with the patch we have then?
Opinions from myk, justdave, & dkl especially requested. (myk for BMO impact,
justdave for sybase impact, dkl for pg impact)
Comment 29•23 years ago
|
||
Comment on attachment 103459 [details] [diff] [review]
The patch
>+ &::SendSQL("SELECT userid FROM profiles WHERE INSTR(login_name, " .
>+ &::SqlQuote($email) . ") LIMIT 51");
Changing the
INSTR(login_name, " . &::SqlQuote($email) . ")
to
POSITION(" . &::SqlQuote($email) . " IN login_name) != 1
is better for cross db compatibility since this works in both MySQL and
PostgreSQL for me.
I have applied this patch to the PostgreSQL version with the above change and
was able to notice a
significant increase in lookup when searching with cclist and/or qa_contact.
Comment 30•23 years ago
|
||
Sybase doesn't support either of those (not that it matters for this bug, since
the Sybase stuff is still off on the horizon). Sybase's equivalent is:
CHARINDEX(" . &::SqlQuote($email) . ",login_name)
Updated•23 years ago
|
Attachment #103459 -
Flags: review-
Comment 31•23 years ago
|
||
Comment on attachment 103459 [details] [diff] [review]
The patch
This patch is now obsolete in that it is broken with the new CGI pm checkin of
recent. Things like $M{$field} has becoming $params->param($field) etc.
Per Dave's comment about Sybase, I think separating out some of the string
search stuff from the SQL into smaller utility functions would be better for
this. Whereas POSITION() works for Pg and MySQL it does not work for Sybase. We
could have a SqlString() function in the SQL that will place the proper syntax
string in it's place. This way DB specific conditionals could be isolated to
one function.
Comment 32•23 years ago
|
||
For 'starts with', LIKE 'Foo%' is better, because then the db can use a index.
You may need to escape the input string, somehoe. I presume theres a way to do
that, hopefully even a portable one.
Comment 33•23 years ago
|
||
Per comment 32:
how much of a performance benefit would it be then to just replace all
email specific searching with the proper LIKE '%foo%' and for the case
with multiple words just split on the spaces and string along multple LIKE's
in an (field1 LIKE '%foo%' OR field2 LIKE '%foo%')? I think MySQL supports RLIKE
for case insensitive search but the others do not. For those cases you can do
the usual (LOWER(field1) LIKE '%" . lc($foo) . "%' OR ...). I have read/heard
though when LOWERing the field before comparison then you basically throwing the
index out the window so it should be discouraged.
| Assignee | ||
Comment 34•23 years ago
|
||
Actually, we're getting way off the original point of this patch.
Even the most horrible way of looking up the email addresses ONCE and then
qualifying the CC and londescs with it is orders of magnitude faster than the
status quo.
Let's just keep this simple and land it. If the pattern match is too
contoversial, we could narrow its effect to handle the exact match only and get
it in.
| Assignee | ||
Comment 35•23 years ago
|
||
OK... Here it is back to INSTR and up-to-date
Attachment #103459 -
Attachment is obsolete: true
Comment 36•23 years ago
|
||
Comment on attachment 104448 [details] [diff] [review]
Patch updated
Decided per earlier discussion to leave the INSTR() change for another later
bug. This patch has passed my functionality testing on both Mysql and
PostgreSQL and has shown to speed up specific searches on cc/commenter
especially in the area of PostgreSQL.
r=dkl
Attachment #104448 -
Flags: review+
Comment 37•23 years ago
|
||
OK, I had a bit of difficulty understanding this patch, and so these comments
reflect changes I think should be made to make the code more understandable.
- ListEmails should do just that; the "IN (" and ")" should be added by the caller.
+ push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id
AND $table.who IN($list)");
is more understandable than the current
+ push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id
AND $table.who $list");
- ListEmails should be called ListIDsForEmail or similar - it doesn't list email
addresses, it lists IDs. The comment above it could also be updated to make this
more clear.
- Why do we do:
+ push(@clist, "$table.who",'greaterthan','0');
? longdescs.who should never be 0, because we don't allow anonymous comment
submission.
- What is $ok flagging as OK? I can't work out what it does. Please give it a
more descriptive name like $foocheckok, and test for not ok with if (!$foocheckok).
Thanks :-)
Gerv
| Assignee | ||
Comment 38•23 years ago
|
||
OK, clarified, fixed naming, and added isnotnull for when I mean isnotnull.
Attachment #104448 -
Attachment is obsolete: true
Comment 39•23 years ago
|
||
Comment on attachment 104825 [details] [diff] [review]
Revised with readability improvements
r=gerv.
Gerv
Attachment #104825 -
Flags: review+
| Assignee | ||
Comment 40•23 years ago
|
||
Checking in Bugzilla/Search.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v <-- Search.pm
new revision: 1.23; previous revision: 1.22
done
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
QA Contact: matty_is_a_geek → default-qa
You need to log in
before you can comment on or make changes to this bug.
Description
•