Closed
Bug 127200
Opened 23 years ago
Closed 22 years ago
Query for CC/longdesc/OR takes long time
Categories
(Bugzilla :: Query/Bug List, defect, P2)
Tracking
()
RESOLVED
FIXED
Bugzilla 2.18
People
(Reporter: tessarakt, Assigned: bugreport)
References
()
Details
(Keywords: perf)
Attachments
(1 file, 4 obsolete files)
4.21 KB,
patch
|
gerv
:
review+
|
Details | Diff | Splinter Review |
Reproduction: 1. Go to Bugzilla page 2. Go to query page 3. Uncheck entries under "Status" 4. Enter "jens@unfaehig.de" under e-mail adress, exact match, matching field CC 5. Run query The query takes a long time (such a long time that I haven't waited for it to finish, aborted it after some minutes). Maybe no index in the database for that field?
Comment 1•23 years ago
|
||
SELECT bugs.bug_id, bugs.groupset, substring(bugs.bug_severity, 1, 3), substring(bugs.priority, 1, 3), substring(bugs.rep_platform, 1, 3), map_assigned_to.login_name, substring(bugs.bug_status,1,4), substring(bugs.resolution,1,4), substring(bugs.short_desc, 1, 60) FROM bugs, profiles map_assigned_to, profiles map_reporter LEFT JOIN profiles map_qa_contact ON bugs.qa_contact = map_qa_contact.userid LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id LEFT JOIN profiles map_cc_ ON cc_.who = map_cc_.userid WHERE ((bugs.groupset & 0) = bugs.groupset ) AND bugs.assigned_to = map_assigned_to.userid AND bugs.reporter = map_reporter.userid AND (map_cc_.login_name = 'jens@unfaehig.de') ... +-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+ | bugs | ALL | assigned_to,reporter | NULL | NULL | NULL | 5 | where used; Using temporary; Using filesort | | map_assigned_to | eq_ref | PRIMARY | PRIMARY | 3 | bugs.assigned_to | 1 | | | map_reporter | index | PRIMARY | PRIMARY | 3 | NULL | 4 | where used; Using index | | map_qa_contact | eq_ref | PRIMARY | PRIMARY | 3 | bugs.qa_contact | 1 | Using index | | cc_ | index | bug_id | bug_id | 6 | NULL | 4 | Using index | | map_cc_ | eq_ref | PRIMARY | PRIMARY | 3 | cc_.who | 1 | where used | +-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+
Comment 2•23 years ago
|
||
Thats because mysql is examining every table in the bugs row. On a db with 442 bugs, and a simplified query mysql> EXPLAIN SELECT bugs.bug_id FROM profiles map_assigned_to INNER JOIN bugs ON bugs.assigned_to=map_assigned_to.userid LEFT JOIN cc ON bugs.bug_id = cc.bug_id WHERE cc.who='1'\G *************************** 1. row *************************** table: bugs type: ALL possible_keys: assigned_to key: NULL key_len: NULL ref: NULL rows: 442 Extra: *************************** 2. row *************************** table: map_assigned_to type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.assigned_to rows: 1 Extra: Using index *************************** 3. row *************************** table: cc type: ref possible_keys: bug_id key: bug_id key_len: 3 ref: bugs.bug_id rows: 1 Extra: where used; Using index 3 rows in set (0.00 sec) If I remove the join to the assignee table (which isn't being used): mysql> EXPLAIN SELECT bugs.bug_id FROM bugs LEFT JOIN cc ON bugs.bug_id = cc.bug_id WHERE cc.who='1'\G *************************** 1. row *************************** table: bugs type: index possible_keys: NULL key: PRIMARY key_len: 3 ref: NULL rows: 442 Extra: Using index *************************** 2. row *************************** table: cc type: ref possible_keys: bug_id key: bug_id key_len: 3 ref: bugs.bug_id rows: 1 Extra: where used; Using index 2 rows in set (0.00 sec) Also see bug 96101.
Comment 3•23 years ago
|
||
Wow, shouldn't it look up the CCs table first since it actually has only one key to look up for that?
Comment 4•23 years ago
|
||
No, it can't, because its a left join, so it has to find the non-matching bits too. This means that it has to sequencially scan the table On a test db with 10000 bugs, 30000 cc, and 25000 profiles: SELECT bugs.bug_id,map_assigned_to.login_name FROM profiles map_assigned_to, bugs LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id LEFT JOIN profiles map_cc_ ON cc_.who = map_cc_.userid WHERE bugs.assigned_to=map_assigned_to.userid AND map_cc_.login_name='QI'; takes 3.62 seconds. Changing the second LEFT JOIN to an INNER JOIN takes this to 1.23 seconds. Why isn't this done? The only difference will be if there is someone on the cc list who is not in the profiles table (actually, they'd have to be the only person in this case). Thats invalid, and I don't think we should slow down queries because of people who may have data corruption, so that should be a plain JOIN. (I note that we do the same thing with the qa contact, but thats a trickier issue because there it is optional. We probably need to rethink that) Any objections to making that change for 2.16? Changing the first LEFT JOIN to an INNER JOIN as well makes this take 0.00 seconds. For cc equals, then we could use an inner join. For cc not equals, do you want to match bugs with no ccs? Tihs gets us back into the problems of bug 109528 of contains vs equals. We'll probably revisit that for 2.18, I guess.
Comment 5•23 years ago
|
||
select count(cc.bug_id) from cc left join profiles ON cc.who=profiles.userid where profiles.userid=NULL; returns 0 on bmo, and if bmo doesn't have corruption, then noone does, right?. This is detected by sanity check anyway. Taking for 2.16 to test with. I'm reasonably certain that the first left join means that the bug row will still get through anyway, but I have to check, hopefully later this week. The line in question in the join at about line 368 of buglist.cgi
Assignee: endico → bbaetz
Severity: normal → major
Keywords: perf
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.16
Comment 6•23 years ago
|
||
OK, try this. This removes both inner joins, on the basis that "not contains" doesn't pick up bugs with no ccs at the moment anyway. This is, I think, provably identical - the condition in $f restricted the column in the last table to be non null, so the left join is identical to the inner join, since the only difference between a left join and an inner join is that non matching rows in the join get NULL as entries. The reason that "does not contain" fails is because INSTR(NULL) == NULL != 0, and NULL fails all the = and != conditionals anyway. We so need to clean this up, but not for 2.16
Updated•23 years ago
|
Comment 7•22 years ago
|
||
Comment on attachment 72194 [details] [diff] [review] patch This looks good. Works and passes tests. I haven't measured performance but that it works should be a no-brainer. I was a little suspicious, so I looked back at the history of this code. This appears to have been inserted as is (rather than changed from inner to left outer), and there was a change to this code that was backed out, but this doesn't seem to have the same problem.
Attachment #72194 -
Flags: review+
I have a saved query called "Open bugs I care about", which searches for (among other things) my email address in the CC: field. The query doesn't work. My browser (Mozilla 0.99) waits for Bugzilla to return some result, but it never gets it! After about a minute of waiting, it appears that Mozilla just gives up, and I'm left with a web page that just says, "Please stand by ...". So the problem is not just that it takes a long time. The problem is that it takes too long.
Comment 9•22 years ago
|
||
Comment on attachment 72194 [details] [diff] [review] patch r= justdave
Attachment #72194 -
Flags: review+
Comment 10•22 years ago
|
||
Checking in buglist.cgi; /cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v <-- buglist.cgi new revision: 1.162; previous revision: 1.161 done
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 11•22 years ago
|
||
This patch is invalid, as has been backed out. See bug 139759. In particular see bug 139759 comment #6, which gives some possible solutions to this problem. reopening, and -> 2.18
Comment 12•22 years ago
|
||
Comment on attachment 72194 [details] [diff] [review] patch patch was vetoed by bbeatz and backed out. Marking needs-work on his behalf
Attachment #72194 -
Flags: review-
Comment 13•22 years ago
|
||
*** Bug 144966 has been marked as a duplicate of this bug. ***
Comment 14•22 years ago
|
||
Bug 144966 is not a duplicate of this one, rather this is a duplicate/subset of that one. (the problem is not just with CC but with most of the email search fields) What about adding a box "Fast search (may fail on corrupt profiles)"? Blocks 133173 ("Change "My Bugs" into a preset query") - the user can't query his own bugs if it takes so long. (personally, with all the boxes by email matched, I waited over 30 mins for results before giving up.)
Blocks: 133173
Comment 15•22 years ago
|
||
The actual problem is 'queries with OR in them are slow'. queries involving CC tend to be the easiest way to run into this, but the comments (longdescs in the schema) have the same problem. The actual problem is a hard problem to solve, in general. Since bugzilla only works on mysql, we don't know if changing databases would speed this up. My fix for this, which possibly could have been extended to longdecs didn't work. Its possible to fix this with subselects, except mysql doesn't support those. We could simulate subselects with a separate query, and thats somethign to look into.
Summary: Query for CC takes long time → Query for CC/longdesc/OR takes long time
Comment 16•22 years ago
|
||
I don't think this is related to OR, I don't see any ORs or sets in the description of this problem.
Comment 17•22 years ago
|
||
if Bug 144966 is not a dup of this one shouldn't it be reopened ?
Assignee | ||
Comment 18•22 years ago
|
||
The actual problem is more likely this..... When a user chooses to search for an email address in CC list or comments, Search.pm takes the single matching email and turns it into (paraphrased) .... SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id LEFT JOIN profiles AS map_cc ON cc.who = map_cc.id LEFT JOIN longdescs ON bugs.id = longdescs.bug_id LEFT JOIN profiles AS map_comment ON longdescs.who = map_comment.id LEFT_JOIN profiles AS map_reporter ON bugs.reporter = map_reporter.id ....othestuff... WHERE (map_comment.login_name = 'sameuser@mozilla.org' OR map_cc.login_name = 'sameuser@mozilla.org' OR map_reporter - 'sameuser@mozilla.org') ..otherstuff.. This means that, just for this function, every bug expands to count(cc) * count(comments) rows each of which is joined to a profile 3 times and then matched to the form field 3 times. In the usual case where a moderate list of users match, this can really be done in a manner requiring only 1 row per bug for this function... first look up userids of matched user.. SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id AND cc.who IN(useridlist) LEFT JOIN longdescs ON bugs.id = longdescs.bug_id AND logdescs.who IN(useridlist) ...otherstuff... WHERE ((cc.who IS NOT NULL) OR (longdescs.who IS NOT NULL) OR (bugs.reporter IN (useridlist))) .. otherstuff.. Implementation note... In Search.pm. Check the $email1 and $email2 fields for matching email addresses directly from profiles. During the initial processing of hard-coded stuff, load an array with a list of matching userids ONLY if there are a reasonable number of matches (< 50?). In the funcdefs, prior to any "LEFT JOIN profiles" operation, check what is about to be included in the query to see if it can be done with the list already stored.
Assignee | ||
Comment 19•22 years ago
|
||
*** Bug 175425 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 20•22 years ago
|
||
This patch causes exact matches to be handled with far fewer joins. I believe that this is really the crux of the matter. If this causes the email "is" queries to become much faster than they used to be, then the patch should be updated to handle email "contains" queries where the matching email returns a moderate number of matches (like fewer than 50).
Attachment #72194 -
Attachment is obsolete: true
Assignee | ||
Comment 21•22 years ago
|
||
This will speed up queries for users making comments or on CC lists dramatically.
Assignee | ||
Updated•22 years ago
|
Attachment #103452 -
Attachment is obsolete: true
Comment 23•22 years ago
|
||
This seems OK to me (although you probably mean either (@list < 51) or LIMIT 50), although I'd prefer not to review it as I'm not familiar with this code. What's the relationship between this bug and 133173? That seems wrong to me... Gerv
Assignee | ||
Comment 24•22 years ago
|
||
If we hit the LIMIT, then we don't know if there were really more than we were willing to check for this way, so we only use the mechanism if the number returned is fewer than the LIMIT. I'll ping around for a reviewer. I think that everyone is unhappy with reviews in this part of the code.
Assignee | ||
Comment 25•22 years ago
|
||
There is no reason for bug 133173 to depend on this. This still does need a review, though.
No longer blocks: 133173
Status: NEW → ASSIGNED
Assignee | ||
Comment 26•22 years ago
|
||
Prior to settling on the patch in attachment 103459 [details] [diff] [review], MattyT suggested trying the approach of creating a temporary table with the matching userids and joining with that. I will try that in the next few days, but it will need a performance test from myk. myk - If you are up for trying out a few key queries, I'll do this. If you would rather stick with the current patch, It's your call. Please comment here.
Assignee | ||
Comment 27•22 years ago
|
||
OK... here's the 2 SQL statements that need to b timed on BMO. Please post result here. create temporary table garbage select userid from profiles where login_name like '%@%' ; alter table garbage add index userid;
Assignee | ||
Comment 28•22 years ago
|
||
It looks like there is no common syntax for temporary tables populated from a select. mysql cannot handle pg's way and vice-versa. What say we go with the patch we have then? Opinions from myk, justdave, & dkl especially requested. (myk for BMO impact, justdave for sybase impact, dkl for pg impact)
Comment 29•22 years ago
|
||
Comment on attachment 103459 [details] [diff] [review] The patch >+ &::SendSQL("SELECT userid FROM profiles WHERE INSTR(login_name, " . >+ &::SqlQuote($email) . ") LIMIT 51"); Changing the INSTR(login_name, " . &::SqlQuote($email) . ") to POSITION(" . &::SqlQuote($email) . " IN login_name) != 1 is better for cross db compatibility since this works in both MySQL and PostgreSQL for me. I have applied this patch to the PostgreSQL version with the above change and was able to notice a significant increase in lookup when searching with cclist and/or qa_contact.
Comment 30•22 years ago
|
||
Sybase doesn't support either of those (not that it matters for this bug, since the Sybase stuff is still off on the horizon). Sybase's equivalent is: CHARINDEX(" . &::SqlQuote($email) . ",login_name)
Updated•22 years ago
|
Attachment #103459 -
Flags: review-
Comment 31•22 years ago
|
||
Comment on attachment 103459 [details] [diff] [review] The patch This patch is now obsolete in that it is broken with the new CGI pm checkin of recent. Things like $M{$field} has becoming $params->param($field) etc. Per Dave's comment about Sybase, I think separating out some of the string search stuff from the SQL into smaller utility functions would be better for this. Whereas POSITION() works for Pg and MySQL it does not work for Sybase. We could have a SqlString() function in the SQL that will place the proper syntax string in it's place. This way DB specific conditionals could be isolated to one function.
Comment 32•22 years ago
|
||
For 'starts with', LIKE 'Foo%' is better, because then the db can use a index. You may need to escape the input string, somehoe. I presume theres a way to do that, hopefully even a portable one.
Comment 33•22 years ago
|
||
Per comment 32: how much of a performance benefit would it be then to just replace all email specific searching with the proper LIKE '%foo%' and for the case with multiple words just split on the spaces and string along multple LIKE's in an (field1 LIKE '%foo%' OR field2 LIKE '%foo%')? I think MySQL supports RLIKE for case insensitive search but the others do not. For those cases you can do the usual (LOWER(field1) LIKE '%" . lc($foo) . "%' OR ...). I have read/heard though when LOWERing the field before comparison then you basically throwing the index out the window so it should be discouraged.
Assignee | ||
Comment 34•22 years ago
|
||
Actually, we're getting way off the original point of this patch. Even the most horrible way of looking up the email addresses ONCE and then qualifying the CC and londescs with it is orders of magnitude faster than the status quo. Let's just keep this simple and land it. If the pattern match is too contoversial, we could narrow its effect to handle the exact match only and get it in.
Assignee | ||
Comment 35•22 years ago
|
||
OK... Here it is back to INSTR and up-to-date
Attachment #103459 -
Attachment is obsolete: true
Comment 36•22 years ago
|
||
Comment on attachment 104448 [details] [diff] [review] Patch updated Decided per earlier discussion to leave the INSTR() change for another later bug. This patch has passed my functionality testing on both Mysql and PostgreSQL and has shown to speed up specific searches on cc/commenter especially in the area of PostgreSQL. r=dkl
Attachment #104448 -
Flags: review+
Comment 37•22 years ago
|
||
OK, I had a bit of difficulty understanding this patch, and so these comments reflect changes I think should be made to make the code more understandable. - ListEmails should do just that; the "IN (" and ")" should be added by the caller. + push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id AND $table.who IN($list)"); is more understandable than the current + push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id AND $table.who $list"); - ListEmails should be called ListIDsForEmail or similar - it doesn't list email addresses, it lists IDs. The comment above it could also be updated to make this more clear. - Why do we do: + push(@clist, "$table.who",'greaterthan','0'); ? longdescs.who should never be 0, because we don't allow anonymous comment submission. - What is $ok flagging as OK? I can't work out what it does. Please give it a more descriptive name like $foocheckok, and test for not ok with if (!$foocheckok). Thanks :-) Gerv
Assignee | ||
Comment 38•22 years ago
|
||
OK, clarified, fixed naming, and added isnotnull for when I mean isnotnull.
Attachment #104448 -
Attachment is obsolete: true
Comment 39•22 years ago
|
||
Comment on attachment 104825 [details] [diff] [review] Revised with readability improvements r=gerv. Gerv
Attachment #104825 -
Flags: review+
Assignee | ||
Comment 40•22 years ago
|
||
Checking in Bugzilla/Search.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v <-- Search.pm new revision: 1.23; previous revision: 1.22 done
Status: ASSIGNED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
QA Contact: matty_is_a_geek → default-qa
You need to log in
before you can comment on or make changes to this bug.
Description
•