Closed Bug 127200 Opened 23 years ago Closed 23 years ago

Query for CC/longdesc/OR takes long time

Categories

(Bugzilla :: Query/Bug List, defect, P2)

2.15
Other
Other

Tracking

()

RESOLVED FIXED
Bugzilla 2.18

People

(Reporter: tessarakt, Assigned: bugreport)

References

()

Details

(Keywords: perf)

Attachments

(1 file, 4 obsolete files)

Reproduction: 1. Go to Bugzilla page 2. Go to query page 3. Uncheck entries under "Status" 4. Enter "jens@unfaehig.de" under e-mail adress, exact match, matching field CC 5. Run query The query takes a long time (such a long time that I haven't waited for it to finish, aborted it after some minutes). Maybe no index in the database for that field?
SELECT bugs.bug_id, bugs.groupset, substring(bugs.bug_severity, 1, 3), substring(bugs.priority, 1, 3), substring(bugs.rep_platform, 1, 3), map_assigned_to.login_name, substring(bugs.bug_status,1,4), substring(bugs.resolution,1,4), substring(bugs.short_desc, 1, 60) FROM bugs, profiles map_assigned_to, profiles map_reporter LEFT JOIN profiles map_qa_contact ON bugs.qa_contact = map_qa_contact.userid LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id LEFT JOIN profiles map_cc_ ON cc_.who = map_cc_.userid WHERE ((bugs.groupset & 0) = bugs.groupset ) AND bugs.assigned_to = map_assigned_to.userid AND bugs.reporter = map_reporter.userid AND (map_cc_.login_name = 'jens@unfaehig.de') ... +-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+ | bugs | ALL | assigned_to,reporter | NULL | NULL | NULL | 5 | where used; Using temporary; Using filesort | | map_assigned_to | eq_ref | PRIMARY | PRIMARY | 3 | bugs.assigned_to | 1 | | | map_reporter | index | PRIMARY | PRIMARY | 3 | NULL | 4 | where used; Using index | | map_qa_contact | eq_ref | PRIMARY | PRIMARY | 3 | bugs.qa_contact | 1 | Using index | | cc_ | index | bug_id | bug_id | 6 | NULL | 4 | Using index | | map_cc_ | eq_ref | PRIMARY | PRIMARY | 3 | cc_.who | 1 | where used | +-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+
Thats because mysql is examining every table in the bugs row. On a db with 442 bugs, and a simplified query mysql> EXPLAIN SELECT bugs.bug_id FROM profiles map_assigned_to INNER JOIN bugs ON bugs.assigned_to=map_assigned_to.userid LEFT JOIN cc ON bugs.bug_id = cc.bug_id WHERE cc.who='1'\G *************************** 1. row *************************** table: bugs type: ALL possible_keys: assigned_to key: NULL key_len: NULL ref: NULL rows: 442 Extra: *************************** 2. row *************************** table: map_assigned_to type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.assigned_to rows: 1 Extra: Using index *************************** 3. row *************************** table: cc type: ref possible_keys: bug_id key: bug_id key_len: 3 ref: bugs.bug_id rows: 1 Extra: where used; Using index 3 rows in set (0.00 sec) If I remove the join to the assignee table (which isn't being used): mysql> EXPLAIN SELECT bugs.bug_id FROM bugs LEFT JOIN cc ON bugs.bug_id = cc.bug_id WHERE cc.who='1'\G *************************** 1. row *************************** table: bugs type: index possible_keys: NULL key: PRIMARY key_len: 3 ref: NULL rows: 442 Extra: Using index *************************** 2. row *************************** table: cc type: ref possible_keys: bug_id key: bug_id key_len: 3 ref: bugs.bug_id rows: 1 Extra: where used; Using index 2 rows in set (0.00 sec) Also see bug 96101.
Wow, shouldn't it look up the CCs table first since it actually has only one key to look up for that?
No, it can't, because its a left join, so it has to find the non-matching bits too. This means that it has to sequencially scan the table On a test db with 10000 bugs, 30000 cc, and 25000 profiles: SELECT bugs.bug_id,map_assigned_to.login_name FROM profiles map_assigned_to, bugs LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id LEFT JOIN profiles map_cc_ ON cc_.who = map_cc_.userid WHERE bugs.assigned_to=map_assigned_to.userid AND map_cc_.login_name='QI'; takes 3.62 seconds. Changing the second LEFT JOIN to an INNER JOIN takes this to 1.23 seconds. Why isn't this done? The only difference will be if there is someone on the cc list who is not in the profiles table (actually, they'd have to be the only person in this case). Thats invalid, and I don't think we should slow down queries because of people who may have data corruption, so that should be a plain JOIN. (I note that we do the same thing with the qa contact, but thats a trickier issue because there it is optional. We probably need to rethink that) Any objections to making that change for 2.16? Changing the first LEFT JOIN to an INNER JOIN as well makes this take 0.00 seconds. For cc equals, then we could use an inner join. For cc not equals, do you want to match bugs with no ccs? Tihs gets us back into the problems of bug 109528 of contains vs equals. We'll probably revisit that for 2.18, I guess.
select count(cc.bug_id) from cc left join profiles ON cc.who=profiles.userid where profiles.userid=NULL; returns 0 on bmo, and if bmo doesn't have corruption, then noone does, right?. This is detected by sanity check anyway. Taking for 2.16 to test with. I'm reasonably certain that the first left join means that the bug row will still get through anyway, but I have to check, hopefully later this week. The line in question in the join at about line 368 of buglist.cgi
Assignee: endico → bbaetz
Severity: normal → major
Keywords: perf
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.16
Attached patch patch (obsolete) — Splinter Review
OK, try this. This removes both inner joins, on the basis that "not contains" doesn't pick up bugs with no ccs at the moment anyway. This is, I think, provably identical - the condition in $f restricted the column in the last table to be non null, so the left join is identical to the inner join, since the only difference between a left join and an inner join is that non matching rows in the join get NULL as entries. The reason that "does not contain" fails is because INSTR(NULL) == NULL != 0, and NULL fails all the = and != conditionals anyway. We so need to clean this up, but not for 2.16
Keywords: patch, review
Comment on attachment 72194 [details] [diff] [review] patch This looks good. Works and passes tests. I haven't measured performance but that it works should be a no-brainer. I was a little suspicious, so I looked back at the history of this code. This appears to have been inserted as is (rather than changed from inner to left outer), and there was a change to this code that was backed out, but this doesn't seem to have the same problem.
Attachment #72194 - Flags: review+
I have a saved query called "Open bugs I care about", which searches for (among other things) my email address in the CC: field. The query doesn't work. My browser (Mozilla 0.99) waits for Bugzilla to return some result, but it never gets it! After about a minute of waiting, it appears that Mozilla just gives up, and I'm left with a web page that just says, "Please stand by ...". So the problem is not just that it takes a long time. The problem is that it takes too long.
Comment on attachment 72194 [details] [diff] [review] patch r= justdave
Attachment #72194 - Flags: review+
Checking in buglist.cgi; /cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v <-- buglist.cgi new revision: 1.162; previous revision: 1.161 done
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
This patch is invalid, as has been backed out. See bug 139759. In particular see bug 139759 comment #6, which gives some possible solutions to this problem. reopening, and -> 2.18
Status: RESOLVED → REOPENED
Keywords: patch, review
Resolution: FIXED → ---
Target Milestone: Bugzilla 2.16 → Bugzilla 2.18
Comment on attachment 72194 [details] [diff] [review] patch patch was vetoed by bbeatz and backed out. Marking needs-work on his behalf
Attachment #72194 - Flags: review-
*** Bug 144966 has been marked as a duplicate of this bug. ***
Bug 144966 is not a duplicate of this one, rather this is a duplicate/subset of that one. (the problem is not just with CC but with most of the email search fields) What about adding a box "Fast search (may fail on corrupt profiles)"? Blocks 133173 ("Change "My Bugs" into a preset query") - the user can't query his own bugs if it takes so long. (personally, with all the boxes by email matched, I waited over 30 mins for results before giving up.)
Blocks: 133173
The actual problem is 'queries with OR in them are slow'. queries involving CC tend to be the easiest way to run into this, but the comments (longdescs in the schema) have the same problem. The actual problem is a hard problem to solve, in general. Since bugzilla only works on mysql, we don't know if changing databases would speed this up. My fix for this, which possibly could have been extended to longdecs didn't work. Its possible to fix this with subselects, except mysql doesn't support those. We could simulate subselects with a separate query, and thats somethign to look into.
Summary: Query for CC takes long time → Query for CC/longdesc/OR takes long time
I don't think this is related to OR, I don't see any ORs or sets in the description of this problem.
Blocks: bz-perf
if Bug 144966 is not a dup of this one shouldn't it be reopened ?
The actual problem is more likely this..... When a user chooses to search for an email address in CC list or comments, Search.pm takes the single matching email and turns it into (paraphrased) .... SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id LEFT JOIN profiles AS map_cc ON cc.who = map_cc.id LEFT JOIN longdescs ON bugs.id = longdescs.bug_id LEFT JOIN profiles AS map_comment ON longdescs.who = map_comment.id LEFT_JOIN profiles AS map_reporter ON bugs.reporter = map_reporter.id ....othestuff... WHERE (map_comment.login_name = 'sameuser@mozilla.org' OR map_cc.login_name = 'sameuser@mozilla.org' OR map_reporter - 'sameuser@mozilla.org') ..otherstuff.. This means that, just for this function, every bug expands to count(cc) * count(comments) rows each of which is joined to a profile 3 times and then matched to the form field 3 times. In the usual case where a moderate list of users match, this can really be done in a manner requiring only 1 row per bug for this function... first look up userids of matched user.. SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id AND cc.who IN(useridlist) LEFT JOIN longdescs ON bugs.id = longdescs.bug_id AND logdescs.who IN(useridlist) ...otherstuff... WHERE ((cc.who IS NOT NULL) OR (longdescs.who IS NOT NULL) OR (bugs.reporter IN (useridlist))) .. otherstuff.. Implementation note... In Search.pm. Check the $email1 and $email2 fields for matching email addresses directly from profiles. During the initial processing of hard-coded stuff, load an array with a list of matching userids ONLY if there are a reasonable number of matches (< 50?). In the funcdefs, prior to any "LEFT JOIN profiles" operation, check what is about to be included in the query to see if it can be done with the list already stored.
*** Bug 175425 has been marked as a duplicate of this bug. ***
Attached patch Rough patch (obsolete) — Splinter Review
This patch causes exact matches to be handled with far fewer joins. I believe that this is really the crux of the matter. If this causes the email "is" queries to become much faster than they used to be, then the patch should be updated to handle email "contains" queries where the matching email returns a moderate number of matches (like fewer than 50).
Attachment #72194 - Attachment is obsolete: true
Attached patch The patch (obsolete) — Splinter Review
This will speed up queries for users making comments or on CC lists dramatically.
Attachment #103452 - Attachment is obsolete: true
taking on
Assignee: bbaetz → bugreport
Status: REOPENED → NEW
This seems OK to me (although you probably mean either (@list < 51) or LIMIT 50), although I'd prefer not to review it as I'm not familiar with this code. What's the relationship between this bug and 133173? That seems wrong to me... Gerv
If we hit the LIMIT, then we don't know if there were really more than we were willing to check for this way, so we only use the mechanism if the number returned is fewer than the LIMIT. I'll ping around for a reviewer. I think that everyone is unhappy with reviews in this part of the code.
Blocks: 176570
There is no reason for bug 133173 to depend on this. This still does need a review, though.
No longer blocks: 133173
Status: NEW → ASSIGNED
Prior to settling on the patch in attachment 103459 [details] [diff] [review], MattyT suggested trying the approach of creating a temporary table with the matching userids and joining with that. I will try that in the next few days, but it will need a performance test from myk. myk - If you are up for trying out a few key queries, I'll do this. If you would rather stick with the current patch, It's your call. Please comment here.
OK... here's the 2 SQL statements that need to b timed on BMO. Please post result here. create temporary table garbage select userid from profiles where login_name like '%@%' ; alter table garbage add index userid;
It looks like there is no common syntax for temporary tables populated from a select. mysql cannot handle pg's way and vice-versa. What say we go with the patch we have then? Opinions from myk, justdave, & dkl especially requested. (myk for BMO impact, justdave for sybase impact, dkl for pg impact)
Comment on attachment 103459 [details] [diff] [review] The patch >+ &::SendSQL("SELECT userid FROM profiles WHERE INSTR(login_name, " . >+ &::SqlQuote($email) . ") LIMIT 51"); Changing the INSTR(login_name, " . &::SqlQuote($email) . ") to POSITION(" . &::SqlQuote($email) . " IN login_name) != 1 is better for cross db compatibility since this works in both MySQL and PostgreSQL for me. I have applied this patch to the PostgreSQL version with the above change and was able to notice a significant increase in lookup when searching with cclist and/or qa_contact.
Sybase doesn't support either of those (not that it matters for this bug, since the Sybase stuff is still off on the horizon). Sybase's equivalent is: CHARINDEX(" . &::SqlQuote($email) . ",login_name)
Attachment #103459 - Flags: review-
Comment on attachment 103459 [details] [diff] [review] The patch This patch is now obsolete in that it is broken with the new CGI pm checkin of recent. Things like $M{$field} has becoming $params->param($field) etc. Per Dave's comment about Sybase, I think separating out some of the string search stuff from the SQL into smaller utility functions would be better for this. Whereas POSITION() works for Pg and MySQL it does not work for Sybase. We could have a SqlString() function in the SQL that will place the proper syntax string in it's place. This way DB specific conditionals could be isolated to one function.
For 'starts with', LIKE 'Foo%' is better, because then the db can use a index. You may need to escape the input string, somehoe. I presume theres a way to do that, hopefully even a portable one.
Per comment 32: how much of a performance benefit would it be then to just replace all email specific searching with the proper LIKE '%foo%' and for the case with multiple words just split on the spaces and string along multple LIKE's in an (field1 LIKE '%foo%' OR field2 LIKE '%foo%')? I think MySQL supports RLIKE for case insensitive search but the others do not. For those cases you can do the usual (LOWER(field1) LIKE '%" . lc($foo) . "%' OR ...). I have read/heard though when LOWERing the field before comparison then you basically throwing the index out the window so it should be discouraged.
Actually, we're getting way off the original point of this patch. Even the most horrible way of looking up the email addresses ONCE and then qualifying the CC and londescs with it is orders of magnitude faster than the status quo. Let's just keep this simple and land it. If the pattern match is too contoversial, we could narrow its effect to handle the exact match only and get it in.
Attached patch Patch updated (obsolete) — Splinter Review
OK... Here it is back to INSTR and up-to-date
Attachment #103459 - Attachment is obsolete: true
Comment on attachment 104448 [details] [diff] [review] Patch updated Decided per earlier discussion to leave the INSTR() change for another later bug. This patch has passed my functionality testing on both Mysql and PostgreSQL and has shown to speed up specific searches on cc/commenter especially in the area of PostgreSQL. r=dkl
Attachment #104448 - Flags: review+
OK, I had a bit of difficulty understanding this patch, and so these comments reflect changes I think should be made to make the code more understandable. - ListEmails should do just that; the "IN (" and ")" should be added by the caller. + push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id AND $table.who IN($list)"); is more understandable than the current + push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id AND $table.who $list"); - ListEmails should be called ListIDsForEmail or similar - it doesn't list email addresses, it lists IDs. The comment above it could also be updated to make this more clear. - Why do we do: + push(@clist, "$table.who",'greaterthan','0'); ? longdescs.who should never be 0, because we don't allow anonymous comment submission. - What is $ok flagging as OK? I can't work out what it does. Please give it a more descriptive name like $foocheckok, and test for not ok with if (!$foocheckok). Thanks :-) Gerv
OK, clarified, fixed naming, and added isnotnull for when I mean isnotnull.
Attachment #104448 - Attachment is obsolete: true
Comment on attachment 104825 [details] [diff] [review] Revised with readability improvements r=gerv. Gerv
Attachment #104825 - Flags: review+
Checking in Bugzilla/Search.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v <-- Search.pm new revision: 1.23; previous revision: 1.22 done
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: