127200 - Query for CC/longdesc/OR takes long time

Reporter

Description

•

23 years ago

Reproduction:

1. Go to Bugzilla page
2. Go to query page
3. Uncheck entries under "Status"
4. Enter "jens@unfaehig.de" under e-mail adress, exact match, matching field CC
5. Run query

The query takes a long time (such a long time that I haven't waited for it to
finish, aborted it after some minutes).

Maybe no index in the database for that field?

Matthew Tuck [:CodeMachine]

Comment 1

•

23 years ago

SELECT bugs.bug_id, bugs.groupset, substring(bugs.bug_severity, 1, 3),
  substring(bugs.priority, 1, 3), substring(bugs.rep_platform, 1, 3),
  map_assigned_to.login_name, substring(bugs.bug_status,1,4),
  substring(bugs.resolution,1,4), substring(bugs.short_desc, 1, 60)
FROM bugs, profiles map_assigned_to, profiles map_reporter
  LEFT JOIN profiles map_qa_contact ON bugs.qa_contact = map_qa_contact.userid
  LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id
  LEFT JOIN profiles map_cc_ ON cc_.who = map_cc_.userid
WHERE ((bugs.groupset & 0) = bugs.groupset )
  AND bugs.assigned_to = map_assigned_to.userid
  AND bugs.reporter = map_reporter.userid
  AND (map_cc_.login_name = 'jens@unfaehig.de')
...

+-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+
| table           | type   | possible_keys        | key     | key_len |
ref              | rows | Extra                                       |
+-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+
| bugs            | ALL    | assigned_to,reporter | NULL    |    NULL |
NULL             |    5 | where used; Using temporary; Using filesort |
| map_assigned_to | eq_ref | PRIMARY              | PRIMARY |       3 |
bugs.assigned_to |    1 |                                             |
| map_reporter    | index  | PRIMARY              | PRIMARY |       3 |
NULL             |    4 | where used; Using index                     |
| map_qa_contact  | eq_ref | PRIMARY              | PRIMARY |       3 |
bugs.qa_contact  |    1 | Using index                                 |
| cc_             | index  | bug_id               | bug_id  |       6 |
NULL             |    4 | Using index                                 |
| map_cc_         | eq_ref | PRIMARY              | PRIMARY |       3 |
cc_.who          |    1 | where used                                  |
+-----------------+--------+----------------------+---------+---------+------------------+------+---------------------------------------------+

Bradley Baetz (:bbaetz)

Comment 2

•

23 years ago

Thats because mysql is examining every table in the bugs row. On a db with 442
bugs, and a simplified query

mysql> EXPLAIN SELECT bugs.bug_id FROM profiles map_assigned_to INNER JOIN bugs
ON bugs.assigned_to=map_assigned_to.userid LEFT JOIN cc ON bugs.bug_id =
cc.bug_id WHERE cc.who='1'\G
*************************** 1. row ***************************
        table: bugs
         type: ALL
possible_keys: assigned_to
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 442
        Extra: 
*************************** 2. row ***************************
        table: map_assigned_to
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: bugs.assigned_to
         rows: 1
        Extra: Using index
*************************** 3. row ***************************
        table: cc
         type: ref
possible_keys: bug_id
          key: bug_id
      key_len: 3
          ref: bugs.bug_id
         rows: 1
        Extra: where used; Using index
3 rows in set (0.00 sec)

If I remove the join to the assignee table (which isn't being used):

mysql> EXPLAIN SELECT bugs.bug_id FROM bugs LEFT JOIN cc ON bugs.bug_id =
cc.bug_id WHERE cc.who='1'\G
*************************** 1. row ***************************
        table: bugs
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 3
          ref: NULL
         rows: 442
        Extra: Using index
*************************** 2. row ***************************
        table: cc
         type: ref
possible_keys: bug_id
          key: bug_id
      key_len: 3
          ref: bugs.bug_id
         rows: 1
        Extra: where used; Using index
2 rows in set (0.00 sec)

Also see bug 96101.

Matthew Tuck [:CodeMachine]

Comment 3

•

23 years ago

Wow, shouldn't it look up the CCs table first since it actually has only one key
to look up for that?

Bradley Baetz (:bbaetz)

Comment 4

•

23 years ago

No, it can't, because its a left join, so it has to find the non-matching bits
too. This means that it has to sequencially scan the table

On a test db with 10000 bugs, 30000 cc, and 25000 profiles:

SELECT bugs.bug_id,map_assigned_to.login_name FROM profiles map_assigned_to,
bugs LEFT JOIN cc cc_ ON bugs.bug_id = cc_.bug_id LEFT JOIN profiles map_cc_ ON
cc_.who = map_cc_.userid WHERE bugs.assigned_to=map_assigned_to.userid AND
map_cc_.login_name='QI';

takes 3.62 seconds.

Changing the second LEFT JOIN to an INNER JOIN takes this to 1.23 seconds. Why
isn't this done? The only difference will be if there is someone on the cc list
who is not in the profiles table (actually, they'd have to be the only person in
this case). Thats invalid, and I don't think we should slow down queries because
of people who may have data corruption, so that should be a plain JOIN. (I note
that we do the same thing with the qa contact, but thats a trickier issue
because there it is optional. We probably need to rethink that)

Any objections to making that change for 2.16?

Changing the first LEFT JOIN to an INNER JOIN as well makes this take 0.00
seconds. For cc equals, then we could use an inner join. For cc not equals, do
you want to match bugs with no ccs? Tihs gets us back into the problems of bug
109528 of contains vs equals. We'll probably revisit that for 2.18, I guess.

Bradley Baetz (:bbaetz)

Comment 5

•

23 years ago

select count(cc.bug_id) from cc left join profiles ON cc.who=profiles.userid
where profiles.userid=NULL;

returns 0 on bmo, and if bmo doesn't have corruption, then noone does, right?.
This is detected by sanity check anyway.

Taking for 2.16 to test with. I'm reasonably certain that the first left join
means that the bug row will still get through anyway, but I have to check,
hopefully later this week.

The line in question in the join at about line 368 of buglist.cgi

Assignee: endico → bbaetz

Severity: normal → major

Keywords: perf

Priority: -- → P2

Target Milestone: --- → Bugzilla 2.16

Bradley Baetz (:bbaetz)

Comment 6

•

23 years ago

Attached patch patch (obsolete) — Details — Splinter Review

OK, try this. This removes both inner joins, on the basis that "not contains"
doesn't pick up bugs with no ccs at the moment anyway.

This is, I think, provably identical - the condition in $f restricted the
column in the last table to be non null, so the left join is identical to the
inner join, since the only difference between a left join and an inner join is
that non matching rows in the join get NULL as entries.

The reason that "does not contain" fails is because INSTR(NULL) == NULL != 0,
and NULL fails all the = and != conditionals anyway.

We so need to clean this up, but not for 2.16

Bradley Baetz (:bbaetz)

Updated

•

23 years ago

Keywords: patch, review

Matthew Tuck [:CodeMachine]

Comment 7

•

22 years ago

Comment on attachment 72194 [details] [diff] [review]
patch

This looks good.  Works and passes tests.  I haven't measured performance but
that it works should be a no-brainer.

I was a little suspicious, so I looked back at the history of this code.  This
appears to have been inserted as is (rather than changed from inner to left
outer), and there was a change to this code that was backed out, but this
doesn't seem to have the same problem.

Attachment #72194 - Flags: review+

TT

Comment 8

•

22 years ago

I have a saved query called "Open bugs I care about", which searches for (among
other things) my email address in the CC: field.  The query doesn't work.  My
browser (Mozilla 0.99) waits for Bugzilla to return some result, but it never
gets it!  After about a minute of waiting, it appears that Mozilla just gives
up, and I'm left with a web page that just says, "Please stand by ...".

So the problem is not just that it takes a long time.  The problem is that it
takes too long.

Dave Miller [:justdave]

Comment 9

•

22 years ago

Comment on attachment 72194 [details] [diff] [review]
patch

r= justdave

Attachment #72194 - Flags: review+

Bradley Baetz (:bbaetz)

Comment 10

•

22 years ago

Checking in buglist.cgi;
/cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v  <--  buglist.cgi
new revision: 1.162; previous revision: 1.161
done

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Bradley Baetz (:bbaetz)

Comment 11

•

22 years ago

This patch is invalid, as has been backed out. See bug 139759.

In particular see bug 139759 comment #6, which gives some possible solutions to
this problem.

reopening, and -> 2.18

Status: RESOLVED → REOPENED

Keywords: patch, review

Resolution: FIXED → ---

Target Milestone: Bugzilla 2.16 → Bugzilla 2.18

Dave Miller [:justdave]

Comment 12

•

22 years ago

Comment on attachment 72194 [details] [diff] [review]
patch

patch was vetoed by bbeatz and backed out.  Marking needs-work on his behalf

Attachment #72194 - Flags: review-

Bradley Baetz (:bbaetz)

Comment 13

•

22 years ago

*** Bug 144966 has been marked as a duplicate of this bug. ***

Bartosz Wucke

Comment 14

•

22 years ago

Bug 144966 is not a duplicate of this one, rather this is a duplicate/subset of
that one. (the problem is not just with CC but with most of the email search fields)

What about adding a box "Fast search (may fail on corrupt profiles)"?

Blocks 133173 ("Change "My Bugs" into a preset query") - the user can't query
his own bugs if it takes so long. (personally, with all the boxes by email
matched, I waited over 30 mins for results before giving up.)

Blocks: 133173

Bradley Baetz (:bbaetz)

Comment 15

•

22 years ago

The actual problem is 'queries with OR in them are slow'. queries involving CC
tend to be the easiest way to run into this, but the comments (longdescs in the
schema) have the same problem.

The actual problem is a hard problem to solve, in general. Since bugzilla only
works on mysql, we don't know if changing databases would speed this up.

My fix for this, which possibly could have been extended to longdecs didn't
work. Its possible to fix this with subselects, except mysql doesn't support
those. We could simulate subselects with a separate query, and thats somethign
to look into.

Summary: Query for CC takes long time → Query for CC/longdesc/OR takes long time

Matthew Tuck [:CodeMachine]

Comment 16

•

22 years ago

I don't think this is related to OR, I don't see any ORs or sets in the
description of this problem.

Gervase Markham [:gerv]

Updated

•

22 years ago

Blocks: bz-perf

harm

Comment 17

•

22 years ago

if Bug 144966 is not a dup of this one
shouldn't it be reopened ?

Joel Peshkin

Assignee

Comment 18

•

22 years ago


The actual problem is more likely this.....


When a user chooses to search for an email address in CC list or comments,
Search.pm takes the single matching email and turns it into (paraphrased) ....

SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id LEFT
JOIN profiles AS map_cc ON cc.who = map_cc.id LEFT JOIN longdescs ON bugs.id =
longdescs.bug_id LEFT JOIN profiles AS map_comment ON longdescs.who =
map_comment.id LEFT_JOIN profiles AS map_reporter ON bugs.reporter =
map_reporter.id ....othestuff...  WHERE (map_comment.login_name =
'sameuser@mozilla.org' OR map_cc.login_name = 'sameuser@mozilla.org' OR
map_reporter - 'sameuser@mozilla.org') ..otherstuff..

This means that, just for this function, every bug expands to count(cc) *
count(comments) rows each of which is joined to a profile 3 times and then
matched to the form field 3 times.
 
In the usual case where a moderate list of users match, this can really be done
in a manner requiring only 1 row per bug for this function...

first look up userids of matched user..

SELECT ...otherstuff..... FROM bugs LEFT JOIN cc ON bugs.id = cc.bug_id AND
cc.who IN(useridlist) LEFT JOIN longdescs ON bugs.id = longdescs.bug_id AND
logdescs.who IN(useridlist) ...otherstuff... WHERE ((cc.who IS NOT NULL) OR
(longdescs.who IS NOT NULL) OR (bugs.reporter IN (useridlist))) .. otherstuff..


Implementation note...

In Search.pm.

Check the $email1 and $email2 fields for matching email addresses directly from
profiles.  During the initial processing of hard-coded stuff, load an array with
a list of matching userids ONLY if there are a reasonable number of matches (< 50?).

In the funcdefs, prior to  any "LEFT JOIN profiles" operation, check what is
about to be included in the query to see if it can be done with the list already
stored.

Joel Peshkin

Assignee

Comment 19

•

22 years ago

*** Bug 175425 has been marked as a duplicate of this bug. ***

Joel Peshkin

Assignee

Comment 20

•

22 years ago

Attached patch Rough patch (obsolete) — Details — Splinter Review

This patch causes exact matches to be handled with far fewer joins.  I believe
that this is really the crux of the matter.  If this causes the email "is"
queries to become much faster than they used to be, then the patch should be
updated to handle email "contains" queries where the matching email returns a
moderate number of matches (like fewer than 50).

Attachment #72194 - Attachment is obsolete: true

Joel Peshkin

Assignee

Comment 21

•

22 years ago

Attached patch The patch (obsolete) — Details — Splinter Review

This will speed up queries for users making comments or on CC lists
dramatically.

Joel Peshkin

Assignee

Updated

•

22 years ago

Attachment #103452 - Attachment is obsolete: true

Joel Peshkin

Assignee

Comment 22

•

22 years ago

taking on

Assignee: bbaetz → bugreport

Status: REOPENED → NEW

Gervase Markham [:gerv]

Comment 23

•

22 years ago

This seems OK to me (although you probably mean either (@list < 51) or LIMIT
50), although I'd prefer not to review it as I'm not familiar with this code.

What's the relationship between this bug and 133173? That seems wrong to me...

Gerv

Joel Peshkin

Assignee

Comment 24

•

22 years ago

If we hit the LIMIT, then we don't know if there were really more than we were
willing to check for this way, so we only use the mechanism if the number
returned is fewer than the LIMIT.

I'll ping around for a reviewer.  I think that everyone is unhappy with reviews
in this part of the code.

Gervase Markham [:gerv]

Updated

•

22 years ago

Blocks: 176570

Joel Peshkin

Assignee

Comment 25

•

22 years ago

There is no reason for bug 133173 to depend on this.  

This still does need a review, though.

No longer blocks: 133173

Status: NEW → ASSIGNED

Joel Peshkin

Assignee

Comment 26

•

22 years ago

Prior to settling on the patch in attachment 103459 [details] [diff] [review], MattyT suggested trying the
approach of creating a temporary table with the matching userids and joining
with that.  I will try that in the next few days, but it will need a performance
test from myk.

myk - If you are up for trying out a few key queries, I'll do this.  If you
would rather stick with the current patch, It's your call.  Please comment here.

Joel Peshkin

Assignee

Comment 27

•

22 years ago

OK... here's the 2 SQL statements that need to b timed on BMO.  Please post
result here.

create temporary table garbage  select userid from profiles where login_name 
like '%@%' ;
alter table garbage add index userid;

Joel Peshkin

Assignee

Comment 28

•

22 years ago

It looks like there is no common syntax for temporary tables populated from a
select.  mysql cannot handle pg's way and vice-versa.  

What say we go with the patch we have then?

Opinions from myk, justdave, & dkl especially requested.  (myk for BMO impact,
justdave for sybase impact, dkl for pg impact)

David Lawrence [:dkl]

Comment 29

•

22 years ago

Comment on attachment 103459 [details] [diff] [review]
The patch

>+        &::SendSQL("SELECT userid FROM profiles WHERE INSTR(login_name, " .
>+            &::SqlQuote($email) . ") LIMIT 51");

Changing the 

INSTR(login_name, " . &::SqlQuote($email) . ") 

to

POSITION(" . &::SqlQuote($email) . " IN login_name) != 1

is better for cross db compatibility since this works in both MySQL and
PostgreSQL for me.

I have applied this patch to the PostgreSQL version with the above change and
was able to notice a 
significant increase in lookup when searching with cclist and/or qa_contact.

Dave Miller [:justdave]

Comment 30

•

22 years ago

Sybase doesn't support either of those (not that it matters for this bug, since
the Sybase stuff is still off on the horizon).  Sybase's equivalent is:

CHARINDEX(" . &::SqlQuote($email) . ",login_name)

David Lawrence [:dkl]

Updated

•

22 years ago

Attachment #103459 - Flags: review-

David Lawrence [:dkl]

Comment 31

•

22 years ago

Comment on attachment 103459 [details] [diff] [review]
The patch

This patch is now obsolete in that it is broken with the new CGI pm checkin of
recent. Things like $M{$field} has becoming $params->param($field) etc. 

Per Dave's comment about Sybase, I think separating out some of the string
search stuff from the SQL into smaller utility functions would be better for
this. Whereas POSITION() works for Pg and MySQL it does not work for Sybase. We
could have a SqlString() function in the SQL that will place the proper syntax
string in it's place. This way DB specific conditionals could be isolated to
one function.

Bradley Baetz (:bbaetz)

Comment 32

•

22 years ago

For 'starts with', LIKE 'Foo%' is better, because then the db can use a index.
You may need to escape the input string, somehoe. I presume theres a way to do
that, hopefully even a portable one.

David Lawrence [:dkl]

Comment 33

•

22 years ago

Per comment 32:

how much of a performance benefit would it be then to just replace all 
email specific searching with the proper LIKE '%foo%' and for the case
with multiple words just split on the spaces and string along multple LIKE's
in an (field1 LIKE '%foo%' OR field2 LIKE '%foo%')? I think MySQL supports RLIKE
for case insensitive search but the others do not. For those cases you can do
the usual (LOWER(field1) LIKE '%" . lc($foo) . "%' OR ...). I have read/heard
though when LOWERing the field before comparison then you basically throwing the
index out the window so it should be discouraged.

Joel Peshkin

Assignee

Comment 34

•

22 years ago

Actually, we're getting way off the original point of this patch.
Even the most horrible way of looking up the email addresses ONCE and then
qualifying the CC and londescs with it is orders of magnitude faster than the
status quo.

Let's just keep this simple and land it.  If the pattern match is too
contoversial, we could narrow its effect to handle the exact match only and get
it in.

Joel Peshkin

Assignee

Comment 35

•

22 years ago

Attached patch Patch updated (obsolete) — Details — Splinter Review

OK... Here it is back to INSTR and up-to-date

Attachment #103459 - Attachment is obsolete: true

David Lawrence [:dkl]

Comment 36

•

22 years ago

Comment on attachment 104448 [details] [diff] [review]
Patch updated

Decided per earlier discussion to leave the INSTR() change for another later
bug. This patch has passed my functionality testing on both Mysql and
PostgreSQL and has shown to speed up specific searches on cc/commenter
especially in the area of PostgreSQL.

r=dkl

Attachment #104448 - Flags: review+

Gervase Markham [:gerv]

Comment 37

•

22 years ago

OK, I had a bit of difficulty understanding this patch, and so these comments
reflect changes I think should be made to make the code more understandable.

- ListEmails should do just that; the "IN (" and ")" should be added by the caller.
+ push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id
AND $table.who IN($list)");
is more understandable than the current
+ push(@supptables, "LEFT JOIN longdescs $table ON bugs.bug_id = $table.bug_id
AND $table.who $list");

- ListEmails should be called ListIDsForEmail or similar - it doesn't list email
addresses, it lists IDs. The comment above it could also be updated to make this
more clear.

- Why do we do:
+                push(@clist, "$table.who",'greaterthan','0');
? longdescs.who should never be 0, because we don't allow anonymous comment
submission.

- What is $ok flagging as OK? I can't work out what it does. Please give it a
more descriptive name like $foocheckok, and test for not ok with if (!$foocheckok).

Thanks :-)

Gerv

Joel Peshkin

Assignee

Comment 38

•

22 years ago

Attached patch Revised with readability improvements — Details — Splinter Review

OK, clarified, fixed naming, and added isnotnull for when I mean isnotnull.

Attachment #104448 - Attachment is obsolete: true

Gervase Markham [:gerv]

Comment 39

•

22 years ago

Comment on attachment 104825 [details] [diff] [review]
Revised with readability improvements

r=gerv.

Gerv

Attachment #104825 - Flags: review+

Joel Peshkin

Assignee

Comment 40

•

22 years ago

Checking in Bugzilla/Search.pm;                               
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v  <--  Search.pm
new revision: 1.23; previous revision: 1.22
done

Status: ASSIGNED → RESOLVED

Closed: 22 years ago → 22 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

QA Contact: matty_is_a_geek → default-qa

patch 23 years ago Bradley Baetz (:bbaetz) 1005 bytes, patch	CodeMachine : review+ justdave : review+ justdave : review-	Details \| Diff \| Splinter Review
Rough patch 22 years ago Joel Peshkin 3.08 KB, patch		Details \| Diff \| Splinter Review
The patch 22 years ago Joel Peshkin 4.15 KB, patch	dkl : review-	Details \| Diff \| Splinter Review
Patch updated 22 years ago Joel Peshkin 4.17 KB, patch	dkl : review+	Details \| Diff \| Splinter Review
Revised with readability improvements 22 years ago Joel Peshkin 4.21 KB, patch	gerv : review+	Details \| Diff \| Splinter Review