Last Comment Bug 372795 - Implement Bugzilla::Product::preload() to speed up query.cgi when there are many products
: Implement Bugzilla::Product::preload() to speed up query.cgi when there are m...
Status: RESOLVED FIXED
: perf
Product: Bugzilla
Classification: Server Software
Component: Bugzilla-General (show other bugs)
: 3.1
: All All
: -- enhancement (vote)
: Bugzilla 3.2
Assigned To: Max Kanat-Alexander
: default-qa
:
Mentors:
Depends on:
Blocks: 65052 364155 364156 364157
  Show dependency treegraph
 
Reported: 2007-03-06 00:31 PST by Max Kanat-Alexander
Modified: 2009-12-15 10:30 PST (History)
8 users (show)
mkanat: approval+
mkanat: blocking3.2+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
v1 (8.93 KB, patch)
2007-03-06 03:15 PST, Max Kanat-Alexander
bugzilla-mozilla: review-
Details | Diff | Splinter Review
v2 (5.57 KB, patch)
2007-11-28 09:20 PST, Max Kanat-Alexander
no flags Details | Diff | Splinter Review
v3 (5.59 KB, patch)
2007-11-28 10:00 PST, Max Kanat-Alexander
bugzilla-mozilla: review-
Details | Diff | Splinter Review
v4 (6.94 KB, patch)
2008-01-23 17:22 PST, Max Kanat-Alexander
no flags Details | Diff | Splinter Review
v4.1 (7.01 KB, patch)
2008-01-23 17:27 PST, Max Kanat-Alexander
no flags Details | Diff | Splinter Review
v5 (7.01 KB, patch)
2008-01-23 18:20 PST, Max Kanat-Alexander
LpSolit: review-
Details | Diff | Splinter Review
Detailed timings for latest patch (2.35 KB, text/csv)
2008-03-02 13:33 PST, Olav Vitters
no flags Details
Chart for Query Time (124.65 KB, image/png)
2008-03-03 13:34 PST, Olav Vitters
no flags Details
v6 (7.16 KB, patch)
2008-03-26 17:20 PDT, Max Kanat-Alexander
LpSolit: review+
Details | Diff | Splinter Review
v7 (7.42 KB, patch)
2008-03-26 22:06 PDT, Max Kanat-Alexander
mkanat: review+
Details | Diff | Splinter Review
Query Time Chart (CVS HEAD -- with latest patch) (59.42 KB, image/png)
2008-04-01 15:33 PDT, Olav Vitters
no flags Details

Description Max Kanat-Alexander 2007-03-06 00:31:12 PST
I'm doing some testing of Bugzilla with 2000 products that have 0-25 components, versions, and target milestones each.

We spend a lot of time doing a SELECT id from components, milestones, and versions, and then generating the objects for each product.

Instead, I think at least Bugzilla::Product's new_from_list should have a { preload => 1 } option, that would call a certain function to preload data by calling a single SQL query for the whole list, or a small number of SQL queries for the whole list.

This is probably generally useful in Bugzilla::Object for other situations like this (say, for lists of Bug or User objects).
Comment 1 Max Kanat-Alexander 2007-03-06 03:15:43 PST
Created attachment 257489 [details] [diff] [review]
v1

Okay, in my highly unscientific testing, this seems to offer a pretty good speed-up. bkor, is there any way that you could test this out on your 3.0 test installation and see if it makes any difference? (Particularly if it makes a difference when you remove your local preload hack.)

Note that this actually has some of bug 372531 in it, and I didn't really separate it out. I'm not saying that this is the exact patch I want to put in, just that I want to see how it improves things on a real machine (not on landfill, which is pretty heavily loaded and not a good perf testing system).
Comment 2 Olav Vitters 2007-04-08 06:46:42 PDT
Performance stats:

pristine non-cached:

  2162 queries, 2.754979 seconds
  4.13user 0.33system 0:10.17elapsed 43%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (29major+7885minor)pagefaults 0swaps


pristine cached (5th run):

  2162 queries, 1.780918 seconds
  4.04user 0.24system 0:05.26elapsed 81%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+7914minor)pagefaults 0swaps


after patch non-cached:

  144 queries, 0.996881 seconds
  2.83user 0.22system 0:07.89elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (29major+7774minor)pagefaults 0swaps

after patch cached (5th run):

  144 queries, 0.280068 seconds
  2.92user 0.13system 0:03.16elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+7802minor)pagefaults 0swaps
Comment 3 Olav Vitters 2007-04-08 07:26:00 PDT
Comment on attachment 257489 [details] [diff] [review]
v1

PS: I tested this against latest Bugzilla HEAD, with InnoDB, etc.

>Index: Bugzilla/Field.pm

>-=item C<match>
>-
>-=over
>-
>-=item B<Description>
>-
>-Returns a list of fields that match the specified criteria.
>-
>-You should be using L<Bugzilla/get_fields> and
>-L<Bugzilla/get_custom_field_names> instead of this function.

There is still some references to match in this POD. Suggest to leave above warning in.

I did not see a POD for match in Bugzilla/Object.pm (r-)
Comment 4 Max Kanat-Alexander 2007-11-28 09:20:15 PST
Created attachment 290547 [details] [diff] [review]
v2

Okay, here's a newer version. This is much simpler, now that Bugzilla::Object already has match(). This also adds a small optimization to get_all that I figure is harmless to keep.
Comment 5 Max Kanat-Alexander 2007-11-28 10:00:37 PST
Created attachment 290556 [details] [diff] [review]
v3

There was an extremely strange taint error that is solved by this seemingly unimportant code change.
Comment 6 Frédéric Buclin 2007-12-05 15:14:02 PST
Is this bug critical enough from a performance point of view to block 3.2? Think about installations with thousands of products, e.g. Mandriva and their old Bugzilla installation (32,000 products where everything was slow).

To be sure: with this patch, the memory consumption won't be worse than what we currently have, right (again with thousands of products in mind)?
Comment 7 Max Kanat-Alexander 2007-12-05 23:20:43 PST
(In reply to comment #6)
> Is this bug critical enough from a performance point of view to block 3.2?

  Yes, it should have even gone into 3.0, probably.

> To be sure: with this patch, the memory consumption won't be worse than what we
> currently have, right (again with thousands of products in mind)?

  Right, it will be the same.
Comment 8 Olav Vitters 2007-12-09 04:23:43 PST
Performance data:
pristine non-cached:
2114 queries, 1.534358 sec runtime
1.24user 0.05system 0:06.46elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (36major+10403minor)pagefaults 0swaps

pristine cached:
2114 queries, 0.319684 sec runtime
1.23user 0.03system 0:01.41elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10439minor)pagefaults 0swaps

patched, non-cached:
95 queries, 1.346977 sec runtime
0.97user 0.05system 0:05.96elapsed 17%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (36major+10262minor)pagefaults 0swaps

patches, cached:
95 queries, 0.076401 sec runtime
0.94user 0.02system 0:01.00elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10299minor)pagefaults 0swaps
Comment 9 Olav Vitters 2007-12-09 04:36:04 PST
btw, 'only' 337 products. Should probably make a script to generate a much larger installation based upon the statistical data in my installation.

dbiprof also shows:
0.026800 seconds for: SELECT id,value,product_id FROM versions WHERE product_id IN (?,?, [etc])  ORDER BY id

but also:
0.005405 seconds for: SELECT products.id,products.name,products.classification_id,products.description,products.milestoneurl,products.disallownew,products.votesperuser,products.maxvotesperbug,products.votestoconfirm,products.defaultmilestone FROM products WHERE id IN (?,?, [etc])  ORDER BY name

this for the patched cached case. Since 2007-04-08, the cached case now only produces 95 queries (24 unique ones -- some are called multiple times). Dbiprof doesn't show the SQL for 27 queries.
Comment 10 Olav Vitters 2007-12-09 04:55:41 PST
Note: I always tested using the command line, without being logged in. When logged in, I see the following:

non-patched cached for show_bug.cgi (I tested query.cgi above):
347 queries, 0.081518 seconds
0.46user 0.02system 0:00.54elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8243minor)pagefaults 0swaps

patched cached for show_bug.cgi:
332 queries, 0.141313 seconds (less queries, but they take more time)
0.52user 0.02system 0:00.62elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9136minor)pagefaults 0swaps

I believe the cause is get_selectable_products from Bugzilla/User.pm. This now retrieves all components (etc), while before it would only (inefficiently) retrieve when needed.
Comment 11 Frédéric Buclin 2007-12-09 05:02:11 PST
(In reply to comment #10)
> I believe the cause is get_selectable_products from Bugzilla/User.pm. This now
> retrieves all components (etc), while before it would only (inefficiently)
> retrieve when needed.

This is annoying. When do we use components, versions and milestones of a list of products versus when don't we care about them? query.cgi uses them, but show_bug.cgi doesn't. What about other pages?
Comment 12 Olav Vitters 2007-12-09 05:34:30 PST
I'll note:
 * CGI script
 * number of queries
 * query time change (pristine -> patched).
 * 'time'

request.cgi (using default params -- do_union etc):
  Speedup (already used components)
  751 queries -> 82 queries
  0.128219 secs -> 0.077958 secs
  0.56user 0.02system 0:00.66elapsed -> 0.54user 0.02system 0:00.60elapsed

page.cgi?id=release-notes.html
  No change (same # of queries)

editproducts.cgi?action=edit&product=nautilus
  Slowdown
  143 queries -> 145 queries
  0.041898 secs -> 0.097888 secs
  0.30user 0.01system 0:00.34elapsed -> 0.35user 0.02system 0:00.42elapsed

editproducts.cgi (shows Classification selection)
  Slowdown
  140 -> 142
  0.021653 secs - > 0.079241 secs
  0.29user 0.01system 0:00.32elapsed -> 0.33user 0.03system 0:00.39elapsed

buglist.cgi (My Bugs query)
  Similar
  165 -> 163
  0.452663 -> 0.457684 (same unchanged queries took 0.002 longer.. too close to call)

I'll check some more.
Comment 13 Olav Vitters 2007-12-09 06:30:53 PST
Data from GNOME Bugzilla (showing 1k+ hits, hiding bgo specific CGI scripts) 
  94727 show_bug.cgi
  27285 attachment.cgi
  23299 buglist.cgi
   5829 showdependencytree.cgi
   3159 query.cgi
   2214 enter_bug.cgi
   1837 show_activity.cgi
   1047 page.cgi

attachment.cgi?id=36779
  No change

attachment.cgi?id=36779&action=diff
  No change

show_activity.cgi?id=163163
  Slight speedup
  142 -> 141
  0.014744 -> 0.013627
  0.26user 0.01system 0:00.29elapsed -> 0.27user 0.01system 0:00.29elapsed

editgroups.cgi
  Slight slowdown, not caused by preload (more the other changes)
  116->114
  0.010542 -> 0.012989

editclassifications.cgi?action=edit&classification=Desktop
  Speedup
  128 -> 130
  0.024016 -> 0.012237
Comment 14 Olav Vitters 2007-12-09 12:45:54 PST
Comment on attachment 290556 [details] [diff] [review]
v3

Works well, but I'd like to have keep the current show_bug.cgi performance. Any way to avoid preload from slowing down show_bug.cgi?
Comment 15 Max Kanat-Alexander 2007-12-09 19:12:45 PST
(In reply to comment #13)
What really matters is query.cgi, for this patch, though.

(In reply to comment #14)
> (From update of attachment 290556 [details] [diff] [review])
> Works well, but I'd like to have keep the current show_bug.cgi performance. Any
> way to avoid preload from slowing down show_bug.cgi?

  We could add a parameter to Product::new_from_list and Bug::get_selectable_products, { preload => 1 }, and only do the preload if it's specified.
Comment 16 Albert Ting 2007-12-27 17:39:32 PST
(In reply to comment #15)
> (In reply to comment #14)
> > (From update of attachment 290556 [details] [diff] [review] [details])
> > Works well, but I'd like to have keep the current show_bug.cgi performance. Any
> > way to avoid preload from slowing down show_bug.cgi?
> 
>   We could add a parameter to Product::new_from_list and
> Bug::get_selectable_products, { preload => 1 }, and only do the preload if it's
> specified.

For show_bugs.cgi, could we still preload the product list (and not the components)?  Reason is that I'd like to have the "New" link in the header section use YUI to hierarchically display all the products and would not want to reload the product list for both the header and the show_bugs' "product" pulldown menu.
Comment 17 Max Kanat-Alexander 2007-12-27 17:42:41 PST
(In reply to comment #16)
> For show_bugs.cgi, could we still preload the product list (and not the
> components)? 

  There's no preloading required, there. Perhaps you misunderstood the patch?
Comment 18 Albert Ting 2007-12-29 16:06:59 PST
(In reply to comment #17)
> (In reply to comment #16)
> > For show_bugs.cgi, could we still preload the product list (and not the
> > components)? 
> 
>   There's no preloading required, there. Perhaps you misunderstood the patch?
> 

Yup, I misunderstood the patch.  Had to do a special caching for my particular need.
Comment 19 Max Kanat-Alexander 2008-01-23 17:22:06 PST
Created attachment 298823 [details] [diff] [review]
v4

Okay, here's a version that only preloads for get_selectable_products. Because get_selectable_products already takes an optional parameter, it was too complex, for this patch, to add a parameter to get_selectable_products.

This should generally improve the state of things with query.cgi and other things that use get_selectable_products.

Also, it should generally improve things in other parts of Bugzilla, since I've changed match() from doing two queries to doing one query.
Comment 20 Max Kanat-Alexander 2008-01-23 17:27:06 PST
Created attachment 298824 [details] [diff] [review]
v4.1

Oops, that patch didn't work, but this one does. :-)
Comment 21 Max Kanat-Alexander 2008-01-23 18:20:34 PST
Created attachment 298838 [details] [diff] [review]
v5

I realized I'd forgotten to use the new sql_in method.
Comment 22 Olav Vitters 2008-02-05 04:57:03 PST
Latest patch speeds up query.cgi. No real change for show_bug.cgi. However, editproducts.cgi is still affected.

editproducts.cgi?action=edit&product=nautilus
  Slowdown
  143 queries -> 145 queries
  0.041476 secs -> 0.127691 secs
  time shows: 0.33 user -> 0.41 user

Should we care?
Comment 23 Frédéric Buclin 2008-02-05 09:43:13 PST
What are the stats for query.cgi, (buglist.cgi) and show_bug.cgi / process_bug.cgi?
Comment 24 Max Kanat-Alexander 2008-02-05 13:41:39 PST
Right now I only care about query.cgi. Let's file a separate bug for editproducts.cgi, since that's also going to have the problem of "lots of HTML output".
Comment 25 Max Kanat-Alexander 2008-02-26 12:33:22 PST
Comment on attachment 298838 [details] [diff] [review]
v5

Now that we have timing data from bkor, would you do the code review on this, LpSolit?
Comment 26 Frédéric Buclin 2008-02-29 15:05:48 PST
Comment on attachment 298838 [details] [diff] [review]
v5

>Index: Bugzilla/Object.pm

> sub match {

my $id    = $class->ID_FIELD;
my $table = $class->DB_TABLE;

Both variables are no longer needed.


>+            push(@terms, "$field IN ($qmarks)");

You have to use $dbh->sql_in().



>Index: Bugzilla/Product.pm

>+sub new_from_list {
>+    my $self = shift;
>+    my (undef, $params) = @_;
>+    $params ||= {};
>+    my $products = $self->SUPER::new_from_list(@_);

This is confusing. I spent some time to understand the goal of undef here. I would rather see:

my ($self, $prod_ids, $params) = @_;
my $products = $self->SUPER::new_from_list($prod_ids);

So that we have a chance to understand which is which.


>+sub preload {

>+        foreach my $obj (@$objects) {
>+            my $product_id = $obj->{product_id};

Milestone, version and component objects all have a public ->product_id() method. Use it.


>+            next if !exists $prods{$product_id};

This can never happen as you call match() with a defined list of product IDs. So this line should go away.


>+            $prods{$product_id}->{"${field}s"} ||= [];
>+            push(@{$prods{$product_id}->{"${field}s"}}, $obj);

This annoys me a bit. If you can Bugzilla::Product->new_from_list($prod_ids, {preload => 1}) twice in the same script (or in a subroutine called by the script), then all fields are populated twice. We should detect that they are already populated, and preload() should return immediately, IMO.
Comment 27 Olav Vitters 2008-03-02 13:33:00 PST
Created attachment 306920 [details]
Detailed timings for latest patch

Using a script I checked the timings for the latest patch.
Note:
 * 'auth' means the cgi script ran under a valid login (important for e.g. process_bug.cgi)
 * Each cgi script was ran 5 times, the timings are for the 5th run
 * Did not flush MySQL buffers / file/vm cache

Suggest to use copy/pasting to ease comparison (e.g. compare auth-patched to auth).
Comment 28 Frédéric Buclin 2008-03-02 15:08:51 PST
Comment on attachment 298838 [details] [diff] [review]
v5

>Index: Bugzilla/User.pm

>+        $self->{selectable_products} = 
>+            Bugzilla::Product->new_from_list($prod_ids, { preload => 1 });

Note that the reason some pages are slower to load is because $user->can_see_product() calls get_selectable_products() which in turn unconditionally calls new_from_list() with preload = 1. I think the caller should decide if it wants to preload the data or not.
Comment 29 Max Kanat-Alexander 2008-03-02 19:11:00 PST
No, I'm against putting it too far up in the call chain, because that adds complexity. If there's a *significant* performance impact, I'd consider it, but for minor performance differences I don't think the complexity would be justified.
Comment 30 Olav Vitters 2008-03-03 13:34:18 PST
Created attachment 307097 [details]
Chart for Query Time

The timings from the .csv as a chart. I had to redo the perf test to generate these charts. However, figures should be more or less the same.

Unfortunately I had to customize/hack pycha for this. I want to have some standard chart thing for Python that I don't need to hack. Colors need a bit of work too, plus the font sizes (pycha limitation).

Best is to compare the 1st tick on the x axis with the 2nd, then the 3rd with the 4th. this way you compare anon with anon-patched and auth with auth-patched.
Comment 31 Frédéric Buclin 2008-03-03 13:37:26 PST
(In reply to comment #29)
> complexity. If there's a *significant* performance impact, I'd consider it

As shown in bkor's chart, editproducts.cgi is 4-5 times slower. That's a significant perf regression.
Comment 32 Max Kanat-Alexander 2008-03-03 15:36:06 PST
Hey bkor. What does the Y axis mean? Also, what's select_classification?

LpSolit points out that, seemingly, editproducts.cgi is slower, but are we talking a difference of 0.1 seconds to 0.5 seconds? I'm not so concerned about the performance of admin pages with lots of products--they already perform badly just because of how much HTML is generated by them, which is a totally separate issue.
Comment 33 Olav Vitters 2008-03-03 15:41:24 PST
Y axis is in seconds. Note that my machine is pretty fast (Duo CPU E6750  @ 2.66GHz, 4GB memory).

select_classification is editproducts.cgi without any arguments. This brings up a page to select the classification.
Comment 34 Frédéric Buclin 2008-03-03 15:49:33 PST
I think allowing calls of the form

  $user->get_selectable_products($class_id, {preload => 0})

is not a big deal. If the 2nd argument is undefined, then use preload => 1. This brings no complexity at all and can save us from useless SQL queries.
Comment 35 Max Kanat-Alexander 2008-03-26 17:20:39 PDT
Created attachment 311925 [details] [diff] [review]
v6

I bypassed the whole architecture question by forcing people to manually call preload() when they need it.
Comment 36 Frédéric Buclin 2008-03-26 18:43:04 PDT
Comment on attachment 311925 [details] [diff] [review]
v6

Seems to work fine, and the performance benefit should be the same as in the previous patch. My previous review comments still stand (except for new_from_list, which is no longer present in Product.pm). r=LpSolit with my previous review comments addressed.
Comment 37 Max Kanat-Alexander 2008-03-26 22:06:45 PDT
Created attachment 311967 [details] [diff] [review]
v7

Okay, here's the version with your other review comments addressed, which were very good. Carrying forward r+.
Comment 38 Max Kanat-Alexander 2008-03-26 22:08:19 PDT
Checking in query.cgi;
/cvsroot/mozilla/webtools/bugzilla/query.cgi,v  <--  query.cgi
new revision: 1.181; previous revision: 1.180
done
Checking in Bugzilla/Object.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Object.pm,v  <--  Object.pm
new revision: 1.23; previous revision: 1.22
done
Checking in Bugzilla/Product.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Product.pm,v  <--  Product.pm
new revision: 1.26; previous revision: 1.25
done
Comment 39 Olav Vitters 2008-04-01 15:33:32 PDT
Created attachment 312978 [details]
Query Time Chart (CVS HEAD -- with latest patch)
Comment 40 Frédéric Buclin 2009-12-15 10:30:24 PST
*** Bug 534844 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.