Closed Bug 372795 Opened 17 years ago Closed 16 years ago

Implement Bugzilla::Product::preload() to speed up query.cgi when there are many products

Categories

(Bugzilla :: Bugzilla-General, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 3.2

People

(Reporter: mkanat, Assigned: mkanat)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(4 files, 7 obsolete files)

I'm doing some testing of Bugzilla with 2000 products that have 0-25 components, versions, and target milestones each.

We spend a lot of time doing a SELECT id from components, milestones, and versions, and then generating the objects for each product.

Instead, I think at least Bugzilla::Product's new_from_list should have a { preload => 1 } option, that would call a certain function to preload data by calling a single SQL query for the whole list, or a small number of SQL queries for the whole list.

This is probably generally useful in Bugzilla::Object for other situations like this (say, for lists of Bug or User objects).
Status: NEW → ASSIGNED
Summary: Bugzilla::Object should be able to preload certain data to speed things up → Bugzilla::Product::new_from_list should be able to preload certain data to speed things up
Attached patch v1 (obsolete) — Splinter Review
Okay, in my highly unscientific testing, this seems to offer a pretty good speed-up. bkor, is there any way that you could test this out on your 3.0 test installation and see if it makes any difference? (Particularly if it makes a difference when you remove your local preload hack.)

Note that this actually has some of bug 372531 in it, and I didn't really separate it out. I'm not saying that this is the exact patch I want to put in, just that I want to see how it improves things on a real machine (not on landfill, which is pretty heavily loaded and not a good perf testing system).
Attachment #257489 - Flags: review?(bugzilla-mozilla)
Summary: Bugzilla::Product::new_from_list should be able to preload certain data to speed things up → Bugzilla::Product::new_from_list should preload certain data to speed things up
Performance stats:

pristine non-cached:

  2162 queries, 2.754979 seconds
  4.13user 0.33system 0:10.17elapsed 43%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (29major+7885minor)pagefaults 0swaps


pristine cached (5th run):

  2162 queries, 1.780918 seconds
  4.04user 0.24system 0:05.26elapsed 81%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+7914minor)pagefaults 0swaps


after patch non-cached:

  144 queries, 0.996881 seconds
  2.83user 0.22system 0:07.89elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (29major+7774minor)pagefaults 0swaps

after patch cached (5th run):

  144 queries, 0.280068 seconds
  2.92user 0.13system 0:03.16elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+7802minor)pagefaults 0swaps
Comment on attachment 257489 [details] [diff] [review]
v1

PS: I tested this against latest Bugzilla HEAD, with InnoDB, etc.

>Index: Bugzilla/Field.pm

>-=item C<match>
>-
>-=over
>-
>-=item B<Description>
>-
>-Returns a list of fields that match the specified criteria.
>-
>-You should be using L<Bugzilla/get_fields> and
>-L<Bugzilla/get_custom_field_names> instead of this function.

There is still some references to match in this POD. Suggest to leave above warning in.

I did not see a POD for match in Bugzilla/Object.pm (r-)
Attachment #257489 - Flags: review?(bugzilla-mozilla) → review-
Attached patch v2 (obsolete) — Splinter Review
Okay, here's a newer version. This is much simpler, now that Bugzilla::Object already has match(). This also adds a small optimization to get_all that I figure is harmless to keep.
Attachment #257489 - Attachment is obsolete: true
Attachment #290547 - Flags: review?(bugzilla-mozilla)
Attached patch v3 (obsolete) — Splinter Review
There was an extremely strange taint error that is solved by this seemingly unimportant code change.
Attachment #290547 - Attachment is obsolete: true
Attachment #290556 - Flags: review?(bugzilla-mozilla)
Attachment #290547 - Flags: review?(bugzilla-mozilla)
Is this bug critical enough from a performance point of view to block 3.2? Think about installations with thousands of products, e.g. Mandriva and their old Bugzilla installation (32,000 products where everything was slow).

To be sure: with this patch, the memory consumption won't be worse than what we currently have, right (again with thousands of products in mind)?
Flags: blocking3.2?
Keywords: perf
(In reply to comment #6)
> Is this bug critical enough from a performance point of view to block 3.2?

  Yes, it should have even gone into 3.0, probably.

> To be sure: with this patch, the memory consumption won't be worse than what we
> currently have, right (again with thousands of products in mind)?

  Right, it will be the same.
Flags: blocking3.2? → blocking3.2+
Performance data:
pristine non-cached:
2114 queries, 1.534358 sec runtime
1.24user 0.05system 0:06.46elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (36major+10403minor)pagefaults 0swaps

pristine cached:
2114 queries, 0.319684 sec runtime
1.23user 0.03system 0:01.41elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10439minor)pagefaults 0swaps

patched, non-cached:
95 queries, 1.346977 sec runtime
0.97user 0.05system 0:05.96elapsed 17%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (36major+10262minor)pagefaults 0swaps

patches, cached:
95 queries, 0.076401 sec runtime
0.94user 0.02system 0:01.00elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10299minor)pagefaults 0swaps
btw, 'only' 337 products. Should probably make a script to generate a much larger installation based upon the statistical data in my installation.

dbiprof also shows:
0.026800 seconds for: SELECT id,value,product_id FROM versions WHERE product_id IN (?,?, [etc])  ORDER BY id

but also:
0.005405 seconds for: SELECT products.id,products.name,products.classification_id,products.description,products.milestoneurl,products.disallownew,products.votesperuser,products.maxvotesperbug,products.votestoconfirm,products.defaultmilestone FROM products WHERE id IN (?,?, [etc])  ORDER BY name

this for the patched cached case. Since 2007-04-08, the cached case now only produces 95 queries (24 unique ones -- some are called multiple times). Dbiprof doesn't show the SQL for 27 queries.
Note: I always tested using the command line, without being logged in. When logged in, I see the following:

non-patched cached for show_bug.cgi (I tested query.cgi above):
347 queries, 0.081518 seconds
0.46user 0.02system 0:00.54elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8243minor)pagefaults 0swaps

patched cached for show_bug.cgi:
332 queries, 0.141313 seconds (less queries, but they take more time)
0.52user 0.02system 0:00.62elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9136minor)pagefaults 0swaps

I believe the cause is get_selectable_products from Bugzilla/User.pm. This now retrieves all components (etc), while before it would only (inefficiently) retrieve when needed.
(In reply to comment #10)
> I believe the cause is get_selectable_products from Bugzilla/User.pm. This now
> retrieves all components (etc), while before it would only (inefficiently)
> retrieve when needed.

This is annoying. When do we use components, versions and milestones of a list of products versus when don't we care about them? query.cgi uses them, but show_bug.cgi doesn't. What about other pages?
I'll note:
 * CGI script
 * number of queries
 * query time change (pristine -> patched).
 * 'time'

request.cgi (using default params -- do_union etc):
  Speedup (already used components)
  751 queries -> 82 queries
  0.128219 secs -> 0.077958 secs
  0.56user 0.02system 0:00.66elapsed -> 0.54user 0.02system 0:00.60elapsed

page.cgi?id=release-notes.html
  No change (same # of queries)

editproducts.cgi?action=edit&product=nautilus
  Slowdown
  143 queries -> 145 queries
  0.041898 secs -> 0.097888 secs
  0.30user 0.01system 0:00.34elapsed -> 0.35user 0.02system 0:00.42elapsed

editproducts.cgi (shows Classification selection)
  Slowdown
  140 -> 142
  0.021653 secs - > 0.079241 secs
  0.29user 0.01system 0:00.32elapsed -> 0.33user 0.03system 0:00.39elapsed

buglist.cgi (My Bugs query)
  Similar
  165 -> 163
  0.452663 -> 0.457684 (same unchanged queries took 0.002 longer.. too close to call)

I'll check some more.
Data from GNOME Bugzilla (showing 1k+ hits, hiding bgo specific CGI scripts) 
  94727 show_bug.cgi
  27285 attachment.cgi
  23299 buglist.cgi
   5829 showdependencytree.cgi
   3159 query.cgi
   2214 enter_bug.cgi
   1837 show_activity.cgi
   1047 page.cgi

attachment.cgi?id=36779
  No change

attachment.cgi?id=36779&action=diff
  No change

show_activity.cgi?id=163163
  Slight speedup
  142 -> 141
  0.014744 -> 0.013627
  0.26user 0.01system 0:00.29elapsed -> 0.27user 0.01system 0:00.29elapsed

editgroups.cgi
  Slight slowdown, not caused by preload (more the other changes)
  116->114
  0.010542 -> 0.012989

editclassifications.cgi?action=edit&classification=Desktop
  Speedup
  128 -> 130
  0.024016 -> 0.012237
Comment on attachment 290556 [details] [diff] [review]
v3

Works well, but I'd like to have keep the current show_bug.cgi performance. Any way to avoid preload from slowing down show_bug.cgi?
Attachment #290556 - Flags: review?(bugzilla-mozilla) → review-
(In reply to comment #13)
What really matters is query.cgi, for this patch, though.

(In reply to comment #14)
> (From update of attachment 290556 [details] [diff] [review])
> Works well, but I'd like to have keep the current show_bug.cgi performance. Any
> way to avoid preload from slowing down show_bug.cgi?

  We could add a parameter to Product::new_from_list and Bug::get_selectable_products, { preload => 1 }, and only do the preload if it's specified.
(In reply to comment #15)
> (In reply to comment #14)
> > (From update of attachment 290556 [details] [diff] [review] [details])
> > Works well, but I'd like to have keep the current show_bug.cgi performance. Any
> > way to avoid preload from slowing down show_bug.cgi?
> 
>   We could add a parameter to Product::new_from_list and
> Bug::get_selectable_products, { preload => 1 }, and only do the preload if it's
> specified.

For show_bugs.cgi, could we still preload the product list (and not the components)?  Reason is that I'd like to have the "New" link in the header section use YUI to hierarchically display all the products and would not want to reload the product list for both the header and the show_bugs' "product" pulldown menu.
(In reply to comment #16)
> For show_bugs.cgi, could we still preload the product list (and not the
> components)? 

  There's no preloading required, there. Perhaps you misunderstood the patch?
(In reply to comment #17)
> (In reply to comment #16)
> > For show_bugs.cgi, could we still preload the product list (and not the
> > components)? 
> 
>   There's no preloading required, there. Perhaps you misunderstood the patch?
> 

Yup, I misunderstood the patch.  Had to do a special caching for my particular need.
Attached patch v4 (obsolete) — Splinter Review
Okay, here's a version that only preloads for get_selectable_products. Because get_selectable_products already takes an optional parameter, it was too complex, for this patch, to add a parameter to get_selectable_products.

This should generally improve the state of things with query.cgi and other things that use get_selectable_products.

Also, it should generally improve things in other parts of Bugzilla, since I've changed match() from doing two queries to doing one query.
Attachment #290556 - Attachment is obsolete: true
Attachment #298823 - Flags: review?(bugzilla-mozilla)
Attached patch v4.1 (obsolete) — Splinter Review
Oops, that patch didn't work, but this one does. :-)
Attachment #298823 - Attachment is obsolete: true
Attachment #298824 - Flags: review?(bugzilla-mozilla)
Attachment #298823 - Flags: review?(bugzilla-mozilla)
Attached patch v5 (obsolete) — Splinter Review
I realized I'd forgotten to use the new sql_in method.
Attachment #298824 - Attachment is obsolete: true
Attachment #298838 - Flags: review?(bugzilla-mozilla)
Attachment #298824 - Flags: review?(bugzilla-mozilla)
Latest patch speeds up query.cgi. No real change for show_bug.cgi. However, editproducts.cgi is still affected.

editproducts.cgi?action=edit&product=nautilus
  Slowdown
  143 queries -> 145 queries
  0.041476 secs -> 0.127691 secs
  time shows: 0.33 user -> 0.41 user

Should we care?
What are the stats for query.cgi, (buglist.cgi) and show_bug.cgi / process_bug.cgi?
Right now I only care about query.cgi. Let's file a separate bug for editproducts.cgi, since that's also going to have the problem of "lots of HTML output".
Comment on attachment 298838 [details] [diff] [review]
v5

Now that we have timing data from bkor, would you do the code review on this, LpSolit?
Attachment #298838 - Flags: review?(bugzilla-mozilla) → review?(LpSolit)
Comment on attachment 298838 [details] [diff] [review]
v5

>Index: Bugzilla/Object.pm

> sub match {

my $id    = $class->ID_FIELD;
my $table = $class->DB_TABLE;

Both variables are no longer needed.


>+            push(@terms, "$field IN ($qmarks)");

You have to use $dbh->sql_in().



>Index: Bugzilla/Product.pm

>+sub new_from_list {
>+    my $self = shift;
>+    my (undef, $params) = @_;
>+    $params ||= {};
>+    my $products = $self->SUPER::new_from_list(@_);

This is confusing. I spent some time to understand the goal of undef here. I would rather see:

my ($self, $prod_ids, $params) = @_;
my $products = $self->SUPER::new_from_list($prod_ids);

So that we have a chance to understand which is which.


>+sub preload {

>+        foreach my $obj (@$objects) {
>+            my $product_id = $obj->{product_id};

Milestone, version and component objects all have a public ->product_id() method. Use it.


>+            next if !exists $prods{$product_id};

This can never happen as you call match() with a defined list of product IDs. So this line should go away.


>+            $prods{$product_id}->{"${field}s"} ||= [];
>+            push(@{$prods{$product_id}->{"${field}s"}}, $obj);

This annoys me a bit. If you can Bugzilla::Product->new_from_list($prod_ids, {preload => 1}) twice in the same script (or in a subroutine called by the script), then all fields are populated twice. We should detect that they are already populated, and preload() should return immediately, IMO.
Attachment #298838 - Flags: review?(LpSolit) → review-
Using a script I checked the timings for the latest patch.
Note:
 * 'auth' means the cgi script ran under a valid login (important for e.g. process_bug.cgi)
 * Each cgi script was ran 5 times, the timings are for the 5th run
 * Did not flush MySQL buffers / file/vm cache

Suggest to use copy/pasting to ease comparison (e.g. compare auth-patched to auth).
Attachment #306920 - Attachment mime type: text/plain → text/csv
Comment on attachment 298838 [details] [diff] [review]
v5

>Index: Bugzilla/User.pm

>+        $self->{selectable_products} = 
>+            Bugzilla::Product->new_from_list($prod_ids, { preload => 1 });

Note that the reason some pages are slower to load is because $user->can_see_product() calls get_selectable_products() which in turn unconditionally calls new_from_list() with preload = 1. I think the caller should decide if it wants to preload the data or not.
No, I'm against putting it too far up in the call chain, because that adds complexity. If there's a *significant* performance impact, I'd consider it, but for minor performance differences I don't think the complexity would be justified.
Attached image Chart for Query Time
The timings from the .csv as a chart. I had to redo the perf test to generate these charts. However, figures should be more or less the same.

Unfortunately I had to customize/hack pycha for this. I want to have some standard chart thing for Python that I don't need to hack. Colors need a bit of work too, plus the font sizes (pycha limitation).

Best is to compare the 1st tick on the x axis with the 2nd, then the 3rd with the 4th. this way you compare anon with anon-patched and auth with auth-patched.
(In reply to comment #29)
> complexity. If there's a *significant* performance impact, I'd consider it

As shown in bkor's chart, editproducts.cgi is 4-5 times slower. That's a significant perf regression.
Hey bkor. What does the Y axis mean? Also, what's select_classification?

LpSolit points out that, seemingly, editproducts.cgi is slower, but are we talking a difference of 0.1 seconds to 0.5 seconds? I'm not so concerned about the performance of admin pages with lots of products--they already perform badly just because of how much HTML is generated by them, which is a totally separate issue.
Y axis is in seconds. Note that my machine is pretty fast (Duo CPU E6750  @ 2.66GHz, 4GB memory).

select_classification is editproducts.cgi without any arguments. This brings up a page to select the classification.
I think allowing calls of the form

  $user->get_selectable_products($class_id, {preload => 0})

is not a big deal. If the 2nd argument is undefined, then use preload => 1. This brings no complexity at all and can save us from useless SQL queries.
Blocks: 65052
Attached patch v6 (obsolete) — Splinter Review
I bypassed the whole architecture question by forcing people to manually call preload() when they need it.
Attachment #298838 - Attachment is obsolete: true
Attachment #311925 - Flags: review?(LpSolit)
Comment on attachment 311925 [details] [diff] [review]
v6

Seems to work fine, and the performance benefit should be the same as in the previous patch. My previous review comments still stand (except for new_from_list, which is no longer present in Product.pm). r=LpSolit with my previous review comments addressed.
Attachment #311925 - Flags: review?(LpSolit) → review+
Summary: Bugzilla::Product::new_from_list should preload certain data to speed things up → Implement Bugzilla::Product::preload() to speed things up with many products
Attached patch v7Splinter Review
Okay, here's the version with your other review comments addressed, which were very good. Carrying forward r+.
Attachment #311925 - Attachment is obsolete: true
Attachment #311967 - Flags: review+
Checking in query.cgi;
/cvsroot/mozilla/webtools/bugzilla/query.cgi,v  <--  query.cgi
new revision: 1.181; previous revision: 1.180
done
Checking in Bugzilla/Object.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Object.pm,v  <--  Object.pm
new revision: 1.23; previous revision: 1.22
done
Checking in Bugzilla/Product.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Product.pm,v  <--  Product.pm
new revision: 1.26; previous revision: 1.25
done
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Flags: approval+
Resolution: --- → FIXED
Summary: Implement Bugzilla::Product::preload() to speed things up with many products → Implement Bugzilla::Product::preload() to speed up query.cgi when there are many products
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: