Closed Bug 833518 Opened 11 years ago Closed 11 years ago

New HTMLElement bindings caused ~20MB regression in peak memory consumption on areweslimyet

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox19 --- unaffected
firefox20 + fixed
firefox21 + fixed

People

(Reporter: johns, Assigned: peterv)

References

Details

(Keywords: regression, Whiteboard: [MemShrink:P2])

Attachments

(3 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #820602 +++

https://areweslimyet.com/

When bug 818219 landed AWSY shows a 20MiB regression in the "RSS: After TP5 [+30s, forced GC]" (Dark green) line, which measures memory usage with 30 TP5 sites open after waiting 30 seconds and forcing a GC.

More specifically, bug 818219 landed as 79fc9d732bf7, which regressed by 18MiB, then was backed out as a675dff7f8c4 which returned to previous levels, before regressing again in the e42b1de74b4d .. 8a0c82cf76ab range, which included this landing.
Hmm.  Do we have any good data on where the memory is used?

I would have expected some extra memory usage here because of all the getters/setters on HTMLElement now getting JSFunctions instead of being JSPropertyOps.

I guess we're setting this up on every single HTML element prototype, so we're looking at something like (100+ functions) * (however many prototypes) * (however many pages)...  If so, then finishing the conversion of HTML element classes to WebIDL will naturally help here.  Have we observed drops when such conversions have landed, by any chance?

(Note: it's a lot of functions because of all the on* properties.)
There haven't been any significant decreases in the past months :(

The last significant drop in usage was in the range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58c8080a1a7c&tochange=8586bd350875

If you zoom in to the Dec 7th range on AWSY and click on the points just before/after the jump you can view the full about:memory information of each test run. The explicit data is much less noisy than resident.

Looking at f23cff8ec6c9 vs 79fc9d732bf7:
- explicit/window-objects regressed by 14MiB, and explicit/js-non-window by ~4MiB
- The myspace.com compartment went from ~12MiB to ~14MiB
  - gc-heap/objects/function up ~700KiB,
  - unused-gc-things up ~500KiB
  - shapes/base by ~300KiB
  - shapes/dict by ~150KiB
It looks like bug 812333 also caused a smaller ~7MiB regression in explicit.

Prior to this 20MiB spike, we climbed from 262MiB to 282MiB from Oct 27th to Dec 7th, largely in small increments.
> If so, then finishing the conversion of HTML
> element classes to WebIDL will naturally help here.

Why is that?

This is a bad regression, and it's on Aurora now.  khuey says we could turn this off for the moment.  Do we have other, longer-term options?
Whiteboard: [MemShrink] → [MemShrink:P1]
(In reply to Nicholas Nethercote [:njn] from comment #4)
> > If so, then finishing the conversion of HTML
> > element classes to WebIDL will naturally help here.
> 
> Why is that?

Because we install the JSFunctions on the leaf prototypes, but only for those classes that aren't converted yet. The new bindings follow WebIDL and inherit them from HTMLElement.prototype. That is, for XPConnect HTML elements, we have

Element.prototype -> Node.prototype -> EventTarget.prototype (each with its own members)
^
|
HTMLElement.prototype (also with its own members)
^
|
(Leaf class).prototype (with all members from the leaf class, *and* the members from all classes along the prototype chain)

whereas for WebIDL binding HTML elements, we have


Element.prototype -> Node.prototype -> EventTarget.prototype (each with its own members)
^
|
HTMLElement.prototype (also with its own members)
^
|
(Leaf class).prototype (with *just* the members from the leaf class)
(In reply to Nicholas Nethercote [:njn] from comment #4)
> This is a bad regression, and it's on Aurora now.  khuey says we could turn
> this off for the moment.  Do we have other, longer-term options?

I don't think we should turn this off, I think we should try filtering out all the on* properties first. As said the longer-term option is converting to the new bindings.
I'm trying out a patch that reduces the number of properties by filtering the on* properties. Though I have to say it's a bit baffling how hard it is to be able to reproduce these test results, given the importance we seem to give to this test.
> There haven't been any significant decreases in the past months :(

Well, it depends on how you define "significant".

There is a definite drop of 6+MB on this pushlog, afaict: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6172df451c1c&tochange=6503c6bee6f8 which is what I'd expect from the changes there.  It's just not clear whether that drop is noise or due to the changes....

> Do we have other, longer-term options?

The primary long-term option is finishing the conversion of HTML elements to WebIDL.  This is a highly parallelizable and mechanical task with two exceptions (embed/object and form), and I welcome people helping out.  ;)  Seriously, though, long-term this should give us what we want.  To the extent that it doesn't, we'll need to fix the JS engine to use less memory for these functions...

Peter, for comment 6 the idea is to try not installing quickstubs for on*?  We could certainly try that; it should be a pretty small behavior change, I think...
> Though I have to say it's a bit baffling how hard it is to be able to reproduce these test results, 
> given the importance we seem to give to this test.

I imagine that if you send johns a patch, he can run it through AWSY on his machine, which is probably much easier than trying to set it up locally.
You can also look at the individual reports on AWSY and probably dig around in the about:memory report for each to figure out which pages are hit the worst, then get the TP5 set.  For this particular problem that does not involve anything transient, I imagine "load a page, click minimize memory usage, wait a few seconds" should show the problem.
(The TP5 pageset is available by asking on #releng.)
(In reply to Peter Van der Beken [:peterv] from comment #7)
> Though I have to say it's a bit baffling how hard it is
> to be able to reproduce these test results, given the importance we seem to
> give to this test.

Adding the ability to the AWSY control panel to automatically test try builds is on my todo list, in the meantime I can run the test on any patches pretty easily.

For running the test locally, there's notes at bug 820602 comment 22. It requires setting up the TP5 pageset (which cannot be distributed) and issuing one command, so I'm not sure it's unreasonably difficult compared to our other test suites.
> I'm trying out a patch that reduces the number of properties by filtering
> the on* properties.

Great!  I've assigned this bug to you accordingly.  If the filtering doesn't work, I guess we can turn this off on Aurora, and then hopefully do enough WebIDL conversion during the rest of this cycle to fix Nightly?

> Though I have to say it's a bit baffling how hard it is
> to be able to reproduce these test results, given the importance we seem to
> give to this test.

Good memory consumption tests are really hard to write;  that's life.  John's done a great job getting AWSY as useful and usable as it is.  Just post the patch when you're done and he'll be able to do a test run.  Thanks!
Assignee: nobody → peterv
In the long term -- nowhere close to short term -- one possibility for dealing with the webidl function-size memory increase is to institutionalize (in the disputably good way) the quickstubs __lookupGetter__/__lookupSetter__ hack, so that the relevant properties include a JSNative initially, then on Object.getOwnProperty inspection they get mutated to contain an actual function.

This is a substantial amount of work of considerable complexity, in terms of implementation, in terms of how it affects other changes we want to make (specifically, making our MOP be like the ES6 MOP -- it makes using Property Descriptor in the MOP harder, because you have to be able to rewrite the Shape sometimes, but not always), and in terms of not affecting perf.  I did some (but not all) of this work in bug 560072, back in the day, so I *know* it's complex and tricky and difficult and error-prone.  But it's possible it's what we'll have to do.  :-\
We can track this for release if the proposal is to backout bug 818219 from branches and work on a proper fix as per comment 14, but if a backout isn't an option this doesn't seem fit to be a release-blocking tracked bug since it looks like a much more involved effort that will span several versions.
> I guess we can turn this off on Aurora

I would be strongly opposed to that unless this is a critical stop-ship problem.  Doing that will make methods with the same name have different behavior on different HTML elements, which is ... highly suboptimal.  In my opinion.
We were giving gps a hard time for a 2.5mb memory regression with FHR; it's only fair to give you 8x more of a hard time.  :)  (Actually, this is probably worse than 8x, because it scales with additional memory usage; it's not a fixed cost.)

Do we have any options here to reduce the size of this memory regression other than backing the patch out?
Um... read the bug?  ;)  Seriously, see comment 7.  Peter is working on it.
(In reply to Boris Zbarsky (:bz) from comment #18)
> Um... read the bug?  ;)  Seriously, see comment 7.  Peter is working on it.

Sorry, I thought that wasn't targeted at Aurora.  Is it?
It would be Aurora-only, actually.  On m-c we're just going to convert more elements and see how things look.

Note that since we're converting them a few at a time you won't see a big drop, but rather a bunch of small decreases.  Assuming our analysis of the situation is correct.

I suppose at the end we can test things by turning off the quickstub bits in a test build and seeing how much memory that wins at that point, if any.
With this MaxMemoryResidentForceGCV2 goes from 405344256 to 393580544.
OK, so 12MB?  And we won another 6MB or so, afaict, when we converted the dromaeo-used elements.  So that sounds like most of the difference right there.
Comment on attachment 706405 [details] [diff] [review]
Don't install new bindings for on* properties

I think we should do this on aurora, and continue converting to new bindings on trunk.
Attachment #706405 - Flags: review?(bzbarsky)
Comment on attachment 706405 [details] [diff] [review]
Don't install new bindings for on* properties

>+        JSBool ok;

This is never really used.  Just get rid of it, please.  So you can make the for loop look like this:

  for (JSPropertySpec* ps = props->specs; px->name; ++ps)

r=me with that.
Attachment #706405 - Flags: review?(bzbarsky) → review+
Attachment #706405 - Attachment is obsolete: true
Attachment #706533 - Flags: review+
Comment on attachment 706533 [details] [diff] [review]
Don't install new bindings for on* properties

We'll need a JS peer to sign off on the new friend api. We only want to add this to aurora though.
Attachment #706533 - Flags: review?(n.nethercote)
Comment on attachment 706533 [details] [diff] [review]
Don't install new bindings for on* properties

Review of attachment 706533 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing this, peterv!

::: js/src/jsapi.cpp
@@ +4021,5 @@
>  }
>  
> +namespace js {
> +
> +JS_FRIEND_API(bool)

Can you add a comment about the temporariness of this?  E.g. "This function was added on the Aurora branch as a short-term fix for bug 833518.  It's not present on trunk."
Attachment #706533 - Flags: review?(n.nethercote) → review+
(In reply to Boris Zbarsky (:bz) from comment #22)
> OK, so 12MB?  And we won another 6MB or so, afaict, when we converted the
> dromaeo-used elements.

When/where did that happen?  Is it on Aurora?
> When/where did that happen?

See comment 8.  The relevant bugs are bug 824823 and bug 824907

>  Is it on Aurora?

Yes.
johns, once peterv's patch has landed, is it possible to do an AWSY run of Aurora?  It'd be good to confirm the improvement on the actual branch.
Comment on attachment 706533 [details] [diff] [review]
Don't install new bindings for on* properties

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 818219
User impact if declined: higher memory usage
Testing completed (on m-c, etc.): pushed to try, was green. We're not planning on landing this on trunk because we want to fix it in a different way there.
Risk to taking this patch (and alternatives if risky): there's a small risk for web compat issues because we're switching some properties on some elements back to using XPConnect, which has small differences in behaviour. Because we've limited it to on* properties the risk should be pretty low.
String or UUID changes made by this patch: none, but adds a JS friend API that should only exist during one cycle
Attachment #706533 - Flags: approval-mozilla-aurora?
Comment on attachment 706533 [details] [diff] [review]
Don't install new bindings for on* properties

Approving for Aurora, understood that it will only land there and a different fix will be forthcoming for trunk.  Let's land this now so we can get some bake time before merge to Beta.
Attachment #706533 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attached image AWSY on Feb 6
The attached AWSY screenshot shows that we're at about the same place we were between Dec 7--17, after this regression happened.  I.e. no net improvement on Nightly.  (Peter landed an improvement on Aurora, of course).

Are there many more conversions to happen between now and the end of this dev cycle (Feb 18)?  I wonder if we'll need to land Peter's fix on the next Aurora as well.
> Are there many more conversions to happen between now and the end of this dev cycle

Yes.
A bunch more elements were converted. It does seem like we've slowly improved, I don't think we're completely back where we were but we're getting close. Personally I don't think we need to take this on the next aurora, but let us know if you disagree.
Whiteboard: [MemShrink:P1] → [MemShrink]
Attached image AWSY on Feb 19
It is improving.  E.g. in the attachment, the blue line was about 250 MiB before the regression (the first jump on the left) and it's been creeping on the right side, and is now at about 260 MiB.
Whiteboard: [MemShrink] → [MemShrink:P2]
njn said he'd do a AWSY run to figure out how much the Aurora patch buys us now.

Peter, is there any particular reason not to apply this patch?
Well, apart from the reason in comment 16?  Though for the on* properties the behavior differences are pretty subtle, I'd guess.
(In reply to Peter Van der Beken [:peterv] from comment #31)
> Comment on attachment 706533 [details] [diff] [review]
> Don't install new bindings for on* properties
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): bug 818219
> User impact if declined: higher memory usage
> Testing completed (on m-c, etc.): pushed to try, was green. We're not
> planning on landing this on trunk because we want to fix it in a different
> way there.

My understanding here is we were going to land something different on aurora.What's the status of the patch Or are we planning on landing the same patch we landed for Fx20 ? We are less than a week away from the merge and will be helpful to have the next steps here.
I think we're close enough to where we were originally -- within about 4 MiB on AWSY -- that we don't need to do anything on Aurora.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I'm going to declare this fixed on 21, then.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: