Open Bug 1515799 Opened 5 years ago Updated 9 months ago

[meta] reduce impact/overhead of custom element use on startup

Categories

(Firefox :: General, enhancement, P3)

enhancement

Tracking

()

Performance Impact medium

People

(Reporter: Gijs, Unassigned)

References

(Depends on 5 open bugs)

Details

(Keywords: meta, perf:frontend, perf:startup)

Attachments

(1 obsolete file)

I just filed bug 1515794, bug 1515795, bug 1515796. They're all basically about UA widgets whose connectedCallbacks show up in startup profiles. This is because onDOMContentLoaded, any DOM that's in the document has their UA widgets connected.

For XBL, we avoid attaching bindings until the node gets a frame, so hidden elements (or elements with hidden ancestors) do not get bindings attached.

The change to UA widgets makes connection/construction of UA widgets more understandable/predictable and eliminates subtle bugs. However, it also introduces a performance penalty where such elements are currently hidden.

(As an aside, we're also paying costs for XBL that go away with the move to UA widgets, because it seems getting any DOM element in JS trips figuring out if it has a binding attached and instantiating it if necessary, which means checking its style, which is also showing up in profiles...)

While we can obviously fix the ones that I'm seeing in startup profiles and reported as bugs, I think we need to come up with a structural way to avoid slowly regressing startup times due to the shift to UA widgets. The 1-2ms that it takes to run some of this stuff is likely not going to impact ts_paint/tpaint numbers (and similar) more than the noise levels or the 1-2% tolerance, so we're unlikely to get performance alerts for this.

I wonder if, somewhat like the static constructor counter, we could have some way of counting how many connectedCallbacks fire, and ensuring on a per-migration basis that these numbers match our expectations. In particular, for all of the bugs I just filed, that number should probably just be 0.

I'm also wondering if this means that for more ubiquitous bindings like <label> or <button>, we should more strongly consider moving to either plain HTML or native implementations than we might otherwise have done. The perf impact seems non-negligible when multiplied by the number of affected nodes.
Summary: Evaluate how we can avoid slowly regressing startup by doing more construction work for UA widgets than we used to do for XBL → Evaluate how we can avoid slowly regressing startup time/performance by doing more construction work for UA widgets than we used to do for XBL
<tree> is no an UA Widget. I don't think we construct any during startup. There is/was an <audio> somewhere (notification?) but without |controls| it won't trigger anything in JS.

I looked at the profile and search for UAWidgetsChild in content precesses, and I couldn't find any...
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #1)
> <tree> is no an UA Widget.

Ugh, I got my terminology confused. I mean custom elements.

> I don't think we construct any during startup.

We do, the one in the 'add bookmark' panel, from the folder tree.

> I looked at the profile and search for UAWidgetsChild in content precesses,
> and I couldn't find any...

Search for connectedCallback.
Summary: Evaluate how we can avoid slowly regressing startup time/performance by doing more construction work for UA widgets than we used to do for XBL → Evaluate how we can avoid slowly regressing startup time/performance by doing more construction work for custom elements than we used to do for XBL
(In reply to :Gijs (he/him) from comment #2)
> Ugh, I got my terminology confused. I mean custom elements.
> 
> > I don't think we construct any during startup.
> 
> We do, the one in the 'add bookmark' panel, from the folder tree.
> 
> > I looked at the profile and search for UAWidgetsChild in content precesses,
> > and I couldn't find any...
> 
> Search for connectedCallback.

I was asking about UA Widgets. Let me know if you find any UA Widgets construction on the start-up patch. UAWidgetsChild (name of the class) should be the right keyword.
Brian, any ideas about what we can do here, or (not quite the same thing) what kind of structural solution we could use for bug 1515794, bug 1515795, bug 1515796 ? I can write impromptu code for them all but that seems somewhat suboptimal.
Flags: needinfo?(bgrinstead)
(In reply to :Gijs (he/him) from comment #4)
> Brian, any ideas about what we can do here, or (not quite the same thing)
> what kind of structural solution we could use for bug 1515794, bug 1515795,
> bug 1515796 ? I can write impromptu code for them all but that seems
> somewhat suboptimal.

This would be quite helpful, thanks for bringing it up. Lots of decisions to make here, so hopefully whatever we do can be expanded on to support other types of perf data that may not get surfaced in talos.

I could definitely imagine wrapping all MozElement connectedCallbacks (either opt-in or automatically), and then keeping a count (and even a running timer for each element class) in a JSM-scoped Map with something like <tagName, [numCalls, totalMs]>. Or in the Performance object on the hidden window. Or maybe keep it scoped per global and save each URL separately? If we wanted to do this, we could have a shared function that takes in the class and replaces the connectedCallback function with a wrapped version of it and then instead of `customElements.define("my-element", MyElement)` you could do `customElements.define("my-element", MozElements.instrumentClass(MyElement))`.

Then we could report that data out... somewhere. I imagine the primary workflow we want here is running a local build and getting a report that can be discussed in the bug (i.e. run with and without the patch and do a comparison). That would help both for new bindings and also for making changes to existing ones to confirm it's helping.

Some ideas on when this could report: after startup is fully complete, during shutdown, after a mochitest, or in talos (via perf-html profiles?). I don't think it'd be practical to make a test where you can't land anything that adds to that total time, since sometimes the corresponding XBL binding would be eagerly created as well. Something that makes the data easy to visualize and discuss in bug comments and reviews is I think what we should be aiming form.

I remember Zibi mentioning that the raptor tool for Firefox OS (which would generate output like https://bugzilla.mozilla.org/show_bug.cgi?id=1203108#c4) as being very helpful for this type of thing. He suggesting bringing something like it to the startup path for desktop as well. That's a bigger change since it would require refactoring and instrumenting the startup path, so maybe this Custom Element initialization dataset could be a candidate to make some of the decisions above and get tooling in place for future measurements.
Flags: needinfo?(bgrinstead)
Priority: -- → P3
Blocks: 1520350

From IRC:

22:47:41 <florian> looking at Mike's profile... How do we feel about all these custom element JS files being loaded eagerly? https://perfht.ml/2UoBppG
22:48:53 <
florian> that's taking longer to load than the big browser.js file

I'd been hoping to be able to look at this stuff myself soon, but I keep being distracted by other stuff. It'd be useful if someone could start looking at this.

The template solution from bug 1520350 has caused quite a few regressions so far, but we probably need to consider something vaguely like that (though for some of the bindings that I'm seeing showing up when opening browser.xul:

0:11.30 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/customElements.js
0:11.30 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/general.js
0:11.30 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/checkbox.js
0:11.30 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/menu.js
0:11.30 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/notificationbox.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/popupnotification.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/radio.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/richlistbox.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/autocomplete-popup.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/autocomplete-richlistitem.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/textbox.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/tabbox.js
0:11.31 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/tree.js
0:11.32 GECKO(92563) cache miss: jssubloader/global/resource/gre/chrome/toolkit/content/global/elements/wizard.js

we should probably instead be wondering why we're even loading those files... )

Blocks: 1522877

(In reply to :Gijs (he/him) from comment #6)

we should probably instead be wondering why we're even loading those files... )

We added the chrome-only setElementCreationCallback onto CustomElementRegistry as a way to avoid eager loading when it's not necessary. In general I've tried to use that instead of eager subscript loading if the defined CE aren't in browser.xul at startup. But since we have no automated test to prevent it, I'm guessing some have slipped through the cracks here (wizard, tree, and maybe some of the autocomplete stuff seem like they could be candidates here). There are a couple of downsides to lazifying, primarily that extended classes (like places-tree, which extends tree) or callers (like we do for findbar) need to handle the "non registered" case instead of being able to assume they are there. If we will see real startup time improvements though, it's probably worth that trade-off.

A test that iterated defined CEs after a browser window opens and checks to make sure at least one has been constructed may be a good start here. I'm digging out from being away at the moment so likely won't have time to work on it this week, but I'll think about it some more and bring it up in our weekly meeting on Wednesday.

Whiteboard: [fxperf]

Giving this fxperf:p2 since it's startup-related, and the earlier we can get proactive solutions in place the less reactive work we'll have to do later.

Whiteboard: [fxperf] → [fxperf:p2]

This isn't intended to land, just a demonstration of a way we could get more insight into how much
time we are spending inside of individual classes.

Depends on: 1541516

Comment on attachment 9053702 [details]
Bug 1515799 - POC - Instrument base custom element class and print call information in a table at startup

Revision D24953 was moved to bug 1541516. Setting attachment 9053702 [details] to obsolete.

Attachment #9053702 - Attachment is obsolete: true
No longer blocks: 1522877

Hm, the instrumentation in bug 1541516 seems to suggest there's a lot more than I thought - though perhaps some of it is new since I filed this bug. Besides the things already on file, I'm also seeing more text, dropdown, richlistbox and menulist widgets created than I would have expected... I'm also seeing stats printed for the hidden window's XUL, which I really wasn't expecting because in principle I'd expect there to be basically no actual XUL in there at all...

Brian, does the browser architecture group have cycles to look at this some more? ("Unfortunately no" is understandable, but I figured I'd ask.)

Flags: needinfo?(bgrinstead)

(In reply to :Gijs (he/him) from comment #11)

Hm, the instrumentation in bug 1541516 seems to suggest there's a lot more than I thought - though perhaps some of it is new since I filed this bug. Besides the things already on file, I'm also seeing more text, dropdown, richlistbox and menulist widgets created than I would have expected...

By the way, the instrumentation counts aren't very smart right now. It doesn't show self time and will double count calls like if connectedCallback() -> render() then the time spent in render also gets double counted in the totals (as part of both functions).

I'm also seeing stats printed for the hidden window's XUL, which I really wasn't expecting because in principle I'd expect there to be basically no actual XUL in there at all...

The hidden window does have XUL on OSX: https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/browser/base/content/hiddenWindow.xul#18. I have a condition in the upcoming <menu> CE conversion to explicitly not fire connectedCallback logic for that - perhaps we should make that more general.

Brian, does the browser architecture group have cycles to look at this some more? ("Unfortunately no" is understandable, but I figured I'd ask.)

I've spent some time looking at optimizing parseXULToFragment (like, try using <html:template>, automatically cache the fragment and clone for reuse, etc). Nothing was really showing much win there unfortunately. I've talked with Brendan about investigating it more generally (probably using the <label> patch in https://bugzilla.mozilla.org/show_bug.cgi?id=1448213 for reference since there are a ton of them in tree).

Flags: needinfo?(bgrinstead)

(In reply to Brian Grinstead [:bgrins] from comment #12)

(In reply to :Gijs (he/him) from comment #11)

Hm, the instrumentation in bug 1541516 seems to suggest there's a lot more than I thought - though perhaps some of it is new since I filed this bug. Besides the things already on file, I'm also seeing more text, dropdown, richlistbox and menulist widgets created than I would have expected...

By the way, the instrumentation counts aren't very smart right now. It doesn't show self time and will double count calls like if connectedCallback() -> render() then the time spent in render also gets double counted in the totals (as part of both functions).

Also I wanted to note that one concern I had with landing the probes in bug 1541516 was that it's the only thing that we are instrumenting in this way, so it could lead to a focus on Custom Element performance that doesn't exist for other code (and specifically the XBL code that we are migrating away from).

So if we have a <foo> element that's currently XBL I was concerned the instrumentation would lead to pushback against landing the MozFoo Custom Element, because we now see time showing up in the instrumentation that wasn't there before. And there are obviously cases where this is new time that we want to avoid as much as possible (like if there are lots of hidden <foo> elements with expensive <constructor>s that didn't used to run), but that isn't always the case. Sometimes we are just moving time away from XBL <foo> to Custom Element <foo>.

In the end I wanted it to land it because I think it'll be valuable to see this information and it's a tool that will help us do a better job with performance, but I don't think it makes sense as a metric that's tracked in isolation. If we do want to start tracking it as a performance metric then I think it needs a thorough audit and some improvements (to account for stuff like self-time I mention above, potentially integration with the profiler, etc).

I don't think we're at a point where it's usable to track as a perf metric where we check for regressions, no.

I was more surprised by some of the data it returned, like seeing 2 MozAutocompleteRichlistboxPopup, 3 RichListBox, 16 MozPopupNotification (though I think we're avoiding doing much there ie I'm not seeing connectedCallback, though I'm seeing attribute inheritance, where I wonder if we should be doing that before doing any other work) and 34 MozDropmarker (when on a clean profile, I don't think any dropmarkers are visible in the browser window at launch), 5 MozMenuList (again, I think 0 visible), and a bunch of MozXULMenu elements despite being on mac (so nothing's visible in the browser window. There's a few other things that look odd, too. It just seems worth poking at...

(In reply to :Gijs (he/him) from comment #14)

I don't think we're at a point where it's usable to track as a perf metric where we check for regressions, no.

I was more surprised by some of the data it returned, like seeing 2 MozAutocompleteRichlistboxPopup, 3 RichListBox, 16 MozPopupNotification (though I think we're avoiding doing much there ie I'm not seeing connectedCallback, though I'm seeing attribute inheritance, where I wonder if we should be doing that before doing any other work) and 34 MozDropmarker (when on a clean profile, I don't think any dropmarkers are visible in the browser window at launch), 5 MozMenuList (again, I think 0 visible), and a bunch of MozXULMenu elements despite being on mac (so nothing's visible in the browser window. There's a few other things that look odd, too. It just seems worth poking at...

Yeah worth looking into anything that looks out of the ordinary, for sure.

  • One of the menulists is https://searchfox.org/mozilla-central/rev/1b2636e8517aa48422ed516affe4d28cb7fa220a/browser/base/content/browser.xul#248-260. Perhaps we could <template> that or build it from JS and only initialize it when a <select> is actually opened. Also, menulist inherits from MozXULMenu so it's likely the data for that is really coming from a menulist.
  • For dropmarkers, in general just attaching a pretty empty CE should be very fast. I did some micro benchmark on this a while ago but it might be worth double checking and asking DOM folks if we are seeing a slowdown here vs a plain element. I do see it has a connectedCallback but it does very little. If there are any in hidden popups etc we could at least delay rendering until popupshowing.
  • MozAutocompleteRichlistboxPopup, RichListBox, MozPopupNotification may all be candidates for lazification. I guess we'll need some kind of guidelines and best practices to decide if/how to do it (https://bugzilla.mozilla.org/show_bug.cgi?id=1544027 might be the place to start figuring that out).
Keywords: meta
Summary: Evaluate how we can avoid slowly regressing startup time/performance by doing more construction work for custom elements than we used to do for XBL → [meta] reduce impact/overhead of custom element use on startup
Depends on: 1624657
Depends on: 1534371
Depends on: 1624683
Depends on: 1639925
No longer blocks: 1515795
Depends on: 1515795

There's still some room for further improvement here, e.g. https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=5d0c71e9afe0d6684a311731f7978b051350c94a&newProject=try&newRevision=f75302ba5da18f6b0c1943039f0c9d44a3bdfe4b&framework=1

This trypush:

  1. removes the download panel overlay (which has button elements which have non-trivial overhead, as well as a richlistbox and a number of popups that come with arrowscrollboxes which all also have overhead)
  2. removes the 4 menulist instances that we create for each window (1 for the ContentSelectDropdown, 3 in webrtc popupnotifications)
  3. removes the checkbox instance we create in each window (in the password popup notification)

I'm seeing 10-20ms drops in timings for window opening tests, which amounts to anywhere between 1.2 and 9% improvements on sessionstore, ts_paint, and twinopen.

The simplest way to reduce this overhead for this naive trypush was to disable the bindings and/or use templates / XML comments to avoid these nodes being in the DOM. To fix this "for real", we could either create/insert some of this markup dynamically, or we could try and investigate why these particular custom elements have such significant overhead, and try to address that. The former is easier, but also makes it easier to regress some of this again in future. The latter is harder, and it may not be possible to fix some of the overhead from the bindings without breaking legitimate uses of the binding.

Besides what's in the trypush, there's also more that could be done around the overhead created by toolbarbuttons - there are some 40-odd ones left in a vanilla browser window that take some 10ms to be built/rendered on my local (fast, developer) machine. These 40 items include 7 toolbarbuttons in the sidebar (hidden by default), and 11 minimize/maximize/restore/close buttons (of which only 3 are visible to start with, with most never becoming visible - we have button items in each toolbar, for use with fullscreen and/or various forms of tabsintitlebar with the menubar visible / invisible). Then there are some other hidden items, like the ion and whatsnew buttons, which are not usually displayed on most startups (including first startup).

We also create labels for each of those 40-odd toolbar buttons, when only the bookmarks and import button have visible labels, and despite most of the toolbarbutton's binding time being spent on the labeling, inheriting of attributes, and dealing with potential other child nodes - so it may be worth splitting the binding so that we use a separate one for buttons that just have an icon, and for which we never show the label.

Depends on: 1686254
Depends on: 1686264
Depends on: 1686266
Performance Impact: --- → P2
Whiteboard: [fxperf:p2]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: