Closed Bug 902857 Opened 11 years ago Closed 11 years ago

Figure out why Australis tpaint reflows take longer than m-c on XP

Categories

(Firefox :: General, defect)

All
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: MattN, Unassigned)

References

()

Details

(Keywords: perf, Whiteboard: [Australis:P5][Australis:M?])

Attachments

(3 files, 3 obsolete files)

According to the data from reflow observers (which include the start and end time of a reflow), Australis is spending ~33% more time in reflows on XP (despite the number of reflows being roughly the same). See the Reflows tab of the spreadsheet at https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0Asj8iLTl0K0UdDcxemJEWVVXWTlKek1wR09iS2FsOXc&pli=1#gid=2

Jet/Markus, could we get help figuring out where the additional time is coming from (perhaps via xperf or some other profiler). The SPS profiler doesn't have enough precision for this with the minimum 1ms sampling interval. Your team would have a much better understanding of how to profile this.

We're blocked on landing Australis on m-c until we get our tpaint numbers closer in line. See the graph if you're interested: http://bit.ly/195a3rz

Removing most of our new CSS only seems to account for a portion of the difference: http://compare-talos.mattn.ca/index.html?oldRevs=9707657c07a4&newRev=4a6d87844bcf&submit=true

Thanks in advance
Flags: needinfo?(mstange)
Flags: needinfo?(bugs)
Just talked to :BenWa about this, and he thinks this ~33% regression is a pretty significant / surprising finding, but yeah - the Gecko Profiler's not going to be much help to us here.

He suggested we might want to pull :roc into this too.
And I should probably actually Cc :roc too while I'm at it.
Or, even better, needinfo.
Flags: needinfo?(roc)
Why isn't SPS helpful? It should work fine as long as you do enough reflows in the sampling interval.
Flags: needinfo?(roc)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #4)
> Why isn't SPS helpful? It should work fine as long as you do enough reflows
> in the sampling interval.

1) The sample interval has a minimum of 1ms on Windows while we're looking for regressions around that size. Doing enough reflows in the time period can make the profiles quite large.
2) Bug 900524 makes stacks incorrect on XP
3) The profiler extension itself adds quite a bit of noise to the profiles because of its UI. On the other hand, if I profile with env. variables on the command-line then I have to fight with symbolification until we have https://github.com/bgirard/Gecko-Profiler-Addon/issues/27
The main issue is that the Firefox front-end team doesn't have the knowledge to analyze layout code in the profiles and that's what we'd like help with.
Hey Neil,

Something like bug 783064 comment 0 but for tpaint (window open) would probably help here. We could do the same for m-c and UX and compare the two. Is this something you could help us with?

Thanks
Flags: needinfo?(enndeakin)
That log was generated by outputting stack traces at PresShell::FrameNeedsReflow and where reflows occur, the latter of which can now be checked via a reflow observer.
Flags: needinfo?(enndeakin)
Attached patch instrumentation patch (obsolete) — Splinter Review
Here's the beginning of a reflow profiler. This adds an environment variable called MOZ_REFLOW_PROFILE_FILE which you can set to a file path where a profile will be created on shutdown. This profile won't contain callstacks but HTML/XUL trees for every reflown document, and the weight of each HTML/XUL element node will be the time that this node's frames spent in nsBox::Layout. Here's an example profile: http://tests.themasta.com/cleopatra/?report=e67ac706f10f1c717933f2cd35e7682d77ba6d5c

Please test the resulting profiles with the cleopatra instance at http://tests.themasta.com/cleopatra/ which has some changes to make it work.

Timings are taken using TimeStamp::Now() which is as precise as it gets.

Tomorrow I'll look into whether nsBox::Layout is actually the method we want to profile, how to add more granularity for the different parts of reflow, how to hook this up to Talos, how to create comparison profiles, and what the actual differences caused by Australis are.
Flags: needinfo?(mstange)
This is awesome. Thank you so much, Markus!
Attached patch instrumentation patch v2 (obsolete) — Splinter Review
I've expanded it to cover more reflow-related methods. I'm also including restyle and some painting timings now, so it's not really a "reflow profiler" any more. Those who are only interested in Reflow timings can just use Cleopatra's filter box.

I've started try builds for mozilla-central and UX.
mozilla-central: https://tbpl.mozilla.org/?tree=Try&rev=108b29fe21c2
UX: https://tbpl.mozilla.org/?tree=Try&rev=d4dabda4a1ad
Attachment #792357 - Attachment is obsolete: true
Flags: needinfo?(bugs)
Two new try runs (I failed to include the Talos change in the previous ones):
m-c: https://tbpl.mozilla.org/?tree=Try&rev=6b44b37512ac
UX: https://tbpl.mozilla.org/?tree=Try&rev=6496798ada52
Thanks for this Markus.

(In reply to Markus Stange [:mstange] from comment #12)
> Two new try runs (I failed to include the Talos change in the previous ones):

These didn't work because of bug 899064. You will need to disable page thumbs like I did in my repo: http://hg.mozilla.org/users/mozilla_noorenberghe.ca/talos-sps-profiling/diff/4d91a753c7c4/talos/test.py
Oh, but they did work, because I'm not doing SPS profiles this time.

Here are the results - warning, they're big:
http://tests.themasta.com/reflow-profile-tpaint-winxp-nightly-6b44b37512ac.txt.zip
http://tests.themasta.com/reflow-profile-tpaint-winxp-ux-6496798ada52.txt.zip
http://tests.themasta.com/reflow-comparison-profile-nightly-6b44b37512ac-ux-6496798ada52.txt.zip

Look at them with http://tests.themasta.com/cleopatra/ and filter by "reflow" if you want to know where reflow spends its time.

Looking at the profiles, I don't really get much more than "yes, reflow takes longer, almost everywhere" out of it, but maybe you can find something. The toolbar part is a little hard to analyze because the customization stuff changed the tree structure.
(In reply to Markus Stange [:mstange] from comment #14)
> Oh, but they did work, because I'm not doing SPS profiles this time.
> 
> Here are the results - warning, they're big:
> http://tests.themasta.com/reflow-profile-tpaint-winxp-nightly-6b44b37512ac.
> txt.zip
> http://tests.themasta.com/reflow-profile-tpaint-winxp-ux-6496798ada52.txt.zip
> http://tests.themasta.com/reflow-comparison-profile-nightly-6b44b37512ac-ux-
> 6496798ada52.txt.zip
> 
> Look at them with http://tests.themasta.com/cleopatra/ and filter by
> "reflow" if you want to know where reflow spends its time.
> 
> Looking at the profiles, I don't really get much more than "yes, reflow
> takes longer, almost everywhere" out of it, but maybe you can find
> something. The toolbar part is a little hard to analyze because the
> customization stuff changed the tree structure.

First off, Markus, am I correct in thinking these profiles are from tpaint runs (i.e. 20 window opens, plus the initial open)? So an extra ~21ms seen translates to a ~1ms tpaint difference? And are these PGO or non-PGO builds?

So, the two main places I've noticed a strong difference already are the navbar and the tabstoolbar. As the navbar indeed changed structure quite a bit, I started with trying to figure out the tabstoolbar (from 115 to 135). This is more or less completely because of the single tab layout time, which went from 63.332 to 82.473. This is interesting because it probably also affects TART, and not just tpaint.

I noticed two things:
- we spend almost twice as much time calculating the tab label's text's width (GetStringWidth went from 12.883 to 20.784). For reference, this is for the label.tab-text.tab-label. I completely fail to understand why this would be the case. I realize the absolute difference will translate to something like 0.4ms on tpaint, but also where TART is considered at least understanding this would be quite good, I think. This increase completely offsets the perf gain from no longer having an icon (for this tab).
- overall, we spend 23 rather than 4ms on the tab background, which is much more complicated than it was before. I wonder if it'd pay off to somehow cache the result of what the background + foreground look like in an image and use that rather than the complicated combination of :before, :after, and the element itself, plus svg masks, for both the start and end of the tab (the middle still takes up a negligible amount of time).
More notes:

On m-c, the hbox#browser takes 36.4ms total. On ux, the deck#content-deck that we added takes 44.7ms. So we lose about 8ms in total by having that, which translates to 0.4ms in tpaint numbers, going on the assumption that these were indeed 20(+1) runs. Half of that is the extra work we have to do to deal with the <deck>, the other half is everything below it getting more difficult. The customization mode XUL doesn't seem to be involved at all, even though it's there - so that means inserting it dynamically likely doesn't get us anything (so I'll be wontfixing bug 900408 in a second). What will help getting us those 0.4ms back is removing the deck and doing customize mode "differently" - either in-content, or by some other way that doesn't involve changing the browser's main (initially used) XUL as much.

However, the cost/benefit ratio of doing that work now is high, and I think there are bigger wins elsewhere. In particular, on the entire window we're about 220ms out (so 10ms per run?) which means there ought to be lower-hanging fruit than this. Most of it (210ms) is in the toolbox. Concrete numbers:

navbar: 146.5ms
tabstoolbar: 29.5ms (see comment #15)
menubar: 21.2ms
add-on bar: 8.2ms

which leaves 6ms for all the bits and pieces of the node itself. When we tried removing a bunch of our CSS (lwt-related stuff, customization related stuff, panel related stuff), we seemed to gain about ~0.3ms on tpaint (I seem to have lost the bug number... Matt?), which may be partially responsible for the "small bits" of difference. I suspect that might be part of what's causing "everything to be a little bit slower". However, I hope that attacking more specific bits of the aforementioned toolbars (which regressed more heavily) will get us better cost/benefit.
(In reply to :Gijs Kruitbosch from comment #15)
> First off, Markus, am I correct in thinking these profiles are from tpaint
> runs (i.e. 20 window opens, plus the initial open)? So an extra ~21ms seen
> translates to a ~1ms tpaint difference? And are these PGO or non-PGO builds?

For the record, these profiles contain what happens during those parts of the tpaint run that contribute to the measured times, so only the 20 child windows, and not the initial parent window open. And they are from a non-PGO build.
I'll start PGO try builds now.
So from the profile, I'm fairly sure that on m-c, the contents of the nav-bar get lay-out once, and on UX, this happens twice. I even have a slight suspicion why: this may be caused by the overflow button, which shows up in the UX profile for 4.8ms overall. Its CSS only sets display: none in these circumstances:

toolbar[customizing][overflowable] > .overflow-button,
toolbar[overflowable]:not([overflowing]) > .overflow-button

Unfortunately, the overflowable attribute isn't set on the navbar initially (this happens in the XBL constructor), so the button gets built (including loading its icon and such) and then we hide it. This makes more space available in the navbar, thus triggering a reflow. If we change the CSS to always hide the overflow-button and show it when [overflowable][overflowing], hopefully we can fix this? I need to run out for a short while due to family obligations, but when I get back, if nobody beats me to it I'll do a try run to test this hypothesis. Meanwhile, needinfo'ing Jared to see if he thinks this is plausible and if I understood the overflowable toolbar stuff correctly  (it may be helpful to look at the profile yourself and/or see discussion on IRC).

(of course, there may also be other reasons that the navbar is dirty and needs to be layout again... but it's hard to know for sure)
Flags: needinfo?(jaws)
Yes, I think we can make the change suggested in comment #18. I'll make this change and file it as blocking this bug.
Flags: needinfo?(jaws)
(In reply to Jared Wein [:jaws] from comment #19)
> Yes, I think we can make the change suggested in comment #18. I'll make this
> change and file it as blocking this bug.

We dealt with this in bug 907787, which has now landed.
Depends on: 907787
Markus/Matt: one thing that confuses me is that these profiles suggest a 220ms total time discrepancy, which equates to an 11ms per tpaint run reflow time discrepancy. That doesn't match Matt's measurements, which suggested we do a grand total of ~3ms worse (see the spreadsheet at https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0Asj8iLTl0K0UdDcxemJEWVVXWTlKek1wR09iS2FsOXc#gid=2 ).

Do we know where the difference comes from? Does the profile method measure more than just the times between the start and end of a reflow?
(In reply to :Gijs Kruitbosch from comment #21)
> That doesn't match Matt's measurements, which suggested we
> do a grand total of ~3ms worse (see the spreadsheet at

This should say *average of ~3ms worse* (per individual window open / tpaint run / thingummywhatsit). So grand total of 3 * 20 = 60, compared to 220, or 3 compared to 11. Regardless, that difference doesn't make sense to me.

Separately, I'd be interested to see a try run of current UX tip with the reflow timings from tpaint as Matt did before, and/or new profiles as Markus did. That'd tell us how much bug 907787 moved the needle, and/or how much room for improvement that leaves us...

Markus, if I wanted to do this tomorrow, is this just a question of pushing your patch to try on top of UX tip? How does one take the results out of the try rev?
(In reply to :Gijs Kruitbosch from comment #21)
> Do we know where the difference comes from?

Not really, this discrepancy confuses me, too. The instrumentation itself does have perf impact, but the work it does only happens at the end of each reflow, after the times have been recorded. But it still might throw other things off. 

(In reply to :Gijs Kruitbosch from comment #22)
> Markus, if I wanted to do this tomorrow, is this just a question of pushing
> your patch to try on top of UX tip?

Yes, plus https://hg.mozilla.org/try/rev/17dd53d4cf43 so that the reflow profile is collected and dumped to the log.

> How does one take the results out of the
> try rev?

Using changes that I'll push to https://github.com/mstange/analyze-tryserver-profiles tomorrow.
(In reply to Markus Stange [:mstange] from comment #23)
> (In reply to :Gijs Kruitbosch from comment #22)
> > Markus, if I wanted to do this tomorrow, is this just a question of pushing
> > your patch to try on top of UX tip?
> 
> Yes, plus https://hg.mozilla.org/try/rev/17dd53d4cf43 so that the reflow
> profile is collected and dumped to the log.

Pushed: https://tbpl.mozilla.org/?tree=Try&rev=773f18cfe9c9 on top of current UX tip ( https://hg.mozilla.org/projects/ux/rev/542ac829409f ).
(In reply to :Gijs Kruitbosch from comment #24)
> (In reply to Markus Stange [:mstange] from comment #23)
> > (In reply to :Gijs Kruitbosch from comment #22)
> > > Markus, if I wanted to do this tomorrow, is this just a question of pushing
> > > your patch to try on top of UX tip?
> > 
> > Yes, plus https://hg.mozilla.org/try/rev/17dd53d4cf43 so that the reflow
> > profile is collected and dumped to the log.
> 
> Pushed: https://tbpl.mozilla.org/?tree=Try&rev=773f18cfe9c9 on top of
> current UX tip ( https://hg.mozilla.org/projects/ux/rev/542ac829409f ).

This broke because everyone is removing #includes all over the place. (which is a good thing, don't get me wrong! :-))

I added an #include for nsAttrValue.h in nsBox.cpp, and that seems to fix it. I've attached the updated patch.
Attachment #792847 - Attachment is obsolete: true
Here's a profile from a PGO mozilla-central run.
This was the result of my UX tip run. It's still significantly worse than m-c, although better than the previous profile (not 100% sure how much of that is PGO and how much is the recent patch).

Jared had a good point on IRC that we should just set overflowable="true" on the nav-bar by default (or equivalent) in order to address potential double work caused by: http://hg.mozilla.org/projects/ux/file/006d1a392367/browser/base/content/browser.css#l154

Local testing says that defaulting the navbar to overflowable="true" helps quite a bit, so I'll file a bug on that in a second.
Depends on: 908326
Pre-emptive new try push w/ PGO on top of the merge after bug 908326 landed to see how reflows are doing: https://tbpl.mozilla.org/?tree=Try&rev=96358de2a46f
The updated profile shows a marked improvement compared to the previous one - but we're still doing significantly worse than m-c in reflow-land (the difference is now down to about 118ms in these profiles). 

Rough divvy-ing up those 118ms:
- 6ms before we get to the toolbox, 112ms in the toolbox
- nav-bar: 43ms worse (18% increase of time compared to m-c)
- tabstoolbar: 34ms worse (29% increase of time compared to m-c - see comment #15 for my guesses as to why)
- menubar: 20ms worse (30% increase of time compared to m-c)
- add-on bar: 8ms worse (very large increase of time compared to m-c, where it was negligible, because turned off by default)
- layout of self: 2ms worse (25% increase of time compared to m-c)
- min/max/prefsize of self: 2.5ms worse (25% increase of time compared to m-c)

I'm not sure we can really get back an actual ~5ms (which these absolute ms numbers imply we're still losing on tpaint) because the test does seem to have overhead (see previous comments). However, the relative increase in workload here (ie the percentages) makes me think we could make another dent here if we looked closely.
Attachment #794157 - Attachment is obsolete: true
(In reply to :Gijs Kruitbosch from comment #30)
> Created attachment 794572 [details]
> UX-merged-profile-tpaint-winxp-96358de2a46f.txt.zip

So the url-bar costs go something like this:

- 17.6ms: bookmarks-menu-button (you add new chrome, it costs something. Surprising: the dropmarker is more expensive than the icon itself, and within the dropmarker, the :before on it is almost as expensive as the icon itself... what's that :before doing there?)

- 13.4ms: difference between the "self" stuff for the navbar on m-c compared to the "self" stuff for both the navbar and the customization target on UX. I expect the overflow styling is responsible for some of this (MinSize call costs went up considerably, as did layout) and the extra nesting for another.
- 7.2ms: PanelUI-button
- 7.8ms: urlbar-container + unified back/fwd button compared to our URL bar container. Not 100% sure where exactly that went, but one thing I did notice is that the identity box by default doesn't have an identity class, and then switches to unknownIdentity for the talos test. I wonder if it should be set to unknownIdentity by default (but this is unlikely to gain us anything of significance on the actual talos test).

- -1ms: losing the reload and stop buttons in the URL bar
- -2ms: search container seems faster (!?)

The tabs stuff is explained in comment #15, so I looked at the menubar next. 

While the placeholder itself (for rendering in the titlebar) takes a slight bit of blame there, we do just generally spend a lot longer in every single bit of it. Again, I suspect we're probably invalidating this bit of the tree more often than m-c is. At first I thought this could be the placeholder, but if I understand bug 901786 correctly, this should no longer be changing size. So I expect this might be triggered by our tweaking the menubar's bottom padding for the tabsintitlebar code. After landing bug 885062, I'll look at if we can add styling for some of this so that the net result for our calculations matches builtin CSS, and that might nudge this a bit more still. OTOH, this would gain us on the order of 1ms according to the profile, which based on experience I'd say is probably something like 0.4ms on the actual talos tests run without the reflow profiling... so not sure that'll be worth it, but I'll have a look anyway.
Whiteboard: [Australis:P?][Australis:M?]
Whiteboard: [Australis:P?][Australis:M?] → [Australis:P5][Australis:M?]
From https://wiki.mozilla.org/Firefox/Australis/Expected_talos_changes#tpaint

"tpaint Windows XP (+2.0% Non PGO, +2.0% PGO)
The performance team has expressed that they are not concerned about this regression. (See bug 889758 and dependent bugs for the investigation)"
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: