Closed Bug 1456728 Opened 3 years ago Closed 3 years ago

6.89 - 8.12% cpstartup content-process-startup (linux64) regression on push d3b5dfca6a90 (Tue Apr 24 2018)

Categories

(Core :: XUL, defect)

Unspecified
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 + fixed

People

(Reporter: igoldan, Assigned: timdream)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=2a760b1c0759ef67bb3fd7892a8cbac787f0a899&tochange=d3b5dfca6a90a05875837d846c3b299690b6e10b

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  8%  cpstartup content-process-startup linux64 opt e10s stylo     196.50 -> 212.45
  7%  cpstartup content-process-startup linux64 pgo e10s stylo     190.00 -> 203.08


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12892

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
:timdream It looks like bug 1445214 caused a performance regression on Linux platforms. Is this something we can fix or should we consider accepting it or backing it out?
Flags: needinfo?(timdream)
Component: Untriaged → XUL
Product: Firefox → Core
I wouldn't have expected too much of a change from that patch since we are still building the same DOM. Interesting that it's only on Linux though.
Is there zero impact on this test on other platforms, or is it just a smaller regression that didn't get alerted?
We are building the same DOM but on a different timing, perhaps multiple times during start-up if bindings get load later.

I'll look into this.
Assignee: nobody → timdream
Status: NEW → ASSIGNED
Flags: needinfo?(timdream)
(In reply to Brian Grinstead [:bgrins] from comment #4)
> Is there zero impact on this test on other platforms, or is it just a
> smaller regression that didn't get alerted?

https://treeherder.mozilla.org/perf.html#/compare?originalProject=autoland&originalRevision=2a760b1c0759ef67bb3fd7892a8cbac787f0a899&newProject=autoland&newRevision=d3b5dfca6a90a05875837d846c3b299690b6e10b&framework=1

It varies... I am re-triggering tests to find out with more confidence.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #6)
> (In reply to Brian Grinstead [:bgrins] from comment #4)
> > Is there zero impact on this test on other platforms, or is it just a
> > smaller regression that didn't get alerted?

Actually, for this particular test, we don't seem to trigger it on all platforms. I have trouble with my account so I can't trigger it. Will have to wait for my account fixed.
OK. I don't know what happened to treeherder, but I can reproduce the performance regression on macOS.

Let me try to fix this before the merge, backout if we couldn't find out a fix before that.
What I trying to do right now is to locate the call site that makes the things run slower. Surprisingly, I can make it slow not by implementing nothing in nsScrollbarFrame and just remove the -moz-binding in CSS.

I don't find any reference to scrollbar that is relevant.

The next thing I will do is to look at the profile produced locally.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #9)
> What I trying to do right now is to locate the call site that makes the
> things run slower. Surprisingly, I can make it slow not by implementing
> nothing in nsScrollbarFrame and just remove the -moz-binding in CSS.

That is.. surprising. Why would removing a binding attachment slow things down?
Hi Mike,

Would you mind give me some pointers here? I am not entirely sure what I am seeing here. Let me know if you have some ideas that you would like me to try out. Thanks!
Flags: needinfo?(mconley)
(In reply to Brian Grinstead [:bgrins] from comment #10)
> That is.. surprising. Why would removing a binding attachment slow things
> down?

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newRevision=6e5e3f3a6d76&framework=1

False alarm; without both the patch and the binding speeds things up. I am probably not using Talos correctly locally here...
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #12)
> (In reply to Brian Grinstead [:bgrins] from comment #10)
> > That is.. surprising. Why would removing a binding attachment slow things
> > down?
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newR
> evision=6e5e3f3a6d76&framework=1
> 
> False alarm; without both the patch and the binding speeds things up. I am
> probably not using Talos correctly locally here...

Wow, the results there imply there are some real perf gains to be had by speeding up scrollbars (i.e. by implementing them with a single layout frame). Looks like there are significant improvements in:
* a11yr
* cpstartup
* sessionrestore
* sessionrestore_many_windows
* sessionrestore_no_auto_restore
* tabpaint
* tpaint
* ts_paint

I wonder if in the meantime we could limit the amount of NAC that we generate based on platform. For example, don't create extra scrollbarbuttons if they don't get rendered on that platform (right now that's controlled by media queries like -moz-scrollbar-start-backward).
It also disproves one theory I had which was that the XBL binding got lazily attached *after* whatever event cpstartup was waiting for, but that the NAC gets created before it. If that were the case I wouldn't expect cpstartup to budge when you removed the -moz-binding rule.
(In reply to Brian Grinstead [:bgrins] from comment #14)
> I wonder if in the meantime we could limit the amount of NAC that we
> generate based on platform. For example, don't create extra scrollbarbuttons
> if they don't get rendered on that platform (right now that's controlled by
> media queries like -moz-scrollbar-start-backward).

Maybe this the easy way out to make the nsScrolbarFrame implementation quicker than the XBL implementation :)

--

Anyway, for the profiles, I couldn't find any relevant functions with the word "scroll" in it so I started looking at the flame graph directly. The only box that is longer than 10ms in the "after" profiles comparing to the "before" profiles is "(root scope) chrome://browser/content/content.js". Could the difference of the DOM tree at the time affect the execution time of the script? I wonder if we iterate the DOM in any way from there.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #16)
> (In reply to Brian Grinstead [:bgrins] from comment #14)
> > I wonder if in the meantime we could limit the amount of NAC that we
> > generate based on platform. For example, don't create extra scrollbarbuttons
> > if they don't get rendered on that platform (right now that's controlled by
> > media queries like -moz-scrollbar-start-backward).
> 
> Maybe this the easy way out to make the nsScrolbarFrame implementation
> quicker than the XBL implementation :)

I guess you could do a compare push where you unconditionally remove them just to get a sense of how much potential win there is. Another thing to check could be to create the full NAC but don't propagate any attributes onto the children.
(In reply to Brian Grinstead [:bgrins] from comment #17)
> I guess you could do a compare push where you unconditionally remove them
> just to get a sense of how much potential win there is. Another thing to
> check could be to create the full NAC but don't propagate any attributes
> onto the children.

You would think improvement like this will show up in the profile, but it didn't. Is it possible that this function is fast but being called for too many times, so it didn't show up on the profile?

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newRevision=f8392445a6aea4ff97c304a6438cb5bcb0713d5d&framework=1

(The patch to test here is https://hg.mozilla.org/try/rev/c5be4fbe6626ec15bda877f05bc0cb6232bed2e3, which makes nsScrollbarFrame::UpdateChildrenAttributeValue() no-op)

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #16)
> (In reply to Brian Grinstead [:bgrins] from comment #14)
> > I wonder if in the meantime we could limit the amount of NAC that we
> > generate based on platform. For example, don't create extra scrollbarbuttons
> > if they don't get rendered on that platform (right now that's controlled by
> > media queries like -moz-scrollbar-start-backward).
> 
> Maybe this the easy way out to make the nsScrolbarFrame implementation
> quicker than the XBL implementation :)

I am tempted to just start working on this. Simply, probing LookAndFeel from nsScrolbarFrame and get rid of the CSS selectors and style. If this is done before the branch we should be able to avoid a backout from beta?
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #19)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment
> #16)
> > (In reply to Brian Grinstead [:bgrins] from comment #14)
> > > I wonder if in the meantime we could limit the amount of NAC that we
> > > generate based on platform. For example, don't create extra scrollbarbuttons
> > > if they don't get rendered on that platform (right now that's controlled by
> > > media queries like -moz-scrollbar-start-backward).
> > 
> > Maybe this the easy way out to make the nsScrolbarFrame implementation
> > quicker than the XBL implementation :)
> 
> I am tempted to just start working on this. Simply, probing LookAndFeel from
> nsScrolbarFrame and get rid of the CSS selectors and style. If this is done
> before the branch we should be able to avoid a backout from beta?

I'd like to see some initial numbers that would indicate if doing this will fix the regression (like a try push that just removes those elements unconditionally). If it's not looking like a pretty easy fix we should consult with :mconley on the impact/reliability of this test, but I'm guessing we should back out to give ourselves more time to figure it out with the merge coming up.
(In reply to Brian Grinstead [:bgrins] from comment #20)
> > I am tempted to just start working on this. Simply, probing LookAndFeel from
> > nsScrolbarFrame and get rid of the CSS selectors and style. If this is done
> > before the branch we should be able to avoid a backout from beta?
> 
> I'd like to see some initial numbers that would indicate if doing this will
> fix the regression (like a try push that just removes those elements
> unconditionally). If it's not looking like a pretty easy fix we should
> consult with :mconley on the impact/reliability of this test, but I'm
> guessing we should back out to give ourselves more time to figure it out
> with the merge coming up.

Oh wait, is the compare link in Comment 19 doing that? Because if so, that's a pretty serious potential improvement on a bunch of tests.. so it's definitely worth trying!
Here are the first runs for each of the profiles that timdream generated in comment 18:

Without patch ("good"): https://perfht.ml/2Hzg1aC
With patch ("bad"): https://perfht.ml/2HA53lt

A few things stand out to me:

1. The biggest difference seems to be the length of time that we're waiting for the content process to initialize and start showing samples in the profiler. Using the end of the "CreateBrowser" samples in the parent process main thread as a starting point, the amount of waiting time waiting for the content process to start showing samples is:

Without patch ("good"): ~864ms
With patch ("bad"): ~2000ms

That's pretty staggering. Looking at the second run with the patch, the gap drops a bit down to ~1100s.

I'm going to request yet another set of profiles, but ask for all threads to be sampled.

2. As timdream already noted, gtk_init seems to take longer to run with the patch applied:

Without patch ("good"): https://perfht.ml/2vQrt0d (~18ms)
With patch ("bad"): https://perfht.ml/2HCRr8M (~43ms)

3. XRE_InitEmbedding2 also seems to take longer:

Without patch ("good"): https://perfht.ml/2vXM5DI (~32ms)
With patch ("bad"): https://perfht.ml/2vRr121 (~41ms)

4. The run with the patch seems to take a little longer running process and frame scripts, but I wouldn't call it a huge difference.


I think (1) is the one worth investigating first.
(In reply to Brian Grinstead [:bgrins] from comment #21)
> Oh wait, is the compare link in Comment 19 doing that? Because if so, that's
> a pretty serious potential improvement on a bunch of tests.. so it's
> definitely worth trying!

Sadly this is again false. It looks like I cannot generate Talos result with artifact mode. Look the difference with and without --artifact.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&framework=1

======

This is the result if I no-op nsScrollbarFrame::UpdateChildrenAttributeValue, it doesn't make any observable difference.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=f8392445a6aea4ff97c304a6438cb5bcb0713d5d&framework=1

This is the result if I no-op nsScrollbarFrame::CreateAnonymousContent (still building)

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=77e4e58fbfadc992b25dad5e899cf88f6f758522&framework=1
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #22)
> Here are the first runs for each of the profiles that timdream generated in
> comment 18:
> 
> Without patch ("good"): https://perfht.ml/2Hzg1aC
> With patch ("bad"): https://perfht.ml/2HA53lt
> 
> A few things stand out to me:
> 
> 1. The biggest difference seems to be the length of time that we're waiting
> for the content process to initialize and start showing samples in the
> profiler. Using the end of the "CreateBrowser" samples in the parent process
> main thread as a starting point, the amount of waiting time waiting for the
> content process to start showing samples is:
> 
> Without patch ("good"): ~864ms
> With patch ("bad"): ~2000ms

Sorry, the "with patch" profile here is build with artifact mode. I am going to spin up a non-artifact mode profile now.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #24)
> > Without patch ("good"): ~864ms
> > With patch ("bad"): ~2000ms
> 
> Sorry, the "with patch" profile here is build with artifact mode. I am going
> to spin up a non-artifact mode profile now.

I noticed the same thing on a different bug yesterday - is there a known issue with comparing artifact/non-artifact builds on talos?
(In reply to Brian Grinstead [:bgrins] from comment #20)
> > I am tempted to just start working on this. Simply, probing LookAndFeel from
> > nsScrolbarFrame and get rid of the CSS selectors and style. If this is done
> > before the branch we should be able to avoid a backout from beta?
> 
> I'd like to see some initial numbers that would indicate if doing this will
> fix the regression (like a try push that just removes those elements
> unconditionally).

Crippling nsScrollbarFrame doesn't do us any favor. This is no longer a viable route.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=77e4e58fbfadc992b25dad5e899cf88f6f758522&framework=1

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=f8392445a6aea4ff97c304a6438cb5bcb0713d5d&framework=1
Comparing the backout with the backout+remove -moz-binding is surprising:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=63adbdfb2cdb911eb379801b54fe2980277566c6&newProject=try&newRevision=6e5e3f3a6d76679a6b61aeddfea9636015eb1550&framework=1

In this case we seem to get the same regression that triggered this bug by simply _removing_ the -moz-binding. This would imply that the new code itself isn't the problem (also evidenced by the fact that we don't see it in any profiles). But something very strange is happening where doing less work (not attaching the binding) is slower than attaching it.
\(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #26)
> Backout and also remove moz-binding:
> Talos:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=6e5e3f3a6d76679a6b61aeddfea9636015eb1550
Profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb2dba70be8305144ddd90a4487557209206dd38
Looking at the profiles with and without the patch, it seems as if framescripts just take longer to run for some reason:

Without patch ("good"): https://perfht.ml/2HxyjsZ
With patch ("bad"): https://perfht.ml/2vRuwoZ
I'm wondering if having the binding on scrollbar is triggering some side effect (like warming up a cache) earlier than it otherwise would. Tim's working on a build that attaches an empty XBL binding to <scrollbar> to see if that's plausible.
I suspect this is going to be one of those annoying things where, because we remove a -moz-binding, we alter timing such that, via some Rube Goldberg style cause and effect, some thread ends up holding a lock and delaying the main thread in the content process or something. :/
(In reply to Brian Grinstead [:bgrins] from comment #31)

That's a great idea, yeah. That smells like the kind of problem we're dealing with here.
(In reply to Brian Grinstead [:bgrins] from comment #31)
> I'm wondering if having the binding on scrollbar is triggering some side
> effect (like warming up a cache) earlier than it otherwise would. Tim's
> working on a build that attaches an empty XBL binding to <scrollbar> to see
> if that's plausible.

empty scrollbar binding

Talos (and other tests):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=766d084731263d8aced50d7a61a48186a01b0355
Profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad65a0021859beb48409f68a7bd514b8f589b30c
Bring back the empty binding!
See Also: → 1436351
(In reply to Brian Grinstead [:bgrins] from comment #36)
> Bring back the empty binding!

In all seriousness - I'd prefer if we could land the empty binding here and figure out the timing weirdness in another bug instead of backing out Bug 1431246. We had to do something similar in Bug 1436351 with a XBL race condition.
Brian and I have found a reference to nsGkAtoms::scrollbar in nsXBLService.cpp. This might be the reason XBL cache get set up at different time because of the scrollbar binding.

This is the try push that restores the empty binding *and* removes the bit

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d22d8f13057d8394803181f0aecbea7609200e95

(That bit is actually something we should clean up given that we shouldn't have binding scrollbar and thumb bindings anymore, hopefully)
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #39)
> Brian and I have found a reference to nsGkAtoms::scrollbar in
> nsXBLService.cpp. This might be the reason XBL cache get set up at different
> time because of the scrollbar binding.
> 
> This is the try push that restores the empty binding *and* removes the bit
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=d22d8f13057d8394803181f0aecbea7609200e95

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=766d084731263d8aced50d7a61a48186a01b0355&newProject=try&newRevision=d22d8f13057d8394803181f0aecbea7609200e95&framework=1

Alright. This doesn't change anything.

Brian and I spoke a bit and I am going to find out what's triggered by the empty binding. Hopefully, we can find a right place to call it via other methods. If I couldn't find it in time, we should check the empty binding in and do it in a follow-up.

Thanks, everyone for the help!
The other thing worthy to point out is that none of the related stack like Loader::CreateSheet() or nsXBLService::LoadBindingDocumentInfo() shows up in the profiles on the main thread of content processes. I assume this is because it happens in the early stages? Regardless, this will takes some time to investigate.
Also: the fact that the cpstartup page loads over a chrome URI could give misleading results (i.e. if there are branches that only run in chrome contexts). I originally thought this could have been a source of this regression issue since, for example, Loader::CreateSheet only calls nsXULPrototypeCache::GetInstance in the case of a chrome URI. Then if the scrollbar binding warmed that up then maybe it would be faster in this case. It turned out this wasn't the problem - since apparently we run this block early on any content process startup due to pluginProblemBinding.css (https://searchfox.org/mozilla-central/source/toolkit/pluginproblem/pluginGlue.manifest#1).

Still, might be worth switching the cpstartup page to load over resource:// or http:// just to better match the real world case.
Update on what I have right now:

I am trying to locate the code that causes us to be faster by identifying and bisecting the code path triggered by the empty binding.

I started from |if (display->mBinding) {| in nsCSSFrameConstructor::AddFrameConstructionItemsInternal().
https://searchfox.org/mozilla-central/rev/8837610b6c999451435695e800f38d4acbc0a644/layout/base/nsCSSFrameConstructor.cpp#5670

Eventually, I've got to nsSyncLoadService::PushSyncStreamToListener() where is the binding XBL document gets read from the disk. It took some time for me to make sure I didn't fall into confirmation bias, but I am now certain that something called from OnStartRequest() and OnStopRequest() in there are the cause. This ruled out what'd actually read from the disk.
Attachment #8971720 - Flags: review?(mconley)
A proper fix would take more time to find & it would probably be not safe to land before merge. Let's get the workaround check-in for now.
Comment on attachment 8971720 [details]
Bug 1456728 - Bind an empty binding on scrollbar

https://reviewboard.mozilla.org/r/240484/#review246812

Thanks!

::: toolkit/content/widgets/general.xml:12
(Diff revision 1)
> +  <!-- This binding is needed on <scrollbar> to prevent content process
> +       startup performance regression -->
> +  <binding id="empty"/>

If one doesn't already exist, can you get a new bug on file for investigating this weirdness, and then refer to it here with a comment, and at each point where you use the empty binding?
Attachment #8971720 - Flags: review?(mconley) → review+
Pushed by timdream@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/efbdd0a0a0ba
Bind an empty binding on scrollbar r=mconley
https://hg.mozilla.org/mozilla-central/rev/efbdd0a0a0ba
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Push from comment 47 fixed the perf regression! \o/

== Change summary for alert #13013 (as of Tue, 01 May 2018 20:36:40 GMT) ==

Improvements:

  7%  cpstartup content-process-startup linux64 pgo e10s stylo     204.08 -> 189.42
  7%  cpstartup content-process-startup linux64 opt e10s stylo     211.33 -> 196.75

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13013
You need to log in before you can comment on or make changes to this bug.