1456728 - 6.89 - 8.12% cpstartup content-process-startup (linux64) regression on push d3b5dfca6a90 (Tue Apr 24 2018)

Reporter

Description

•

6 years ago

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=2a760b1c0759ef67bb3fd7892a8cbac787f0a899&tochange=d3b5dfca6a90a05875837d846c3b299690b6e10b

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  8%  cpstartup content-process-startup linux64 opt e10s stylo     196.50 -> 212.45
  7%  cpstartup content-process-startup linux64 pgo e10s stylo     190.00 -> 203.08


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=12892

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

6 years ago

:timdream It looks like bug 1445214 caused a performance regression on Linux platforms. Is this something we can fix or should we consider accepting it or backing it out?

Flags: needinfo?(timdream)

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Component: Untriaged → XUL

Product: Firefox → Core

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

6 years ago

gecko profiles:

before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FJJvlgqdBS36xRPu0t8P-pw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_cpstartup.zip

after: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FUTBApocORiWvLyCiqmTOZA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_cpstartup.zip

Brian Grinstead [:bgrins]

Comment 3

•

6 years ago

I wouldn't have expected too much of a change from that patch since we are still building the same DOM. Interesting that it's only on Linux though.

Brian Grinstead [:bgrins]

Comment 4

•

6 years ago

Is there zero impact on this test on other platforms, or is it just a smaller regression that didn't get alerted?

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 5

•

6 years ago

We are building the same DOM but on a different timing, perhaps multiple times during start-up if bindings get load later.

I'll look into this.

Assignee: nobody → timdream

Status: NEW → ASSIGNED

Flags: needinfo?(timdream)

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 6

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #4)
> Is there zero impact on this test on other platforms, or is it just a
> smaller regression that didn't get alerted?

https://treeherder.mozilla.org/perf.html#/compare?originalProject=autoland&originalRevision=2a760b1c0759ef67bb3fd7892a8cbac787f0a899&newProject=autoland&newRevision=d3b5dfca6a90a05875837d846c3b299690b6e10b&framework=1

It varies... I am re-triggering tests to find out with more confidence.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 7

•

6 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #6)
> (In reply to Brian Grinstead [:bgrins] from comment #4)
> > Is there zero impact on this test on other platforms, or is it just a
> > smaller regression that didn't get alerted?

Actually, for this particular test, we don't seem to trigger it on all platforms. I have trouble with my account so I can't trigger it. Will have to wait for my account fixed.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 8

•

6 years ago

OK. I don't know what happened to treeherder, but I can reproduce the performance regression on macOS.

Let me try to fix this before the merge, backout if we couldn't find out a fix before that.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 9

•

6 years ago

What I trying to do right now is to locate the call site that makes the things run slower. Surprisingly, I can make it slow not by implementing nothing in nsScrollbarFrame and just remove the -moz-binding in CSS.

I don't find any reference to scrollbar that is relevant.

The next thing I will do is to look at the profile produced locally.

Brian Grinstead [:bgrins]

Comment 10

•

6 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #9)
> What I trying to do right now is to locate the call site that makes the
> things run slower. Surprisingly, I can make it slow not by implementing
> nothing in nsScrollbarFrame and just remove the -moz-binding in CSS.

That is.. surprising. Why would removing a binding attachment slow things down?

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox59: --- → unaffected

status-firefox60: --- → unaffected

status-firefox61: --- → affected

status-firefox-esr52: --- → unaffected

tracking-firefox61: --- → +

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 11

•

6 years ago

Hi Mike,

Would you mind give me some pointers here? I am not entirely sure what I am seeing here. Let me know if you have some ideas that you would like me to try out. Thanks!

Flags: needinfo?(mconley)

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 12

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #10)
> That is.. surprising. Why would removing a binding attachment slow things
> down?

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newRevision=6e5e3f3a6d76&framework=1

False alarm; without both the patch and the binding speeds things up. I am probably not using Talos correctly locally here...

Mike Conley (:mconley) (:⚙️)

Comment 13

•

6 years ago

Looking at profiles from comment 2...

Brian Grinstead [:bgrins]

Comment 14

•

6 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #12)
> (In reply to Brian Grinstead [:bgrins] from comment #10)
> > That is.. surprising. Why would removing a binding attachment slow things
> > down?
> 
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newR
> evision=6e5e3f3a6d76&framework=1
> 
> False alarm; without both the patch and the binding speeds things up. I am
> probably not using Talos correctly locally here...

Wow, the results there imply there are some real perf gains to be had by speeding up scrollbars (i.e. by implementing them with a single layout frame). Looks like there are significant improvements in:
* a11yr
* cpstartup
* sessionrestore
* sessionrestore_many_windows
* sessionrestore_no_auto_restore
* tabpaint
* tpaint
* ts_paint

I wonder if in the meantime we could limit the amount of NAC that we generate based on platform. For example, don't create extra scrollbarbuttons if they don't get rendered on that platform (right now that's controlled by media queries like -moz-scrollbar-start-backward).

Brian Grinstead [:bgrins]

Comment 15

•

6 years ago

It also disproves one theory I had which was that the XBL binding got lazily attached *after* whatever event cpstartup was waiting for, but that the NAC gets created before it. If that were the case I wouldn't expect cpstartup to budge when you removed the -moz-binding rule.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 16

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #14)
> I wonder if in the meantime we could limit the amount of NAC that we
> generate based on platform. For example, don't create extra scrollbarbuttons
> if they don't get rendered on that platform (right now that's controlled by
> media queries like -moz-scrollbar-start-backward).

Maybe this the easy way out to make the nsScrolbarFrame implementation quicker than the XBL implementation :)

--

Anyway, for the profiles, I couldn't find any relevant functions with the word "scroll" in it so I started looking at the flame graph directly. The only box that is longer than 10ms in the "after" profiles comparing to the "before" profiles is "(root scope) chrome://browser/content/content.js". Could the difference of the DOM tree at the time affect the execution time of the script? I wonder if we iterate the DOM in any way from there.

Brian Grinstead [:bgrins]

Comment 17

•

6 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #16)
> (In reply to Brian Grinstead [:bgrins] from comment #14)
> > I wonder if in the meantime we could limit the amount of NAC that we
> > generate based on platform. For example, don't create extra scrollbarbuttons
> > if they don't get rendered on that platform (right now that's controlled by
> > media queries like -moz-scrollbar-start-backward).
> 
> Maybe this the easy way out to make the nsScrolbarFrame implementation
> quicker than the XBL implementation :)

I guess you could do a compare push where you unconditionally remove them just to get a sense of how much potential win there is. Another thing to check could be to create the full NAC but don't propagate any attributes onto the children.

Comment hidden (obsolete)

I did two other try pushes in an attempt to get higher resolution profiles

https://treeherder.mozilla.org/#/jobs?repo=try&revision=272ba72132105be7a779e0cf789463f41b453ea3 with patch
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8e4f05dd7bdeb592febbfc3e5255fa727961a7cc without patch

Other than content.js, this time, I can also see gtk_init slows down a lot. Could this be related?

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 19

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #17)
> I guess you could do a compare push where you unconditionally remove them
> just to get a sense of how much potential win there is. Another thing to
> check could be to create the full NAC but don't propagate any attributes
> onto the children.

You would think improvement like this will show up in the profile, but it didn't. Is it possible that this function is fast but being called for too many times, so it didn't show up on the profile?

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newRevision=f8392445a6aea4ff97c304a6438cb5bcb0713d5d&framework=1

(The patch to test here is https://hg.mozilla.org/try/rev/c5be4fbe6626ec15bda877f05bc0cb6232bed2e3, which makes nsScrollbarFrame::UpdateChildrenAttributeValue() no-op)

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #16)
> (In reply to Brian Grinstead [:bgrins] from comment #14)
> > I wonder if in the meantime we could limit the amount of NAC that we
> > generate based on platform. For example, don't create extra scrollbarbuttons
> > if they don't get rendered on that platform (right now that's controlled by
> > media queries like -moz-scrollbar-start-backward).
> 
> Maybe this the easy way out to make the nsScrolbarFrame implementation
> quicker than the XBL implementation :)

I am tempted to just start working on this. Simply, probing LookAndFeel from nsScrolbarFrame and get rid of the CSS selectors and style. If this is done before the branch we should be able to avoid a backout from beta?

Brian Grinstead [:bgrins]

Comment 20

•

6 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #19)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment
> #16)
> > (In reply to Brian Grinstead [:bgrins] from comment #14)
> > > I wonder if in the meantime we could limit the amount of NAC that we
> > > generate based on platform. For example, don't create extra scrollbarbuttons
> > > if they don't get rendered on that platform (right now that's controlled by
> > > media queries like -moz-scrollbar-start-backward).
> > 
> > Maybe this the easy way out to make the nsScrolbarFrame implementation
> > quicker than the XBL implementation :)
> 
> I am tempted to just start working on this. Simply, probing LookAndFeel from
> nsScrolbarFrame and get rid of the CSS selectors and style. If this is done
> before the branch we should be able to avoid a backout from beta?

I'd like to see some initial numbers that would indicate if doing this will fix the regression (like a try push that just removes those elements unconditionally). If it's not looking like a pretty easy fix we should consult with :mconley on the impact/reliability of this test, but I'm guessing we should back out to give ourselves more time to figure it out with the merge coming up.

Brian Grinstead [:bgrins]

Comment 21

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #20)
> > I am tempted to just start working on this. Simply, probing LookAndFeel from
> > nsScrolbarFrame and get rid of the CSS selectors and style. If this is done
> > before the branch we should be able to avoid a backout from beta?
> 
> I'd like to see some initial numbers that would indicate if doing this will
> fix the regression (like a try push that just removes those elements
> unconditionally). If it's not looking like a pretty easy fix we should
> consult with :mconley on the impact/reliability of this test, but I'm
> guessing we should back out to give ourselves more time to figure it out
> with the merge coming up.

Oh wait, is the compare link in Comment 19 doing that? Because if so, that's a pretty serious potential improvement on a bunch of tests.. so it's definitely worth trying!

Mike Conley (:mconley) (:⚙️)

Comment 22

•

6 years ago

Here are the first runs for each of the profiles that timdream generated in comment 18:

Without patch ("good"): https://perfht.ml/2Hzg1aC
With patch ("bad"): https://perfht.ml/2HA53lt

A few things stand out to me:

1. The biggest difference seems to be the length of time that we're waiting for the content process to initialize and start showing samples in the profiler. Using the end of the "CreateBrowser" samples in the parent process main thread as a starting point, the amount of waiting time waiting for the content process to start showing samples is:

Without patch ("good"): ~864ms
With patch ("bad"): ~2000ms

That's pretty staggering. Looking at the second run with the patch, the gap drops a bit down to ~1100s.

I'm going to request yet another set of profiles, but ask for all threads to be sampled.

2. As timdream already noted, gtk_init seems to take longer to run with the patch applied:

Without patch ("good"): https://perfht.ml/2vQrt0d (~18ms)
With patch ("bad"): https://perfht.ml/2HCRr8M (~43ms)

3. XRE_InitEmbedding2 also seems to take longer:

Without patch ("good"): https://perfht.ml/2vXM5DI (~32ms)
With patch ("bad"): https://perfht.ml/2vRr121 (~41ms)

4. The run with the patch seems to take a little longer running process and frame scripts, but I wouldn't call it a huge difference.


I think (1) is the one worth investigating first.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 23

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #21)
> Oh wait, is the compare link in Comment 19 doing that? Because if so, that's
> a pretty serious potential improvement on a bunch of tests.. so it's
> definitely worth trying!

Sadly this is again false. It looks like I cannot generate Talos result with artifact mode. Look the difference with and without --artifact.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=60c7341fb0a9&newProject=try&newRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&framework=1

======

This is the result if I no-op nsScrollbarFrame::UpdateChildrenAttributeValue, it doesn't make any observable difference.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=f8392445a6aea4ff97c304a6438cb5bcb0713d5d&framework=1

This is the result if I no-op nsScrollbarFrame::CreateAnonymousContent (still building)

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=77e4e58fbfadc992b25dad5e899cf88f6f758522&framework=1

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 24

•

6 years ago

(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #22)
> Here are the first runs for each of the profiles that timdream generated in
> comment 18:
> 
> Without patch ("good"): https://perfht.ml/2Hzg1aC
> With patch ("bad"): https://perfht.ml/2HA53lt
> 
> A few things stand out to me:
> 
> 1. The biggest difference seems to be the length of time that we're waiting
> for the content process to initialize and start showing samples in the
> profiler. Using the end of the "CreateBrowser" samples in the parent process
> main thread as a starting point, the amount of waiting time waiting for the
> content process to start showing samples is:
> 
> Without patch ("good"): ~864ms
> With patch ("bad"): ~2000ms

Sorry, the "with patch" profile here is build with artifact mode. I am going to spin up a non-artifact mode profile now.

Brian Grinstead [:bgrins]

Comment 25

•

6 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #24)
> > Without patch ("good"): ~864ms
> > With patch ("bad"): ~2000ms
> 
> Sorry, the "with patch" profile here is build with artifact mode. I am going
> to spin up a non-artifact mode profile now.

I noticed the same thing on a different bug yesterday - is there a known issue with comparing artifact/non-artifact builds on talos?

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 26

•

6 years ago

I've confused myself as well, so this is all the valid, non-artifact-mode, pushes

Base, central (with patch):
Talos:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3231d0a83c923f65ba9486e860ef8892138dcfa
Profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5988f1036f54a31631b857ada74fa7569693b1ca

Backout (without patch):
Talos:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=63adbdfb2cdb911eb379801b54fe2980277566c6
Profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8e4f05dd7bdeb592febbfc3e5255fa727961a7cc

no-op nsScrollbarFrame::CreateAnonymousContent
Talos: 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=77e4e58fbfadc992b25dad5e899cf88f6f758522
Profile: 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62245b088d0a47713bd4e674b038a7f0ae553325

no-op nsScrollbarFrame::UpdateChildrenAttributeValue
Talos:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8392445a6aea4ff97c304a6438cb5bcb0713d5d
Profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e96ec1c535b4a4f87c79d6ef6d6aad3b9eab6bd2

Backout and also remove moz-binding:
Talos:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6e5e3f3a6d76679a6b61aeddfea9636015eb1550

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Updated

•

6 years ago

Updated

•

6 years ago

Depends on: 1457510

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 27

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #20)
> > I am tempted to just start working on this. Simply, probing LookAndFeel from
> > nsScrolbarFrame and get rid of the CSS selectors and style. If this is done
> > before the branch we should be able to avoid a backout from beta?
> 
> I'd like to see some initial numbers that would indicate if doing this will
> fix the regression (like a try push that just removes those elements
> unconditionally).

Crippling nsScrollbarFrame doesn't do us any favor. This is no longer a viable route.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=77e4e58fbfadc992b25dad5e899cf88f6f758522&framework=1

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d3231d0a83c923f65ba9486e860ef8892138dcfa&newProject=try&newRevision=f8392445a6aea4ff97c304a6438cb5bcb0713d5d&framework=1

Brian Grinstead [:bgrins]

Comment 28

•

6 years ago

Comparing the backout with the backout+remove -moz-binding is surprising:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=63adbdfb2cdb911eb379801b54fe2980277566c6&newProject=try&newRevision=6e5e3f3a6d76679a6b61aeddfea9636015eb1550&framework=1

In this case we seem to get the same regression that triggered this bug by simply _removing_ the -moz-binding. This would imply that the new code itself isn't the problem (also evidenced by the fact that we don't see it in any profiles). But something very strange is happening where doing less work (not attaching the binding) is slower than attaching it.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 29

•

6 years ago

\(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #26)
> Backout and also remove moz-binding:
> Talos:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=6e5e3f3a6d76679a6b61aeddfea9636015eb1550
Profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb2dba70be8305144ddd90a4487557209206dd38

Mike Conley (:mconley) (:⚙️)

Comment 30

•

6 years ago

Looking at the profiles with and without the patch, it seems as if framescripts just take longer to run for some reason:

Without patch ("good"): https://perfht.ml/2HxyjsZ
With patch ("bad"): https://perfht.ml/2vRuwoZ

Brian Grinstead [:bgrins]

Comment 31

•

6 years ago

I'm wondering if having the binding on scrollbar is triggering some side effect (like warming up a cache) earlier than it otherwise would. Tim's working on a build that attaches an empty XBL binding to <scrollbar> to see if that's plausible.

Mike Conley (:mconley) (:⚙️)

Comment 32

•

6 years ago

I suspect this is going to be one of those annoying things where, because we remove a -moz-binding, we alter timing such that, via some Rube Goldberg style cause and effect, some thread ends up holding a lock and delaying the main thread in the content process or something. :/

Mike Conley (:mconley) (:⚙️)

Comment 33

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #31)

That's a great idea, yeah. That smells like the kind of problem we're dealing with here.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 34

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #31)
> I'm wondering if having the binding on scrollbar is triggering some side
> effect (like warming up a cache) earlier than it otherwise would. Tim's
> working on a build that attaches an empty XBL binding to <scrollbar> to see
> if that's plausible.

empty scrollbar binding

Talos (and other tests):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=766d084731263d8aced50d7a61a48186a01b0355
Profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad65a0021859beb48409f68a7bd514b8f589b30c

Mike Conley (:mconley) (:⚙️)

Comment 35

•

6 years ago

Yep, the empty binding makes the regression disappear. Lovely.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=766d08473126&framework=1&showOnlyImportant=1&selectedTimeRange=172800

Flags: needinfo?(mconley)

Brian Grinstead [:bgrins]

Comment 36

•

6 years ago

Bring back the empty binding!

Comment 37

•

6 years ago

(In reply to Brian Grinstead [:bgrins] from comment #36)
> Bring back the empty binding!

In all seriousness - I'd prefer if we could land the empty binding here and figure out the timing weirdness in another bug instead of backing out Bug 1431246. We had to do something similar in Bug 1436351 with a XBL race condition.

Comment hidden (mozreview-request)

For some reason, removing the scrollbar binding triggers
content process performance regression on Linux.

We'll add an empty binding for now until we find the real cause.

Review commit: https://reviewboard.mozilla.org/r/240484/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/240484/

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 39

•

6 years ago

Brian and I have found a reference to nsGkAtoms::scrollbar in nsXBLService.cpp. This might be the reason XBL cache get set up at different time because of the scrollbar binding.

This is the try push that restores the empty binding *and* removes the bit

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d22d8f13057d8394803181f0aecbea7609200e95

(That bit is actually something we should clean up given that we shouldn't have binding scrollbar and thumb bindings anymore, hopefully)

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 40

•

6 years ago

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #39)
> Brian and I have found a reference to nsGkAtoms::scrollbar in
> nsXBLService.cpp. This might be the reason XBL cache get set up at different
> time because of the scrollbar binding.
> 
> This is the try push that restores the empty binding *and* removes the bit
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=d22d8f13057d8394803181f0aecbea7609200e95

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=766d084731263d8aced50d7a61a48186a01b0355&newProject=try&newRevision=d22d8f13057d8394803181f0aecbea7609200e95&framework=1

Alright. This doesn't change anything.

Brian and I spoke a bit and I am going to find out what's triggered by the empty binding. Hopefully, we can find a right place to call it via other methods. If I couldn't find it in time, we should check the empty binding in and do it in a follow-up.

Thanks, everyone for the help!

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 41

•

6 years ago

The other thing worthy to point out is that none of the related stack like Loader::CreateSheet() or nsXBLService::LoadBindingDocumentInfo() shows up in the profiles on the main thread of content processes. I assume this is because it happens in the early stages? Regardless, this will takes some time to investigate.

Brian Grinstead [:bgrins]

Comment 42

•

6 years ago

Also: the fact that the cpstartup page loads over a chrome URI could give misleading results (i.e. if there are branches that only run in chrome contexts). I originally thought this could have been a source of this regression issue since, for example, Loader::CreateSheet only calls nsXULPrototypeCache::GetInstance in the case of a chrome URI. Then if the scrollbar binding warmed that up then maybe it would be faster in this case. It turned out this wasn't the problem - since apparently we run this block early on any content process startup due to pluginProblemBinding.css (https://searchfox.org/mozilla-central/source/toolkit/pluginproblem/pluginGlue.manifest#1).

Still, might be worth switching the cpstartup page to load over resource:// or http:// just to better match the real world case.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 43

•

6 years ago

Update on what I have right now:

I am trying to locate the code that causes us to be faster by identifying and bisecting the code path triggered by the empty binding.

I started from |if (display->mBinding) {| in nsCSSFrameConstructor::AddFrameConstructionItemsInternal().
https://searchfox.org/mozilla-central/rev/8837610b6c999451435695e800f38d4acbc0a644/layout/base/nsCSSFrameConstructor.cpp#5670

Eventually, I've got to nsSyncLoadService::PushSyncStreamToListener() where is the binding XBL document gets read from the disk. It took some time for me to make sure I didn't fall into confirmation bias, but I am now certain that something called from OnStartRequest() and OnStopRequest() in there are the cause. This ruled out what'd actually read from the disk.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Updated

•

6 years ago

Attachment #8971720 - Flags: review?(mconley)

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 44

•

6 years ago

A proper fix would take more time to find & it would probably be not safe to land before merge. Let's get the workaround check-in for now.

Mike Conley (:mconley) (:⚙️)

Comment 45

•

6 years ago

mozreview-review

Comment on attachment 8971720 [details]
Bug 1456728 - Bind an empty binding on scrollbar

https://reviewboard.mozilla.org/r/240484/#review246812

Thanks!

::: toolkit/content/widgets/general.xml:12
(Diff revision 1)
> +  <!-- This binding is needed on <scrollbar> to prevent content process
> +       startup performance regression -->
> +  <binding id="empty"/>

If one doesn't already exist, can you get a new bug on file for investigating this weirdness, and then refer to it here with a comment, and at each point where you use the empty binding?

Attachment #8971720 - Flags: review?(mconley) → review+

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Updated

•

6 years ago

Comment 47

•

6 years ago

Pushed by timdream@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/efbdd0a0a0ba
Bind an empty binding on scrollbar r=mconley

Cosmin Sabou [:CosminS]

Comment 48

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/efbdd0a0a0ba

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox61: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Ionuț Goldan [:igoldan]

Reporter

Comment 49

•

6 years ago

Push from comment 47 fixed the perf regression! \o/

== Change summary for alert #13013 (as of Tue, 01 May 2018 20:36:40 GMT) ==

Improvements:

  7%  cpstartup content-process-startup linux64 pgo e10s stylo     204.08 -> 189.42
  7%  cpstartup content-process-startup linux64 opt e10s stylo     211.33 -> 196.75

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13013