1268508 - 3.12% sessionrestore/ts_paint (linux64 pgo only) regression on push 86e0ea42f55c (Mon Apr 25 2016)

Reporter

Description

•

8 years ago

Talos has detected a Firefox performance regression from push 86e0ea42f55c. As author of one of the patches included in that push, we need your help to address this regression.

A link to the push:
https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=14952428292c90aa00c28fdfb1f1e79bd8257ce2&tochange=86e0ea42f55c34fc90dd9e32439783a9929f28f4

This is a list of all known regressions and improvements related to the push:
https://treeherder.mozilla.org/perf.html#/alerts?id=1038

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see:

https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64 -u none -t other --rebuild 5  # add "mozharness: --spsProfile" to generate profile data

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e [path]/firefox -a sessionrestore

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

8 years ago

:mixedpuppy, thanks for landing the code to fix the regression- this seemed to have caused a pgo (what we ship) only regression on linux64.  Can you look into ways to fix this performance regression or help explain what this code does and why it is causing a regression.

Flags: needinfo?(mixedpuppy)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

8 years ago

oh, a new alert came in, ts_paint.  probably something in this code is causing startup regressions- I am not sure if we can offload that somehow.

Summary: 3.12% sessionrestore (linux64 pgo only) regression on push 86e0ea42f55c (Mon Apr 25 2016) → 3.12% sessionrestore/ts_paint (linux64 pgo only) regression on push 86e0ea42f55c (Mon Apr 25 2016)

:Gijs (he/him)

Comment 3

•

8 years ago

If I had to guess, this is probably happening because we're force-loading CUI "early" in the particular circumstances of the x64 linux build.

We can workaround by tiptoeing around the issue: if the module is not loaded, CUI.windows is going to be empty anyway. So wrapping this in Cu.isModuleLoaded("resource:///modules/CustomizableUI.jsm") should get rid of it.

It'd be useful to get confirmation that that's the case from try, though.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

8 years ago

oh try we will need to do a pgo build:
https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build

Shane Caraveo (:mixedpuppy)

Comment 5

•

8 years ago

hrm, this is weird.  

Gijs, I don't think checking the module load will work here (we already did things in CUI above the changed line, thus if we were causing CUI to load, we were doing so before this change).  

Since the prior iterator had failures that using CUI.windows fixes, we have to assume that merely adding 2 menuitems (sub-menu, not top level, not even visible on startup) causes the performance issues.

Flags: needinfo?(mixedpuppy)

:Gijs (he/him)

Comment 6

•

8 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #5)
> hrm, this is weird.  
> 
> Gijs, I don't think checking the module load will work here (we already did
> things in CUI above the changed line, thus if we were causing CUI to load,
> we were doing so before this change).  

Oh, d'oh, yes, you're right.

> Since the prior iterator had failures that using CUI.windows fixes, we have
> to assume that merely adding 2 menuitems (sub-menu, not top level, not even
> visible on startup) causes the performance issues.

that would be... surprising? Possible, I guess... but surprising.

Shane Caraveo (:mixedpuppy)

Comment 7

•

8 years ago

So, what we do in onWindowOpened:

add a stylesheet
add 4 window properties (lazy getters)
add 2 submenu items that are not visible (CUI.getPlacementOfWidget is called here)

It feels like the most likely culprit is the spreadsheet, I wonder if it is being loaded after CUI adds the pocket button to the toolbar.  That could conceivably cause a repaint since the button icon is via the spreadsheet.  Maybe this change is affecting order here somehow.

Shane Caraveo (:mixedpuppy)

Comment 8

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5b3b403b01c

Shane Caraveo (:mixedpuppy)

Comment 9

•

8 years ago

Gijs, above try is the only obvious thing I can see as a cause of these regressions.  Thoughts?

Flags: needinfo?(gijskruitbosch+bugs)

:Gijs (he/him)

Comment 10

•

8 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #7)
> So, what we do in onWindowOpened:
> 
> add a stylesheet
> add 4 window properties (lazy getters)
> add 2 submenu items that are not visible (CUI.getPlacementOfWidget is called
> here)
> 
> It feels like the most likely culprit is the spreadsheet,

I'm assuming you mean the stylesheet... and sorry to hear you've been thinking about spreadsheets? :-)

> I wonder if it is
> being loaded after CUI adds the pocket button to the toolbar.  That could
> conceivably cause a repaint since the button icon is via the spreadsheet. 

Hm. Yes, that is possible.

> Maybe this change is affecting order here somehow.

So... let's think very carefully about this for a bit. The only change (relevant to startup) that we made was this loop at the bottom of bootstrap's startup.

We used to loop through all open windows, and now we're only looping through the ones CUI knows about. So if anything that loop is now doing less work. The windows are a subset of the ones we used to use rather than a superset.

If that loop is now doing less work, that means more work will happen once CUI fires onWindowOpened. But even that, looking at the code, should be happening before we stick the pocket button in the toolbar... so I don't really understand why that would have made a difference.

Unfortunately the patch seems to have not been pgo'd properly, or something. It's a bit of a faff trying to compare with fx-team because of the pgo-to-opt renaming that try does, so you have to compare the opt numbers from "new" with the pgo numbers from "old", but:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=4dc48f3688f0&newProject=try&newRevision=a5b3b403b01c&framework=1&showOnlyImportant=0

shows 1222.33 for ts_paint on the opt non-e10s trypush, but 1046.67 for the pgo fx-team push. That makes no sense, because that's more like a 20% increase, much more than the 3% this bug is about... and the numbers are more in the right ballpark for a 'regular' opt build - but the trypush clearly has the mk_add_options mozconfig switch. Joel, can you check if that really worked correctly?

In the meantime, that push also has failing tests, so clearly something is not quite right...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 11

•

8 years ago

the build on try is pgo as far as I can tell (both the options, build times and what i see in the logs), why it is showing much different numbers than normal pgo and more like opt is really confusing.

I would recommend that since it seems as though all aspects of the changes have been scrutinized and we have attempted to fix them that we either push these fixes forward or ignore this.

If we want to look more at this on try- we could:
* push a blank changeset with just the pgo options
* compare that to the above one (assuming we have the same base revision)

Flags: needinfo?(jmaher)

Shane Caraveo (:mixedpuppy)

Comment 12

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=440515524aae

Shane Caraveo (:mixedpuppy)

Comment 13

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c1f35393ed9

Shane Caraveo (:mixedpuppy)

Comment 14

•

8 years ago

The push in comment 12 is just a baseline fx-team pgo build.  in comment 13 is a fixed patch for the test failures in the previous try.

I did a ton of timing stuff yesterday, everything seems to always happen in the same order (with or without the patch that caused this bug) except that (prior to said patch) sometimes pocket initialized *earlier* than the window causing the failure to add the menu items.  Otherwise, the stylesheet was always added prior to the MozAfterPaint event and prior to CUI creating the widget.  

The only difference I could see is whether or not the menu items got added.  This patch moves adding the menu items to the browser-delayed-startup-finished observer.

Shane Caraveo (:mixedpuppy)

Comment 15

•

8 years ago

oh, btw, the test failures was a typo which could have been the cause of the much faster times

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 16

•

8 years ago

unfortunately there is no difference here- I think both pushes showed the normal pgo range for ts_paint and sessionrestore, which is a good sign.  Possibly we don't have much we can do here?

Shane Caraveo (:mixedpuppy)

Comment 17

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a54a9f8d8110

Shane Caraveo (:mixedpuppy)

Comment 18

•

8 years ago

one last try, reverting the changes from bug 1263599 to see if it stays in a normal range.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 19

•

8 years ago

and this is showing a ts_paint and sessionrestore improvement- so there is something in the original changes, although it seems to be a mystery.

Shane Caraveo (:mixedpuppy)

Comment 20

•

8 years ago

per irc with jmaher we'll wontfix this.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

3.12% sessionrestore/ts_paint (linux64 pgo only) regression on push 86e0ea42f55c (Mon Apr 25 2016)

Categories

(Firefox :: Pocket, defect)

Tracking

()

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20