Closed Bug 1380291 Opened 2 years ago Closed 2 years ago

6.38 - 9.27% tart (linux64, windows10-64, windows7-32) regression on push 3dfb4c796b7b87cc6e90bf83ef7822c8ba9c4767 (Tue Jul 11 2017)

Categories

(Firefox :: General, defect, P3)

55 Branch
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: igoldan, Assigned: Fischer)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=3dfb4c796b7b87cc6e90bf83ef7822c8ba9c4767

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  9%  tart summary linux64 opt e10s     5.51 -> 6.02
  8%  tart summary linux64 pgo e10s     4.71 -> 5.11
  7%  tart summary windows7-32 opt e10s 7.13 -> 7.64
  6%  tart summary windows10-64 opt e10s6.02 -> 6.41
  6%  tart summary windows7-32 pgo e10s 5.20 -> 5.53


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=7827

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
:Fischer Could you look over this regression and estimate a fix time? I may have missed adding a performance improvement to the above list: the a11yr summary test on Windows 10 improved by 6-7%, but I'm not 100% it's related to this code change.
If you find use in looking over that improvement too, you can find it here:
https://treeherder.mozilla.org/perf.html#/alerts?id=7836
Flags: needinfo?(fliu)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> :Fischer Could you look over this regression and estimate a fix time? I may
> have missed adding a performance improvement to the above list: the a11yr
> summary test on Windows 10 improved by 6-7%, but I'm not 100% it's related
> to this code change.
> If you find use in looking over that improvement too, you can find it here:
> https://treeherder.mozilla.org/perf.html#/alerts?id=7836
This is a bit weird because the bug 1377433 is doing *less* thing, which removes the css transition animation effect.
I will take a look at this more.
I agree this is odd, as I noticed this thing too. Thanks for investigating this.
(In reply to Fischer [:Fischer] from comment #2)
> (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> > :Fischer Could you look over this regression and estimate a fix time? I may
> > have missed adding a performance improvement to the above list: the a11yr
> > summary test on Windows 10 improved by 6-7%, but I'm not 100% it's related
> > to this code change.
> > If you find use in looking over that improvement too, you can find it here:
> > https://treeherder.mozilla.org/perf.html#/alerts?id=7836
> This is a bit weird because the bug 1377433 is doing *less* thing, which
> removes the css transition animation effect.
> I will take a look at this more.

It used to set the onboarding-opened class after the animation, and now it sets it immediately, presumably immediately triggering reflows (not necessarily sync ones, though I don't know if later code in that module will trigger those, I haven't looked at context) which will be 'work' which will compete for CPU/GPU cycles with the tab animations as they open/close.

I expect that if we can determine whether or not the new tour stuff needs to be opened 'sooner' (before first paint of the new tab page) we can hopefully avoid this additional cost.
As I mentioned in #perf I expected this regression since it is the inverse of bug 1357641 comment 67 from when the notification bar landed (which should not have caused a perf improvement at all!). In discussion with mconley we have a theory that the animation was messing with what tart is actually measuring.
Related to the a11y test from comment 1: that improvement was done by bug 1373079, so it's unrelated to this one.
Assignee: nobody → fliu
Status: NEW → ASSIGNED
Flags: needinfo?(fliu)
Priority: -- → P1
Whiteboard: [photon-onboarding]
Flags: qe-verify-
The newly landed bug 1372067 delays the init of the notification on idle so the impact from notification setup works should be lowered by some degree, which may help a bit [1].

Then about the UI part, the bug 1377433 removed the line adding the css selector inside `requestAnimationFrame`. That could bring some impact so a solution of appending notification element and manipulating css inside `requestAnimationFrame` is proposed.
Now running the talos to see how it is going.

[1] https://dxr.mozilla.org/mozilla-central/rev/30ea2905130e85f9e1d8d56fa3097901eec6514b/browser/extensions/onboarding/content/onboarding.js#289
The talos results[3] with the proposal in the comment 7. 

Overall not making thing worse which is expected since `requestAnimationFrame` should help some.

Below are some items to discuss:
- The tc-T-e10s on Win7 opt and Win10 x64 opt failed on infrastructure exception. Tried to push other tries but still the same so a bit afraid about the results [1][2].

- For the regression of tart opt e10s, osx-10-10 is having some gain of -5.6% but not many differences on other platform.

- For bloom_basic opt e10s, a bit interesting, could saw quite some improvements on linux/windows except for osx. And looked into the last 7-days results on osx [4], our results seems to fall within the range pattern. Not sure how the onoarding notification on about:newtab could affect this (page load time?) because it seem what this test does is to load a page and get computed style[5].

- For glvideo Mean tick time, an -16.84% improvement on win7 but there is an outlier on the last 7-days results[6]. Exclude that outlier, no much difference in fact.

- For quantum_pageload_facebook, an -13.49% improvement on win7 but an outlier there as well[7].

- For sessionrestore opt e10s, 9.74% slower on osx but seems sill fall into the last 30-days range pattern.

- For tp5o XRes, a huge 7,561.95% difference but looked into the last 30-days results[9]. It seems we are just following the pattern like everybody. This test is about memory so guess maybe has to do with gc jobs.

- For tp5o_webext XRes, a -49.80% gain but it seems just like the tp5o XRes case[10].

- For tp5n main_normal_fileio, tp5o % Processor Time, tp5o_webext responsiveness, and tresize opt, a bit around -5% improvement but not sure how the onoarding notification could affect this.

- For tscrollx, saw some gains but seems also follow the range pattern,

[1] Talos with the patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f226e0423b96bd6089f58a9c3ef6bb3f3ea2f4e6
[2] Talos without the patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=93e2b2eb23f0f991c38e68fad5b2a91b0d582f82
[3] Talos comparison: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=93e2b2eb23f0f991c38e68fad5b2a91b0d582f82&newProject=try&newRevision=f226e0423b96bd6089f58a9c3ef6bb3f3ea2f4e6&framework=1&showOnlyImportant=0
[4] https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=%5Btry,7541c1a3b0c4464bdbc1c51dfc6cb54ad0b37caf,1,1%5D&highlightedRevisions=93e2b2eb23f0&highlightedRevisions=f226e0423b96
[5] https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/perf-reftest/bloom-basic.html
[6] https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=%5Btry,5f55ddec4454cc4c2ad2ecbb838c2338900d7575,1,1%5D&highlightedRevisions=93e2b2eb23f0&highlightedRevisions=f226e0423b96
[7] https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=93e2b2eb23f0&highlightedRevisions=f226e0423b96&series=%5B%22try%22,%2258e04c8c27f32f2bb885c36ea0bad9b0887fbf37%22,1,1%5D&timerange=604800
[8] https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Btry,c22155824871783bb5d8f514b3a01b4c885034c0,1,1%5D&highlightedRevisions=93e2b2eb23f0&highlightedRevisions=f226e0423b96
[9] https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Btry,a20264b4683ad3ae2483813370c9a211daf2a348,1,1%5D&highlightedRevisions=93e2b2eb23f0&highlightedRevisions=f226e0423b96
[10] https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Btry,6f60500acbb8e13ea9a3ad684d8fade18aec088b,1,1%5D&highlightedRevisions=93e2b2eb23f0&highlightedRevisions=f226e0423b96
(In reply to Fischer [:Fischer] from comment #9)
> Created attachment 8887144 [details]
> Bug 1380291 - Add .onboarding-opened and append notification bar inside
> requestAnimationFrame
> 
> Review commit: https://reviewboard.mozilla.org/r/157550/diff/#index_header
> See other reviews: https://reviewboard.mozilla.org/r/157550/
The actual patch tested on the talos in the comment 8
(In reply to Fischer [:Fischer] from comment #8)
> The talos results[3] with the proposal in the comment 7. 

Some notes:
- we should focus on fixing the tart regression. There was no regression on OS X given comment #0, only Windows/Linux, and those seem more or less unaffected
- comment #5 suggests this is simply the inverse of the win reported for bug 1357641 comment 67. If correct, I don't think it will necessarily be all that useful to try to analyze this here, but mconley or Mossop might know more.
As I already mentioned, this performance regression just looks like the inverse of https://treeherder.mozilla.org/perf.html#/alerts?id=7570 which happened when we turned on the notification bar in the first place. That change should not have improved performance so I don't think we were seeing a real improvement here but some artifact from how the test measures things. I don't think we should spend any further time working on this regression.
based on comment12, change priority to P3.
Priority: P1 → P3
hi Fischer, 

please provide any further info if necessary.
Flags: needinfo?(fliu)
(In reply to Francis Lee [:frlee] from comment #14)
> hi Fischer, 
> 
> please provide any further info if necessary.
Yes, will do after the major onboarding tasks for 57 are completed.
Flags: needinfo?(fliu)
the improvement on July 18th is related to turning on activity stream in bug 1381569, that seems unrelated to the original changes in this patch.

Fischer is there a bug to track the major onboarding tasks?
Flags: needinfo?(fliu)
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #17)
> the improvement on July 18th is related to turning on activity stream in bug 1381569, that seems unrelated to the original changes in this patch.
> 
> Fischer is there a bug to track the major onboarding tasks?
Here it is: Bug 1354046 - [meta] Implement the OnBoarding overlay.
AFAIK, with the bug 1381569 to turn on activity stream, the about:newtab wouldn't be loaded in the main process[1].

[1] https://dxr.mozilla.org/mozilla-central/rev/7d2e89fb92331d7e4296391213c1e63db628e046/browser/components/about/AboutRedirector.cpp#223
Flags: needinfo?(fliu)
What's the status of things here? Am I right that this was a Photon-only regression and therefore doesn't affect 56 at this point? But 57 still has work to be done?
Flags: needinfo?(fliu)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> What's the status of things here? Am I right that this was a Photon-only
> regression and therefore doesn't affect 56 at this point? But 57 still has
> work to be done?
From the comment 12 and the comment 16, this may not be a real issue. To avoid the confusion, close the bug first. We could revisit it if any happened again.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(fliu)
Resolution: --- → WORKSFORME
remove whiteboard tag due to its WORKSFORME
Whiteboard: [photon-onboarding]
You need to log in before you can comment on or make changes to this bug.