3.23 - 9.81% tp5o responsiveness / tps (linux64, windows8-64, windowsxp) regression on push c2ffbe22285487d5443d840e4c5feaa391f66c8d (Fri Jun 10 2016)

RESOLVED FIXED in Firefox 50

Status

()

defect
P1
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: rwood, Assigned: johannh)

Tracking

({perf, regression})

Trunk
Firefox 50
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox50 fixed)

Details

(Whiteboard: [fxprivacy][talos_regression])

Attachments

(1 attachment)

Talos has detected a Firefox performance regression from push c2ffbe22285487d5443d840e4c5feaa391f66c8d. As author of one of the patches included in that push, we need your help to address this regression.

This is a list of all known regressions and improvements related to the push:

https://treeherder.mozilla.org/perf.html#/alerts?id=1529

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see:

https://wiki.mozilla.org/Buildbot/Talos/Tests#tps
https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5

Reproducing and debugging the regression:

If you would like to re-run this Talos test on a potential fix, use try with the following syntax:

try: -b o -p win64,linux64,win32 -u none -t g2[Windows 8,Windows XP],tp5o-e10s[Windows 8,Windows XP] --rebuild 5  # add "mozharness: --spsProfile" to generate profile data

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:

https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:

talos --develop -e [path]/firefox -a tps:tp5o

(add --e10s to run tests in e10s mode)

Making a decision:

As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:

https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Hi, we triggered jobs on try to bisect this regression:

https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&filter-searchStr=talos&tochange=9d12c50f7482530ff6e9008036836e22aee68426&fromchange=aa9c00be0b3b2aa1d46c7283f783e18d64a8cabb

We believe these regressions occurred at push c2ffbe22285487d5443d840e4c5feaa391f66c8d on fx-team:

tp5o responsiveness windowsxp opt e10s: 9.81%
tps summary linux64 opt: 3.34%
tps summary linux64 pgo: 3.23%
tps summary windows8-64 opt: 4.51%

We are still triaging performance alerts on other platforms and we will update this bug with any new information we find.
Hi Johann, as you are the patch author, if you could please have a look at this anytime next week that would be appreciated, thanks!
Flags: needinfo?(jhofmann)
We're looking into it, thanks for catching that. I'll get back to you soon!
Flags: needinfo?(jhofmann)
awesome, thanks!
Whiteboard: [talos_regression] → [fxprivacy][triage][talos_regression]
Assignee: nobody → jhofmann
Priority: -- → P1
Whiteboard: [fxprivacy][triage][talos_regression] → [fxprivacy][talos_regression]
Gijs, I'd like a fresh pair of eyes on this. Would you have time to take a look?

This is a very simple caching approach to solve the problem of loading and sorting all translations every time.

These are the results from the try push comparing completely disabled (base) to optimized (new): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dfa22ac5e59c&newProject=try&newRevision=0cf7379ea46c&framework=1&showOnlyImportant=0.

It still goes up to 1.8% slower in tps opt. Is that an acceptable regression range? The feature is (literally) quite small, but to optimize even further we'd have to go great lengths and completely change the permissions database.
from my point of view, 1.8% is probably ok- as long as we understand where it is coming from.  This looks like a good set of fixes.  I assume Gijs can weigh in as well in his viewpoint of the regression and how efficient/stable the patch looks.

Thanks for getting a fix for this!
Comment on attachment 8763810 [details]
Bug 1280485 - Don't sort permission list to improve tab load performance.

https://reviewboard.mozilla.org/r/59936/#review56862

The fix will break sorting of options when the locale changes. It feels to me like we should cache `Object.keys(gPermissionObject)` which can't change at runtime, and use that for the API call that determines whether a site has any permissions, rather than the locale-sorted list, as the order of the permissions makes no difference to that check.

It would also be... prudent... to verify that the SVG changes did not in and of themselves cause regressions, if you haven't done so already.

For the simple check of 'do we have any non-default permissions for this site', I think a separate API actually would make sense. It'd still make sense to land this caching change, but I would also want a way of just asking whether we have any specific permissions stored for a URI, so please make sure we do that in a followup bug if you don't want to tackle it in here.
Attachment #8763810 - Flags: review?(gijskruitbosch+bugs) → review-
(In reply to :Gijs Kruitbosch from comment #9)
> The fix will break sorting of options when the locale changes.

Can the locale and the translated strings ever change at runtime without a restart? I know we are moving to a model where we can do this, but unless something changed that I don't know about, it will be a long time before this will be possible. There are also other places where we cache strings.

Honestly I'd not worry too much if it's the OS locale that is changing (like collation rules), if strings are staying in the same language... and I'd exclude runtime locale changes triggered by add-ons in this picture.

> It feels to
> me like we should cache `Object.keys(gPermissionObject)` which can't change
> at runtime, and use that for the API call that determines whether a site has
> any permissions, rather than the locale-sorted list, as the order of the
> permissions makes no difference to that check.

Note that we'll use the same code path very soon to display a series of icons representing the permissions that have been blocked, and I guess we'd like to keep the same order as the control center if there is more than one icon (although that may not be fundamental).

Maybe we should just avoid sorting by locale at all in the control center?

Actually, there is no particular reason to do so at the cost of a performance or code complexity hit. In most cases, only one or two permissions will be listed. The sorting may be an artifact of the previous design where all permissions were supposed to be listed at the same time in a detail view in a subpanel.
Not sorting at all sounds like an excellent plan. Thanks Paolo!
(In reply to :Gijs Kruitbosch from comment #9)
> Comment on attachment 8763810 [details]
> Bug 1280485 - Cache permission list to improve tab load performance.
> 
> https://reviewboard.mozilla.org/r/59936/#review56862
> 
> It would also be... prudent... to verify that the SVG changes did not in and
> of themselves cause regressions, if you haven't done so already.
> 

Yes, I didn't do that. I'll try to think of a way to test that.

> For the simple check of 'do we have any non-default permissions for this
> site', I think a separate API actually would make sense. It'd still make
> sense to land this caching change, but I would also want a way of just
> asking whether we have any specific permissions stored for a URI, so please
> make sure we do that in a followup bug if you don't want to tackle it in
> here.

That's a good idea. Let me look into it!
(In reply to :Paolo Amadini from comment #10)
> (In reply to :Gijs Kruitbosch from comment #9)
> > The fix will break sorting of options when the locale changes.
> 
> Can the locale and the translated strings ever change at runtime without a
> restart? I know we are moving to a model where we can do this, but unless
> something changed that I don't know about, it will be a long time before
> this will be possible. There are also other places where we cache strings.

The locale can change at runtime, yes. It's just a pref. There are add-ons to make the experience slightly better, as there are APIs that 'should' be usable to make this work. In any case, all of this goes away if we just remove the sorting.
Iteration: --- → 50.2
Status: NEW → ASSIGNED
Flags: qe-verify-
Comment on attachment 8763810 [details]
Bug 1280485 - Don't sort permission list to improve tab load performance.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59936/diff/1-2/
Attachment #8763810 - Attachment description: Bug 1280485 - Cache permission list to improve tab load performance. → Bug 1280485 - Don't sort permission list to improve tab load performance.
Attachment #8763810 - Flags: review- → review?(gijskruitbosch+bugs)
Comment on attachment 8763810 [details]
Bug 1280485 - Don't sort permission list to improve tab load performance.

https://reviewboard.mozilla.org/r/59936/#review57076

r=me with the below addressed. Can you file a followup for the permissions API if you haven't already? Thanks!

::: browser/modules/SitePermissions.jsm:255
(Diff revision 2)
>    "pointerLock": {
>      exactHostMatch: true
>    }
>  };
> +
> +var gPermissionIDs = Object.keys(gPermissionObject);

Nit: no var, use let or even const (in which case `kPermissionIDs`) as this doesn't change.
Attachment #8763810 - Flags: review?(gijskruitbosch+bugs) → review+
Some try's for this:

This patch vs. central
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=08bc2c836ec9&newProject=try&newRevision=428cee814832&framework=1&showOnlyImportant=0

This patch vs. the original changes commented out
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dfa22ac5e59c&newProject=try&newRevision=428cee814832&framework=1&showOnlyImportant=0

This patch + reverted SVG & CSS vs. central
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=08bc2c836ec9&newProject=try&newRevision=e60da1979168&framework=1&showOnlyImportant=0

This patch + reverted SVG & CSS vs. the original changes commented out
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dfa22ac5e59c&newProject=try&newRevision=e60da1979168&framework=1&showOnlyImportant=0

I'm not 100% sure what do make of this, it seems like the SVG has a small impact but that might just be noise. Might be just because it's a bit bigger now, but guessing the cause for these slight differences is really hard.

We can definitely see that the optimization has some impact.
Comment on attachment 8763810 [details]
Bug 1280485 - Don't sort permission list to improve tab load performance.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59936/diff/2-3/
After check-in, is there any way to verify that this had some positive effect?
Keywords: checkin-needed
When the change merges to central I always validate that it has the desired effect or some effect and comment in the bug.
Ok, great!
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4005a1e876dd
Don't sort permission list to improve tab load performance. r=Gijs
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/4005a1e876dd
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 50
hey, I see alerts for the fix coming in and verified on the graphs this looks great:
https://treeherder.mozilla.org/perf.html#/alerts?id=1589
This regression patch c2ffbe222854 has been merged into Aurora, so please make sure to uplift any fixes to Aurora, thank you.

https://treeherder.mozilla.org/perf.html#/alerts?id=2185

  tps summary linux64 pgo - 7.79% worse
  tps summary linux64 pgo e10s - 4.21% worse 
  tps summary osx-10-10 opt - 26.26% worse
  tps summary windows7-32 pgo - 5.89% worse
  tps summary windows7-32 pgo e10s - 10.64% worse
  tps summary windows8-64 pgo - 9.99% worse
  tps summary windows8-64 pgo e10s - 8.57% worse
  tps summary windowsxp pgo - 12.92% worse
(In reply to Alison Shiue from comment #25)
> This regression patch c2ffbe222854 has been merged into Aurora, so please
> make sure to uplift any fixes to Aurora, thank you.

The original regression should have already been fixed by changeset c3370c98e4ed.
thanks for mentioning this Paolo!  I looked deeper into this and found out this is related to bug 1284350 instead.  Sorry for the confusion- it isn't always straightforward on how to match a regression with a missing improvement, etc.
You need to log in before you can comment on or make changes to this bug.