Closed Bug 1504139 Opened 6 years ago Closed 3 years ago

2.74 - 4.78% about_preferences_basic (linux64, osx-10-10, windows7-32) regression on push 4548dcc2c7f64eae5d4cbcee16e700f05c31a129 (Thu Nov 1 2018)

Categories

(Toolkit :: UI Widgets, defect, P2)

defect

Tracking

()

RESOLVED WONTFIX
Performance Impact none
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- wontfix
firefox63 --- unaffected
firefox64 --- unaffected
firefox65 - wontfix
firefox66 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fix-optional

People

(Reporter: igoldan, Assigned: mconley)

References

Details

(4 keywords)

Attachments

(1 obsolete file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=4548dcc2c7f64eae5d4cbcee16e700f05c31a129

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  5%  about_preferences_basic windows7-32 opt e10s stylo     130.52 -> 136.76
  4%  about_preferences_basic windows7-32 pgo e10s stylo     120.69 -> 125.78
  3%  about_preferences_basic osx-10-10 opt e10s stylo       197.16 -> 202.67
  3%  about_preferences_basic linux64 opt e10s stylo         143.33 -> 147.26


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=17325

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Component: General → XUL Widgets
Product: Testing → Toolkit
Flags: needinfo?(vporof)
Brian, thoughts?
Flags: needinfo?(vporof) → needinfo?(bgrinstead)
As I mentioned in bug 1499423, uses of <treecol> within <listheader> should be replaced with something else. This might mitigate the regression, but I'm not sure it will be enough to entirely cancel it, since we still have other uses within <treecols> (bug 1446367, bug 1446360, bug 1446369, and also bug 1446369 which I hope we don't have to spend too much time on).

If there is no better way, we might just have to fix these bugs to remove the trees from Preferences.
I looked at the 'after' profiles on windows 7 and the weird thing is that I never see more than 1ms when filtering on 'tree' in JS. I can take a closer look.
Huh, so even when I comment out the updateAttributes function (https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=da0e6a1e8bb50874dad7b1a6f8364ceecbc0e6d8&newProject=try&newRevision=1103d843b1ef119c77acec5580613626d9113d4f&framework=1&showOnlyImportant=1) or even the entire custom element implementation (https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=da0e6a1e8bb50874dad7b1a6f8364ceecbc0e6d8&newProject=try&newRevision=8bc60daee202a9555807f694ffa041f9e1914c4f&framework=1&showOnlyImportant=1) I'm not seeing any important changes to about_preferences_basic.

If I uncheck "Show Important Changes" on the last one (entire CE impl commented out), I do see some ~2% improvements but they are low confidence.

I'll do one more check where we stop extending from MozBaseControl.
For future reference, I've been pushing with `./mach try fuzzy -q '!pgo !qr !msvc !linux /opt-talos-chrome-e10s' --rebuild 6`
Apparently having it extend MozXULElement instead of MozBaseControl doesn't help either.. https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=da0e6a1e8bb50874dad7b1a6f8364ceecbc0e6d8&newProject=try&newRevision=1ad3de607565df6da6bd222bd2c403b4648cf239&framework=1&showOnlyImportant=1.

Will try completely not loading tree.js which should certainly be faster, right?
I have a patch that loads tree.js but it does nothing else: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=da0e6a1e8bb50874dad7b1a6f8364ceecbc0e6d8&newProject=try&newRevision=1f391f053fa9b4691f4ff91bed029d0fd6de6dae&framework=1&showOnlyImportant=1. That doesn't seem to fix the about preferences regression (and somehow is causing a strong tresze regression which makes no sense).
If loading the file is causing a problem I would have expected to see it show up on other tests, since this is loaded in all chrome XUL docs.
Now I'm really confused. If I move the treecol custom element directly into customElements.js instead of loading tree.js separately I see a _regression_: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=da0e6a1e8bb50874dad7b1a6f8364ceecbc0e6d8&newProject=try&newRevision=c18ea34537a7e8dd6fa5608a92b6a26158bf6055&framework=1&showOnlyImportant=1.

The fact that _not_ loading tree.js fixes it (Comment 8), while loading an empty tree.js doesn't fix it (Comment 9) implies to me that loading the extra script was hurting us. So, the fact that moving the CE impl out of tree.js and into customElements.js doesn't fix it is bizarre.
Priority: -- → P2
I just did another push that injects an empty Custom Element in tree.js since it seems so strange that it wouldn't fix the regression: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=da0e6a1e8bb50874dad7b1a6f8364ceecbc0e6d8&newProject=try&newRevision=9830645bf043c01fb80d96f14d90c5c823e6f1b1&framework=1&showOnlyImportant=1. Given this and Comment 9 (where loading tree.js but with no code inside also doesn't fix it), I don't know what to do next.

Mike, do you have any ideas? This was introduced when we started landing parts of the tree widget as Custom Element (loaded via tree.js here: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/toolkit/content/customElements.js#299)

There's a regression here that's fixed by not loading tree.js at all (Comment 8), but when loading tree.js as an empty file it doesn't go away (Comment 9). I'm wondering if we are somehow hitting a bad case with script caching.
Flags: needinfo?(bgrinstead) → needinfo?(mconley)
Sorry for the delay. Looking at this today.
I've never worked with this particular Talos test, so I dug into how it works, and here are my findings:

1. It appears to load the following pages in this order:

about:preferences
about:blank
about:preferences#search
about:blank
about:preferences#privacy
about:blank
about:preferences#sync
about:blank
about:preferences#home

2. When we run with profiling, each profile contains a _single_ page load. Some of those are for about:blank, which are useless.

3. Looking at the alert that igoldan linked to in comment 0, looking at the subtests, apparently only some of the pageloads are reporting regressions:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=7131a22bb5e73218aa244df81feb14704ac0481d&newProject=mozilla-inbound&newRevision=4548dcc2c7f64eae5d4cbcee16e700f05c31a129&originalSignature=377faa822f4779c094f55a8c5ee0960df43f25da&newSignature=377faa822f4779c094f55a8c5ee0960df43f25da&framework=1

4. Given 1, 2, and 3, this means that chances are there are only a few profiles that we care about here. We can discard the about:blank ones. Let's choose the about:preferences#sync one arbitrarily:

Good: https://perfht.ml/2zU6AAh
Bad: https://perfht.ml/2zWezge

My analysis is going to focus on these profiles moving forward.
Looking at the profile markers, the "Good" profile reports a non-blank paint for about:preferences#sync at 137ms from the start of the page load.

The "Bad" profile reports a non-blank paint for about:preferences#sync at 150ms.

So this is good - we appear to have captured _some_ kind of difference in these profiles. At least from these two snapshots, the difference is about 13ms, or about a single frame.
Looking at the very end of the two profiles, I _think_ I see where the time is going.

Here are the two profiles, but only the samples after the DOMContentLoaded event fires:

Good: https://perfht.ml/2zU7bBV
Bad: https://perfht.ml/2zRVonE

In the "Good" case, the paint occurs before a cluster of load events and the pageshow event. In the "Bad" case, the paint occurs after.

I suspect what's happening is that there's a tiny cliff here, where enough delay is introduced by the change that we end up painting for the next frame's vsync.

Does this regression go away if we use "ASAP" mode? ASAP mode causes us to paint ASAP after the DOM has been dirtied, without waiting for vsync. ASAP mode can be enabled by setting:

preferences = {'layout.frame_rate': 0 }

For the test. Looking through the rest of the tests, I seem to see these other two preferences usually set along with it:

'docshell.event_starvation_delay_hint': 1,
'dom.send_after_paint_to_content': False

I'm not sure if they're necessary.

At any rate, does the regression go away if we use ASAP mode?
Flags: needinfo?(mconley) → needinfo?(bgrinstead)
(In reply to Mike Conley (:mconley) (:⚙️) from comment #16)
> Looking at the very end of the two profiles, I _think_ I see where the time
> is going.
> 
> Here are the two profiles, but only the samples after the DOMContentLoaded
> event fires:
> 
> Good: https://perfht.ml/2zU7bBV
> Bad: https://perfht.ml/2zRVonE
> 
> In the "Good" case, the paint occurs before a cluster of load events and the
> pageshow event. In the "Bad" case, the paint occurs after.
> 
> I suspect what's happening is that there's a tiny cliff here, where enough
> delay is introduced by the change that we end up painting for the next
> frame's vsync.
> 
> Does this regression go away if we use "ASAP" mode? ASAP mode causes us to
> paint ASAP after the DOM has been dirtied, without waiting for vsync. ASAP
> mode can be enabled by setting:
> 
> preferences = {'layout.frame_rate': 0 }
> 
> For the test. Looking through the rest of the tests, I seem to see these
> other two preferences usually set along with it:
> 
> 'docshell.event_starvation_delay_hint': 1,
> 'dom.send_after_paint_to_content': False
> 
> I'm not sure if they're necessary.
> 
> At any rate, does the regression go away if we use ASAP mode?

Sure enough, with the ASAP mode prefs set we see 5-7% improvements on win/linux (OSX builds still pending): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=137208d58b5983654b5ab5bde9530d65904f648b&newProject=try&newRevision=cc9d7d5fc182926dbcae7aea347d61f7172bfd5f&framework=1&showOnlyImportant=1. This is actually a bit larger than the initial regression triggered from Bug 1499423.

Is this something we should aim to actually land, or just a step along the way for debugging what to do next?
Flags: needinfo?(bgrinstead) → needinfo?(mconley)
Assignee: nobody → mconley
(In reply to Brian Grinstead [:bgrins] from comment #17)
> Is this something we should aim to actually land, or just a step along the
> way for debugging what to do next?

Yeah, imo, this is something we should consider landing. Without ASAP mode, we risk falling into these 16ms traps, and detecting real regressions and improvements gets harder.

So yes, I support switching this test to using ASAP mode. Frankly, any test that is measuring how long it takes to paint should probably be using ASAP mode to avoid the 16ms coarseness.
Flags: needinfo?(mconley)
Mike / Brian, can one of you take the reins to land the switch to ASAP mode?
Flags: needinfo?(mconley)
Flags: needinfo?(bgrinstead)
I think this can be un-tracked and wontfix for 65. The fix here will be entirely in the test harness and shouldn't need an uplift.
Flags: needinfo?(bgrinstead)
Mike, you mentioned in Comment 18 also switching to ASAP for other tests that wait for paint. Do you have a list of those tests?

(In reply to Brian Grinstead [:bgrins] from comment #21)

Mike, you mentioned in Comment 18 also switching to ASAP for other tests
that wait for paint. Do you have a list of those tests?

Mike, can you provide this list?

I can. Note that comment 18 is just my opinion - you'll probably want to double-check with the people who tend to work on those tests before making any changes to their underlying assumptions.

I think this is the list:

tabpaint
tp5n
tp5o (all variants)
tpaint
tps
tresize
tsvg (all variants - though you might want to double-check with someone from Graphics on that)
tscrollx
ts_paint (all variants)

Flags: needinfo?(mconley)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #23)

I can. Note that comment 18 is just my opinion - you'll probably want to double-check with the people who tend to work on those tests before making any changes to their underlying assumptions.

I think this is the list:

tabpaint
tp5n
tp5o (all variants)
tpaint
tps
tresize
tsvg (all variants - though you might want to double-check with someone from Graphics on that)
tscrollx
ts_paint (all variants)

:rwood, :jmaher are you ok with updating these tests to use the ASAP mode?

Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #24)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #23)

I can. Note that comment 18 is just my opinion - you'll probably want to double-check with the people who tend to work on those tests before making any changes to their underlying assumptions.

I think this is the list:

tabpaint
tp5n
tp5o (all variants)
tpaint
tps
tresize
tsvg (all variants - though you might want to double-check with someone from Graphics on that)
tscrollx
ts_paint (all variants)

:rwood, :jmaher are you ok with updating these tests to use the ASAP mode?

I'm fine with that if dev has deemed that mode is required. I am not the author of these tests though, the original test authors/developers would need to confirm if that is ok.

Flags: needinfo?(rwood)

great question and I am glad to see progress here. We do switch ts_paint (and related startup tests) every year, it is due for another update.

As for tp5n (this is really xperf where we look for file IO) there might not be a need. tp5o is on the way out the door to be fully replaced by tp6 inside of raptor- we are waiting for the responsiveness metric to be working, probaby in Q2.

talos does have good support for ASAP mode, on the wiki:
https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

we have ownership information if there are questions.

Flags: needinfo?(jmaher)

:bgrins could we land the ASAP mode only for our about_preference_basic test? I'd rather file a separate bug asking the owners of the other tests whether they're ok with also switching to this mode.

Flags: needinfo?(bgrinstead)

Our best guess as to what caused this regression is that we hit a tiny cliff,
where enough delay was introduced by the change that we ended up painting
for the next frame's vsync. ASAP mode causes us to paint ASAP after the DOM
has been dirtied, without waiting for vsync.

Pushed by bgrinstead@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/59e2f15b38f5
Enable ASAP mode in about_preferences_basic;r=jaws
Flags: needinfo?(bgrinstead)

== Change summary for alert #19146 (as of Mon, 04 Feb 2019 18:54:04 GMT) ==

Improvements:

18% about_preferences_basic linux64 pgo e10s stylo 117.26 -> 96.35
17% about_preferences_basic linux64 opt e10s stylo 127.18 -> 104.98
16% about_preferences_basic linux64-qr opt e10s stylo 128.67 -> 108.10
16% about_preferences_basic windows7-32 pgo e10s stylo 99.23 -> 83.52
15% about_preferences_basic windows10-64 pgo e10s stylo 101.03 -> 85.78
15% about_preferences_basic windows10-64 opt e10s stylo 109.28 -> 93.33
14% about_preferences_basic windows7-32 opt e10s stylo 107.50 -> 91.97
13% about_preferences_basic windows10-64-qr opt e10s stylo 108.59 -> 94.07

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19146

Backout regression:

== Change summary for alert #19159 (as of Tue, 05 Feb 2019 14:04:56 GMT) ==

Regressions:

29% about_preferences_basic linux64 pgo e10s stylo 95.19 -> 122.47
20% about_preferences_basic windows7-32 pgo e10s stylo 83.58 -> 100.65
20% about_preferences_basic linux64-qr opt e10s stylo 107.74 -> 129.38
19% about_preferences_basic linux64 opt e10s stylo 104.58 -> 124.40
16% about_preferences_basic windows7-32 opt e10s stylo 91.67 -> 106.77
16% about_preferences_basic windows10-64 opt e10s stylo 94.00 -> 109.30
16% about_preferences_basic windows10-64 pgo e10s stylo 86.03 -> 99.98
15% about_preferences_basic windows10-64-qr opt e10s stylo 94.31 -> 108.08

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19159

Brian, how's this looking for 66?

Flags: needinfo?(bgrinstead)

(In reply to Tim Spurway [:tspurway] from comment #33)

Brian, how's this looking for 66?

There's no need to uplift this, it's a test-only change and not affecting performance for users.

Flags: needinfo?(bgrinstead)

:jaws could you look over the failure reasons for our perf test? We need to update it, so we store correct metrics.

Flags: needinfo?(jaws)
Keywords: stalled
Whiteboard: qf
Whiteboard: qf → [qf]

comment 34 suggests this is not a qf-relevant bug. --> [qf-]

Whiteboard: [qf] → [qf-]

I'm going through some old perf regression bugs, and this one seems like it stalled a little over a year ago after a patch was backed out for causing failures. There are a couple of open needinfo's, but I'm adding a new one for :bgrinstead as the author of the patch in case that getes us moving again.

Flags: needinfo?(bgrinstead)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #37)

I'm going through some old perf regression bugs, and this one seems like it stalled a little over a year ago after a patch was backed out for causing failures. There are a couple of open needinfo's, but I'm adding a new one for :bgrinstead as the author of the patch in case that getes us moving again.

I just checked and adding 'dom.send_after_paint_to_content': False as other ASAP tests do seems to get past the hang locally, though I don't understand why. Try is closed atm but I'll check later to confirm and see if it can be re-pushed.

(In reply to Brian Grinstead [:bgrins] from comment #38)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #37)

I'm going through some old perf regression bugs, and this one seems like it stalled a little over a year ago after a patch was backed out for causing failures. There are a couple of open needinfo's, but I'm adding a new one for :bgrinstead as the author of the patch in case that getes us moving again.

I just checked and adding 'dom.send_after_paint_to_content': False as other ASAP tests do seems to get past the hang locally, though I don't understand why. Try is closed atm but I'll check later to confirm and see if it can be re-pushed.

False alarm - those tests still fail in automation when changing that preference (https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b0b58ac2a30d0352de2806c56652bd0141d3704&selectedTaskRun=YLXWaE3jRACSlE9coW0UIA.0). I also tried setting tploadnocache = True which I noticed other ASAP tests do and it failed. Ditto with changing both of them in the same patch.

Given that this isn't affecting users, I'm not planning to spend more time debugging this. I'll suggest we wontfix it, but if Jared or Mike would like to keep it open for further investigation that's fine with me too.

Flags: needinfo?(bgrinstead)
Attachment #9041238 - Attachment is obsolete: true

Ionut, the data for this regression is 2 years old, whch means the graph is not available anymore. Could you look into and eventually update the status of the bug?

Flags: needinfo?(igoldan)

I'm inclined to close this as WONTFIX, as long as Mike or Jared are OK with this. Too much time has passed without this issue being addressed.

Flags: needinfo?(igoldan)
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mconley)
Flags: needinfo?(jaws)
Resolution: --- → WONTFIX

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Performance Impact: --- → -
Whiteboard: [qf-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: