Closed Bug 1522827 Opened 6 years ago Closed 5 years ago

9.5 - 20.38% raptor-tp6-reddit-firefox (linux64, linux64-qr, windows10-64, windows10-64-qr, windows7-32) regression on push d9cde93070b09b7f78a2d01208ed78cde55db092 (Wed Jan 23 2019)

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fix-optional

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, perf-alert, regression)

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a02dd51c39da142f3cbe22569483eab33581d305&tochange=d9cde93070b09b7f78a2d01208ed78cde55db092

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

20% raptor-tp6-reddit-firefox windows10-64-qr opt 854.29 -> 1,028.38
15% raptor-tp6-reddit-firefox windows7-32 opt 829.79 -> 954.27
14% raptor-tp6-reddit-firefox windows10-64 opt 876.29 -> 1,000.94
12% raptor-tp6-reddit-firefox linux64-qr opt 846.29 -> 945.33
10% raptor-tp6-reddit-firefox linux64 opt 811.89 -> 889.00

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=18849

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Component: General → XPCOM
Product: Testing → Core
Flags: needinfo?(nfroyd)

This change makes zero sense to me. The commit in question is only removing code that never gets executed, but somehow that code is responsible for 10% performance? I don't buy it.

acreskey, I know you've been looking at raptor/performance. Do you have any suggestions as to what I should be looking for here?

Flags: needinfo?(nfroyd) → needinfo?(acreskey)

Those numbers look strange for dead code removal.
What about applying the reverse of this patch (i.e. re-adding the Scheduler code), and pushing that to try?

Flags: needinfo?(acreskey)

(In reply to Andrew Creskey from comment #3)

Those numbers look strange for dead code removal.
What about applying the reverse of this patch (i.e. re-adding the Scheduler code), and pushing that to try?

I've pushed this to try. On the Base side (left) there's the dead code removal, on the New there's the backout of bug 1485216:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f4a3e6e29701db51dd414dd903a324fa74517195&newProject=try&newRevision=097b669e8da95cc55c04e437949ab7291f8135b8&framework=10

Thanks :igoldan
Looking into the subtests for windows10-64 it's ttfi that looks to be improved by the backout:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=f4a3e6e29701db51dd414dd903a324fa74517195&newProject=try&newRevision=097b669e8da95cc55c04e437949ab7291f8135b8&originalSignature=1822323&newSignature=1822323&framework=10

And if I have this right, this is the recent history for ttfi for that configuration:
https://treeherder.mozilla.org/perf.html#/graphs?series=try,1822326

It seems on some days that it runs from ~2500 to ~7500 ms?

Any updates on this matter?

Flags: needinfo?(acreskey)

:igoldan, I'm not sure how I can help, but from looking at your rollback it's the ttfi subtest which is regressing.
But ttfi is such a noisy metric that I'm not sure if it's a good one to be concerned about:
https://treeherder.mozilla.org/perf.html#/graphs?series=try,1822326

All I can suggest it to get a few more runs in on that rollback?

Flags: needinfo?(acreskey)
Depends on: 1535551

(In reply to Andrew Creskey from comment #7)

:igoldan, I'm not sure how I can help, but from looking at your rollback it's the ttfi subtest which is regressing.
But ttfi is such a noisy metric that I'm not sure if it's a good one to be concerned about.

ttfi used to be a stable metric. Unfortunately, landing of bug 1485216 caused it to become very noisy for a full week. On OSX, ttfi even remained noisy since then.

Somewhere around January 30, a patch landed (don't yet know which) which eliminated most of this noise on Windows & Linux platforms. It also partially brought down ttfi to previous baselines. This confirms that bug 1485216 caused a real regression.

Once we find out what bug re stabilized ttfi, we'll have to see whether they're related somehow.
If they're not, then we need to consider getting back the 10 to 20% raptor losses.

Good investigation :igoldan, let me know if I can help.

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)

(In reply to Andrew Creskey from comment #7)

:igoldan, I'm not sure how I can help, but from looking at your rollback it's the ttfi subtest which is regressing.
But ttfi is such a noisy metric that I'm not sure if it's a good one to be concerned about.

ttfi used to be a stable metric. Unfortunately, landing of bug 1485216 caused it to become very noisy for a full week. On OSX, ttfi even remained noisy since then.

Somewhere around January 30, a patch landed (don't yet know which) which eliminated most of this noise on Windows & Linux platforms.

I looked up the patch and it is this one. One of bug 1523158, bug 1521786, bug 1506949 made it.

Is any of them related to bug 1485216?

Flags: needinfo?(acreskey)

Olli Pettay says bug 1521786 influenced fcp, while bug 1506949 influenced loadtime.

Bug 1523158 has nothing to do with the Raptor tests from comment 0.

I can see how bug 1521786 could impact ttfi -- we only start looking for ttfi after fcp has occurred.
So making fcp come sooner would give us a larger window for ttfi to be detected.

I'm not 100% sure on bug 1506949 - but if it affects the idle queue then I can see how it could impact ttfi, since ttfi requires 50ms of no activity on the idle queue.

See Also: → 1524189

So I'm not very familiar with the event queues, but I think that the removal of the LabeledEventQueue in bug 1485216 may have caused the instability and regression in ttfi.

We now create one event queue for content process main threads, but previously we created LabeledEventQueue, which is a collection of queues.
See:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9cde93070b0#l21.87

Some description here:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9cde93070b0#l15.45

Apparently LabeledEventQueue was problematic performance wise.

:igoldan -- would it be useful if I reverted just the LabeledEventQueue change of the commit and tested that?

(In reply to Andrew Creskey from comment #13)

:igoldan -- would it be useful if I reverted just the LabeledEventQueue change of the commit and tested that?

Yes, let's try this out.

Ah, I didn't realize that the LabeledEventQueue depends on the Scheduler.
So I'll put them both back in and we can toggle the main thread creation logic here in different revisions to see if that's the source of the perf difference:
https://searchfox.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#226

So my idea of re-adding the LabeledEventQueue in Comment 13 was overly optimistic.
Other dependencies of that system have also been removed -- See Bug 1525031.

I think we should get the input of someone more familiar with the event queue scheduling to see if my Comment 13 makes sense.
:froydnj ?

Flags: needinfo?(acreskey) → needinfo?(nfroyd)

(In reply to Andrew Creskey from comment #16)

So my idea of re-adding the LabeledEventQueue in Comment 13 was overly optimistic.
Other dependencies of that system have also been removed -- See Bug 1525031.

I think we should get the input of someone more familiar with the event queue scheduling to see if my Comment 13 makes sense.
:froydnj ?

LabeledEventQueue was only used if certain preferences were set:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d9cde93070b0#l21.86

and we never set those preferences. Unless the test infrastructure somehow was...?

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #17)

(In reply to Andrew Creskey from comment #16)

So my idea of re-adding the LabeledEventQueue in Comment 13 was overly optimistic.
Other dependencies of that system have also been removed -- See Bug 1525031.

I think we should get the input of someone more familiar with the event queue scheduling to see if my Comment 13 makes sense.
:froydnj ?

LabeledEventQueue was only used if certain preferences were set:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d9cde93070b0#l21.86

and we never set those preferences. Unless the test infrastructure somehow was...?

As far as I can tell, the UseMultipleQueues pref was set, but the codepath is very windy.

The pref is defaulted to true here:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9cde93070b0#l7.13

Scheduler::GetPrefs() return this pref:
https://hg.mozilla.org/integration/mozilla-inbound/rev/784f80261f91#l2.731

And the result of Scheduler::GetPrefs() is eventually passed into Scheduler::SetPrefs()
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9cde93070b0#l3.32

Ugh, ugh, ugh. So bits of the Quantum DOM work were having a performance effect. I think what we were (inadvertently?) were doing before were grouping runnables related to a particular tab/document/docgroup together, and then attempting to prioritize that group before looking at other things.

I guess we should try putting bug 1525031 and (parts of) bug 1485216 back in. Or we can come up with an alternative mechanism to prioritize events based on their associated tabs.

I know that we treat ttfi equally in the geomean raptor tp6 scores that this regressed.
But the other pageload metrics were not regressed, from what I saw.

It might be worth discussing this with an expert, e.g. :jesup, to get his opinion.

Shifting ttfi while improving architecture might not be a bad tradeoff, particularly if the scheduling in general is going to be given a lot of focus in the near future.

(In reply to Andrew Creskey from comment #20)

It might be worth discussing this with an expert, e.g. :jesup, to get his opinion.

Randell, could you provide some assistance here?

Flags: needinfo?(rjesup)

I'm on PTO (sorry I missed this), but...

TTFI is very sensitive to any event that takes 50ms to run; if you push a 50+ms event to later in order to run <50ms events earlier (instead of the other way around), it will delay TTFI by that amount. So changes to ordering, especially in events that happen after "load" can have a big impact on TTFI. The question is what events are getting pushed - if they're events related to the tab that just loaded, that might be the cause. And events related to the just-loaded-tab actually are more important here for user experience - while TTFI/TTI are problematic measurements, they do try to get at an important part of user experience - when a page becomes smoothly responsive.

This is an aspect of Scheduling; I had thought per the removals referenced here that the scheduling code we had was turned off; apparently it wasn't (but is now removed).

We might need/want to reintroduce it, and as part of the ongoing scheduling work/discussions we may need to consider using it to implement some of the scheduling decisions we make. We also need to think how this will work (and if it can) in a Fission world, which may mean dumping the current structure and implementing some master cross-process structure (Quantum scheduling was mostly focused on making multiple tabs play better in the same processes)

NI smaug and bas for visibility for the Scheduling meetings

Flags: needinfo?(rjesup)
Flags: needinfo?(bugs)
Flags: needinfo?(bas)

It sounds like this regression was primarily related to ttfi. Due to excessive noise and uncertain value we were getting from the ttfi measurement, we disabled it in bug 1536874. Even if this regression was now fixed, we wouldn't see an improvement in these tests. Randell: do you think we can close this bug, or is there value in keeping it open?

Flags: needinfo?(rjesup)

There is value - this impacts the scheduling work, and while ttfi is not something we're monitoring currently, it is something (or close to something) we care about; it is an indirect (and problematic) look at a form of jank.

Flags: needinfo?(rjesup)

I'm not sure our TTFI measures jank very accurately. If user was interacting with the page, there would be pending input events and we'd
yield certain slow operations sooner (like reflow and DOM creation).

Flags: needinfo?(bugs)

At this point we're not measuring TTFI and I don't believe there are immediate plans to reintroduce it. With this in mind, are we going to be able to resolve this regression?

Closing this as WONTFIX as we are not running TTFI tests for some time now.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Flags: needinfo?(bas)
You need to log in before you can comment on or make changes to this bug.