2.73 - 11.13% a11yr / tps (linux64, osx-10-10, windows7-32, windows8-64, windowsxp) regression on push 24d97dc514d4b8fe608e9aee7527615c1e613606 (Tue Nov 8 2016)

RESOLVED WONTFIX

Status

()

defect
RESOLVED WONTFIX
3 years ago
3 months ago

People

(Reporter: ashiue, Assigned: bkelly)

Tracking

({perf, regression, talos-regression})

52 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 affected)

Details

Reporter

Description

3 years ago
Talos has detected a Firefox performance regression from push 24d97dc514d4b8fe608e9aee7527615c1e613606. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 11%  a11yr summary osx-10-10 opt e10s       415.75 -> 462.02
  9%  a11yr summary linux64 opt e10s         734.29 -> 802.02
  9%  a11yr summary windowsxp opt e10s       741.4 -> 804.7
  8%  a11yr summary windows7-32 opt e10s     772.9 -> 838.31
  6%  a11yr summary windows8-64 opt e10s     609.14 -> 644.72
  5%  tps summary linux64 opt                59.36 -> 62.07
  4%  tps summary osx-10-10 opt              64.67 -> 67.58
  4%  tps summary windows7-32 opt            70.18 -> 72.97
  3%  tps summary windows8-64 opt            64.79 -> 66.55


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=4020

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter

Comment 1

3 years ago
I did a lot of retriggers and found this issue might caused by some of following patches:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d19cffeae1febc6d669a024340de25228e544475&tochange=24d97dc514d4b8fe608e9aee7527615c1e613606

Hi Ben, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
Blocks: 1302124, 1300659
Flags: needinfo?(bkelly)
Two thoughts here:

1) Its possible we are triggering the back pressure mechanism in talos.  This will cause us to suspend the window while the main thread catches up. Obviously this would have a big impact on talos test results.
2) Timers in the tests and pages could just be getting delayed due to a busy main thread.  By design we are causing timers to yield to the main thread more in order to keep the browser and content interactive.

If the problem is (1), then we can adjust the back pressure threshold.  I did not expect to hit it during talos.

If the problem is (2), then I'm not sure we want to take any action.  While the overall time may have increased, the change should be reducing jank during load.  That seems preferable to me.

I'll run some tests to see what is happening.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
So, it seems talos has a large number of setTimeout() calls inside its framework.  I count about 12 in here:

https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/pageloader/chrome/pageloader.js

Since we now explicitly cause timers to yield the main thread we have changed the framework itself.

Joel, what do you think about changing talos to use nsITimer directly instead of the window's setTimeout()?  This would avoid any interaction with the content setTimeout().
Flags: needinfo?(jmaher)
Also, a11yr dhtml talos test dubiously uses a setTimeout() at the end of its work before measuring the completion time:

https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/a11y/dhtml.html#39
I would have no problem if pageloader used nsiTimer vs setTimeout- pageloader is long overdue to be re-written, in fact the code is probably coming on 10 years old.

As for the a11y test specifically, we should coordinate with the a11y team if we want to change that.

Ben, if you want to make a patch, it should work on try and I can ensure it gets signed/deployed properly.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) from comment #5)
> As for the a11y test specifically, we should coordinate with the a11y team
> if we want to change that.

I think the test just wants a "next tick" call.  We should probably expose something from pageloader to do that since we don't reliably provide an interface for content to do that.
Interesting, the a11yr regression appears to be due to removing the timer coalescing.  I think what was happening previously was:

1) A timer was waiting to run
2) The test body executes and triggers a bunch of events/runnables which go on the main thread.
3) Final test setTimeout() is queued at time zero.
4) Timer from (1) runs and coalesces the timer from (3) as well.  This causes (3) to run before the events from (2).

I think this was almost certainly wrong for the test.  Why was it using a setTimeout() if not to measure the cost of executing the test?  Getting to run out of order was probably artificially improving its numbers.

Alternatively, if the test does not want to wait for the events/runnable from the test it could just call tprecord() without using setTimeout().

David, do you know who on the a11y team might know about the intent of this test here?
Flags: needinfo?(dbolter)
Note, the talos framework timeouts were unaffected by my patches because they are executed from a chrome window.
(In reply to Ben Kelly [:bkelly] from comment #7)

> David, do you know who on the a11y team might know about the intent of this
> test here?

Probably me, but I don't recall details and unfortunately I didn't document much in the original bug 572486. My quick guess is that I wanted to test our a11y event queue coalescence performance and the flush + setTimeOut was the best (?) hack at the time. I'm very willing to believe there is a better way these days.

I'm not at all familiar with "timer coalescing"...

Alex is this something you could help with?
Flags: needinfo?(dbolter) → needinfo?(surkov.alexander)
(In reply to Ben Kelly [:bkelly] from comment #7)
> Interesting, the a11yr regression appears to be due to removing the timer
> coalescing.  I think what was happening previously was:
> 
> 1) A timer was waiting to run
> 2) The test body executes and triggers a bunch of events/runnables which go
> on the main thread.
> 3) Final test setTimeout() is queued at time zero.
> 4) Timer from (1) runs and coalesces the timer from (3) as well.  This
> causes (3) to run before the events from (2).
> 
> I think this was almost certainly wrong for the test.  Why was it using a
> setTimeout() if not to measure the cost of executing the test?  Getting to
> run out of order was probably artificially improving its numbers.

then we should be in a good shape now, we indeed want to make measuring after the main thread finishes all computations.
Flags: needinfo?(surkov.alexander)
(In reply to alexander :surkov from comment #10)
> then we should be in a good shape now, we indeed want to make measuring
> after the main thread finishes all computations.

Awesome, thanks for confirming!  I think the a11yr regression is WONTFIX then.

I'm going to look at the tps regression now.
Sorry for the wall of text that follows.  I'd like to WONTFIX this, but I think it requires some explanation.

I believe what is happening in the tps regression is:

1) My patches make consecutive timers yield the main thread.
2) This reduces our max pause for timers, but also spreads them out over time a bit.
3) As a result we are more likely to have a timer running or about to run when the tab switch is triggered.
4) In non-e10s we cannot interrupt js for tab switching, so we see a mean regression in tab switch time.
5) We also, however, see the max time for a tab switch drop a bit.

I combined the samples for all sub-tests across multiple runs from treeherder and got these stats:

		Old		New
Min		38.0		38.0
Mean		61.8		64.1
Median		60.0		62.0
Max		137.0		132.0
Std Dev		12.1		11.9
Total		92745.0		96169.0
T Test				0.000000

This data is consistent with the theory above.  The mean and median have regressed by about 2ms.  The max pause, however, has dropped by 5ms.  The t-test shows the mean change is statistically significant, although its hard to say about the max pause.

This only regresses in non-e10s, however.

In e10s we would expect the yielding to produce the same timer spreading and conflicts, but e10s tab switching can interrupt js.  So we would not expect to see the same regression.

Here are some e10s stats I generated locally (again, combining all tps subtest samples):

		Old		New
Min		17.0		13.0
Mean		39.9		36.5
Median		37.0		34.0
Max		92.0		99.0
Std Dev		11.3		11.8
Total		9978.0		9118.0
T Test				0.000001

This shows that e10s does not regress.  In fact, it shows a statistically significant improvement in mean for e10s tab switch time.  However, I don't see this improvement in the perfherder results for the e10s retriggers:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=d19cffeae1fe&newProject=mozilla-inbound&newRevision=24d97dc514d4b8fe608e9aee7527615c1e613606&framework=1&showOnlyImportant=0

At the very least, though, I think we can say that it does not regress like non-e10s.

With all that in mind, I'd like to recommend we WONTFIX this regression.  My reasoning is:

1) Reducing max pauses is beneficial for overall responsiveness in the browser.  At the extreme end of the upside both the browser and content remain responsive even with exponential setTimeout usage.  This benefit is achieved in both e10s and non-e10s.
2) The regression only affects non-e10s.  We are targeting the majority of our perf work at e10s as the future of the platform architecture.
3) The regression is relatively small.  Its 2ms that averages ~60ms, but can vary anywhere from 30ms up to 100+ms.
4) The mitigations I could put in place for this would complicate a very complex piece of code already and reduce the other benefits we see from yielding timers.

Mike, what do you think?
Flags: needinfo?(mconley)
Component: Untriaged → DOM
Product: Firefox → Core
(In reply to Ben Kelly [:bkelly] from comment #12)
> 
> Mike, what do you think?

I think this is some solid reasoning. Thanks for taking the time to put this together!

This is a case where the benefits of this patch clearly outweigh any drawbacks. The fact that it advantages e10s tab switching is particularly compelling for me.

Let's WONTFIX this. Thanks for the fine investigative work, bkelly!
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mconley)
Resolution: --- → WONTFIX
Note, we ended up addressing these talos results as a side effect of the mitigations in bug 1342854.  I still don't fully believe the perf impact is legit, though, based on comment 12.
See Also: → 1342854
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.