Closed Bug 1302347 Opened 5 years ago Closed 5 years ago

7.74 - 8.46% tart (windows7-32, windows8-64) regression on push 32c103f71eff47445929cc08ce0947b9b5c11ab6 (Tue Sep 13 2016)

Categories

(Firefox :: Untriaged, defect)

51 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox51 --- affected

People

(Reporter: ashiue, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push 32c103f71eff47445929cc08ce0947b9b5c11ab6. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  8%  tart summary windows7-32 opt     6.06 -> 6.57
  8%  tart summary windows8-64 opt     5.43 -> 5.85


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=3078

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Whaat, again. I did run the patch via try.
As I mentioned to Olli in person, I think the issue here is that this patch assumes we have a few free ms in every frame, because we only need a new frame every 15ms, but TART does not have any limit on how often it produces frames, so any additional overhead is a regression. Note that the GC already does this, but it was hooked into this mechanism before TART landed, so there was no regression. Either we need some way to detect this test is running, or come up with a smarter way to decide on the budget (eg in this case we have 0ms left over so we wouldn't bother running it) or we will just need to accept the regression.
I think I'll add a pref check for layout.frame_rate and if it is 0 or so, skip running GC/CC slices.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #4)
> I think I'll add a pref check for layout.frame_rate and if it is 0 or so,
> skip running GC/CC slices.

Please don't. See next.

(In reply to Andrew McCreight [:mccr8] from comment #3)
> Either we need some way to detect this test is running

That doesn't sound like a great idea to me. We don't want the code to run in a special way just for the test, both due to non-required complexity of the code, but also to be able to test "real" code as much as possible.

> or come up with a smarter way to decide on the budget

If you think there's a smarter way to decide on the budget, then this sounds like a good thing to have regardless of this test.

The case where there's no budget left is not unique to this or other tests - it actually does happen in the wild too, depending on the page the user is at, on the system, etc.

If you think the tradeoff here is both expected and fair, and the test results represent this tradeoff reasonably, just say so and we'll take the regression.

Your call.
(In reply to Avi Halachmi (:avih) from comment #5)
> The case where there's no budget left is not unique to this or other tests -
> it actually does happen in the wild too, depending on the page the user is
> at, on the system, etc.

Also, TART actually explicitly take such issues into account, and leaves the browser idle for ~1000ms IIRC between animations. The animations themselves are typically ~200ms, so the browser is idle (animation/test wise) for the majority of the test run.
but it is totally non-realistic to effectively spin event loop all the time for refreshdriver.
GC/CC expect that frame budget larger, in practice 16ms.
So, if we take out also GC slices from this picture, we should get TART which gives more accurate information about frame speed without some randomly kicking GC slices.
So, I think not running GC/CC slices here is reasonable.
We may want to have different test where refresh driver ticks normally and also keep GC/CC slices working there.
(In reply to Olli Pettay [:smaug] from comment #7)
> but it is totally non-realistic to effectively spin event loop all the time
> for refreshdriver.
> GC/CC expect that frame budget larger, in practice 16ms.
> So, if we take out also GC slices from this picture, we should get TART
> which gives more accurate information about frame speed without some
> randomly kicking GC slices.

As far as I understand, ASAP mode, while probably not without fault, is the only way we can measure performance changes, and is as close as we can get to real world cases where we do run out of frame budget because the page is heavy enough or the system is weak enough.

We've discussed this on IRC, and decided to take this regression (and also measure the GC/CC slices), and if at some stage we notice that the test became less useful or more noisy due to that, then @smaug will go ahead and exclude those while running the test.
ok, landing the patch again and marking this wontfix, per discussion with Avih.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
thank you for gettting to a resolution so fast!
You need to log in before you can comment on or make changes to this bug.