Closed Bug 1302647 Opened 6 years ago Closed 6 years ago

4.81 - 6.18% damp (linux64, osx-10-10, windows8-64) regression on push a71db5e5d89b7523049f6264451f0cda592f841d (Tue Sep 13 2016)

Categories

(Firefox :: Untriaged, defect)

51 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox48 --- unaffected
firefox49 --- unaffected
firefox50 --- unaffected
firefox51 --- affected

People

(Reporter: ashiue, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push a71db5e5d89b7523049f6264451f0cda592f841d. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  6%  damp summary osx-10-10 opt       293.59 -> 311.74
  5%  damp summary linux64 opt         294.95 -> 310.05
  5%  damp summary windows8-64 opt     298.24 -> 312.58


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=3096

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
After doing some retriggers, this issue might be caused by the following patch:
https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=1f293c131736fe5e9ce86c27954ca9364a9a6a6d&tochange=a71db5e5d89b7523049f6264451f0cda592f841d

Hi James, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
Blocks: 1300861, 1291351
Flags: needinfo?(jlong)
Looking at the subtests I don't understand why this would be affecting any non-debugger probes:

* windows: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=fx-team&originalRevision=1f293c131736fe5e9ce86c27954ca9364a9a6a6d&newProject=fx-team&newRevision=a71db5e5d89b7523049f6264451f0cda592f841d&originalSignature=01c744b65cb2d891e759a1d86b83fcf59b2f3325&newSignature=01c744b65cb2d891e759a1d86b83fcf59b2f3325&framework=1
* osx: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=fx-team&originalRevision=1f293c131736fe5e9ce86c27954ca9364a9a6a6d&newProject=fx-team&newRevision=a71db5e5d89b7523049f6264451f0cda592f841d&originalSignature=2443f35ea4dcddc22273f07bad4be47e2d43d4a0&newSignature=2443f35ea4dcddc22273f07bad4be47e2d43d4a0&framework=1
* linux: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=fx-team&originalRevision=1f293c131736fe5e9ce86c27954ca9364a9a6a6d&newProject=fx-team&newRevision=a71db5e5d89b7523049f6264451f0cda592f841d&originalSignature=6fb4fded0ef2af3f1e6945d9ecfc12d9f0a50ad1&newSignature=6fb4fded0ef2af3f1e6945d9ecfc12d9f0a50ad1&framework=1

I think any of those must be noise within the test, since switching to the new frontend should have no effect on how quickly the toolbox and other tools open.

We should consider any regression in the debugger probes (simple.jsdebugger.close.DAMP, simple.jsdebugger.open.DAMP, simple.jsdebugger.reload.DAMP, complicated.jsdebugger.close.DAMP, complicated.jsdebugger.open.DAMP, complicated.jsdebugger.reload.DAMP).  It'd be worth investigating startup and teardown time a bit within the debugger to see if there's any low hanging fruit there. I don't want to back this out though, since it's only on in nightly and it's in an early state to gather feedback.
What's also very weird is that `complicated.jsdebugger.close` and `complicated.jsdebugger.reload` didn't change at all. Considering those should be more complicated (and slower) you'd think you'd see even more of a difference there.

Honestly, I've found damp to be too difficult to get any useful information for devtools. This is a completely different debugger so I'm not sure we should worry about 50ms differences. We should focus a sprint on performance and see what we could do, but we're not going to strive to hit the old numbers exactly. There are improvements, I'm sure, but we'll probably focus a sprint on them later.

We don't really do much on startup. We're definitely loading a lot more JS up-front, so that may be a little slower. I'm not really sure what we would optimize. And we don't do anything one teardown, we just let the GC eat us up.
Flags: needinfo?(jlong)
I think it is reasonable to accept this and do a small sprint twice/year to boost perf.  In fact that is a model we have expected for many tests.

:jlongster, :bgrins, would a small sprint later in Q4 (maybe hawaii) be a reasonable thing?  I could draft up differences in numbers over the last 6 months (subtests specifically) as a starting point.
(In reply to Joel Maher ( :jmaher) from comment #4)
> I think it is reasonable to accept this and do a small sprint twice/year to
> boost perf.  In fact that is a model we have expected for many tests.
> 
> :jlongster, :bgrins, would a small sprint later in Q4 (maybe hawaii) be a
> reasonable thing?  I could draft up differences in numbers over the last 6
> months (subtests specifically) as a starting point.

A focused sprint on perf would be great.  I think we'd be able to best focus on it outside of the work week - and I know the debugger team has been doing 2 week sprints.  Maybe this could be targeted for right after the project hits enough stability to be able to be turned on in Dev Edition.

Any kind of extra info we can get on subtests would be helpful - as I mentioned in Comment 2 I don't understand why some subtests are showing regressions in this case.  Could this be an issue with the damp code, or is this sort of variability / spillover between subtests expected?
ok, give me a heads up when this is an item on a 2 week sprint and I can make sure to prepare for it.

I think there is expected spillover between subtests, we should analyze raw values before/after to see- this is why we load each subtest 25 times, that would reduce the amount of odd gc/cc behaviour straddling different subtests.
marking as wontfix for now since there is no actionable work items on this- will look forward to a focused effort to hack on perf in the future.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.