Closed
Bug 1316575
Opened 8 years ago
Closed 7 years ago
4.35% tp5o Main_RSS (osx-10-10) regression on push fab432069073 (Tue Nov 8 2016)
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox50 | --- | unaffected |
firefox51 | --- | unaffected |
firefox52 | --- | wontfix |
firefox53 | --- | wontfix |
firefox54 | --- | wontfix |
People
(Reporter: jmaher, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: perf, regression, talos-regression)
Talos has detected a Firefox performance regression from push fab432069073. As author of one of the patches included in that push, we need your help to address this regression. Regressions: 4% tp5o Main_RSS osx-10-10 opt e10s 315003903.33 -> 328711710.23 Improvements: 4% tps summary linux64 opt e10s 43.1 -> 41.56 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=4064 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•8 years ago
|
||
here is a compare view with the retriggers to show the effects of this change only: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=bd9dc9379305&newProject=mozilla-inbound&newRevision=fab432069073&framework=1 for the regression, we have gone bi-modal :( :smaug, can you take a look at this and help determine what we should do to resolve this bug as per the guidelines in the original comment?
Flags: needinfo?(bugs)
Comment 2•8 years ago
|
||
I was expecting regressions, but we should see also improvements in some tests measuring stability of painting performance. I guess tps ends up sort of doing that. (tryserver is unfortunately often too unreliable to detect talos regressions, or would need to trigger all the tests many many times). Based on manual testing, and some feedback from others too, I'm pretty confident that we should not backout the patch. What we possibly could do is to slow down vsync processing, or actually refreshdriver handling while we're loading top level pages... except that it would make using multiple windows jankier.
Flags: needinfo?(bugs)
Comment 3•8 years ago
|
||
I'm pretty sure this is just the timing of GC and the memory snapshot getting changed. Maybe we should make the test framework use requestIdleCallback() somehow to wait for the browser to truly idle before trying to snapshot the memory.
Comment 4•8 years ago
|
||
Or hmm, maybe I could do the slowing down in a bit different level. investigating.
Comment 5•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #3) > I'm pretty sure this is just the timing of GC and the memory snapshot > getting changed. Why you think so? With the new setup we can process vsync sooner, so while still loading, the page may end up doing painting more often.
Comment 6•8 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #5) > (In reply to Ben Kelly [:bkelly] from comment #3) > > I'm pretty sure this is just the timing of GC and the memory snapshot > > getting changed. > Why you think so? With the new setup we can process vsync sooner, so while > still loading, the page > may end up doing painting more often. Well, vsync and paint running first could delay a GC till after the test decides to snapshot memory. This would look like a memory regression.
Comment 7•8 years ago
|
||
hmm, right. Yeah, I'm not at all familiar what tp5o actually does, or how the memory is measured. It doesn't seem to be documented in https://wiki.mozilla.org/Buildbot/Talos/Tests
I think it's here: http://searchfox.org/mozilla-central/source/testing/talos/talos/pageloader/chrome/memory.js The critical call seems to be here: http://searchfox.org/mozilla-central/search?q=symbol:%23collectMemory&redirect=false Interestingly, we don't force any GCs or wait for anything before the measurement.
Reporter | ||
Comment 9•8 years ago
|
||
I am open to changing that in pageloader if we need to.
Comment 10•8 years ago
|
||
Or we could just say that the regression is understandable, and do nothing. I would expect us to get lots of similar-ish changes to the numbers while we're adding more quantum stuff.
Reporter | ||
Comment 11•8 years ago
|
||
would it make sense to review how we collect memory so we have something that we can track and trust?
Comment 12•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #11) > would it make sense to review how we collect memory so we have something > that we can track and trust? We already know that measurement is no good on e10s (bug 1250169, comment 15). Whether or not to force a GC is an interesting question, we've seen leaks in the past that were cleaned up by a forced GC (minimize in about:memory), but weren't with just incremental. For AWSY we measure both.
Updated•8 years ago
|
Component: Untriaged → XPCOM
Product: Firefox → Core
Updated•8 years ago
|
status-firefox50:
--- → unaffected
status-firefox51:
--- → unaffected
status-firefox53:
--- → affected
Comment 13•8 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #12) > (In reply to Joel Maher ( :jmaher) from comment #11) > > would it make sense to review how we collect memory so we have something > > that we can track and trust? > > We already know that measurement is no good on e10s (bug 1250169, comment > 15). Can you clarify a bit, Eric? Does this mean we should close this regression?
Flags: needinfo?(erahm)
Comment 14•7 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #13) > (In reply to Eric Rahm [:erahm] from comment #12) > > (In reply to Joel Maher ( :jmaher) from comment #11) > > > would it make sense to review how we collect memory so we have something > > > that we can track and trust? > > > > We already know that measurement is no good on e10s (bug 1250169, comment > > 15). > > Can you clarify a bit, Eric? Does this mean we should close this regression? As long as we have a test that sums the RSS(main) + USS(child processes) -- I think we do -- I'd say we can disable this test on e10s and ignore the regression.
Flags: needinfo?(erahm)
Reporter | ||
Comment 16•7 years ago
|
||
There are a handful of things which cause me to get confused, dealing with memory management and talos have always confused me. I don't believe we have proper memory reporting in talos when it comes to main/child processes. Unfortunately I am not clear on what to do to fix this inside of talos- ideally we could get AWSY running on all platforms and track memory that way. As to the question of closing this, the regression in talos numbers still exists: https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=%5Bmozilla-inbound,d9b2865ade63ca5702db9cbd1572d36051440fd5,1%5D&series=%5Bautoland,d9b2865ade63ca5702db9cbd1572d36051440fd5,0%5D&selected=%5Bmozilla-inbound,d9b2865ade63ca5702db9cbd1572d36051440fd5%5D It would be sad to close this with no plan to fix talos or ensure we are measuring the right numbers.
Flags: needinfo?(jmaher)
Comment 17•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #16) > It would be sad to close this with no plan to fix talos or ensure we are > measuring the right numbers. Can you file this and close this bug out?
Flags: needinfo?(jmaher)
Comment 19•7 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #17) > Can you file this and close this bug out? Agree as long as it's been clarified and there is a follow-up bug being tracked by module owner with priority.
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•