Closed Bug 1138999 Opened 9 years ago Closed 8 years ago

4-8% Linux*/Win*/MacOS* most tests! regression on Mozilla-Inbound-Non-PGO (v.39) on March 02, 2015 from push a1a89ff4ee31

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mishravikas, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36

Steps to reproduce:


Talos has detected a Firefox performance regression from your commit a1a89ff4ee31 in bug 762449.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=a1a89ff4ee31&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux,win64,win32,macosx64 -u none -t other  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a sessionrestore_no_auto_restore

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Friday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Blocks: 1138995
Hey Glandium!  It has been a couple months since your changes have tickled the Talos tests, can you comment on this- it is quite a sweeping set of regressions- maybe there is something we can do to minimize this?  Maybe it is worth keeping?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mh+mozilla)
Summary: 4-8% Linux*/Win*/MacOS* tcanvasmark/cart/tpaint/tsvgr_opacity/tsvgx/sessionrestore/sessionrestore_no_auto_restore/tp5o_scroll/tresize/tart/tp5o/tp5o/tp5o/tp5o/ts_paint/a11yr regression on Mozilla-Inbound-Non-PGO on March 02, 2015 from push a1a89ff4ee31 → 4-8% Linux*/Win*/MacOS* all tests! regression on Mozilla-Inbound-Non-PGO (v.39) on March 02, 2015 from push a1a89ff4ee31
A few notes:
- This is not set to ride the trains yet. So the regression will stay on nightly. Presumably, regressions from other things can still be detected and would lead to the same regressions without bug 762449. That is, I don't expect the change of allocator to be affecting the detection of other regressions.
- So far, I haven't received as much regression notices for the PGO counter parts, which actually matter more. As far as try can tell me, tweaking optimization level for jemalloc on non-PGO builds does reduce the regression. Is it worth doing? good question, I don't know. At least, it's possibly worth doing on mac, because those builds are not PGOed, but otoh, mac is not affected (and sadly, the one test that regressed on mac doesn't run on try, but it's not a speed test)
- I'm testing various things to try to make the regression go away, or at least be less severe. I don't know how that will end, but at least, keeping jemalloc3 on allows to find bugs such as bug 1138705.
Flags: needinfo?(mh+mozilla)
> Summary: 4-8% Linux*/Win*/MacOS* tcanvasmark/cart/tpaint/tsvgr_opacity/tsvgx/sessionrestore/sessionrestore_no_auto_restore/tp5o_scroll/tresize/tart/tp5o/tp5o/tp5o/tp5o/ts_paint/a11yr regression on Mozilla-Inbound-Non-PGO on March 02, 2015 from push a1a89ff4ee31 → 4-8% Linux*/Win*/MacOS* all tests! 

Note this is far from being all tests.
thanks for the feedback glandium!  I agree that PGO is the most important as that is what we ship.
Summary: 4-8% Linux*/Win*/MacOS* all tests! regression on Mozilla-Inbound-Non-PGO (v.39) on March 02, 2015 from push a1a89ff4ee31 → 4-8% Linux*/Win*/MacOS* most tests! regression on Mozilla-Inbound-Non-PGO (v.39) on March 02, 2015 from push a1a89ff4ee31
Note that I *did* receive PGO regression notifications. But much less.
(In reply to Vikas Mishra [:mishravikas] from comment #6)
> We have more pgo regressions:

You mean less. 14 vs. 30-something.
(In reply to Mike Hommey [:glandium] from comment #7)
> (In reply to Vikas Mishra [:mishravikas] from comment #6)
> > We have more pgo regressions:
> 
> You mean less. 14 vs. 30-something.

I'm sorry if I wasn't clear with the comment, I meant that we have some more regressions which are pgo apart from the ones posted in the first comment which were non-pgo.
Keywords: perf, regression
Whiteboard: [talos_regression]
:glandium, what is the plan with this?  How will we ensure this doesn't ride the trains?  Is there work being done to fix this?
Flags: needinfo?(mh+mozilla)
The code that enables jemalloc3 in configure.in is in a if test -n "$NIGHTLY_BUILD" block, which means it only happens when the version number is *a2, so it won't ride the trains.
Flags: needinfo?(mh+mozilla)
Depends on: 1141079
can we resolve this?
Do you prefer it resolved? I was thinking of keeping it as a reminder of what landing bug 762449 does (as this bug blocks it).
it doesn't matter either way- lets leave it open!
Blocks: 1201802
We don't have PGO results yet do we?
If I remember correctly (really hard to find now), jemalloc 3 was a great win on AWFY. It improved a lot of tests. But jemalloc 4 didn't improved or regressed anything there. Why? The improvements were due to a bug on jemalloc 3, a trade-off not taken on jemalloc 4 or something else? (posting here since this is about regressions/improvements due to jemalloc)
No longer blocks: 1138995
No longer blocks: 1194348
the latest alerts are here:
http://alertmanager.allizom.org:8080/alerts.html?rev=949d31ea7ce7529eb3316f7094d51ff56e099c82&showAll=1&testIndex=0&platIndex=0

less overall alerts, a few seem higher- I broke talos for 5 pushes whilst landing jemalloc4- I am not 100% sure if any of this is related to the few other pushes in there.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.