Closed
Bug 1134303
Opened 9 years ago
Closed 9 years ago
20% MacOS* tsvgx regression on Mozilla-Inbound (v.38) on February 17, 2015 from push 4a21032847e0
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: mishravikas, Unassigned)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36 Steps to reproduce: Talos has detected a Firefox performance regression from your commit 4a21032847e0 in bug 1131759. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=4a21032847e0&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tsvg.2C_tsvgx Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p macosx64 -u none -t svgr # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a tsvgx Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Monday, or the offending patch will be backed out! *** Our wiki page oulines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Updated•9 years ago
|
Comment 1•9 years ago
|
||
:shu, can you take a look at this- we had done a series of retriggers and for the tsvgx results your push had 3/4 results >325 whereas previous pushes rarely had a value >325.
Flags: needinfo?(shu)
Summary: 20% MacOS* tsvgx regression on Mozilla-Inbound on February 17, 2015 from push 4a21032847e0 → 20% MacOS* tsvgx regression on Mozilla-Inbound (v.38) on February 17, 2015 from push 4a21032847e0
Comment 2•9 years ago
|
||
Hi Joel. I don't really know how to look into this. Nor do I know how a 1-liner fix to the JS engine can trigger SVG regressions.
Flags: needinfo?(shu)
Comment 3•9 years ago
|
||
Joel, on linux and windows this graph is very stable over the past 3 months (without a regression on Feb 17th), while on OS X it seems very noisy and possibly with a regression. But TBH, I'm not sure I can see a meaningful change at the graph around Feb 17th for OS X: http://graphs.mozilla.org/graph.html#tests=%5B%5B281,63,24%5D,%5B281,64,24%5D,%5B281,63,25%5D,%5B281,63,33%5D,%5B281,63,31%5D%5D&sel=1423152458053.4358,1424288387834,36.11105459707733,406.48142496744777&displayrange=90&datatype=geo And if there is one, it might be due to the specific noise pattern we see on OS X for this test. Not necessarily, but possibly. Should the supposedly offending patch affect different platforms differently?
Comment 4•9 years ago
|
||
(In reply to Avi Halachmi (:avih) from comment #3) > Joel, on linux and windows this graph is very stable over the past 3 months > (without a regression on Feb 17th), while on OS X it seems very noisy and > possibly with a regression. > > But TBH, I'm not sure I can see a meaningful change at the graph around Feb > 17th for OS X: > http://graphs.mozilla.org/graph.html#tests=%5B%5B281,63,24%5D,%5B281,64, > 24%5D,%5B281,63,25%5D,%5B281,63,33%5D,%5B281,63,31%5D%5D&sel=1423152458053. > 4358,1424288387834,36.11105459707733,406. > 48142496744777&displayrange=90&datatype=geo > > And if there is one, it might be due to the specific noise pattern we see on > OS X for this test. Not necessarily, but possibly. > > Should the supposedly offending patch affect different platforms differently? My patch fixes a data race by making the field of a JS engine-internal data structure mozilla::Atomic. AFAIU, mozilla::Atomic is implemented via libc++ <atomic> on both OS X and Linux. (libc++ atomic, of course, may differ on OS X and Linux, but that's such a widely used thing...) I'm unsure about Windows.
Comment 5•9 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=36610eaaf033
Comment 6•9 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=c34948a8fe25
Comment 7•9 years ago
|
||
first push is a baseline, second push is testing a backout of the two patches in the revision suspected in this.
Comment 8•9 years ago
|
||
Joel, could you also try backing out bug 1131759 but not bug 1133530? If these get backed out, there is nothing actionable for me. I cannot try to fix these another way: these are minimal fixes. Suggestions on how to proceed in that case?
Comment 9•9 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=be186d846b05 with only bug 1131759 backed out.
Comment 10•9 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=e2a76c9b8574 using relaxed instead of release-acquire for bug 1131759
Comment 11•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #5) > https://tbpl.mozilla.org/?tree=Try&rev=36610eaaf033 TBPL ended up not running T(s). I re-pushed what I had in comment 10 at https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f2decfeb9c5
Comment 12•9 years ago
|
||
Pushed backout of bug 1131759 to see how Talos goes. https://hg.mozilla.org/integration/mozilla-inbound/rev/8d38dc1dcdeb
Comment 13•9 years ago
|
||
Please don't back out bug 1133530, by the way. If the backout of bug 1131759 doesn't fix it, I would rather not be backed out due the noisiness of the OS X SVG-ASAP and the completely unrelated code in bug 1133530. I would rather have someone familiar with SVG code investigate.
Comment 14•9 years ago
|
||
so the regression is reduced greatly and it starts with the backout, but could also be related to the push right afterwards: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=73715117edd4. eventually I got the correct try server syntax and have a push with both patches backed out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d34d446bf5d2 * I understand that is not ideal, but I wanted to see if the push was the problem. The data coming in is showing data <325 consistently. This is a large regression, so I would like to be a bit more careful about landing this. If we can reduce this a bit more it would be easier to justify accepting a regression for a new feature/bug fix. The call is ultimately in your hands though.
Comment 15•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #14) > The data coming in is showing data <325 consistently. While that statement might be correct, in itself it's not an indication of ~20% regression. IMO it would be better to assess the change by stating the average and stddev before and after. If the average changed by ~20% and stddev is meaningfully lower than 20%, then it's indeed a clear indication for a regression. However, other combinations of these metrics will need further assessment IMO.
Comment 16•9 years ago
|
||
Is there existing data to help identify which of the http://hg.mozilla.org/build/talos/file/default/talos/page_load_test/svgx tests regressed?
Comment 17•9 years ago
|
||
My own try pushes I forgot to push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee1d6a245a44 backout 1131759: 343, 313, 260, 297, 260, 313, 332, 315, 330, 277, 330, 345, 312, 340, 247, 317, 253 https://treeherder.mozilla.org/#/jobs?repo=try&revision=6916b80fa326 backout 1133530: 339, 324, 322, 283, 338, 314, 282, 281, 338, 339, 338, 323, 322, 322, 339, 326, 328 https://treeherder.mozilla.org/#/jobs?repo=try&revision=3313b7c6f9c8 backout both: 318, 312, 312, 315, 313, 264, 318, 313, 260, 298, 260, 313, 262, 312, 262, 298, 260, 313 So indeed, backing out both seems to reduce the regression the most. Bug 1133530 is a minimal fix for a JS crasher though, so I don't really view backing it out as an option.
Comment 18•9 years ago
|
||
s/forgot to push/forgot to comment
Comment 19•9 years ago
|
||
looking at the raw test files: https://treeherder.allizom.org/perf.html#/graphs?timerange=1209600&seriesList=[["mozilla-inbound","11f6e2252086eb1cd1df48f8c3f48ed188fa8614"],["mozilla-inbound","29e209f7a9f70362d45f1ec8f5942af2c231c9fd"]] composite-scale.svg looks like a culprit and tsvgx composite-scale-rotate.svg is suspicious of changing its pattern, but it isn't clear that is affected. Keeping a js crasher landed is a good thing!
Comment 20•9 years ago
|
||
The composite-scale* tests have absolutely no script in them. Perhaps this is a regression in the test harness?
Comment 21•9 years ago
|
||
The harness hasn't changed for at least a week, buildbot changes a few times a week, but it would be odd to have just tsvgx on OSX regress 20%. Also if a backout resolves this a lot of the way, then most likely this is really related to the patch.
Comment 22•9 years ago
|
||
"Related" beyond my ken. Like, icache issues?
Comment 23•9 years ago
|
||
this regression is fully fixed, if we are content with the backout lets close this, otherwise we can leave this open to track changes when we reland the patch or a variation of it. Keep in mind that we will be uplifting on Monday, so most likely we will close this out and open a new bug if/when the time comes.
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 24•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #21) > The harness hasn't changed for at least a week... When I said "Perhaps this is a regression in the test harness?" I meant *of* the test harness.
Updated•9 years ago
|
Keywords: perf,
regression
Whiteboard: [talos_regression]
You need to log in
before you can comment on or make changes to this bug.
Description
•