Closed Bug 1336781 Opened 8 years ago Closed 8 years ago

6.09 - 23.26% tp5o responsiveness (linux64) regression on push d2758f635f72 (Thu Feb 2 2017)

Categories

(Core :: JavaScript Engine, defect)

53 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push d2758f635f72. As author of one of the patches included in that push, we need your help to address this regression. Regressions: 23% tp5o responsiveness linux64 pgo 3.74 -> 4.61 22% tp5o responsiveness linux64 opt 4.5 -> 5.5 6% tp5o responsiveness linux64 pgo 26.76 -> 28.39 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=5004 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
:bhackett, can you help us figure this out, this is only a regression in responsiveness, but a pretty large one at that.
Flags: needinfo?(bhackett1024)
This was a large reorganizing patch that shouldn't have a measurable effect on performance anywhere. Are there similar responsiveness tests on other platforms? How did they react to this patch?
we measure this test on win7/win8 as well, both of them look stable. This looks to be without question the patch landed here and it affects opt as well as pgo. Sometimes pgo will have hiccups due to the pgo process, but in this case we are seeing a common trend. e10s is 20% non-e10s is 5% One other possibility is that this could be related to when we do a clobber build vs incremental, maybe the issue is somewhere in a prior commit- I was unable to backout without many conflicts, possibly pushing to try with a baseline and a backout using this try syntax would confirm/deny any questions: ./mach try -b o -p linux64 -u none -t tp5o --rebuild 6
(In reply to Joel Maher ( :jmaher) from comment #3) > we measure this test on win7/win8 as well, both of them look stable. This > looks to be without question the patch landed here and it affects opt as > well as pgo. Sometimes pgo will have hiccups due to the pgo process, but in > this case we are seeing a common trend. > > e10s is 20% > non-e10s is 5% > > One other possibility is that this could be related to when we do a clobber > build vs incremental, maybe the issue is somewhere in a prior commit- > > I was unable to backout without many conflicts, possibly pushing to try with > a baseline and a backout using this try syntax would confirm/deny any > questions: > ./mach try -b o -p linux64 -u none -t tp5o --rebuild 6 I have seen bogus regressions plenty of times which did not involve PGO. How do I test this locally on my own machine?
Flags: needinfo?(bhackett1024)
we have mach command for talos: ./mach talos-test -a tp5o
(In reply to Joel Maher ( :jmaher) from comment #5) > we have mach command for talos: > ./mach talos-test -a tp5o Hmm, I built a linux x64 browser and tried to run this, but it gets stuck on "Attempting to fetch from 'https://api.pub.build.mozilla.org/tooltool/'...". Is there a place to download these tests from so that I can run them offline?
I think you have to authenticate to the service first, can you manually log into: https://api.pub.build.mozilla.org/login_request?next=%2Ftooltool%2F the pageset is located and described a bit here: https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5n_pages_set if you do things manually, then mach won't necessarily work- I would recommend logging in first. Otherwise you could run a different talos test and then manually put the tp5n.zip pageset into the virtualenv working directory and run mach again with the --no-download option.
OK, I manually downloaded the tp5n.zip file and was able to run tp5 locally on d2758f635f72 and its parent revision. On linux 64 for responsiveness I get 6.08 before and 6.62 after, a 9% regression. This seems smaller than the report in comment 0, though I don't know what the second "tp5o responsiveness linux64 pgo" line in comment 0 is for. I guess that before looking into this further I'd like to ask a couple questions. - How does this measurement tie into the JS engine? This measurement seems designed to test the behavior of the event loop mechanism. d2758f635f72 does not touch anything outside the JS engine, so any (unintended!) effects on behavior it will have will be inside the engine. Would looking at the distribution of times we spend inside JS requests before/after the change help in determining whether this is a bogus regression report? - How important is this measurement? I'm not sure what to make of comment 1, and the regression I'm seeing is considerably smaller than in comment 0.
thanks for getting this running and looking into this. the responsiveness metric is interesting- we do not see a lot of change on this compared to other tests. In the early days we had some discrepancies in the regressions. I typically use a rule of thumb that if we have a regression across many tests or platforms that increases the importance of it. Typically we will see different results locally vs integration, primarily due to the environment- we have clean environments yet older machines: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation By default you are running in e10s mode where we saw the 20%+ regression, it looks like you are seeing ~40% less locally. It is typical to see a smaller regression locally, although this seems a bit more extreme than normal. One idea for further investigation is to use the spsProfiling hooked up to talos. this can be done on try with syntax like so: ./mach try -b o -p linux64 -u none -t tp5o mozharness: --spsProfile likewise locally you can run it with adding |--spsProfile| to the mach commandline. This has helped find areas where we are tracing code paths that we didn't realize. Given that the change here is quite large: https://hg.mozilla.org/integration/mozilla-inbound/rev/d2758f635f72 and also given that this is not affecting other tests, if there is no clear path for fixing this regression, then it would be worth considering a WONTFIX. What is odd is we don't get other JS regressions here, only a responsiveness regression.
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Brian, I am checking in here to see if you are stuck on anything related to this bug? I would like to get to a resolution one way or another.
Flags: needinfo?(bhackett1024)
I haven't made any progress on tracking this down. I think that given the isolated and pretty small nature of this regression and that it doesn't make sense for d2758f635f72 to affect this test and no others, it would probably be best to WONTFIX.
Flags: needinfo?(bhackett1024)
thanks for the update!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.