Closed Bug 1419758 Opened 7 years ago Closed 7 years ago

11.77 - 33.87% tp6_facebook / tp6_facebook_heavy (linux64, windows10-64) regression on push b81d21aaf172b4edff7ae90041d707950121dd34 (Mon Nov 20 2017)

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- unaffected
firefox58 --- unaffected
firefox59 --- fixed

People

(Reporter: igoldan, Assigned: jandem)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=b81d21aaf172b4edff7ae90041d707950121dd34 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 34% tp6_facebook_heavy summary windows10-64 pgo e10s 264.44 -> 354.00 30% tp6_facebook summary windows10-64 pgo e10s 264.19 -> 342.17 27% tp6_facebook_heavy summary linux64 pgo e10s 303.25 -> 385.88 23% tp6_facebook summary linux64 pgo 1_thread e10s 310.73 -> 383.42 12% tp6_facebook summary windows10-64 opt e10s 298.17 -> 333.25 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10647 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → JavaScript Engine: JIT
Product: Firefox → Core
:jandem It looks like bug 1415853 caused some serious performance regressions on our facebook tp6 tests. Can you please investigate this issue?
Flags: needinfo?(jdemooij)
I'll take a look tomorrow. I wonder if it's spawning workers :/
I'll take a closer look tomorrow, but note that (some of) these graphs look very bimodal: https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland,20feb5a680732d65835c0e18ab64d717f1897bfb,0 Not sure yet what to make of this.
Joel, any thoughts on comment 3? I'm happy to investigate (I know it showed up on multiple platforms), but the graph looks super bimodal since the bytecode cache landed so I'm not sure if I'll find anything.
Flags: needinfo?(jmaher)
bi-modal data is still useful for finding patterns, but it does require a bit more flexibility in considering a judgement call. In this case on mozilla-inbound we show a clear bimodal pattern, but then it becomes top heavy on November 21st instead of more evenly spread out. :igoldan, can you do two things: 1) look at a few data points prior to the regression (maybe 2 low ones and 2 high ones) and look at the raw values for facebook, possibly we are just getting one or more higher points to post high, but the average might be the same 2) look at 2 data points after the shift to high and see what the raw values are I used this series: https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland,20feb5a680732d65835c0e18ab64d717f1897bfb,0 If we find that we are just slightly off, then we are just straddling a bi-modal issue and it could be we are seeing a 1% regression instead of a 10%+ regression. I also thought this might be related to a weekends case where lower frequency of test running yields different results, but that doesn't seem to be the case- worth extra eyes to confirm. If we find that all the values are much higher, then I think we have reason to assume there is a regression worth investigating. It seems as though we keep getting regressions on facebook and not as much the other pages- possibly we need to look at why that is the case
Flags: needinfo?(jmaher) → needinfo?(igoldan)
OK so if I disable the bytecode cache, the regression seems to be more visible/clear on Try, probably because it gets rid of the bimodal results. I'll try bisecting the changes in this patch so we know better where the problem is.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #5) > :igoldan, can you do two things: > 1) look at a few data points prior to the regression (maybe 2 low ones and 2 > high ones) and look at the raw values for facebook, possibly we are just > getting one or more higher points to post high, but the average might be the > same > 2) look at 2 data points after the shift to high and see what the raw values > are > > I used this series: > https://treeherder.mozilla.org/perf.html#/ > graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland, > 20feb5a680732d65835c0e18ab64d717f1897bfb,0 I did multiple retriggers on pushes before and after. The regression point is pretty clear.
Flags: needinfo?(igoldan)
I tracked this down to a small part of the patch, fortunately it's not something fundamental. It still seems pretty random and weird but I think we can work around it. I'll do some more Try pushes to see if there's a better fix.
Attached patch PatchSplinter Review
In Baseline's TryAttachCallStub, we used to attach a stub when the callee has JIT code, but now with the interpreter stub we can attach a stub even if the callee doesn't have JIT code (yet). What seems to happen is that we now allocate more template objects (we allocate them when we attach an IC stub when we're constructing) and I think that triggers a GC somehow. Here's a workaround to attach a constructing stub only if the callee has JIT code. I hope this will fix the Talos regression.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8931660 - Flags: review?(bbouvier)
Comment on attachment 8931660 [details] [diff] [review] Patch Review of attachment 8931660 [details] [diff] [review]: ----------------------------------------------------------------- Hah, it's a bit sad, but if it fixes the regression, let's do it. Thanks.
Attachment #8931660 - Flags: review?(bbouvier) → review+
Pushed by jandemooij@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0393a89f8a55 Don't attach a constructing stub if the callee doesn't have JIT code yet. r=bbouvier
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
improvements are coming from here: == Change summary for alert #10785 (as of Tue, 28 Nov 2017 22:00:11 GMT) == Improvements: 25% tp6_facebook windows7-32 pgo 1_thread e10s 343.71 -> 259.38 24% tp6_facebook windows10-64 pgo e10s 343.33 -> 259.33 11% tp6_facebook linux64 opt 1_thread e10s 352.38 -> 313.46 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10785
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: