Closed
Bug 1419758
Opened 7 years ago
Closed 7 years ago
11.77 - 33.87% tp6_facebook / tp6_facebook_heavy (linux64, windows10-64) regression on push b81d21aaf172b4edff7ae90041d707950121dd34 (Mon Nov 20 2017)
Categories
(Core :: JavaScript Engine: JIT, defect)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla59
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox57 | --- | unaffected |
firefox58 | --- | unaffected |
firefox59 | --- | fixed |
People
(Reporter: igoldan, Assigned: jandem)
References
Details
(Keywords: perf, regression, talos-regression)
Attachments
(1 file)
1.21 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
Talos has detected a Firefox performance regression from push:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=b81d21aaf172b4edff7ae90041d707950121dd34
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
34% tp6_facebook_heavy summary windows10-64 pgo e10s 264.44 -> 354.00
30% tp6_facebook summary windows10-64 pgo e10s 264.19 -> 342.17
27% tp6_facebook_heavy summary linux64 pgo e10s 303.25 -> 385.88
23% tp6_facebook summary linux64 pgo 1_thread e10s 310.73 -> 383.42
12% tp6_facebook summary windows10-64 opt e10s 298.17 -> 333.25
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10647
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Updated•7 years ago
|
Component: Untriaged → JavaScript Engine: JIT
Product: Firefox → Core
Reporter | ||
Comment 1•7 years ago
|
||
:jandem It looks like bug 1415853 caused some serious performance regressions on our facebook tp6 tests. Can you please investigate this issue?
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 2•7 years ago
|
||
I'll take a look tomorrow. I wonder if it's spawning workers :/
Assignee | ||
Comment 3•7 years ago
|
||
I'll take a closer look tomorrow, but note that (some of) these graphs look very bimodal:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland,20feb5a680732d65835c0e18ab64d717f1897bfb,0
Not sure yet what to make of this.
Assignee | ||
Comment 4•7 years ago
|
||
Joel, any thoughts on comment 3? I'm happy to investigate (I know it showed up on multiple platforms), but the graph looks super bimodal since the bytecode cache landed so I'm not sure if I'll find anything.
Flags: needinfo?(jmaher)
Comment 5•7 years ago
|
||
bi-modal data is still useful for finding patterns, but it does require a bit more flexibility in considering a judgement call. In this case on mozilla-inbound we show a clear bimodal pattern, but then it becomes top heavy on November 21st instead of more evenly spread out.
:igoldan, can you do two things:
1) look at a few data points prior to the regression (maybe 2 low ones and 2 high ones) and look at the raw values for facebook, possibly we are just getting one or more higher points to post high, but the average might be the same
2) look at 2 data points after the shift to high and see what the raw values are
I used this series:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland,20feb5a680732d65835c0e18ab64d717f1897bfb,0
If we find that we are just slightly off, then we are just straddling a bi-modal issue and it could be we are seeing a 1% regression instead of a 10%+ regression. I also thought this might be related to a weekends case where lower frequency of test running yields different results, but that doesn't seem to be the case- worth extra eyes to confirm.
If we find that all the values are much higher, then I think we have reason to assume there is a regression worth investigating. It seems as though we keep getting regressions on facebook and not as much the other pages- possibly we need to look at why that is the case
Flags: needinfo?(jmaher) → needinfo?(igoldan)
Assignee | ||
Comment 6•7 years ago
|
||
OK so if I disable the bytecode cache, the regression seems to be more visible/clear on Try, probably because it gets rid of the bimodal results.
I'll try bisecting the changes in this patch so we know better where the problem is.
Reporter | ||
Comment 7•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #5)
> :igoldan, can you do two things:
> 1) look at a few data points prior to the regression (maybe 2 low ones and 2
> high ones) and look at the raw values for facebook, possibly we are just
> getting one or more higher points to post high, but the average might be the
> same
> 2) look at 2 data points after the shift to high and see what the raw values
> are
>
> I used this series:
> https://treeherder.mozilla.org/perf.html#/
> graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland,
> 20feb5a680732d65835c0e18ab64d717f1897bfb,0
I did multiple retriggers on pushes before and after. The regression point is pretty clear.
Flags: needinfo?(igoldan)
Assignee | ||
Comment 8•7 years ago
|
||
I tracked this down to a small part of the patch, fortunately it's not something fundamental. It still seems pretty random and weird but I think we can work around it. I'll do some more Try pushes to see if there's a better fix.
Assignee | ||
Comment 9•7 years ago
|
||
In Baseline's TryAttachCallStub, we used to attach a stub when the callee has JIT code, but now with the interpreter stub we can attach a stub even if the callee doesn't have JIT code (yet).
What seems to happen is that we now allocate more template objects (we allocate them when we attach an IC stub when we're constructing) and I think that triggers a GC somehow.
Here's a workaround to attach a constructing stub only if the callee has JIT code. I hope this will fix the Talos regression.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8931660 -
Flags: review?(bbouvier)
Updated•7 years ago
|
status-firefox57:
--- → unaffected
status-firefox58:
--- → unaffected
status-firefox59:
--- → affected
Comment 10•7 years ago
|
||
Comment on attachment 8931660 [details] [diff] [review]
Patch
Review of attachment 8931660 [details] [diff] [review]:
-----------------------------------------------------------------
Hah, it's a bit sad, but if it fixes the regression, let's do it. Thanks.
Attachment #8931660 -
Flags: review?(bbouvier) → review+
Comment 11•7 years ago
|
||
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0393a89f8a55
Don't attach a constructing stub if the callee doesn't have JIT code yet. r=bbouvier
Comment 12•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Updated•7 years ago
|
status-firefox-esr52:
--- → unaffected
Comment 13•7 years ago
|
||
improvements are coming from here:
== Change summary for alert #10785 (as of Tue, 28 Nov 2017 22:00:11 GMT) ==
Improvements:
25% tp6_facebook windows7-32 pgo 1_thread e10s 343.71 -> 259.38
24% tp6_facebook windows10-64 pgo e10s 343.33 -> 259.33
11% tp6_facebook linux64 opt 1_thread e10s 352.38 -> 313.46
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10785
You need to log in
before you can comment on or make changes to this bug.
Description
•