Closed Bug 1419758 Opened 8 years ago Closed 8 years ago

11.77 - 33.87% tp6_facebook / tp6_facebook_heavy (linux64, windows10-64) regression on push b81d21aaf172b4edff7ae90041d707950121dd34 (Mon Nov 20 2017)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla59

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox57

---

unaffected

firefox58

---

unaffected

firefox59

---

fixed

People

(Reporter: igoldan, Assigned: jandem)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Patch 8 years ago Jan de Mooij [:jandem] 1.21 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review

Ionuț Goldan [:igoldan]

Reporter

Description

•

8 years ago

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=b81d21aaf172b4edff7ae90041d707950121dd34 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 34% tp6_facebook_heavy summary windows10-64 pgo e10s 264.44 -> 354.00 30% tp6_facebook summary windows10-64 pgo e10s 264.19 -> 342.17 27% tp6_facebook_heavy summary linux64 pgo e10s 303.25 -> 385.88 23% tp6_facebook summary linux64 pgo 1_thread e10s 310.73 -> 383.42 12% tp6_facebook summary windows10-64 opt e10s 298.17 -> 333.25 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10647 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Updated

•

8 years ago

Component: Untriaged → JavaScript Engine: JIT

Product: Firefox → Core

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

8 years ago

:jandem It looks like bug 1415853 caused some serious performance regressions on our facebook tp6 tests. Can you please investigate this issue?

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Assignee

Comment 2

•

8 years ago

I'll take a look tomorrow. I wonder if it's spawning workers :/

Jan de Mooij [:jandem]

Assignee

Comment 3

•

8 years ago

I'll take a closer look tomorrow, but note that (some of) these graphs look very bimodal: https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland,20feb5a680732d65835c0e18ab64d717f1897bfb,0 Not sure yet what to make of this.

Jan de Mooij [:jandem]

Assignee

Comment 4

•

8 years ago

Joel, any thoughts on comment 3? I'm happy to investigate (I know it showed up on multiple platforms), but the graph looks super bimodal since the bytecode cache landed so I'm not sure if I'll find anything.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

8 years ago

bi-modal data is still useful for finding patterns, but it does require a bit more flexibility in considering a judgement call. In this case on mozilla-inbound we show a clear bimodal pattern, but then it becomes top heavy on November 21st instead of more evenly spread out. :igoldan, can you do two things: 1) look at a few data points prior to the regression (maybe 2 low ones and 2 high ones) and look at the raw values for facebook, possibly we are just getting one or more higher points to post high, but the average might be the same 2) look at 2 data points after the shift to high and see what the raw values are I used this series: https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland,20feb5a680732d65835c0e18ab64d717f1897bfb,0 If we find that we are just slightly off, then we are just straddling a bi-modal issue and it could be we are seeing a 1% regression instead of a 10%+ regression. I also thought this might be related to a weekends case where lower frequency of test running yields different results, but that doesn't seem to be the case- worth extra eyes to confirm. If we find that all the values are much higher, then I think we have reason to assume there is a regression worth investigating. It seems as though we keep getting regressions on facebook and not as much the other pages- possibly we need to look at why that is the case

Flags: needinfo?(jmaher) → needinfo?(igoldan)

Jan de Mooij [:jandem]

Assignee

Comment 6

•

8 years ago

OK so if I disable the bytecode cache, the regression seems to be more visible/clear on Try, probably because it gets rid of the bimodal results. I'll try bisecting the changes in this patch so we know better where the problem is.

Ionuț Goldan [:igoldan]

Reporter

Comment 7

•

8 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #5) > :igoldan, can you do two things: > 1) look at a few data points prior to the regression (maybe 2 low ones and 2 > high ones) and look at the raw values for facebook, possibly we are just > getting one or more higher points to post high, but the average might be the > same > 2) look at 2 data points after the shift to high and see what the raw values > are > > I used this series: > https://treeherder.mozilla.org/perf.html#/ > graphs?timerange=1209600&series=mozilla-inbound,1574798,1&series=autoland, > 20feb5a680732d65835c0e18ab64d717f1897bfb,0 I did multiple retriggers on pushes before and after. The regression point is pretty clear.

Flags: needinfo?(igoldan)

Jan de Mooij [:jandem]

Assignee

Comment 8

•

8 years ago

I tracked this down to a small part of the patch, fortunately it's not something fundamental. It still seems pretty random and weird but I think we can work around it. I'll do some more Try pushes to see if there's a better fix.

Jan de Mooij [:jandem]

Assignee

Comment 9

•

8 years ago

Attached patch Patch — Details — Splinter Review

In Baseline's TryAttachCallStub, we used to attach a stub when the callee has JIT code, but now with the interpreter stub we can attach a stub even if the callee doesn't have JIT code (yet). What seems to happen is that we now allocate more template objects (we allocate them when we attach an IC stub when we're constructing) and I think that triggers a GC somehow. Here's a workaround to attach a constructing stub only if the callee has JIT code. I hope this will fix the Talos regression.

Assignee: nobody → jdemooij

Status: NEW → ASSIGNED

Flags: needinfo?(jdemooij)

Attachment #8931660 - Flags: review?(bbouvier)

Carmen Fat [:cfat] - Ecosystem QA

Updated

•

8 years ago

status-firefox57: --- → unaffected

status-firefox58: --- → unaffected

status-firefox59: --- → affected

Benjamin Bouvier [:bbouvier] (inactive)

Comment 10

•

8 years ago

Comment on attachment 8931660 [details] [diff] [review] Patch Review of attachment 8931660 [details] [diff] [review]: ----------------------------------------------------------------- Hah, it's a bit sad, but if it fixes the regression, let's do it. Thanks.

Attachment #8931660 - Flags: review?(bbouvier) → review+

Pulsebot

Comment 11

•

8 years ago

Pushed by jandemooij@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0393a89f8a55 Don't attach a constructing stub if the callee doesn't have JIT code yet. r=bbouvier

Dorel Luca [:dluca]

Comment 12

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0393a89f8a55

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox59: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox-esr52: --- → unaffected

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

8 years ago

improvements are coming from here: == Change summary for alert #10785 (as of Tue, 28 Nov 2017 22:00:11 GMT) == Improvements: 25% tp6_facebook windows7-32 pgo 1_thread e10s 343.71 -> 259.38 24% tp6_facebook windows10-64 pgo e10s 343.33 -> 259.33 11% tp6_facebook linux64 opt 1_thread e10s 352.38 -> 313.46 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10785

You need to log in before you can comment on or make changes to this bug.

Bugzilla

11.77 - 33.87% tp6_facebook / tp6_facebook_heavy (linux64, windows10-64) regression on push b81d21aaf172b4edff7ae90041d707950121dd34 (Mon Nov 20 2017)

Categories

(Core :: JavaScript Engine: JIT, defect)

Tracking

()

People

(Reporter: igoldan, Assigned: jandem)

References

Details

(Keywords: perf, regression, talos-regression)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Attachment

General

Description

File Name

Content Type