Closed Bug 1392969 Opened 7 years ago Closed 7 years ago

23.35% tp6_facebook (windows10-64) regression on push 049ef1d4eb2690857f88b03ea7bfeeb130ea2834 (Tue Aug 22 2017)

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox56 --- unaffected
firefox57 --- disabled
firefox58 --- affected

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=049ef1d4eb2690857f88b03ea7bfeeb130ea2834 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 23% tp6_facebook summary windows10-64 pgo e10s 258.65 -> 319.04 Improvements: 29% tp6_facebook summary windows10-64 opt e10s stylo 432.38 -> 306.17 26% tp6_facebook summary windows10-64 opt 1_thread e10s stylo409.00 -> 302.04 19% tp6_facebook summary windows10-64 opt e10s 374.88 -> 304.67 15% tp6_google summary windows10-64 opt e10s 522.12 -> 441.79 15% tp6_google summary windows10-64 opt 1_thread e10s stylo539.46 -> 458.54 14% tp6_google summary windows10-64 opt e10s stylo 538.00 -> 461.75 14% tp6_facebook summary windows7-32 opt 1_thread e10s stylo382.50 -> 329.25 13% tp6_google summary windows10-64 pgo e10s 420.04 -> 364.29 13% tp6_google summary windows10-64 pgo e10s stylo 453.04 -> 392.96 13% tp6_google summary windows7-32 opt 1_thread e10s stylo545.12 -> 474.79 13% tp6_facebook summary windows7-32 opt e10s 388.02 -> 338.58 13% tp6_google summary windows7-32 opt e10s stylo 535.56 -> 468.12 12% tp6_google summary windows7-32 opt e10s 540.17 -> 473.29 12% tp6_google summary windows7-32 pgo 1_thread e10s stylo440.67 -> 389.50 11% tp6_google summary windows10-64 pgo 1_thread e10s stylo446.38 -> 396.79 10% tp6_google summary osx-10-10 opt e10s 496.19 -> 444.25 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=8921 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Even with all the huge improvements above, I still had to file this bug, due to the PGO regression noticed on Win10. It actually looks like a PGO build breakage, because the equivalent OPT build shows a 31% improvement. This bug may likely end with a WONTFIX or WORKSFORME, but I still need your thoughts on this. Nicolas, what do you think? Also, how likely is for these perf changes to be related with the other bug from the patch: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=dd0ce550bf64fa9f41c7f90329102d5c246ecaad&tochange=049ef1d4eb2690857f88b03ea7bfeeb130ea2834 namely, bug 1390863? I want to make sure I'm blocking the right bug.
Flags: needinfo?(nicolas.b.pierron)
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
The improvements are likely due to enabling the JSBC (Bug 900784) Can you look at the about:telemetry, in the histograms of the client process, which columns of the DOM_SCRIPT_ENCODING_STATUS histogram are raising after reloading the test case. They are not supposed to change at each reload, only once. If there is an encoding issue, this might be consuming more time as we keep trying to encode at each new load. I do not know how the same test case could improve on opt build and regress on pgo builds. Note that the encoding only happens when there is some idle time after the load end. Maybe the PGO build does not wait long enough for the idle callback to be triggered and the event got discarded, leaving the test case always encoding the bytecode instead of saving it. (see Bug 1372942)
Flags: needinfo?(nicolas.b.pierron)
when the root cause was backed out we saw a lot of regressions: == Change summary for alert #8949 (as of August 23 2017 10:51 UTC) == Regressions: 42% tp6_facebook summary windows10-64 opt e10s stylo 304.33 -> 431.67 35% tp6_facebook summary windows10-64 opt 1_thread e10s stylo306.08 -> 412.83 17% tp6_google summary windows10-64 opt 1_thread e10s stylo458.83 -> 537.42 17% tp6_google summary windows10-64 opt e10s 445.46 -> 520.62 15% tp6_google summary windows10-64 pgo e10s 365.17 -> 421.50 15% tp6_google summary windows7-32 opt e10s 472.42 -> 544.92 15% tp6_google summary windows10-64 opt e10s stylo 462.58 -> 532.83 14% tp6_google summary windows7-32 opt 1_thread e10s stylo476.46 -> 544.88 14% tp6_google summary windows7-32 pgo 1_thread e10s stylo388.79 -> 444.38 14% tp6_google summary windows7-32 pgo e10s 358.69 -> 409.17 14% tp6_google summary windows7-32 opt e10s stylo 471.79 -> 537.12 14% tp6_google summary windows10-64 pgo 1_thread e10s stylo395.96 -> 450.42 14% tp6_facebook summary windows7-32 opt e10s 338.81 -> 385.12 14% tp6_google summary windows7-32 pgo e10s stylo 383.75 -> 436.17 14% tp6_google summary windows10-64 pgo e10s stylo 397.04 -> 450.67 12% tp6_google summary osx-10-10 opt e10s 448.58 -> 504.17 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8949
I did some manual testing, both with Nightly and Beta on a Windows x64 computer. In both cases with a new profile with the Gecko Profiler add-on enabled and recording. Here are the Beta¹ profiles: http://bit.ly/2gkoYvU (without JSBC) http://bit.ly/2gkpNou (with JSBC) I do not have the Helper Thread profiles, but the overall timing confirms that we do save a bit of execution time, from 124ms to 114ms (under nsJSUtils::ExecutionContext), and from 641ms to 522ms between the first Navigation::Start and the Navigation::DOMComplete. Nightly results are showing similar improvements, when the JSBC (Bug 900784) is enabled. ¹ I have no idea how to check if these are pgo builds or not.
all beta builds are pgo. There are probably realistic scenarios here where the prefs we use and the hardware/os configuration all end up making results hard to replicate fully- in many cases we can, sometimes we see large differences.
(In reply to Nicolas B. Pierron [:nbp] from comment #4) > Here are the Beta¹ profiles: > […] > > ¹ I have no idea how to check if these are pgo builds or not. I can see a MOZ_PGO=1 in the Configure options under about:buildconfig. So, yes.
Priority: -- → P3
Hi Naveed, can you comment with the reason this is deferred from 57? This popped up on the Performance Report for 57 from Kevin and we'd just like it documented. Many thanks.
Flags: needinfo?(nihsanullah)
(In reply to Erin Lancaster [:elan] from comment #7) > Hi Naveed, can you comment with the reason this is deferred from 57? This > popped up on the Performance Report for 57 from Kevin and we'd just like it > documented. Many thanks. 2 reasons: - The JSBC got backed out, and would only be re-enabled in 58. - This is a false positive from Talos. (comment 4)
Flags: needinfo?(nihsanullah)
Updating the tracking flags to show that 57 has the feature disabled.
As this bug is blocking Bug 900784, and as it was shown to be a false positive (comment 4) I will close it. I guess we can open another bug if we care about fixing this false positive in Talos.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.