1237993 - 2-9% Linux 64/Win*/MacOS* sessionrestore/v8_7 regression on Mozilla-Inbound-Non-PGO on January 07, 2016 from push a0c3bdd0559b936e5b2865783411231181c84d3c

Reporter

Description

•

10 years ago

Talos has detected a Firefox performance regression from your commit a0c3bdd0559b936e5b2865783411231181c84d3c in bug 1234880. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=a0c3bdd0559b936e5b2865783411231181c84d3c&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore and: https://wiki.mozilla.org/Buildbot/Talos/Tests#V8.2C_version_7 Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p linux64,win64,win32,macosx64 -u none -t dromaeojs,other # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a sessionrestore:v8_7 Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Tuesday, or the offending patch will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

10 years ago

looking in the original bug, this was called out as a performance regression: https://bugzilla.mozilla.org/show_bug.cgi?id=1207922#c21 We have a hit on session restore and for talos v8_7. Here are some graphs: v8 example: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,03709275d00df65103cd63ec371f699c68f62a5e,1] linux64 e10s session restore (the only session restore issue I see): https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,66dbde001b68b274cd7b350641f995cc06c6e1a9,1]&selected=[mozilla-inbound,66dbde001b68b274cd7b350641f995cc06c6e1a9,24845,19502535] a compare view will start to be more accurate when more data comes in: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4f4ef7073bb5&newProject=mozilla-inbound&newRevision=a0c3bdd0559b&framework=1&showOnlyConfident=1 :arai, can you comment about how much of this is expected and the plan to fix this going forward?

Flags: needinfo?(arai.unmht)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

10 years ago

looking in detail at the compare view: * the v8 stuff confirms the regexp regression from AWFY * the session restore regression is real and confirmed * dromaeo css has a slight regression on windows: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4f4ef7073bb5&newProject=mozilla-inbound&newRevision=a0c3bdd0559b&framework=1&filter=win%20dromaeo&showOnlyConfident=1 (in jquery and yui for winxp e10s, win7 e10s, winxp non-e10s) the rest of the data in there is just noise

Tooru Fujisawa [:arai]

Comment 3

•

10 years ago

The regression in v8_7 is more than expected. my performance comparison there did not cover well. I expected a kind of offset, and same performance in Ion execution, but in the case tested in v8_7, Ion execution also gets slower. Will investigate it.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

10 years ago

Thanks :arai! Lets fix what we can and understand what we leave in :)

Tooru Fujisawa [:arai]

Comment 5

•

10 years ago

my performance comparison in bug 1207922 was wrong, especially, I did 'r.lastIndex = 0' even with no-flags case, and when I remove it, the performance difference in ion-execution becomes big. Then, the performance regression in v8_7 comes from following: * additional function(s) between regexp.exec(string) and inlinable function * access to global and sticky flags Tested following 2 modification to unpatched tree: A. wrap regexp_exec with self-hosted JS (before+wrap) > function RegExp_prototype_exec(s) { > return callFunction(regexp_exec, this, s); > } B. wrap regexp_exec with self-hosted JS, and get global and sticky flags (before+wrap+flags) > function RegExp_prototype_exec(s) { > var global = this.global; > var sticky = this.sticky; > > return callFunction(regexp_exec, this, s); > } v8_7 score is following before: 5614 A.before+wrap: 4095 B.before+wrap+flags: 3705 after: 3424 So, most of the performance regression comes from A, and a little another regression from B. maybe we could optimize out the global/sticky flags access with almost same way as we're going to use in bug 887016. till, how do you think we can address the regression from A?

Flags: needinfo?(till)

Tooru Fujisawa [:arai]

Comment 6

•

10 years ago

Sorry, I got it. it's because we used inlined code for RegExp.prototype.test when the return value is not used. it means we did not create match object. it boosts the performance there before bug 1207922. https://hg.mozilla.org/mozilla-central/file/66bf206c4882/js/src/jit/MCallOptimize.cpp#l180 > case InlinableNative::RegExpExec: > return CallResultEscapes(pc) ? inlineRegExpExec(callInfo) : inlineRegExpTest(callInfo); and all .exec result is not used in v8_7 regexp.js :| https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/v8_7/regexp.js?case=true&from=v8_7%2Fregexp.js#122 > for (var i = 0; i < 6511; i++) { > re0.exec(s0[i]); > } When I remove inlineRegExpTest branch there, v8 score becomes 4260. > case InlinableNative::RegExpExec: > return inlineRegExpExec(callInfo); I'll try figuring out how/if we could apply similar optimization even with self-hosted JS. Also, might be better looking into session restore first. btw, is it common to call .exec without using the return value? maybe they use RegExp.$1 instead?

Till Schneidereit [:till]

Comment 7

•

10 years ago

(In reply to Tooru Fujisawa [:arai] from comment #6) > Sorry, I got it. > it's because we used inlined code for RegExp.prototype.test when the return > value is not used. > it means we did not create match object. it boosts the performance there > before bug 1207922. > [..] > and all .exec result is not used in v8_7 regexp.js :| > [..] > When I remove inlineRegExpTest branch there, v8 score becomes 4260. The reason we added this optimization is exactly this benchmark :( However, v8_7 is a really old version of the benchmark, newer versions do use the result. I don't think we should back out a change because it regresses this particular test. In fact, I think we should remove the test from our infrastructure and not spend any time on worrying about it. Then again, badly written benchmarks pop up all the time, e.g. on sites like jsbench.com, where it is, imo, way too easy to very quickly write seemingly sensible benchmarks, that really don't test anything useful because modern JS engines are way more complex than most developers assume. (To be clear, that's not their fault, and they shouldn't need to know better.) > I'll try figuring out how/if we could apply similar optimization even with > self-hosted JS. Mainly because of the benchmarking situation, I think having something like that would be really useful, as long as it's done within a reasonable complexity budget. Something along the lines of a _ReturnValueIsUsed intrinsic. I can't really comment on how hard that'd be to implement, though, and I don't think it's of the highest priority. > Also, might be better looking into session restore first. Yes, that regression is what we should worry about. I don't know much about that test, so for all I know it might be equally invalid, but it seems likely that it's more real-world applicable. > btw, is it common to call .exec without using the return value? maybe they > use RegExp.$1 instead? No, this is just a shoddy benchmark :( The TL;DR of all this is that I think the v8_7 and SpiderMonkey regressions should be WONTFIX, but that the session restore needs investigation. NI? jandem because none of this is really my call.

Flags: needinfo?(till) → needinfo?(jdemooij)

Tooru Fujisawa [:arai]

Comment 8

•

10 years ago

while running talos sessionrestore locally, RegExp.prototype.exec() is called 20 times, and RegExp.prototype.test() is called 309 times, so all those call do not use Ion, however, it's strange that only 329 calls take extra 50ms. am I reading the data correctly? it means that the time taken by the test's single run increases from 2200ms to 2250ms, right? https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,66dbde001b68b274cd7b350641f995cc06c6e1a9,1]

Till Schneidereit [:till]

Comment 9

•

10 years ago

(In reply to Tooru Fujisawa [:arai] from comment #8) > while running talos sessionrestore locally, RegExp.prototype.exec() is > called 20 times, and RegExp.prototype.test() is called 309 times, so all > those call do not use Ion, however, it's strange that only 329 calls take > extra 50ms. > > am I reading the data correctly? it means that the time taken by the test's > single run increases from 2200ms to 2250ms, right? Yes. And honestly, I have a hard time believing that this is a regression that has its direct reason in the RegExp changes. If you look at the graphs for the same test on other platforms, and for the same test in non-e10s, and on the same platform with pgo[1], with or without e10s: none of all those are affected. What seems most likely is that the code changes cause some perturbation in what the compiler does with the code, in a way that affects only e10s. Or something like that. [1] which is insanely helpful for this particular test. So much so that I think we should look into why that is and if we can reap some of the benefits without pgo, too.

Tooru Fujisawa [:arai]

Comment 10

•

10 years ago

The sessionrestore's performance regression reproduced locally with opt build on linux 64bit, with e10s enabled. the time increases about 40ms (2140 -> 2180), by applying the 3 changesets in bug 1207922. indeed, it does not happen on non-e10s run.

Tooru Fujisawa [:arai]

Comment 11

•

10 years ago

Attached image talos result comparoson (obsolete) — Details

locally, wrapping RegExp.prototype.exec/test with simple self-hosted function increases most of the time (about 30ms of 40ms total). added following function to before bug 1207922, and changed to use them instead of calling native function directly (D) > function RegExp_Exec(s) { > return callFunction(regexp_exec, this, s); > } > function RegExp_Test(s) { > return callFunction(regexp_test, this, s); > } especially, wrapping exec (B) increases most of the time. applying whole changes to RegExp.js (E) also increases a little from (D). this time CallResultEscapes is not relevant. removing it (G) does not increase the time.

Tooru Fujisawa [:arai]

Comment 12

•

10 years ago

RegExp.prototype.exec is called in following places: * about:neterror:324 [1] * about:neterror:326 [2] * dist/bin/browser/components/WebContentConverter.js:790 [3] * resource://gre/modules/Task.jsm:120 [4] Will try replacing them with String#match, to see what happens. [1] https://dxr.mozilla.org/mozilla-central/rev/c33f30666b37dbceffb9fbe5089a668db8893a85/browser/base/content/aboutNetError.xhtml#324 [2] https://dxr.mozilla.org/mozilla-central/rev/c33f30666b37dbceffb9fbe5089a668db8893a85/browser/base/content/aboutNetError.xhtml#326 [3] https://dxr.mozilla.org/mozilla-central/rev/c33f30666b37dbceffb9fbe5089a668db8893a85/browser/components/feeds/WebContentConverter.js#790 [4] https://dxr.mozilla.org/mozilla-central/rev/c33f30666b37dbceffb9fbe5089a668db8893a85/toolkit/modules/Task.jsm#120

Tooru Fujisawa [:arai]

Comment 13

•

10 years ago

same patch as D in comment #11 shows different result on try. https://treeherder.mozilla.org/#/jobs?repo=try&revision=31fb34652d1c&selectedJob=15264891 > sessionrestore: 2173 and from log: > 2477.0, 2207.0, 2186.0, 2171.0, 2180.0, 2173.0, 2152.0, 2187.0, 2162.0, 2167.0 they don't show any regression. so, it might be more environment specific issue. anyway, I will have to test everything on try server... :(

Tooru Fujisawa [:arai]

Comment 14

•

10 years ago

Attached image talos result comparoson — Details

so far, applying changes for RegExp.js increases the time (C). even only with exec (J) or test(I). Also, applying changes to irregexp does not change the performance (F)

Attachment #8706097 - Attachment is obsolete: true

Hannes Verschore [:h4writer]

Comment 15

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #0) > Making a decision: > As the patch author we need your feedback to help us handle this regression. > *** Please let us know your plans by Tuesday, or the offending patch will be > backed out! *** - v8_7: this benchmark is outdated and we had a optimization got triggered on v8-regexp. In octane, rightfully, the code has been changed that 'disables' this optimization. That optimization doesn't seem to give a lot of benefit anymore and is more technical debt. - ss: Is a real regression and also noticed in the original landing bug. We decided against blocking that to land this. The issue is that our Interpreter/Baseline will be a little bit slower, in order to facilitate faster IonMonkey speed. Which is our high tier compiler and generates the fastest code. Now SS is an old benchmark which was created when performance in JS didn't exist and started the performance rollercoaster we have now. Kraken and Octane were designed to specifically fix the faults SS has as a benchmark. The regression in SS is such an issue, where the code doesn't run enough to be considered important/hot (to be handled in IonMonkey). - linux64 e10s session restore: here we debated a little bit about. But it seems linux64 only? Which would be surprising given the change we introduced. We ruled against blocking because of this, since only non-pgo linux64 is affected. We assume this is timing issue, not a code issue. If other platforms would be affected we might have ruled different. This change improves code readability/maintainibility a lot. It also increases our performance on searches using "/g". Given the above statements: me and till would be against backing this out, even with the given regressions.

Flags: needinfo?(jdemooij)

Flags: needinfo?(arai.unmht)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 16

•

10 years ago

sounds great! It looks like :arai is still doing some hunting on the sessionrestore issue, thanks for working on this.

Tooru Fujisawa [:arai]

Comment 17

•

10 years ago

Attached image another talos result comparoson — Details

got some interesting result. applying same patch (wrap exec/test with self-hosted) introduces different result for following 2 cases: * before bug 1207922 (m-c 0876b808c677) + wrap exec/test [1] it does not increase the time * after bug 1207922 (m-c 6020a4cb41a7) + backout bug 1207922 + wrap exec/test [2] it increases the time so, there should be another reason than bug 1207922's direct impact, and backing it out won't help much, it should happen again near future with same underlying reason. I could keep bisecting tho, does it worth consuming server resource? if not, I'd like to close this as WONTFIX. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=31fb34652d1c [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=b40999be2334

Hannes Verschore [:h4writer]

Comment 18

•

10 years ago

Oh interesting. Maybe it is caused by the inlining heuristics in that case. In any case we can better divert time towards improvements instead of investigating here.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 19

•

10 years ago

thanks again for looking into this!

Lawrence Mandel [:lmandel] (use needinfo)

Updated

•

10 years ago

status-firefox46: affected → wontfix

talos result comparoson 10 years ago Tooru Fujisawa [:arai] 9.30 KB, image/png		Details
talos result comparoson 10 years ago Tooru Fujisawa [:arai] 11.17 KB, image/png		Details
another talos result comparoson 10 years ago Tooru Fujisawa [:arai] 8.70 KB, image/png		Details