Closed Bug 1237993 Opened 8 years ago Closed 8 years ago

2-9% Linux 64/Win*/MacOS* sessionrestore/v8_7 regression on Mozilla-Inbound-Non-PGO on January 07, 2016 from push a0c3bdd0559b936e5b2865783411231181c84d3c

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox46 --- wontfix

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(2 files, 1 obsolete file)

Talos has detected a Firefox performance regression from your commit a0c3bdd0559b936e5b2865783411231181c84d3c in bug 1234880.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=a0c3bdd0559b936e5b2865783411231181c84d3c&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore

and:
https://wiki.mozilla.org/Buildbot/Talos/Tests#V8.2C_version_7


Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64,win64,win32,macosx64 -u none -t dromaeojs,other  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a sessionrestore:v8_7

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Tuesday, or the offending patch will be backed out! ***

Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
looking in the original bug, this was called out as a performance regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=1207922#c21

We have a hit on session restore and for talos v8_7.  Here are some graphs:
v8 example:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,03709275d00df65103cd63ec371f699c68f62a5e,1]

linux64 e10s session restore (the only session restore issue I see):
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,66dbde001b68b274cd7b350641f995cc06c6e1a9,1]&selected=[mozilla-inbound,66dbde001b68b274cd7b350641f995cc06c6e1a9,24845,19502535]


a compare view will start to be more accurate when more data comes in:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4f4ef7073bb5&newProject=mozilla-inbound&newRevision=a0c3bdd0559b&framework=1&showOnlyConfident=1

:arai, can you comment about how much of this is expected and the plan to fix this going forward?
Flags: needinfo?(arai.unmht)
looking in detail at the compare view:
* the v8 stuff confirms the regexp regression from AWFY
* the session restore regression is real and confirmed 
* dromaeo css has a slight regression on windows: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4f4ef7073bb5&newProject=mozilla-inbound&newRevision=a0c3bdd0559b&framework=1&filter=win%20dromaeo&showOnlyConfident=1 (in jquery and yui for winxp e10s, win7 e10s, winxp non-e10s)

the rest of the data in there is just noise
The regression in v8_7 is more than expected.  my performance comparison there did not cover well.
I expected a kind of offset, and same performance in Ion execution, but in the case tested in v8_7, Ion execution also gets slower.
Will investigate it.
Thanks :arai!  Lets fix what we can and understand what we leave in :)
my performance comparison in bug 1207922 was wrong, especially, I did 'r.lastIndex = 0' even with no-flags case, and when I remove it, the performance difference in ion-execution becomes big.


Then, the performance regression in v8_7 comes from following:
  * additional function(s) between regexp.exec(string) and inlinable function
  * access to global and sticky flags

Tested following 2 modification to unpatched tree:

A. wrap regexp_exec with self-hosted JS (before+wrap)

> function RegExp_prototype_exec(s) {
>     return callFunction(regexp_exec, this, s);
> }


B. wrap regexp_exec with self-hosted JS, and get global and sticky flags (before+wrap+flags)

> function RegExp_prototype_exec(s) {
>     var global = this.global;
>     var sticky = this.sticky;
> 
>     return callFunction(regexp_exec, this, s);
> }


v8_7 score is following

  before:                5614
  A.before+wrap:         4095
  B.before+wrap+flags:   3705
  after:                 3424

So, most of the performance regression comes from A, and a little another regression from B.
maybe we could optimize out the global/sticky flags access with almost same way as we're going to use in bug 887016.

till, how do you think we can address the regression from A?
Flags: needinfo?(till)
Sorry, I got it.
it's because we used inlined code for RegExp.prototype.test when the return value is not used.
it means we did not create match object.  it boosts the performance there before bug 1207922.

https://hg.mozilla.org/mozilla-central/file/66bf206c4882/js/src/jit/MCallOptimize.cpp#l180
>       case InlinableNative::RegExpExec:
>         return CallResultEscapes(pc) ? inlineRegExpExec(callInfo) : inlineRegExpTest(callInfo);

and all .exec result is not used in v8_7 regexp.js  :|
https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/tests/v8_7/regexp.js?case=true&from=v8_7%2Fregexp.js#122
>     for (var i = 0; i < 6511; i++) {
>       re0.exec(s0[i]);
>     }

When I remove inlineRegExpTest branch there, v8 score becomes 4260.
>       case InlinableNative::RegExpExec:
>         return inlineRegExpExec(callInfo);

I'll try figuring out how/if we could apply similar optimization even with self-hosted JS.
Also, might be better looking into session restore first.


btw, is it common to call .exec without using the return value?  maybe they use RegExp.$1 instead?
(In reply to Tooru Fujisawa [:arai] from comment #6)
> Sorry, I got it.
> it's because we used inlined code for RegExp.prototype.test when the return
> value is not used.
> it means we did not create match object.  it boosts the performance there
> before bug 1207922.
> [..]
> and all .exec result is not used in v8_7 regexp.js  :|
> [..]
> When I remove inlineRegExpTest branch there, v8 score becomes 4260.

The reason we added this optimization is exactly this benchmark :( However, v8_7 is a really old version of the benchmark, newer versions do use the result. I don't think we should back out a change because it regresses this particular test. In fact, I think we should remove the test from our infrastructure and not spend any time on worrying about it.

Then again, badly written benchmarks pop up all the time, e.g. on sites like jsbench.com, where it is, imo, way too easy to very quickly write seemingly sensible benchmarks, that really don't test anything useful because modern JS engines are way more complex than most developers assume. (To be clear, that's not their fault, and they shouldn't need to know better.)

> I'll try figuring out how/if we could apply similar optimization even with
> self-hosted JS.

Mainly because of the benchmarking situation, I think having something like that would be really useful, as long as it's done within a reasonable complexity budget. Something along the lines of a _ReturnValueIsUsed intrinsic. I can't really comment on how hard that'd be to implement, though, and I don't think it's of the highest priority.

> Also, might be better looking into session restore first.

Yes, that regression is what we should worry about. I don't know much about that test, so for all I know it might be equally invalid, but it seems likely that it's more real-world applicable.

> btw, is it common to call .exec without using the return value?  maybe they
> use RegExp.$1 instead?

No, this is just a shoddy benchmark :(


The TL;DR of all this is that I think the v8_7 and SpiderMonkey regressions should be WONTFIX, but that the session restore needs investigation.

NI? jandem because none of this is really my call.
Flags: needinfo?(till) → needinfo?(jdemooij)
while running talos sessionrestore locally, RegExp.prototype.exec() is called 20 times, and RegExp.prototype.test() is called 309 times, so all those call do not use Ion, however, it's strange that only 329 calls take extra 50ms.

am I reading the data correctly?  it means that the time taken by the test's single run increases from 2200ms to 2250ms, right?
  https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,66dbde001b68b274cd7b350641f995cc06c6e1a9,1]
(In reply to Tooru Fujisawa [:arai] from comment #8)
> while running talos sessionrestore locally, RegExp.prototype.exec() is
> called 20 times, and RegExp.prototype.test() is called 309 times, so all
> those call do not use Ion, however, it's strange that only 329 calls take
> extra 50ms.
> 
> am I reading the data correctly?  it means that the time taken by the test's
> single run increases from 2200ms to 2250ms, right?

Yes. And honestly, I have a hard time believing that this is a regression that has its direct reason in the RegExp changes. If you look at the graphs for the same test on other platforms, and for the same test in non-e10s, and on the same platform with pgo[1], with or without e10s: none of all those are affected. What seems most likely is that the code changes cause some perturbation in what the compiler does with the code, in a way that affects only e10s. Or something like that.


[1] which is insanely helpful for this particular test. So much so that I think we should look into why that is and if we can reap some of the benefits without pgo, too.
The sessionrestore's performance regression reproduced locally with opt build on linux 64bit, with e10s enabled.  the time increases about 40ms (2140 -> 2180), by applying the 3 changesets in bug 1207922. 
indeed, it does not happen on non-e10s run.
Attached image talos result comparoson (obsolete) —
locally, wrapping RegExp.prototype.exec/test with simple self-hosted function increases most of the time (about 30ms of 40ms total).

added following function to before bug 1207922, and changed to use them instead of calling native function directly (D)

> function RegExp_Exec(s) {
>     return callFunction(regexp_exec, this, s);
> }
> function RegExp_Test(s) {
>     return callFunction(regexp_test, this, s);
> }

especially, wrapping exec (B) increases most of the time.

applying whole changes to RegExp.js (E) also increases a little from (D).

this time CallResultEscapes is not relevant.  removing it (G) does not increase the time.
same patch as D in comment #11 shows different result on try.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=31fb34652d1c&selectedJob=15264891
> sessionrestore: 2173
and from log:
> 2477.0, 2207.0, 2186.0, 2171.0, 2180.0, 2173.0, 2152.0, 2187.0, 2162.0, 2167.0

they don't show any regression.
so, it might be more environment specific issue.

anyway, I will have to test everything on try server... :(
Attached image talos result comparoson
so far, applying changes for RegExp.js increases the time (C).
even only with exec (J) or test(I).

Also, applying changes to irregexp does not change the performance (F)
Attachment #8706097 - Attachment is obsolete: true
(In reply to Joel Maher (:jmaher) from comment #0)
> Making a decision:
> As the patch author we need your feedback to help us handle this regression.
> *** Please let us know your plans by Tuesday, or the offending patch will be
> backed out! ***

- v8_7: this benchmark is outdated and we had a optimization got triggered on v8-regexp. In octane, rightfully, the code has been changed that 'disables' this optimization. That optimization doesn't seem to give a lot of benefit anymore and is more technical debt.
- ss: Is a real regression and also noticed in the original landing bug. We decided against blocking that to land this. The issue is that our Interpreter/Baseline will be a little bit slower, in order to facilitate faster IonMonkey speed. Which is our high tier compiler and generates the fastest code. Now SS is an old benchmark which was created when performance in JS didn't exist and started the performance rollercoaster we have now. Kraken and Octane were designed to specifically fix the faults SS has as a benchmark. The regression in SS is such an issue, where the code doesn't run enough to be considered important/hot (to be handled in IonMonkey).
- linux64 e10s session restore: here we debated a little bit about. But it seems linux64 only? Which would be surprising given the change we introduced. We ruled against blocking because of this, since only non-pgo linux64 is affected. We assume this is timing issue, not a code issue. If other platforms would be affected we might have ruled different.

This change improves code readability/maintainibility a lot. It also increases our performance on searches using "/g".

Given the above statements: me and till would be against backing this out, even with the given regressions.
Flags: needinfo?(jdemooij)
Flags: needinfo?(arai.unmht)
sounds great!  It looks like :arai is still doing some hunting on the sessionrestore issue, thanks for working on this.
got some interesting result.

applying same patch (wrap exec/test with self-hosted) introduces different result for following 2 cases:
  * before bug 1207922 (m-c 0876b808c677) + wrap exec/test [1]
    it does not increase the time
  * after bug 1207922 (m-c 6020a4cb41a7) + backout bug 1207922 + wrap exec/test [2]
    it increases the time

so, there should be another reason than bug 1207922's direct impact, and backing it out won't help much, it should happen again near future with same underlying reason.

I could keep bisecting tho, does it worth consuming server resource?
if not, I'd like to close this as WONTFIX.


[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=31fb34652d1c
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=b40999be2334
Oh interesting. Maybe it is caused by the inlining heuristics in that case. In any case we can better divert time towards improvements instead of investigating here.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
thanks again for looking into this!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: