Closed Bug 813559 Opened 12 years ago Closed 12 years ago

IonMonkey: enable off thread compilation by default

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: bhackett1024, Unassigned)

References

Details

Attachments

(5 files, 1 obsolete file)

Enough bugs that are wanted for off thread compilation to be an advantage are in the tree now, and it'd be good to enable it by default.  The main things this needs are (a) performance testing on the benchmarks / emscripten etc., and (b) fuzz coverage.
Off thread compilation can be fuzzed by compiling a thread safe build and running with --ion-parallel-compile=on.  This will only work on multicore machines, but you should get an immediate failure when trying to use this option on a single core machine.
I think we should have a way to force the number of threads that are used on the CLI, it seems like a good idea for testing in general.
Attached patch tweaks (bcbac30160a9) (obsolete) — Splinter Review
Fix a few small things:

- Adds a --thread-count=N per comment 2.

- Restricts things so that at most one off thread compilation can occur at once.  This should address a bogus reentrance assert decoder was hitting where multiple threads were simultaneously trying to read from the VM wrappers table.  It is, however, a change I wanted to make anyways (will avoid real races that could otherwise once we are running IonBuilder off thread).
Attachment #683997 - Flags: review?(dvander)
Attached patch patchSplinter Review
Oops, first patch had a third unwanted change.
Attachment #683997 - Attachment is obsolete: true
Attachment #683997 - Flags: review?(dvander)
Attachment #683999 - Flags: review?(dvander)
decoder was hitting a second assertion which seemed to indicate that GC marking was occurring from the compilation thread.  I think this is due to the read barriers when getting JSObjects and TypeObjects from the Types in TypeSets.  I don't know whether these are really necessary Ion, but they are for JM and rather than try to refine these barriers it seems simpler to just not do off thread compilation in the middle of an incremental GC.  Since Ion doesn't need to throw away code when the GC starts, most hot code should already have been compiled by the start of the IGC and this doesn't seem an onerous restriction.
Attachment #684012 - Flags: review?(dvander)
Attachment #683999 - Flags: review?(dvander) → review+
Attachment #684012 - Flags: review?(dvander) → review+
What were the results of the performance tests?
Ah I misread sorry, I thought that cset was enabling it by default
Attached file Crashing testcase
The attached test crashes on 32 bit threadsafe builds (m-c e685ffb06975), when run with options --ion-parallel-compile=on --thread-count=2 --ion-eager. Backtrace:

Program received signal SIGSEGV, Segmentation fault.
0xf7fcc7b2 in ?? ()
(gdb) bt 8
#0  0xf7fcc7b2 in ?? ()
#1  0x084bc7df in EnterIon (cx=0x891d000, fp=0xf6d00138, jitcode=0xf7fcc770) at /srv/repos/mozilla-central/js/src/ion/Ion.cpp:1479
#2  0x084bcaa5 in js::ion::Cannon (cx=0x891d000, fp=0xf6d00138) at /srv/repos/mozilla-central/js/src/ion/Ion.cpp:1517
#3  0x08185328 in js::Interpret (cx=0x891d000, entryFrame=0xf6d000e8, interpMode=js::JSINTERP_NORMAL) at /srv/repos/mozilla-central/js/src/jsinterp.cpp:2366
#4  0x0817cd24 in js::RunScript (cx=0x891d000, script=..., fp=0xf6d000e8) at /srv/repos/mozilla-central/js/src/jsinterp.cpp:326
#5  0x0817dc94 in js::ExecuteKernel (cx=0x891d000, script=..., scopeChain=..., thisv=..., type=js::EXECUTE_GLOBAL, evalInFrame=0x0, result=0xf6d000c0) at /srv/repos/mozilla-central/js/src/jsinterp.cpp:515
#6  0x0817def7 in js::Execute (cx=0x891d000, script=..., scopeChainArg=..., rval=0xf6d000c0) at /srv/repos/mozilla-central/js/src/jsinterp.cpp:553
#7  0x0809c5ef in JS_ExecuteScript (cx=0x891d000, objArg=0xf5a0b040, scriptArg=0xf5a1a780, rval=0xf6d000c0) at /srv/repos/mozilla-central/js/src/jsapi.cpp:5572
(More stack frames follow...)
(gdb) x /i $pc
=> 0xf7fcc7b2:  mov    (%ecx),%eax
(gdb) info reg ecx eax
ecx            0x0      0
eax            0x893a640        143894080
Patch for the testcase in comment 10.  Adding a type barrier or monitoring a bytecode in a script that is being compiled off thread would not properly cancel / throw away that compilation.
Attachment #684810 - Flags: review?(dvander)
Attachment #684810 - Flags: review?(dvander) → review+
The last patch seems to have resolved a whole load of other crashes too that I was seeing. I'm going to fuzz it a bit more now but so far I don't see new crashes.
OS: Mac OS X → All
Hardware: x86 → All
Version: Other Branch → Trunk
Blocks: 815199
No longer blocks: 815199
Depends on: 815199
Depends on: 815258
Depends on: 815906
Depends on: 817635
Off thread compilation seems to be stable now and I think it should be fine to enable it by default.  Also, a week or two back I tried this out in a Windows browser and saw similar behavior to a Mac threadsafe shell, 3-5% improvement on SS (browser SS is very noisy).
Attachment #690568 - Flags: review?(dvander)
Attachment #690568 - Flags: review?(dvander) → review+
This seems to have regressed Kraken:



Regression  Kraken Benchmark increase 6.14% on Win7 Mozilla-Inbound-Non-PGO
-----------------------------------------------------------------------------
    Previous: avg 3678.417 stddev 48.468 of 30 runs up to revision 057218b30dcb
    New     : avg 3904.320 stddev 19.020 of 5 runs since revision b851b1485547
    Change  : +225.903 (6.14% / z=4.661)
    Graph   : http://mzl.la/TQoTfy

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=057218b30dcb&tochange=b851b1485547

Changesets:
  * http://hg.mozilla.org/integration/mozilla-inbound/rev/7b3ffe203ebf
    : Theo Chevalier <theo.chevalier11@gmail.com> - Bug 737596 - Enable Telemetry by default on Nightly and Aurora channels (mobile). r=bnicholson
    : http://bugzilla.mozilla.org/show_bug.cgi?id=737596

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/ad491d4e39da
    : Theo Chevalier <theo.chevalier11@gmail.com> - Bug 725987 - Create Telemetry (opt-out) notification for Nightly and Aurora (mobile). r=bnicholson
    : http://bugzilla.mozilla.org/show_bug.cgi?id=725987

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/4c4bdc7fc6c6
    : Theo Chevalier <theo.chevalier11@gmail.com> - Bug 737600 - When telemetry is disabled in the pref pane, we should update toolkit.telemetry.rejected. r=bnicholson
    : http://bugzilla.mozilla.org/show_bug.cgi?id=737600

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/7271e935c397
    : Brian Hackett <bhackett1024@gmail.com> - Bug 813559 - Enable off thread Ion compilation by default, r=dvander.
    : http://bugzilla.mozilla.org/show_bug.cgi?id=813559

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/b851b1485547
    : Ehsan Akhgari <ehsan@mozilla.com> - Bug 820120 - Remove the browser.privatebrowsing.dont_prompt_on_enter pref in per-window private browsing builds; r=jdm
    : http://bugzilla.mozilla.org/show_bug.cgi?id=820120

Bugs:
  * http://bugzilla.mozilla.org/show_bug.cgi?id=737600 - When telemetry is disabled in the pref pane, we should update toolkit.telemetry.rejected
  * http://bugzilla.mozilla.org/show_bug.cgi?id=820120 - Remove the pref: 'browser.privatebrowsing.dont_prompt_on_enter'
  * http://bugzilla.mozilla.org/show_bug.cgi?id=725987 - Create Telemetry (opt-out) notification for Nightly and Aurora (mobile)
  * http://bugzilla.mozilla.org/show_bug.cgi?id=737596 - Enable Telemetry by default on Nightly and Aurora channels (mobile)
  * http://bugzilla.mozilla.org/show_bug.cgi?id=813559 - IonMonkey: enable off thread compilation by default
_______________________________________________
dev-tree-management mailing list
dev-tree-management@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tree-management
V8 too!



Regression  V8 version 7 decrease 5.24% on MacOSX 10.7 Mozilla-Inbound
------------------------------------------------------------------------
    Previous: avg 5886.904 stddev 78.551 of 30 runs up to revision 057218b30dcb
    New     : avg 5578.474 stddev 51.378 of 5 runs since revision b851b1485547
    Change  : -308.430 (5.24% / z=3.926)
    Graph   : http://mzl.la/TQsxpV

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=057218b30dcb&tochange=b851b1485547

Changesets:
  * http://hg.mozilla.org/integration/mozilla-inbound/rev/7b3ffe203ebf
    : Theo Chevalier <theo.chevalier11@gmail.com> - Bug 737596 - Enable Telemetry by default on Nightly and Aurora channels (mobile). r=bnicholson
    : http://bugzilla.mozilla.org/show_bug.cgi?id=737596

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/ad491d4e39da
    : Theo Chevalier <theo.chevalier11@gmail.com> - Bug 725987 - Create Telemetry (opt-out) notification for Nightly and Aurora (mobile). r=bnicholson
    : http://bugzilla.mozilla.org/show_bug.cgi?id=725987

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/4c4bdc7fc6c6
    : Theo Chevalier <theo.chevalier11@gmail.com> - Bug 737600 - When telemetry is disabled in the pref pane, we should update toolkit.telemetry.rejected. r=bnicholson
    : http://bugzilla.mozilla.org/show_bug.cgi?id=737600

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/7271e935c397
    : Brian Hackett <bhackett1024@gmail.com> - Bug 813559 - Enable off thread Ion compilation by default, r=dvander.
    : http://bugzilla.mozilla.org/show_bug.cgi?id=813559

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/b851b1485547
    : Ehsan Akhgari <ehsan@mozilla.com> - Bug 820120 - Remove the browser.privatebrowsing.dont_prompt_on_enter pref in per-window private browsing builds; r=jdm
    : http://bugzilla.mozilla.org/show_bug.cgi?id=820120

Bugs:
  * http://bugzilla.mozilla.org/show_bug.cgi?id=737600 - When telemetry is disabled in the pref pane, we should update toolkit.telemetry.rejected
  * http://bugzilla.mozilla.org/show_bug.cgi?id=820120 - Remove the pref: 'browser.privatebrowsing.dont_prompt_on_enter'
  * http://bugzilla.mozilla.org/show_bug.cgi?id=725987 - Create Telemetry (opt-out) notification for Nightly and Aurora (mobile)
  * http://bugzilla.mozilla.org/show_bug.cgi?id=737596 - Enable Telemetry by default on Nightly and Aurora channels (mobile)
  * http://bugzilla.mozilla.org/show_bug.cgi?id=813559 - IonMonkey: enable off thread compilation by default
_______________________________________________
dev-tree-management mailing list
dev-tree-management@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tree-management
I just ran kraken 5 times each with and without off thread compilation in a recent Windows 7 nightly.

with ion.parallel_compilation

1714
1697
1727
1757
1689

average 1717

without ion.parallel_compilation

1758
1728
1707
1715
1711

average 1724

So, if anything, this seems a tiny speedup for kraken.  I've gotten similar results testing opt threadsafe shells on my mac.
https://hg.mozilla.org/mozilla-central/rev/7271e935c397
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Well, looking at the graph URLs, both regressions seem quite real, so my guess is that your patch regresses something which your local tests do not examine.
I mean, I think this is at least worth investigating.  We could be regressing on machines with fewer number of cores for example.  It seems quite odd to me to let this regression slide in without at least understanding why it happens...
Well, a few things:

- My local tests were running the Kraken benchmark at krakenbenchmark.mozilla.org.  This is exactly what Talos should be measuring.

- This is really easy for anyone to test themselves; you don't need to compare different builds, just change the javascript.options.ion.parallel_compilation config option.  If someone finds a configuration where they *do* see a regression or other strange behavior, file it and we'll get it fixed.

- I have zero faith that Talos is a reliable tool for tracking JS benchmark performance.  I don't know what the problem is, but I've had a long history of seeing and investigating fake regressions reported on dev-tree-management.  Most recent is bug 811349.  Some real regressions too, but the signal/noise ratio is just too low.
(In reply to Brian Hackett (:bhackett) from comment #23)
> - I have zero faith that Talos is a reliable tool for tracking JS benchmark
> performance.  I don't know what the problem is, but I've had a long history
> of seeing and investigating fake regressions reported on
> dev-tree-management.  Most recent is bug 811349.  Some real regressions too,
> but the signal/noise ratio is just too low.

Yeah, indeed. In recent memory, Talos for JS has served much better at finding correctness issues than actual performance ones.
Btw, should we have seen a change on awfy with this landing? I don't think I see a change either up or down.

Regarding talos, I believe we use fairly old hardware for it for consistency? That's one possible reason for results on talos differing from real-world results I think (less modern CPU, fewer cores, etc.).
(In reply to Alon Zakai (:azakai) from comment #25)
> Btw, should we have seen a change on awfy with this landing? I don't think I
> see a change either up or down.

No, awfy runs non-threadsafe builds so doesn't use off thread compilation (or other multithreaded stuff like background finalization).
Oh ok, I see. Is that due to technical limitations, or intentional?
(In reply to Alon Zakai (:azakai) from comment #27)
> Oh ok, I see. Is that due to technical limitations, or intentional?

I think it's just inertia.  awfy's main use in the JS team is for tracking beenchmark regressions/improvements, and very few engine changes will differ in their effect between threadsafe and non-threadsafe builds.  This bug is an exception to that.
(In reply to Brian Hackett (:bhackett) from comment #23)
> - I have zero faith

I find your lack of faith disturbing

(Sorry, had to be done... ;-) http://www.youtube.com/watch?v=mI8GEidaMMI)
The Talos machines are using a certain configuration which is probably different to what our developers use, but calling that a non-real world hardware configuration is a stretch.  These are the same kind of machines that others buy and use as well.  It is quite conceivable for this patch to have changed things in a way that favor machines which can do more parallelism and hurt the ones that can't.  I haven't tested that theory of course, but I don't see any reason to believe that the regression that Talos is showing here is bogus, and the fact that someone cannot reproduce locally is not really indicating much, since they may use a hardware configuration that does not trigger the problem.

That all being said, if you guys really believe that Talos for js benchmarks is not useful, there's no point in running those tests, no matter which side of the argument is right, because the end result is that the results of those measurements are ignored.  I'd appreciate if you file a bug and ask for those tests to be turned off.  Thanks!
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #29)
> (In reply to Brian Hackett (:bhackett) from comment #23)
> > - I have zero faith
> 
> I find your lack of faith disturbing
> 
> (Sorry, had to be done... ;-) http://www.youtube.com/watch?v=mI8GEidaMMI)

(Note I agree with you fwiw)
(In reply to Brian Hackett (:bhackett) from comment #26)
> (In reply to Alon Zakai (:azakai) from comment #25)
> > Btw, should we have seen a change on awfy with this landing? I don't think I
> > see a change either up or down.
> 
> No, awfy runs non-threadsafe builds so doesn't use off thread compilation
> (or other multithreaded stuff like background finalization).

AWFY runs threadsafe builds.
Off topic, but wrt comment 23 and comment 30 , there is a considerable effort in improving Talos' signal from noise: https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise . You can read about what we've done so far here: https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise#Status

Not intended as a summary, but the way we currently do statistics *is* bad and noisy.  Most regression emails (which are not generated by Talos, fwiw) are incorrect.  You can read more about it at the above cited links. While we're working hard to get a replacement system in place, in the interim we have a few options:

- turn off Talos (and possibly other) perf testing knowing that they're wrong
- dealing with Talos/Graphserver results but taking them with a grain of salt knowing that they're wrong
- panic and keep beating this horse

We'd love any help with the SfN effort.  We know our results are neither statistically rigorous or heuristically accurate.
(In reply to David Anderson [:dvander] from comment #32)
> (In reply to Brian Hackett (:bhackett) from comment #26)
> > (In reply to Alon Zakai (:azakai) from comment #25)
> > > Btw, should we have seen a change on awfy with this landing? I don't think I
> > > see a change either up or down.
> > 
> > No, awfy runs non-threadsafe builds so doesn't use off thread compilation
> > (or other multithreaded stuff like background finalization).
> 
> AWFY runs threadsafe builds.

Hrmm... I lied. Yikes. AWFY can't re-autoconf so I must have messed up the last time the build scripts changed. I'll fix that now and make sure AWFY is testing with --ion-parallel-compile=on.
(In reply to comment #33)
> - turn off Talos (and possibly other) perf testing knowing that they're wrong
> - dealing with Talos/Graphserver results but taking them with a grain of salt
> knowing that they're wrong
> - panic and keep beating this horse

My point is that if we're going to ignore the results of the JS tests (v8 and Kraken) anyway, we might as well turn them off.  That will help people like me better focus on things such as Ts which are highly stable and directly corresponding to the performance metrics that users are likely to observe.

Anyway, we're getting very off-topic for this bug.  I've filed bug 820598 to turn off those two tests and CCed bhackett and dvander to give them a chance to object if they disagree.  Please let's continue the Talos related discussion there.
I ran kraken 5 times with and without the parallel compilation pref on, on my i7 laptop, latest nightly. I get a 1.5% slowdown when it is turned on.

I wouldn't be surprised if a change like this ends up a little faster in some cases and a little slower in others. The important thing is that it's an architectural change in the right direction IMO.
we can test this by running talos on try server with and without the preference.  I looked on my local build of inbound and didn't see ion.parallel_compilation.  Is this only set on a specific branch?  

I can either help you guys run with/without the setting on try server, or I can do it and point you to the try server runs to see the results.  I really need to know what to set and what the values should be or not be.
This appears to have caused bug 819329 to spike, which has resulted in lots of crashes on the latest Nightly.
Depends on: 819329
(In reply to Joel Maher (:jmaher) from comment #37)
> I can either help you guys run with/without the setting on try server, or I
> can do it and point you to the try server runs to see the results.  I really
> need to know what to set and what the values should be or not be.

That would be really helpful! The pref is "javascript.options.ion.parallel_compilation", or just a backout of 7271e935c397 works too.

I changed AWFY to use threadsafe builds, and it looks like this patch was about a 1-2% win on Kraken and a 3% win on SunSpider. V8 unchanged. That's still not a full browser build, but it's a good sign.
I'm seeing regressions on AWFY Kraken and SS, not wins.  Am I looking in the wrong place?
IIANM it looks like going threadsafe was a regression, then going to parallel compilation gave wins on sunspider and kraken.
> Created attachment 690568 [details] [diff] [review]
> enable off thread compilation by default

I fuzzed this for a bit on a 64-bit js threadsafe shell and didn't immediately find any large problems.
(In reply to David Anderson [:dvander] from comment #39)
> (In reply to Joel Maher (:jmaher) from comment #37)
> 
> That would be really helpful! The pref is
> "javascript.options.ion.parallel_compilation", or just a backout of
> 7271e935c397 works too.
> 
I looked on my local inbound build and I don't have javascript.options.ion.parallel_compilation in about:config.  What are the values for it?  True/False? 1/0?
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #43)
> > Created attachment 690568 [details] [diff] [review]
> > enable off thread compilation by default
> 
> I fuzzed this for a bit on a 64-bit js threadsafe shell and didn't
> immediately find any large problems.

Just to clarify, I ran this against --ion-parallel-compile=on and --thread-count=N where N was a random number between 2 and number of cores that appeared on my OS (8, for me, due to hyperthreadedness)

Jesse, if --thread-count=? was not specified, bhackett mentions that --ion-parallel-compile=on uses the # of cores on the machine - 1.
(In reply to Joel Maher (:jmaher) from comment #44)
> (In reply to David Anderson [:dvander] from comment #39)
> > (In reply to Joel Maher (:jmaher) from comment #37)
> > 
> > That would be really helpful! The pref is
> > "javascript.options.ion.parallel_compilation", or just a backout of
> > 7271e935c397 works too.
> > 
> I looked on my local inbound build and I don't have
> javascript.options.ion.parallel_compilation in about:config.  What are the
> values for it?  True/False? 1/0?

True/false - what revision is your inbound build?
Depends on: 822089
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: