Closed Bug 1505271 Opened 10 months ago Closed 10 months ago

Stop creating 1 mach exception handler thread per JSRuntime

Categories

(Core :: Javascript: WebAssembly, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: luke, Assigned: luke)

Details

Attachments

(2 files, 2 obsolete files)

On OSX, we currently eagerly create one extra thread per JSRuntime (so one for the main-thread and one for each worker) that sits waiting for Mach exceptions from that JSRuntime's owner thread.  Instead, we should create at most one thread per process, and lazily, only when wasm is used.

The reason for the current eagerness is that, for a while, the signal handlers were used for all JS Ion code's async interrupts, requiring eager installation, but no longer: bug 864220.

The reason for the current auxiliary-thread-per-JSRuntime is that there wasn't, until recently, a good way to find the faulting JSContext from a process-wide handler thread.  Nowadays, wasm code preserves fp which allows, at valid wasm trap sites, the JSContext to be found via fp->tls->instance->realm->runtime->cx.

Having a single global handler simplifies a bunch of other things too, because you can just let it stay alive.
Attached patch use-fp-not-cxSplinter Review
This simple patch switches to fp as the source of truth.
Assignee: nobody → luke
Attachment #9023176 - Flags: review?(bbouvier)
Attached patch single-debug-thread (obsolete) — Splinter Review
This patch kills the thread-per-JSRuntime, making the debug thread lazily-created per-process.  ~100 lines of code net removed.
Attachment #9023177 - Flags: review?(bbouvier)
Comment on attachment 9023176 [details] [diff] [review]
use-fp-not-cx

Review of attachment 9023176 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.
Attachment #9023176 - Flags: review?(bbouvier) → review+
Comment on attachment 9023177 [details] [diff] [review]
single-debug-thread

Review of attachment 9023177 [details] [diff] [review]:
-----------------------------------------------------------------

Won't there be an issue if the port is partially initialized, and fails in the middle of initialization? There was code to cleanup this situation in uninstall() (and a ScopeExit in install() call this uninstall()), but it's been removed as far as I can tell.

::: js/src/wasm/WasmSignalHandlers.cpp
@@ +610,2 @@
>  {
> +    while(true) {

pre-existing: space between while and opening parenthesis

@@ +610,3 @@
>  {
> +    while(true) {
> +        kern_return_t kret;

nit: fuse declaration and definition one line below?

@@ +622,5 @@
>              MOZ_CRASH();
>          }
>  
> +        // Magic constant taken from mach_exc in /usr/include/mach/mach_exc.defs.
> +        if (request.body.Head.msgh_id != 2405) {

nit: prefer an ALL_UPPERCASE constant here for the msg id?

@@ +711,5 @@
>  
> +struct ProcessSignalHandlerState
> +{
> +    bool tried;
> +    bool have;

These names, albeit short, are not very eloquent; how about `triedSetup` and `areSetup` (or just `setup`)?

@@ +780,5 @@
> +    }
> +    kret = mach_port_insert_right(mach_task_self(), sMachDebugPort, sMachDebugPort,
> +                                  MACH_MSG_TYPE_MAKE_SEND);
> +    if (kret != KERN_SUCCESS) {
> +        return false;

What about all the cleanup code that was in uninstall() above? Seems it still should be run if anything goes wrong?

@@ +865,1 @@
>          return false;

Note the cleanup code would be needed here too!
Attachment #9023177 - Flags: review?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #4)
> Won't there be an issue if the port is partially initialized, and fails in
> the middle of initialization? There was code to cleanup this situation in
> uninstall() (and a ScopeExit in install() call this uninstall()), but it's
> been removed as far as I can tell.

The worst that can happen is that the partially-initialized port lives until the end of the process, which it was going to do anyway if the initialization fails.  In the original code, this thorough cleanup was necessary because this was per-JSRuntime, not per-process.  The rest of the per-process signal-handler initialization works this way, incidentally.  Notice also that ProcessSignalState::tried is eagerly set to true so if the init fails once, it won't try again so there is at most one port leaked.

> What about all the cleanup code that was in uninstall() above? Seems it
> still should be run if anything goes wrong?
...
> Note the cleanup code would be needed here too!

... thus I don't think any of this cleanup code is needed.
Attached patch single-debug-thread (obsolete) — Splinter Review
This addresses the comments and also changes the logic so that the non-OSX signal/exception handlers continue to be installed eagerly (by JS_Init()) and only the OSX exception-handler thread gets created lazily (for the reason commented above wasm::InitEagerProcessSignalHandlers()).

Note that the LazyInstallState field names remain short/terse because they are always read in the the context of `lazyInstallState->tried` which I think gives a pretty good explanation of the meaning.
Attachment #9023177 - Attachment is obsolete: true
Attachment #9023747 - Flags: review?(bbouvier)
Try showed that, in some configurations, breakpad signal handlers were getting installed after the wasm ones (which means breakpad was getting called first, triggering a crash, before wasm got a chance).  This updated patch goes back to the previous scheme of waiting to install the wasm signal handlers until the first JSRuntime/JSContext is created.
Attachment #9023747 - Attachment is obsolete: true
Attachment #9023747 - Flags: review?(bbouvier)
Attachment #9023820 - Flags: review?(bbouvier)
Comment on attachment 9023820 [details] [diff] [review]
single-debug-thread

Review of attachment 9023820 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thanks.

::: js/src/wasm/WasmSignalHandlers.cpp
@@ +883,5 @@
> +    // are also task-level (i.e. process-level) exception ports, those are
> +    // "claimed" by breakpad and chaining Mach exceptions is dark magic that we
> +    // avoid by instead intercepting exceptions at the thread level before they
> +    // propagate to the process-level. This works because there are no other
> +    // uses of thread-level exception ports.

Thanks, this comment explains the situation much better.

::: js/src/wasm/WasmSignalHandlers.h
@@ +45,5 @@
> +// Assuming EnsureEagerProcessSignalHandlers() has already been called,
> +// this function performs the full installation of signal handlers which must
> +// be performed per-thread/JSContext. This operation may incur some overhead and
> +// so should be done only when needed to use wasm. Currently, this is done in
> +// wasm::HasSupport() which is called when decideing whether to expose the

nit: "deciding", I think?
also nit: wasm::HasCompilerSupport()
Attachment #9023820 - Flags: review?(bbouvier) → review+
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/dfdd63aae0c5
Baldr: find the JSContext via fp in the signal handler (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/0abbc0f316d3
Baldr: only create one wasm exception handler thread per process, and lazily (r=bbouvier)
Backed out changeset 0abbc0f316d3 (Bug 1505271) for raptor perma failures CLOSED TREE 
Backed out changeset dfdd63aae0c5 (bug 1505271) for raptor perma failures CLOSED TREE
https://hg.mozilla.org/mozilla-central/rev/dfdd63aae0c5
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Oops, I guess these didn't run because I usually try with "-t none".

Joel: sorry for ni? out of the blue, and feel free to forward me to someone more appropriate, but two questions:
 1. I don't see raptor in mozilla-releng.net/trychooser; does it fall under one of the existing labels or is it missing?
 2. A symbolicated crash stack would be super-helpful (this looks like it might be a MOZ_CRASH()); is there any way to enable that for raptor runs?

Thanks!
Flags: needinfo?(luke) → needinfo?(jmaher)
Actually, n/m; I thought this might require some arcane build steps to repro, but actually it repros on any build and raptor was delightfully easy to run (`./mach raptor-test --test raptor-unity-webgl`, no other setup!).

The bug was that I added some MOZ_RELEASE(cx->wasmHasSignalHandlers)s at the last moment to asm.js paths, forgetting that asm.js doesn't ensure or require signal handlers (anymore).

(Still, the two feature requests in comment 13 might be nice for future devs.  I'd add a third too: it'd be useful to capture the crash string of MOZ_CRASH() if that isn't done by #2.)
Flags: needinfo?(jmaher)
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5e300fa7763d
Baldr: find the JSContext via fp in the signal handler (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/cbe3c9df4f9c
Baldr: only create one wasm exception handler thread per process, and lazily (r=bbouvier)
You need to log in before you can comment on or make changes to this bug.