Closed Bug 1505271 Opened 6 years ago Closed 6 years ago

Stop creating 1 mach exception handler thread per JSRuntime

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla65

Tracking Flags:

Tracking

Status

firefox65

---

fixed

People

(Reporter: luke, Assigned: luke)

Details

Attachments

(2 files, 2 obsolete files)

use-fp-not-cx 6 years ago Luke Wagner [:luke] 2.96 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
single-debug-thread 6 years ago Luke Wagner [:luke] 20.16 KB, patch		Details \| Diff \| Splinter Review
single-debug-thread 6 years ago Luke Wagner [:luke] 33.04 KB, patch		Details \| Diff \| Splinter Review
single-debug-thread 6 years ago Luke Wagner [:luke] 33.36 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review

Luke Wagner [:luke]

Assignee

Description

•

6 years ago

On OSX, we currently eagerly create one extra thread per JSRuntime (so one for the main-thread and one for each worker) that sits waiting for Mach exceptions from that JSRuntime's owner thread. Instead, we should create at most one thread per process, and lazily, only when wasm is used. The reason for the current eagerness is that, for a while, the signal handlers were used for all JS Ion code's async interrupts, requiring eager installation, but no longer: bug 864220. The reason for the current auxiliary-thread-per-JSRuntime is that there wasn't, until recently, a good way to find the faulting JSContext from a process-wide handler thread. Nowadays, wasm code preserves fp which allows, at valid wasm trap sites, the JSContext to be found via fp->tls->instance->realm->runtime->cx. Having a single global handler simplifies a bunch of other things too, because you can just let it stay alive.

Luke Wagner [:luke]

Assignee

Comment 1

•

6 years ago

Attached patch use-fp-not-cx — Details — Splinter Review

This simple patch switches to fp as the source of truth.

Assignee: nobody → luke

Attachment #9023176 - Flags: review?(bbouvier)

Luke Wagner [:luke]

Assignee

Comment 2

•

6 years ago

Attached patch single-debug-thread (obsolete) — Details — Splinter Review

This patch kills the thread-per-JSRuntime, making the debug thread lazily-created per-process. ~100 lines of code net removed.

Attachment #9023177 - Flags: review?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Comment 3

•

6 years ago

Comment on attachment 9023176 [details] [diff] [review] use-fp-not-cx Review of attachment 9023176 [details] [diff] [review]: ----------------------------------------------------------------- Nice.

Attachment #9023176 - Flags: review?(bbouvier) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Comment 4

•

6 years ago

Comment on attachment 9023177 [details] [diff] [review] single-debug-thread Review of attachment 9023177 [details] [diff] [review]: ----------------------------------------------------------------- Won't there be an issue if the port is partially initialized, and fails in the middle of initialization? There was code to cleanup this situation in uninstall() (and a ScopeExit in install() call this uninstall()), but it's been removed as far as I can tell. ::: js/src/wasm/WasmSignalHandlers.cpp @@ +610,2 @@ > { > + while(true) { pre-existing: space between while and opening parenthesis @@ +610,3 @@ > { > + while(true) { > + kern_return_t kret; nit: fuse declaration and definition one line below? @@ +622,5 @@ > MOZ_CRASH(); > } > > + // Magic constant taken from mach_exc in /usr/include/mach/mach_exc.defs. > + if (request.body.Head.msgh_id != 2405) { nit: prefer an ALL_UPPERCASE constant here for the msg id? @@ +711,5 @@ > > +struct ProcessSignalHandlerState > +{ > + bool tried; > + bool have; These names, albeit short, are not very eloquent; how about `triedSetup` and `areSetup` (or just `setup`)? @@ +780,5 @@ > + } > + kret = mach_port_insert_right(mach_task_self(), sMachDebugPort, sMachDebugPort, > + MACH_MSG_TYPE_MAKE_SEND); > + if (kret != KERN_SUCCESS) { > + return false; What about all the cleanup code that was in uninstall() above? Seems it still should be run if anything goes wrong? @@ +865,1 @@ > return false; Note the cleanup code would be needed here too!

Attachment #9023177 - Flags: review?(bbouvier)

Luke Wagner [:luke]

Assignee

Comment 5

•

6 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #4) > Won't there be an issue if the port is partially initialized, and fails in > the middle of initialization? There was code to cleanup this situation in > uninstall() (and a ScopeExit in install() call this uninstall()), but it's > been removed as far as I can tell. The worst that can happen is that the partially-initialized port lives until the end of the process, which it was going to do anyway if the initialization fails. In the original code, this thorough cleanup was necessary because this was per-JSRuntime, not per-process. The rest of the per-process signal-handler initialization works this way, incidentally. Notice also that ProcessSignalState::tried is eagerly set to true so if the init fails once, it won't try again so there is at most one port leaked. > What about all the cleanup code that was in uninstall() above? Seems it > still should be run if anything goes wrong? ... > Note the cleanup code would be needed here too! ... thus I don't think any of this cleanup code is needed.

Luke Wagner [:luke]

Assignee

Comment 6

•

6 years ago

Attached patch single-debug-thread (obsolete) — Details — Splinter Review

This addresses the comments and also changes the logic so that the non-OSX signal/exception handlers continue to be installed eagerly (by JS_Init()) and only the OSX exception-handler thread gets created lazily (for the reason commented above wasm::InitEagerProcessSignalHandlers()). Note that the LazyInstallState field names remain short/terse because they are always read in the the context of `lazyInstallState->tried` which I think gives a pretty good explanation of the meaning.

Attachment #9023177 - Attachment is obsolete: true

Attachment #9023747 - Flags: review?(bbouvier)

Luke Wagner [:luke]

Assignee

Comment 7

•

6 years ago

Attached patch single-debug-thread — Details — Splinter Review

Try showed that, in some configurations, breakpad signal handlers were getting installed after the wasm ones (which means breakpad was getting called first, triggering a crash, before wasm got a chance). This updated patch goes back to the previous scheme of waiting to install the wasm signal handlers until the first JSRuntime/JSContext is created.

Attachment #9023747 - Attachment is obsolete: true

Attachment #9023747 - Flags: review?(bbouvier)

Attachment #9023820 - Flags: review?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Comment 8

•

6 years ago

Comment on attachment 9023820 [details] [diff] [review] single-debug-thread Review of attachment 9023820 [details] [diff] [review]: ----------------------------------------------------------------- Nice, thanks. ::: js/src/wasm/WasmSignalHandlers.cpp @@ +883,5 @@ > + // are also task-level (i.e. process-level) exception ports, those are > + // "claimed" by breakpad and chaining Mach exceptions is dark magic that we > + // avoid by instead intercepting exceptions at the thread level before they > + // propagate to the process-level. This works because there are no other > + // uses of thread-level exception ports. Thanks, this comment explains the situation much better. ::: js/src/wasm/WasmSignalHandlers.h @@ +45,5 @@ > +// Assuming EnsureEagerProcessSignalHandlers() has already been called, > +// this function performs the full installation of signal handlers which must > +// be performed per-thread/JSContext. This operation may incur some overhead and > +// so should be done only when needed to use wasm. Currently, this is done in > +// wasm::HasSupport() which is called when decideing whether to expose the nit: "deciding", I think? also nit: wasm::HasCompilerSupport()

Attachment #9023820 - Flags: review?(bbouvier) → review+

Pulsebot

Comment 9

•

6 years ago

Pushed by lwagner@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/dfdd63aae0c5 Baldr: find the JSContext via fp in the signal handler (r=bbouvier) https://hg.mozilla.org/integration/mozilla-inbound/rev/0abbc0f316d3 Baldr: only create one wasm exception handler thread per process, and lazily (r=bbouvier)

Andrei Ciure[:aciure]

Comment 10

•

6 years ago

Backed out changeset 0abbc0f316d3 (Bug 1505271) for raptor perma failures push that caused the backout: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&searchStr=ugl&revision=0abbc0f316d3d1b4c47338ff34e8a2e02e6479f0 failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&fromchange=8ea9a919916f7dfd45d9638d9b1f16bf2eb9e97c&searchStr=linux%2Cx64%2Cquantumrender%2Copt%2Craptor%2Cperformance%2Ctests%2Con%2Cfirefox%2Cwith%2Ce10s%2Ctest-linux64-qr%2Fopt-raptor-unity-webgl-firefox-e10s%2Crap-e10s%28ugl%29&selectedJob=210919086 backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/111655db0b164cbf3eaaac6feccf3c98e8d8066d

Flags: needinfo?(luke)

Andrei Ciure[:aciure]

Comment 11

•

6 years ago

Backed out changeset 0abbc0f316d3 (Bug 1505271) for raptor perma failures CLOSED TREE Backed out changeset dfdd63aae0c5 (bug 1505271) for raptor perma failures CLOSED TREE

Stefan Hindli [:stefan_hindli]

Comment 12

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/dfdd63aae0c5

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox65: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla65

Luke Wagner [:luke]

Assignee

Comment 13

•

6 years ago

Oops, I guess these didn't run because I usually try with "-t none". Joel: sorry for ni? out of the blue, and feel free to forward me to someone more appropriate, but two questions: 1. I don't see raptor in mozilla-releng.net/trychooser; does it fall under one of the existing labels or is it missing? 2. A symbolicated crash stack would be super-helpful (this looks like it might be a MOZ_CRASH()); is there any way to enable that for raptor runs? Thanks!

Flags: needinfo?(luke) → needinfo?(jmaher)

Luke Wagner [:luke]

Assignee

Comment 14

•

6 years ago

Actually, n/m; I thought this might require some arcane build steps to repro, but actually it repros on any build and raptor was delightfully easy to run (`./mach raptor-test --test raptor-unity-webgl`, no other setup!). The bug was that I added some MOZ_RELEASE(cx->wasmHasSignalHandlers)s at the last moment to asm.js paths, forgetting that asm.js doesn't ensure or require signal handlers (anymore). (Still, the two feature requests in comment 13 might be nice for future devs. I'd add a third too: it'd be useful to capture the crash string of MOZ_CRASH() if that isn't done by #2.)

Flags: needinfo?(jmaher)

Pulsebot

Comment 15

•

6 years ago

Pushed by lwagner@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5e300fa7763d Baldr: find the JSContext via fp in the signal handler (r=bbouvier) https://hg.mozilla.org/integration/mozilla-inbound/rev/cbe3c9df4f9c Baldr: only create one wasm exception handler thread per process, and lazily (r=bbouvier)

Cosmin Sabou [:CosminS]

Comment 16

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5e300fa7763d https://hg.mozilla.org/mozilla-central/rev/cbe3c9df4f9c

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Stop creating 1 mach exception handler thread per JSRuntime

Categories

(Core :: JavaScript: WebAssembly, enhancement)

Tracking

()

People

(Reporter: luke, Assigned: luke)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Attachment

General

Description

File Name

Content Type