Closed Bug 851964 Opened 11 years ago Closed 11 years ago

Re-enable OdinMonkey on OSX


(Core :: JavaScript Engine, defect)

Not set





(Reporter: luke, Assigned: luke)




(1 file, 9 obsolete files)

33.08 KB, patch
: review+
Details | Diff | Splinter Review
On OSX, Breakpad uses the underlying Mach exception mechanism to catch errors and print minidumps.  This means that OdinMonkey's SIGSEGV/SIGBUS handlers are never called when Breakpad is enabled.  (Breakpad was tested on Windows/Linux, I foolishly forgot about OSX.  The shell tests the signal handlers, but clearly, we need some Mochitests as well.)  For now, we'll disable Odin on OSX.  This fix mostly just requires spending some time to understand Mach exceptions and the right way to integrate with Breakpad.

Ted, are you familiar with this part of Breakpad, or can you recommend anyone else familiar with Mach exceptions?
I probably know about as much as anyone else you're going to find, and I have a copy of "Mac OS X Internals" laying around.
I did some digging; this might be old news to Ted, but might be useful.

As best I can tell, the mach exception handling stuff is dependent on a magic symbol ("catch_exception_raise") being present in the process.  Breakpad already provides one of those, so we might want to use the same mechanism... I was starting to add a "BreakpadPreHandleExceptionHandler" symbol that breakpad would look for and forward to before doing its own thing.

However, it looks like breakpad is using the 32-bit handling mechanism -- see .  (It's using exc_server instead of mach_exc_server generated by mig, etc.)  This is a problem because while breakpad doesn't care about the exception params, so 32-bit truncation is perfectly fine, Odin very much cares.  This likely means that we'd need to rewrite this part of breakpad to use the 64-bit routines; that doesn't look particularly bad, but it's annoying that we need to do it.
Attached patch 95% of a fix (obsolete) — Splinter Review
Blah, I should have slept, but got close.  This is 95% of a fix; I have this separated out into 3 nice patches locally.  Here's WIP that does the following:

1) Uses the 64-bit handler infrastructure on OSX instead of the 32-bit.  Note that as things are currently, we will always truncate access exception addresses as similar in our reports, because we're only getting the low 32 bits of those addresses.

2) Adds a user exception handler function, defined via a magic symbol in the binary.

3) Uses that in Asm.js.

This all should in theory work, except that I need to read a TLS slot in another thread to get that thread's JS per thread data.  Unfortunately, state.__gs always seems to be 0; I'm not reading the segment register addresses with THREAD_STATE64.  No idea how to solve this, unfortunately :(
Assignee: luke → vladimir
Do we also have the option to not use signal handling on OS X and like Windows64 emit slightly-slower code (with branching), temporarily?
I think Vlad has this problem fixed already :)
Part 1 - add TARGET_OSX_USE_64BIT_EXCEPTIONS to breakpad.

This has us go through the full 64-bit path in breakpad, instead of receiving truncated 32-bit values.  Before this, even current breakpad is getting just truncated values for reporting (but just this won't fix it -- we need to make sure the 64-bit values are plumbed all the way through).

I tested breakpad in a few crashes and things seemed to be ok, but please let me know if there's a full test suite I can run.
Attachment #726027 - Attachment is obsolete: true
Attachment #726398 - Flags: review?(ted)
Adds a callback symbol that breakpad looks for in the current process to give it a chance to handle an exception first.  This approach (look for a symbol in the process) is the same one used by the (32-bit) mach handling setup, so I didn't feel too bad about doing it this way.
Attachment #726400 - Flags: review?(ted)
Adds a linked list of JSRuntimes to JSRuntime, specifically so that we can look up the AsmJS invocation from the Mach exception handler.  We get the exception on a different thread, and despite many, many efforts, I couldn't figure out how to get a hold of a TLS slot on a different thread.  This was the path of least resistance.
Attachment #726401 - Flags: review?(luke)
Attachment #726401 - Attachment description: add JSRuntime tracking in a linked list, so that we can find AsmJS info from any thread → 3 - add JSRuntime tracking in a linked list, so that we can find AsmJS info from any thread
The duplicated code in the handler is unfortunate, but trying to mix everything together may have just lead to more convoluted code flow.
Attachment #726402 - Flags: review?(luke)
Comment on attachment 726401 [details] [diff] [review]
3 - add JSRuntime tracking in a linked list, so that we can find AsmJS info from any thread

Review of attachment 726401 [details] [diff] [review]:

::: js/src/jsapi.cpp
@@ +1008,5 @@
> +            runtimeListHead_ = runtimeListNext_;
> +        } else {
> +            while (r && r->runtimeListNext_ != this) {
> +                r = r->runtimeListNext_;
> +            }

No { } around single-line body

::: js/src/jscntxt.cpp
@@ +1423,5 @@
> +        activation = r->mainThread.asmJSActivationStackFromAnyThread();
> +        if (activation) {
> +            const AsmJSModule &module = activation->module();
> +            uint8_t *code = module.functionCode();
> +            if (pc >= code && pc < (code + module.functionBytes())) {

This duplicates PCIsInModule in AsmJSSignalHandler.  Could you common this out by adding bool AsmJSModule::pcIsInModule(void *pc)?

@@ +1426,5 @@
> +            uint8_t *code = module.functionCode();
> +            if (pc >= code && pc < (code + module.functionBytes())) {
> +                // found it
> +                break;
> +            }

No { } around single-line then/else

::: js/src/jscntxt.h
@@ +681,5 @@
> +#ifdef XP_MACOSX
> +    // On OSX, we need a way to find all the live runtimes from an arbitrary
> +    // thread (specifically, the exception thread, for fixing up Odin
> +    // accesses).

s/Odin accesses/asm.js faults/
Attachment #726401 - Flags: review?(luke) → review+
Comment on attachment 726402 [details] [diff] [review]
4 - use previous patches to add support for Mach exception catching to Odin

Review of attachment 726402 [details] [diff] [review]:

Nice job digging into all this!

(Awaiting the OSX 32-bit patch.)

::: js/src/ion/AsmJSSignalHandlers.cpp
@@ +306,5 @@
> +    boolean_t BreakpadUserExceptionHandler64(exception_type_t extype,
> +                                             mach_exception_data_t code,
> +                                             mach_msg_type_number_t code_count,
> +                                             mach_port_t thread)
> +        __attribute__((visibility("default")));

I'd really much prefer an explicit call from breakpad into with AsmJS in the name rather than this implicit linking strategy.

@@ +443,2 @@
>      } else {
> +        SetThreadStateRegisterToCoercedUndefined(&context->__ss, isFloat32, reg);

No { } around single-line then/else here and several times below.
Attachment #726402 - Flags: review?(luke) → review+
I'll take a look at your patches either today or tomorrow. Breakpad does have a unit test suite, you can run the Mac client tests by pulling Breakpad from SVN, applying your patches, and using the XCode project in src/client/mac. There's an "all_unittests" target that will build and run the client tests in XCode.

Additionally, I have a patch to the XCode project to get it building on modern versions of XCode:
Comment on attachment 726398 [details] [diff] [review]

Review of attachment 726398 [details] [diff] [review]:

This looks about as reasonable as this patch could look. I know we talked about it on IRC (and you sent me a patch even), but I'd like the 64-bit behavior to be on-by-default on x86-64.

::: toolkit/crashreporter/google-breakpad/src/client/mac/handler/
@@ +202,5 @@
>      NDR_record_t NDR;
>      exception_type_t exception;
>      mach_msg_type_number_t codeCnt;
> +    int64_t code[2];

We should fix the MinidumpGenerator code to store these properly so we get valid data in the minidump. That could be a followup though.
Attachment #726398 - Flags: review?(ted) → review+
Comment on attachment 726400 [details] [diff] [review]
2 - add BreakpadUserExceptionHandler

Review of attachment 726400 [details] [diff] [review]:

I think this is OK, we should find out what upstream thinks, but you can land it for now. I'll get it up for upstream review tomorrow.

::: toolkit/crashreporter/google-breakpad/src/client/mac/handler/
@@ +716,5 @@
> +        if (s_UserExceptionHandler &&
> +            s_UserExceptionHandler(receive.exception, receive.code, receive.code_count,
> +                          == TRUE)
> +        {

nit: open brackets go on the same line as the if

@@ +718,5 @@
> +            s_UserExceptionHandler(receive.exception, receive.code, receive.code_count,
> +                          == TRUE)
> +        {
> +          // Userspace handler got it.  We create an approriate reply indicating success,
> +          // and send it back to the kernel.

Generally Breakpad code tries to avoid first-person comments, FWIW.
Attachment #726400 - Flags: review?(ted) → review+
Attached patch followup JS patch (obsolete) — Splinter Review
followup to parts 3 and 4
Attachment #726968 - Flags: review?(luke)
Attachment #726968 - Flags: review?(luke) → review+
The checked-in code doesn't compile unless I --enable-threadsafe, because PR_Lock is not #included otherwise.  Is that expected?  As in, have we dropped --disable-threadsafe support?
Attached patch fix non-threadsafe (obsolete) — Splinter Review
This (untested) patch should fix non-threadsafe.  I haven't landed it yet due to tree and to see whether things will get backed out or whatnot.
Backed out in for intermittent (but far far too frequent) timeouts in mostly, but not quite exclusively, 10.8 crash-related tests. In case the list comes in handy:

M3  dom/plugins/test/test_crashing.html (the only one that failed, once, on 10.7)
bc  browser_pluginCrashCommentAndURL.js
oth dom/plugins/test/test_crash_notify.xul

Unfortunately, hanging like that kills the rest of the suite, so it's not possible to say for sure that there wouldn't be other tests later in any of those suites that would also hang in a run that's planning on hanging.
Attached patch standalone JS patch (wip) (obsolete) — Splinter Review
Here's a patch that sets up an exception server in JS, and handles thread exceptions only, which should have first go at things.  This seems to mostly work, but even though the exception gets "handled", we still end up crashing with an access exception even though we shouldn't (because we handled it!).
Attached patch patch (obsolete) — Splinter Review
Continuing with Vlad's fine work, this patch exclusively uses Mach exceptions (in JS_THREADSAFE and non-JS_THREADSAFE) builds which has the added benefits that I can run the asm.js unit tests, which now pass.  (Also gdb works again; yeah!).

Trying a browser build now...
Assignee: vladimir → luke
Attachment #726398 - Attachment is obsolete: true
Attachment #726400 - Attachment is obsolete: true
Attachment #726401 - Attachment is obsolete: true
Attachment #726402 - Attachment is obsolete: true
Attachment #726968 - Attachment is obsolete: true
Attachment #727048 - Attachment is obsolete: true
Attachment #727360 - Attachment is obsolete: true
Attached patch qref'd patchSplinter Review
Attachment #728333 - Attachment is obsolete: true
Comment on attachment 728339 [details] [diff] [review]
qref'd patch

Builds a browser, survives MOZ_CRASHREPORTER=1, runs BBench/ammo.js, and successfully submitted a crash report.

Try push:
I see it doesn't have OSX10.8 in the list, but I developed/tested on OSX, so I think we're covered.
Attachment #728339 - Flags: review?(vladimir)
Comment on attachment 728339 [details] [diff] [review]
qref'd patch

r+ from Vlad (from his phone), green on try, tree's open, let's do this!
Leeeeeeroooy Jeeenkins
Attachment #728339 - Flags: review?(vladimir) → review+
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
You need to log in before you can comment on or make changes to this bug.