Last Comment Bug 851964 - Re-enable OdinMonkey on OSX
: Re-enable OdinMonkey on OSX
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All Mac OS X
: -- normal (vote)
: mozilla22
Assigned To: Luke Wagner [:luke]
:
Mentors:
Depends on: 884630
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-17 14:26 PDT by Luke Wagner [:luke]
Modified: 2013-06-18 21:36 PDT (History)
16 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
95% of a fix (27.02 KB, patch)
2013-03-18 01:48 PDT, Vladimir Vukicevic [:vlad] [:vladv]
no flags Details | Diff | Splinter Review
1 - add breakpad TARGET_OSX_USE_64BIT_EXCEPTIONS (18.20 KB, patch)
2013-03-18 16:25 PDT, Vladimir Vukicevic [:vlad] [:vladv]
ted: review+
Details | Diff | Splinter Review
2 - add BreakpadUserExceptionHandler (3.98 KB, patch)
2013-03-18 16:27 PDT, Vladimir Vukicevic [:vlad] [:vladv]
ted: review+
Details | Diff | Splinter Review
3 - add JSRuntime tracking in a linked list, so that we can find AsmJS info from any thread (3.81 KB, patch)
2013-03-18 16:28 PDT, Vladimir Vukicevic [:vlad] [:vladv]
luke: review+
Details | Diff | Splinter Review
4 - use previous patches to add support for Mach exception catching to Odin (12.05 KB, patch)
2013-03-18 16:30 PDT, Vladimir Vukicevic [:vlad] [:vladv]
luke: review+
Details | Diff | Splinter Review
followup JS patch (5.34 KB, patch)
2013-03-19 16:45 PDT, Vladimir Vukicevic [:vlad] [:vladv]
luke: review+
Details | Diff | Splinter Review
fix non-threadsafe (2.40 KB, patch)
2013-03-19 21:25 PDT, Vladimir Vukicevic [:vlad] [:vladv]
no flags Details | Diff | Splinter Review
standalone JS patch (wip) (29.59 KB, patch)
2013-03-20 14:04 PDT, Vladimir Vukicevic [:vlad] [:vladv]
no flags Details | Diff | Splinter Review
patch (33.11 KB, patch)
2013-03-22 11:26 PDT, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
qref'd patch (33.08 KB, patch)
2013-03-22 11:38 PDT, Luke Wagner [:luke]
luke: review+
Details | Diff | Splinter Review

Description Luke Wagner [:luke] 2013-03-17 14:26:04 PDT
On OSX, Breakpad uses the underlying Mach exception mechanism to catch errors and print minidumps.  This means that OdinMonkey's SIGSEGV/SIGBUS handlers are never called when Breakpad is enabled.  (Breakpad was tested on Windows/Linux, I foolishly forgot about OSX.  The shell tests the signal handlers, but clearly, we need some Mochitests as well.)  For now, we'll disable Odin on OSX.  This fix mostly just requires spending some time to understand Mach exceptions and the right way to integrate with Breakpad.

Ted, are you familiar with this part of Breakpad, or can you recommend anyone else familiar with Mach exceptions?
Comment 1 Ted Mielczarek [:ted.mielczarek] 2013-03-17 16:54:35 PDT
I probably know about as much as anyone else you're going to find, and I have a copy of "Mac OS X Internals" laying around.
Comment 2 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-17 20:44:38 PDT
I did some digging; this might be old news to Ted, but might be useful.

As best I can tell, the mach exception handling stuff is dependent on a magic symbol ("catch_exception_raise") being present in the process.  Breakpad already provides one of those, so we might want to use the same mechanism... I was starting to add a "BreakpadPreHandleExceptionHandler" symbol that breakpad would look for and forward to before doing its own thing.

However, it looks like breakpad is using the 32-bit handling mechanism -- see http://stackoverflow.com/a/2839113 .  (It's using exc_server instead of mach_exc_server generated by mig, etc.)  This is a problem because while breakpad doesn't care about the exception params, so 32-bit truncation is perfectly fine, Odin very much cares.  This likely means that we'd need to rewrite this part of breakpad to use the 64-bit routines; that doesn't look particularly bad, but it's annoying that we need to do it.
Comment 3 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-18 01:48:48 PDT
Created attachment 726027 [details] [diff] [review]
95% of a fix

Blah, I should have slept, but got close.  This is 95% of a fix; I have this separated out into 3 nice patches locally.  Here's WIP that does the following:

1) Uses the 64-bit handler infrastructure on OSX instead of the 32-bit.  Note that as things are currently, we will always truncate access exception addresses as similar in our reports, because we're only getting the low 32 bits of those addresses.

2) Adds a user exception handler function, defined via a magic symbol in the binary.

3) Uses that in Asm.js.

This all should in theory work, except that I need to read a TLS slot in another thread to get that thread's JS per thread data.  Unfortunately, state.__gs always seems to be 0; I'm not reading the segment register addresses with THREAD_STATE64.  No idea how to solve this, unfortunately :(
Comment 4 Alon Zakai (:azakai) 2013-03-18 15:40:19 PDT
Do we also have the option to not use signal handling on OS X and like Windows64 emit slightly-slower code (with branching), temporarily?
Comment 5 Luke Wagner [:luke] 2013-03-18 15:43:20 PDT
I think Vlad has this problem fixed already :)
Comment 6 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-18 16:25:43 PDT
Created attachment 726398 [details] [diff] [review]
1 - add breakpad TARGET_OSX_USE_64BIT_EXCEPTIONS

Part 1 - add TARGET_OSX_USE_64BIT_EXCEPTIONS to breakpad.

This has us go through the full 64-bit path in breakpad, instead of receiving truncated 32-bit values.  Before this, even current breakpad is getting just truncated values for reporting (but just this won't fix it -- we need to make sure the 64-bit values are plumbed all the way through).

I tested breakpad in a few crashes and things seemed to be ok, but please let me know if there's a full test suite I can run.
Comment 7 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-18 16:27:11 PDT
Created attachment 726400 [details] [diff] [review]
2 - add BreakpadUserExceptionHandler

Adds a callback symbol that breakpad looks for in the current process to give it a chance to handle an exception first.  This approach (look for a symbol in the process) is the same one used by the (32-bit) mach handling setup, so I didn't feel too bad about doing it this way.
Comment 8 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-18 16:28:39 PDT
Created attachment 726401 [details] [diff] [review]
3 - add JSRuntime tracking in a linked list, so that we can find AsmJS info from any thread

Adds a linked list of JSRuntimes to JSRuntime, specifically so that we can look up the AsmJS invocation from the Mach exception handler.  We get the exception on a different thread, and despite many, many efforts, I couldn't figure out how to get a hold of a TLS slot on a different thread.  This was the path of least resistance.
Comment 9 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-18 16:30:39 PDT
Created attachment 726402 [details] [diff] [review]
4 - use previous patches to add support for Mach exception catching to Odin

The duplicated code in the handler is unfortunate, but trying to mix everything together may have just lead to more convoluted code flow.
Comment 10 Luke Wagner [:luke] 2013-03-18 17:06:47 PDT
Comment on attachment 726401 [details] [diff] [review]
3 - add JSRuntime tracking in a linked list, so that we can find AsmJS info from any thread

Review of attachment 726401 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +1008,5 @@
> +            runtimeListHead_ = runtimeListNext_;
> +        } else {
> +            while (r && r->runtimeListNext_ != this) {
> +                r = r->runtimeListNext_;
> +            }

No { } around single-line body

::: js/src/jscntxt.cpp
@@ +1423,5 @@
> +        activation = r->mainThread.asmJSActivationStackFromAnyThread();
> +        if (activation) {
> +            const AsmJSModule &module = activation->module();
> +            uint8_t *code = module.functionCode();
> +            if (pc >= code && pc < (code + module.functionBytes())) {

This duplicates PCIsInModule in AsmJSSignalHandler.  Could you common this out by adding bool AsmJSModule::pcIsInModule(void *pc)?

@@ +1426,5 @@
> +            uint8_t *code = module.functionCode();
> +            if (pc >= code && pc < (code + module.functionBytes())) {
> +                // found it
> +                break;
> +            }

No { } around single-line then/else

::: js/src/jscntxt.h
@@ +681,5 @@
>  
> +#ifdef XP_MACOSX
> +    // On OSX, we need a way to find all the live runtimes from an arbitrary
> +    // thread (specifically, the exception thread, for fixing up Odin
> +    // accesses).

s/Odin accesses/asm.js faults/
Comment 11 Luke Wagner [:luke] 2013-03-18 17:13:22 PDT
Comment on attachment 726402 [details] [diff] [review]
4 - use previous patches to add support for Mach exception catching to Odin

Review of attachment 726402 [details] [diff] [review]:
-----------------------------------------------------------------

Nice job digging into all this!

(Awaiting the OSX 32-bit patch.)

::: js/src/ion/AsmJSSignalHandlers.cpp
@@ +306,5 @@
> +    boolean_t BreakpadUserExceptionHandler64(exception_type_t extype,
> +                                             mach_exception_data_t code,
> +                                             mach_msg_type_number_t code_count,
> +                                             mach_port_t thread)
> +        __attribute__((visibility("default")));

I'd really much prefer an explicit call from breakpad into with AsmJS in the name rather than this implicit linking strategy.

@@ +443,2 @@
>      } else {
> +        SetThreadStateRegisterToCoercedUndefined(&context->__ss, isFloat32, reg);

No { } around single-line then/else here and several times below.
Comment 12 Ted Mielczarek [:ted.mielczarek] 2013-03-19 05:13:32 PDT
I'll take a look at your patches either today or tomorrow. Breakpad does have a unit test suite, you can run the Mac client tests by pulling Breakpad from SVN, applying your patches, and using the XCode project in src/client/mac. There's an "all_unittests" target that will build and run the client tests in XCode.

Additionally, I have a patch to the XCode project to get it building on modern versions of XCode:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/breakpad-mq/raw-file/7f9524222945/xcode-project-update
Comment 13 Ted Mielczarek [:ted.mielczarek] 2013-03-19 12:27:53 PDT
Comment on attachment 726398 [details] [diff] [review]
1 - add breakpad TARGET_OSX_USE_64BIT_EXCEPTIONS

Review of attachment 726398 [details] [diff] [review]:
-----------------------------------------------------------------

This looks about as reasonable as this patch could look. I know we talked about it on IRC (and you sent me a patch even), but I'd like the 64-bit behavior to be on-by-default on x86-64.

::: toolkit/crashreporter/google-breakpad/src/client/mac/handler/exception_handler.cc
@@ +202,5 @@
>      NDR_record_t NDR;
>      exception_type_t exception;
>      mach_msg_type_number_t codeCnt;
> +#if TARGET_OSX_USE_64BIT_EXCEPTIONS
> +    int64_t code[2];

We should fix the MinidumpGenerator code to store these properly so we get valid data in the minidump. That could be a followup though.
Comment 14 Ted Mielczarek [:ted.mielczarek] 2013-03-19 12:29:21 PDT
Comment on attachment 726400 [details] [diff] [review]
2 - add BreakpadUserExceptionHandler

Review of attachment 726400 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is OK, we should find out what upstream thinks, but you can land it for now. I'll get it up for upstream review tomorrow.

::: toolkit/crashreporter/google-breakpad/src/client/mac/handler/exception_handler.cc
@@ +716,5 @@
>  
> +        if (s_UserExceptionHandler &&
> +            s_UserExceptionHandler(receive.exception, receive.code, receive.code_count,
> +                                   receive.thread.name) == TRUE)
> +        {

nit: open brackets go on the same line as the if

@@ +718,5 @@
> +            s_UserExceptionHandler(receive.exception, receive.code, receive.code_count,
> +                                   receive.thread.name) == TRUE)
> +        {
> +          // Userspace handler got it.  We create an approriate reply indicating success,
> +          // and send it back to the kernel.

Generally Breakpad code tries to avoid first-person comments, FWIW.
Comment 15 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-19 16:45:07 PDT
Created attachment 726968 [details] [diff] [review]
followup JS patch

followup to parts 3 and 4
Comment 17 Boris Zbarsky [:bz] 2013-03-19 20:48:47 PDT
The checked-in code doesn't compile unless I --enable-threadsafe, because PR_Lock is not #included otherwise.  Is that expected?  As in, have we dropped --disable-threadsafe support?
Comment 18 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-19 21:25:04 PDT
Created attachment 727048 [details] [diff] [review]
fix non-threadsafe

This (untested) patch should fix non-threadsafe.  I haven't landed it yet due to tree and to see whether things will get backed out or whatnot.
Comment 19 Phil Ringnalda (:philor, back in August) 2013-03-19 21:59:32 PDT
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/26653529ea8b for intermittent (but far far too frequent) timeouts in mostly, but not quite exclusively, 10.8 crash-related tests. In case the list comes in handy:

M3  dom/plugins/test/test_crashing.html (the only one that failed, once, on 10.7)
bc  browser_pluginCrashCommentAndURL.js
oth dom/plugins/test/test_crash_notify.xul

Unfortunately, hanging like that kills the rest of the suite, so it's not possible to say for sure that there wouldn't be other tests later in any of those suites that would also hang in a run that's planning on hanging.
Comment 20 Vladimir Vukicevic [:vlad] [:vladv] 2013-03-20 14:04:15 PDT
Created attachment 727360 [details] [diff] [review]
standalone JS patch (wip)

Here's a patch that sets up an exception server in JS, and handles thread exceptions only, which should have first go at things.  This seems to mostly work, but even though the exception gets "handled", we still end up crashing with an access exception even though we shouldn't (because we handled it!).
Comment 21 Luke Wagner [:luke] 2013-03-22 11:26:57 PDT
Created attachment 728333 [details] [diff] [review]
patch

Continuing with Vlad's fine work, this patch exclusively uses Mach exceptions (in JS_THREADSAFE and non-JS_THREADSAFE) builds which has the added benefits that I can run the asm.js unit tests, which now pass.  (Also gdb works again; yeah!).

Trying a browser build now...
Comment 22 Luke Wagner [:luke] 2013-03-22 11:38:53 PDT
Created attachment 728339 [details] [diff] [review]
qref'd patch
Comment 23 Luke Wagner [:luke] 2013-03-22 13:42:13 PDT
Comment on attachment 728339 [details] [diff] [review]
qref'd patch

Builds a browser, survives MOZ_CRASHREPORTER=1, runs BBench/ammo.js, and successfully submitted a crash report.

Try push:
https://tbpl.mozilla.org/?tree=Try&rev=8ca15d7d7c49
I see it doesn't have OSX10.8 in the list, but I developed/tested on OSX, so I think we're covered.
Comment 24 Luke Wagner [:luke] 2013-03-22 16:06:39 PDT
Comment on attachment 728339 [details] [diff] [review]
qref'd patch

r+ from Vlad (from his phone), green on try, tree's open, let's do this!
Leeeeeeroooy Jeeenkins
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5d90cd59be8
Comment 25 Joe Drew (not getting mail) 2013-03-23 16:00:13 PDT
https://hg.mozilla.org/mozilla-central/rev/a5d90cd59be8

Note You need to log in before you can comment on or make changes to this bug.