Closed Bug 1731162 Opened 4 years ago Closed 3 years ago

Crash in [@ nsXPCWrappedJS::Release] on the Breakpad ExceptionHandler thread

Categories

(Toolkit :: Crash Reporting, defect)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: mccr8, Assigned: gsvelto)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

This is a weird one. We're on the Breakpad ExceptionHandler thread, and calling exit. Somehow this is triggering the XPCWJS thread check release assert. There's at least a couple of these crashes

Crash report: https://crash-stats.mozilla.org/report/index/17766243-6c90-4f19-84f8-767e90210913

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(NS_IsMainThread()) (nsXPCWrappedJS::Release called off main thread)

Top 6 frames of crashing thread:

0 XUL nsXPCWrappedJS::Release js/xpconnect/src/XPCWrappedJS.cpp:259
1 libsystem_c.dylib __cxa_finalize_ranges 
2 libsystem_c.dylib exit 
3 XUL google_breakpad::ExceptionHandler::WaitForMessage toolkit/crashreporter/breakpad-client/mac/handler/exception_handler.cc:699
4 libsystem_pthread.dylib _pthread_start 
5 libsystem_pthread.dylib thread_start 

Still happening. Should this be using _exit() instead of exit() or something? Whatever skips the destructors.

Flags: needinfo?(gsvelto)

Yes, we should probably call _exit() however... I don't know what the correct behavior would be here. What we're doing is trying to populate the reply by forwarding the exception in case anybody cares, then replying to the kernel (which originally sent us the exception message). We call exit() only if we fail to populate the reply which in the current code only happens if the msgh_id of the inbound exception is different from 2405 which should be impossible.

This is odd, but there's something else that's odd: the crashes indicate that we're in the main process but if we were we should have called _exit() already. This happens implicitly here, the second-last parameter is telling Breakpad to quit after having written the minidump.

So if we haven't quit already that could mean two things:

  1. We're in a forked version of the main process
  2. Launching the crash reporter client failed

1 should be impossible and 2 is very unlikely. Either way this indicate a fundamental problem, and I'd rule out a hardware issue because it would be odd that a bit flip would affect this specific code path. So instead of calling _exit() I'm tempted to turn this into a crash where I store the contents of the exception id and the result of launching the crash reporter client. Then we'd know what's going on and we might react properly.

Flags: needinfo?(gsvelto)

I've got a patch to try and detect what's wrong here. Let's see if I can get some facts.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Keywords: leave-open
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e83ac05b890f Crash with more information when receiving odd exceptions on macOS r=KrisWright

Adding the new signature I introduced for this issue. It seems that MOZ_CRASH_UNSAFE_PRINTF() doesn't understand the "pretty hexadecimal" modifier and so it's not printing out the first parameter correctly, I should adjust that. The second parameter appears to be always zero - which would indicate that we didn't write out the minidump correctly - however given that the first parameter wasn't printed out correctly I fear we might really be looking at the contents of the first one.

Crash Signature: [@ nsXPCWrappedJS::Release] → [@ nsXPCWrappedJS::Release] [@ google_breakpad::ExceptionHandler::WaitForMessage | google_breakpad::ExceptionHandler::WaitForMessage | _pthread_start]
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0e015cdbe6ba Don't attempt to pretty print the mach message ID in the crash message r=KrisWright

Here's a couple more signatures. The mach message ID appears to be bogus sometimes, however what all these crashes have in common is that we failed to write a minidump. I'm considering leaving this as-is for the time being because this would have been a crash anyway - we're just not getting the minidump because it's not being written out. Having this crash anyway later lets us track the volume and eventually verify the fix for the minidump writing logic when get around to that.

Crash Signature: [@ nsXPCWrappedJS::Release] [@ google_breakpad::ExceptionHandler::WaitForMessage | google_breakpad::ExceptionHandler::WaitForMessage | _pthread_start] → [@ google_breakpad::ExceptionHandler::WaitForMessage | google_breakpad::ExceptionHandler::WaitForMessage | _pthread_start] [@ google_breakpad::ExceptionHandler::WaitForMessage | _pthread_body] [@ google_breakpad::ExceptionHandler::WaitForMessage | _pthr…

The leave-open keyword is there and there is no activity for 6 months.
:gsvelto, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(gsvelto)

Looking at the crash data we're getting mach messages with ID we don't know how to handle (4917, 29434 and more). In all cases we've not been able to write a minidump, we don't really know what to do and the main process crashed anyway so I'll call this fixed since it's now a controlled crash.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(gsvelto)
Resolution: --- → FIXED
Crash Signature: _pthread_start] → _pthread_start] [@ google_breakpad::ExceptionHandler::WaitForMessage | libsystem_pthread.dylib@0x32ea] [@ google_breakpad::ExceptionHandler::WaitForMessage | libsystem_pthread.dylib@0x3660]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: