Closed Bug 1731162 Opened 4 years ago Closed 3 years ago

Crash in [@ nsXPCWrappedJS::Release] on the Breakpad ExceptionHandler thread

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: mccr8, Assigned: gsvelto)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

Bug 1731162 - Crash with more information when receiving odd exceptions on macOS r=KrisWright 4 years ago Gabriele Svelto [:gsvelto] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1731162 - Don't attempt to pretty print the mach message ID in the crash message r=KrisWright 4 years ago Gabriele Svelto [:gsvelto] 48 bytes, text/x-phabricator-request		Details \| Review

Andrew McCreight [:mccr8]

Reporter

Description

•

4 years ago

This is a weird one. We're on the Breakpad ExceptionHandler thread, and calling exit. Somehow this is triggering the XPCWJS thread check release assert. There's at least a couple of these crashes

Crash report: https://crash-stats.mozilla.org/report/index/17766243-6c90-4f19-84f8-767e90210913

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(NS_IsMainThread()) (nsXPCWrappedJS::Release called off main thread)

Top 6 frames of crashing thread:

0 XUL nsXPCWrappedJS::Release js/xpconnect/src/XPCWrappedJS.cpp:259
1 libsystem_c.dylib __cxa_finalize_ranges 
2 libsystem_c.dylib exit 
3 XUL google_breakpad::ExceptionHandler::WaitForMessage toolkit/crashreporter/breakpad-client/mac/handler/exception_handler.cc:699
4 libsystem_pthread.dylib _pthread_start 
5 libsystem_pthread.dylib thread_start

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

4 years ago

Still happening. Should this be using _exit() instead of exit() or something? Whatever skips the destructors.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Assignee

Comment 2

•

4 years ago

Yes, we should probably call _exit() however... I don't know what the correct behavior would be here. What we're doing is trying to populate the reply by forwarding the exception in case anybody cares, then replying to the kernel (which originally sent us the exception message). We call exit() only if we fail to populate the reply which in the current code only happens if the msgh_id of the inbound exception is different from 2405 which should be impossible.

This is odd, but there's something else that's odd: the crashes indicate that we're in the main process but if we were we should have called _exit() already. This happens implicitly here, the second-last parameter is telling Breakpad to quit after having written the minidump.

So if we haven't quit already that could mean two things:

We're in a forked version of the main process
Launching the crash reporter client failed

1 should be impossible and 2 is very unlikely. Either way this indicate a fundamental problem, and I'd rule out a hardware issue because it would be odd that a bit flip would affect this specific code path. So instead of calling _exit() I'm tempted to turn this into a crash where I store the contents of the exception id and the result of launching the crash reporter client. Then we'd know what's going on and we might react properly.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Assignee

Comment 3

•

4 years ago

I've got a patch to try and detect what's wrong here. Let's see if I can get some facts.

Assignee: nobody → gsvelto

Status: NEW → ASSIGNED

Keywords: leave-open

Gabriele Svelto [:gsvelto]

Assignee

Comment 4

•

4 years ago

Attached file Bug 1731162 - Crash with more information when receiving odd exceptions on macOS r=KrisWright — Details

Pulsebot

Comment 5

•

4 years ago

Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e83ac05b890f Crash with more information when receiving odd exceptions on macOS r=KrisWright

Cristian Tuns

Comment 6

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e83ac05b890f

Gabriele Svelto [:gsvelto]

Assignee

Comment 7

•

4 years ago

Adding the new signature I introduced for this issue. It seems that MOZ_CRASH_UNSAFE_PRINTF() doesn't understand the "pretty hexadecimal" modifier and so it's not printing out the first parameter correctly, I should adjust that. The second parameter appears to be always zero - which would indicate that we didn't write out the minidump correctly - however given that the first parameter wasn't printed out correctly I fear we might really be looking at the contents of the first one.

Crash Signature: [@ nsXPCWrappedJS::Release] → [@ nsXPCWrappedJS::Release] [@ google_breakpad::ExceptionHandler::WaitForMessage | google_breakpad::ExceptionHandler::WaitForMessage | _pthread_start]

Gabriele Svelto [:gsvelto]

Assignee

Comment 8

•

4 years ago

Attached file Bug 1731162 - Don't attempt to pretty print the mach message ID in the crash message r=KrisWright — Details

Pulsebot

Comment 9

•

4 years ago

Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0e015cdbe6ba Don't attempt to pretty print the mach message ID in the crash message r=KrisWright

Cristina Cozmuta (:CrissCozmuta)

Comment 10

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0e015cdbe6ba

Gabriele Svelto [:gsvelto]

Assignee

Comment 11

•

4 years ago

Here's a couple more signatures. The mach message ID appears to be bogus sometimes, however what all these crashes have in common is that we failed to write a minidump. I'm considering leaving this as-is for the time being because this would have been a crash anyway - we're just not getting the minidump because it's not being written out. Having this crash anyway later lets us track the volume and eventually verify the fix for the minidump writing logic when get around to that.

Crash Signature: [@ nsXPCWrappedJS::Release] [@ google_breakpad::ExceptionHandler::WaitForMessage | google_breakpad::ExceptionHandler::WaitForMessage | _pthread_start] → [@ google_breakpad::ExceptionHandler::WaitForMessage | google_breakpad::ExceptionHandler::WaitForMessage | _pthread_start] [@ google_breakpad::ExceptionHandler::WaitForMessage | _pthread_body] [@ google_breakpad::ExceptionHandler::WaitForMessage | _pthr…

BugBot [:suhaib / :marco/ :calixte]

Comment 13

•

3 years ago

The leave-open keyword is there and there is no activity for 6 months.
:gsvelto, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Assignee

Comment 14

•

3 years ago

Looking at the crash data we're getting mach messages with ID we don't know how to handle (4917, 29434 and more). In all cases we've not been able to write a minidump, we don't really know what to do and the main process crashed anyway so I'll call this fixed since it's now a controlled crash.

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Flags: needinfo?(gsvelto)

Resolution: --- → FIXED

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Keywords: leave-open

Gabriele Svelto [:gsvelto]

Assignee

Updated

•

3 years ago

Crash Signature: _pthread_start] → _pthread_start] [@ google_breakpad::ExceptionHandler::WaitForMessage | libsystem_pthread.dylib@0x32ea] [@ google_breakpad::ExceptionHandler::WaitForMessage | libsystem_pthread.dylib@0x3660]

Steven Michaud [:smichaud] (Retired)

Updated

•

2 years ago

Bugzilla

Crash in [@ nsXPCWrappedJS::Release] on the Breakpad ExceptionHandler thread

Categories

(Toolkit :: Crash Reporting, defect)

Tracking

()

People

(Reporter: mccr8, Assigned: gsvelto)

References

Details

(Keywords: crash)

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 13

Comment 14

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type