[macOS 13] Crash in [@ js::MachExceptionHandler] at "Failed to forward to the previous handler!"
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
People
(Reporter: mccr8, Assigned: jandem)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(3 files)
Crash report: https://crash-stats.mozilla.org/report/index/ea2539e9-e74d-4796-8a4f-e6a360220609
MOZ_CRASH Reason: MOZ_CRASH(MachExceptionHandler: Failed to forward to the previous handler!)
Top 4 frames of crashing thread:
0 XUL js::MachExceptionHandler js/src/ds/MemoryProtectionExceptionHandler.cpp:561
1 XUL js::detail::ThreadTrampoline<void js/src/threading/Thread.h:209
2 libsystem_pthread.dylib libsystem_pthread.dylib@0x000000000000728c
3 libsystem_pthread.dylib libsystem_pthread.dylib@0x000000000000728c
I'm filing this in the JS engine because that's the origin of this code, but it looks like all of the crashes with this signature are on macOS 13, so there might be something interesting there.
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
Crashes are on both arm64 and amd64.
Reporter | ||
Updated•2 years ago
|
Comment 2•2 years ago
|
||
At this time, macOS Ventura isn't available to me as part of Apple's pre-release program. Keeping n-i set to follow up on this.
Comment 3•2 years ago
|
||
I was able to download macOS Ventura, but I have not run into this particular crash so far.
Comment 4•2 years ago
|
||
I haven't either. I've now been using FF 101.0 and 101.0.1 for a couple of days on both Intel and Apple Silicon hardware.
Comment 5•2 years ago
|
||
Crash stacks for macOS 13 Beta build 22A5266r (the latest and so far only released macOS 13 beta) are now symbolicated. I manually scraped its symbols and sent them to Gabriele Svelto. I'll keep doing that for subsequent macOS 13 betas.
bp-011b7d0d-160c-42fd-8851-268c50220612
0 XUL js::MachExceptionHandler() js/src/ds/MemoryProtectionExceptionHandler.cpp:652 context
1 XUL js::detail::ThreadTrampoline<void (&)(), >::Start(void*) js/src/threading/Thread.h:209 cfi
2 libsystem_pthread.dylib _pthread_start cfi
3 libsystem_pthread.dylib thread_start cfi
Comment 6•2 years ago
|
||
Note that the procedure for manual symbol scraping is different on macOS 13. See bug 1661771 comment #23.
Comment 7•2 years ago
|
||
Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.
There is another spike of Mac ARM64 crashes recently, with Jan changes on JIT frames, but not on amd64 as mentionned in comment 1.
Comment 8•2 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #7)
Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.
This is MachExceptionHandler
, which seems like it has a much wider range of uses.
There is another spike of Mac ARM64 crashes recently, with Jan changes on JIT frames, but not on amd64 as mentionned in comment 1.
For this crash, it would be really helpful to know what the error message is, since there are many ways that mach_msg
can fail. This could be an OOM, for example. Something like:
MACH_CRASH_UNSAFE_PRINTF("MachExceptionHandler: Failed to forward to the previous handler: %s", mach_error_string(ret));
would be really helpful here (though it'll require data review).
(@smichaud, it's great to see you around here! Thank you for keeping an eye on things.)
Comment 9•2 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #8)
(In reply to Nicolas B. Pierron [:nbp] from comment #7)
Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.This is
MachExceptionHandler
, which seems like it has a much wider range of uses.
...or not. I think you're right, this is only installed from the memory protection stuff. Hm, do we need this now? I'll look further.
Comment 10•2 years ago
|
||
I'm wondering if this is really a crash in WebRenderCommandBuilder::Destroy
that is getting caught by the exception handling thread and failing there. It could even be the same as bug 1759481?
Comment 11•2 years ago
|
||
If so, this could be indicative of a crash reporting issue, if MOZ_CRASH
is getting overshadowed by the failure here. gsvelto, do you know if that's possible?
Comment 12•2 years ago
|
||
Comment 13•2 years ago
|
||
From looking at the blame annotations, I think Jan might be the best to answer the question of whether we need this memory protection handler at all now.
Comment 14•2 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #11)
If so, this could be indicative of a crash reporting issue, if
MOZ_CRASH
is getting overshadowed by the failure here. gsvelto, do you know if that's possible?
It's possible that the message was not forwarded correctly (i.e. Breakpad's exception handler ultimately rejects it) but here we're not waiting for a reply, just sending the message (mach_msg(..., MACH_SEND_MSG, ...)
) so the crash reporter's reply shouldn't matter and shouldn't affect the return code. That being said I've seen odd things happening in the exception handler though only rarely: sometimes we get an exception with a msgh_id
which doesn't match what we expect. Additionally sometimes we get an exception that's not meant for the target process (i.e. in the message task.name != mach_task_self()
). We explicitly ignore those exceptions. I don't what would happen if we'd try to forward them instead, but since MachExceptionHandler
does not do that particular check then I guess we might be trying to forward an exception meant for someone else. Last but not least it seems that MachExceptionHandler
only listens for one exception, it doesn't wait in a loop. This might not be related to this particular crash but it seems odd to me.
Assignee | ||
Comment 15•2 years ago
|
||
PageProtectingVector
has been unused since bug 1342023.
MemoryProtectionExceptionHandler
was only used to annotate crashes affecting
LifoAlloc
memory.
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Comment 16•2 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #8)
(@smichaud, it's great to see you around here! Thank you for keeping an eye on things.)
Yup, still puttering around :-)
Comment 17•2 years ago
|
||
I'm able to reliably reproduce this on an M1 with macOS 13.0 Beta with Nightly when clicking the Allow button for location services. For example, on maps.google.com when clicking the button to go to your current location.
Crash:
https://crash-stats.mozilla.org/report/index/9df78665-a296-4a71-8ed4-878100220622
13-inch, M1, 2020
macOS 13.0 Beta (22A5266r)
Nightly 103.0a1 (2022-06-22) (64-bit)
https://hg.mozilla.org/mozilla-central/rev/0242545b34ca3f3290c68496c2e921ddfdf5cdc3
Comment 18•2 years ago
•
|
||
(In reply to comment #17)
Your STR doesn't work for me, on either Apple Silicon (a 2020 Mac Mini) or Intel (a VMware Fusion VM), using either the current FF release (101.0.1) or today's mozilla-central nightly.
By the way, a new macOS 13.0 beta has just been released. It'll be interesting to see if that makes a difference here.
Edit:
Crash:
https://crash-stats.mozilla.org/report/index/9df78665-a296-4a71-8ed4-878100220622
Note that this is partially corrupt -- the lowest line is wrong.
Comment 19•2 years ago
•
|
||
Having WIFI disabled seems to trigger this. All the crashes I've hit have been with WIFI disabled and I haven't been able to reproduce with WIFI enabled.
The machine is connected to a USB-C hub for ethernet, power, and external monitor.
It is still reproducible with the new Beta update today (22A5286j).
Another crash report from Nightly in js::MachExceptionHandler:
https://crash-stats.mozilla.org/report/index/5df1780c-0227-4246-a97a-1c3f10220622
On Firefox Beta, the crash is different, hitting on the Wifi Monitor
thread:
https://crash-stats.mozilla.org/report/index/fe4f3613-376f-4b15-ac3a-2de240220622
Comment 20•2 years ago
•
|
||
Interesting, me too.
I have an Ethernet connection (to a WiFi base station) on my 2020 Mac Mini, but I'd forgotten to disable WiFi (even though I wasn't using it). When I did, your STR started working.
Comment 21•2 years ago
•
|
||
I also crash with the FF release (101.0.1), but the crash stack is corrupt, and very weird:
bp-5d724d3d-ebf9-46c0-8956-40a550220622
[@ EMPTY: no crashing thread identified; MissingThreadList ]
Edit: I get the same kind of crash as you on FF 102.0 Beta 9.
Comment 22•2 years ago
•
|
||
This is the signature for the crashes on the WiFi Monitor
thread.
Edit: Unfortunately, a lot of the ___chkstk_darwin
crash stacks don't match this bug (and are corrupt) -- possibly all those not on macOS 13.
Comment 23•2 years ago
|
||
Additionally, with WIFI disabled and a non-debug local build with ac_add_options --disable-optimize
, I can't reproduce the problem.
Comment 24•2 years ago
|
||
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8873669c6014 Remove MemoryProtectionExceptionHandler and PageProtectingVector. r=nbp
Assignee | ||
Comment 25•2 years ago
|
||
This exception handler was only enabled for MOZ_DIAGNOSTIC_ASSERT_ENABLED
builds, so Nightly and early beta IIRC. It sounds like there's an unrelated WiFi monitor issue we still need to fix.
Comment 26•2 years ago
|
||
bugherder |
Comment 27•2 years ago
|
||
The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox102
towontfix
.
For more information, please visit auto_nag documentation.
Comment 28•2 years ago
|
||
The underlying issue here is probably a bug in the WiFi monitor, triggered by a behavior change in macOS 13. Jan's patch presumably won't fix it. I'll spin off a new bug to cover it.
Assignee | ||
Comment 29•2 years ago
|
||
(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #27)
The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?
The next merge is in a few days and this code is disabled on release and late beta, so there's no need to uplift at this point.
Assignee | ||
Updated•2 years ago
|
Comment 30•2 years ago
|
||
(In reply to Haik Aftandilian [:haik] from comment #23)
Additionally, with WIFI disabled and a non-debug local build with
ac_add_options --disable-optimize
, I can't reproduce the problem.
Please try ac_add_options --disable-jemalloc
and let us know your results. I'll try that, too.
See bug 1776210 comment #5 for context.
Comment 31•2 years ago
|
||
(Following up comment #30)
I still crash with jemalloc disabled. See bug 1776210 comment #7.
Comment 32•2 years ago
|
||
(Following up comment #30)
But I don't crash with optimization disabled. See bug 1776210 comment #8.
Updated•2 years ago
|
Description
•