Closed Bug 1773584 Opened 2 years ago Closed 2 years ago

[macOS 13] Crash in [@ js::MachExceptionHandler] at "Failed to forward to the previous handler!"

Categories

(Core :: JavaScript Engine, defect)

All
macOS
defect

Tracking

()

RESOLVED FIXED
103 Branch
Tracking Status
firefox-esr91 --- disabled
firefox-esr102 --- disabled
firefox101 --- disabled
firefox102 --- disabled
firefox103 --- fixed

People

(Reporter: mccr8, Assigned: jandem)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(3 files)

Crash report: https://crash-stats.mozilla.org/report/index/ea2539e9-e74d-4796-8a4f-e6a360220609

MOZ_CRASH Reason: MOZ_CRASH(MachExceptionHandler: Failed to forward to the previous handler!)

Top 4 frames of crashing thread:

0 XUL js::MachExceptionHandler js/src/ds/MemoryProtectionExceptionHandler.cpp:561
1 XUL js::detail::ThreadTrampoline<void  js/src/threading/Thread.h:209
2 libsystem_pthread.dylib libsystem_pthread.dylib@0x000000000000728c 
3 libsystem_pthread.dylib libsystem_pthread.dylib@0x000000000000728c 

I'm filing this in the JS engine because that's the origin of this code, but it looks like all of the crashes with this signature are on macOS 13, so there might be something interesting there.

Flags: needinfo?(spohl.mozilla.bugs)
Summary: Crash in [@ js::MachExceptionHandler] → Crash in [@ js::MachExceptionHandler] at "Failed to forward to the previous handler!"

Crashes are on both arm64 and amd64.

OS: Unspecified → macOS
Hardware: Unspecified → All
Summary: Crash in [@ js::MachExceptionHandler] at "Failed to forward to the previous handler!" → [macOS 13] Crash in [@ js::MachExceptionHandler] at "Failed to forward to the previous handler!"
Blocks: 1773708

At this time, macOS Ventura isn't available to me as part of Apple's pre-release program. Keeping n-i set to follow up on this.

I was able to download macOS Ventura, but I have not run into this particular crash so far.

Flags: needinfo?(spohl.mozilla.bugs)

I haven't either. I've now been using FF 101.0 and 101.0.1 for a couple of days on both Intel and Apple Silicon hardware.

Crash stacks for macOS 13 Beta build 22A5266r (the latest and so far only released macOS 13 beta) are now symbolicated. I manually scraped its symbols and sent them to Gabriele Svelto. I'll keep doing that for subsequent macOS 13 betas.

bp-011b7d0d-160c-42fd-8851-268c50220612

    0  XUL  js::MachExceptionHandler()  js/src/ds/MemoryProtectionExceptionHandler.cpp:652  context
    1  XUL  js::detail::ThreadTrampoline<void (&)(), >::Start(void*)  js/src/threading/Thread.h:209  cfi
    2  libsystem_pthread.dylib  _pthread_start   cfi
    3  libsystem_pthread.dylib  thread_start   cfi

Note that the procedure for manual symbol scraping is different on macOS 13. See bug 1661771 comment #23.

Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.

There is another spike of Mac ARM64 crashes recently, with Jan changes on JIT frames, but not on amd64 as mentionned in comment 1.

Flags: needinfo?(sphink)

(In reply to Nicolas B. Pierron [:nbp] from comment #7)

Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.

This is MachExceptionHandler, which seems like it has a much wider range of uses.

There is another spike of Mac ARM64 crashes recently, with Jan changes on JIT frames, but not on amd64 as mentionned in comment 1.

For this crash, it would be really helpful to know what the error message is, since there are many ways that mach_msg can fail. This could be an OOM, for example. Something like:

MACH_CRASH_UNSAFE_PRINTF("MachExceptionHandler: Failed to forward to the previous handler: %s", mach_error_string(ret));

would be really helpful here (though it'll require data review).

(@smichaud, it's great to see you around here! Thank you for keeping an eye on things.)

Flags: needinfo?(sphink)

(In reply to Steve Fink [:sfink] [:s:] from comment #8)

(In reply to Nicolas B. Pierron [:nbp] from comment #7)

Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.

This is MachExceptionHandler, which seems like it has a much wider range of uses.

...or not. I think you're right, this is only installed from the memory protection stuff. Hm, do we need this now? I'll look further.

I'm wondering if this is really a crash in WebRenderCommandBuilder::Destroy that is getting caught by the exception handling thread and failing there. It could even be the same as bug 1759481?

See Also: → 1759481

If so, this could be indicative of a crash reporting issue, if MOZ_CRASH is getting overshadowed by the failure here. gsvelto, do you know if that's possible?

Flags: needinfo?(gsvelto)

From looking at the blame annotations, I think Jan might be the best to answer the question of whether we need this memory protection handler at all now.

Flags: needinfo?(jdemooij)

(In reply to Steve Fink [:sfink] [:s:] from comment #11)

If so, this could be indicative of a crash reporting issue, if MOZ_CRASH is getting overshadowed by the failure here. gsvelto, do you know if that's possible?

It's possible that the message was not forwarded correctly (i.e. Breakpad's exception handler ultimately rejects it) but here we're not waiting for a reply, just sending the message (mach_msg(..., MACH_SEND_MSG, ...)) so the crash reporter's reply shouldn't matter and shouldn't affect the return code. That being said I've seen odd things happening in the exception handler though only rarely: sometimes we get an exception with a msgh_id which doesn't match what we expect. Additionally sometimes we get an exception that's not meant for the target process (i.e. in the message task.name != mach_task_self()). We explicitly ignore those exceptions. I don't what would happen if we'd try to forward them instead, but since MachExceptionHandler does not do that particular check then I guess we might be trying to forward an exception meant for someone else. Last but not least it seems that MachExceptionHandler only listens for one exception, it doesn't wait in a loop. This might not be related to this particular crash but it seems odd to me.

Flags: needinfo?(gsvelto)

PageProtectingVector has been unused since bug 1342023.

MemoryProtectionExceptionHandler was only used to annotate crashes affecting
LifoAlloc memory.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)

(In reply to Steve Fink [:sfink] [:s:] from comment #8)

(@smichaud, it's great to see you around here! Thank you for keeping an eye on things.)

Yup, still puttering around :-)

Attached image Crash on Clicking Allow

I'm able to reliably reproduce this on an M1 with macOS 13.0 Beta with Nightly when clicking the Allow button for location services. For example, on maps.google.com when clicking the button to go to your current location.

Crash:
https://crash-stats.mozilla.org/report/index/9df78665-a296-4a71-8ed4-878100220622

13-inch, M1, 2020
macOS 13.0 Beta (22A5266r)
Nightly 103.0a1 (2022-06-22) (64-bit)
https://hg.mozilla.org/mozilla-central/rev/0242545b34ca3f3290c68496c2e921ddfdf5cdc3

(In reply to comment #17)

Your STR doesn't work for me, on either Apple Silicon (a 2020 Mac Mini) or Intel (a VMware Fusion VM), using either the current FF release (101.0.1) or today's mozilla-central nightly.

By the way, a new macOS 13.0 beta has just been released. It'll be interesting to see if that makes a difference here.

Edit:

Crash:
https://crash-stats.mozilla.org/report/index/9df78665-a296-4a71-8ed4-878100220622

Note that this is partially corrupt -- the lowest line is wrong.

Having WIFI disabled seems to trigger this. All the crashes I've hit have been with WIFI disabled and I haven't been able to reproduce with WIFI enabled.

The machine is connected to a USB-C hub for ethernet, power, and external monitor.

It is still reproducible with the new Beta update today (22A5286j).

Another crash report from Nightly in js::MachExceptionHandler:
https://crash-stats.mozilla.org/report/index/5df1780c-0227-4246-a97a-1c3f10220622

On Firefox Beta, the crash is different, hitting on the Wifi Monitor thread:
https://crash-stats.mozilla.org/report/index/fe4f3613-376f-4b15-ac3a-2de240220622

Interesting, me too.

I have an Ethernet connection (to a WiFi base station) on my 2020 Mac Mini, but I'd forgotten to disable WiFi (even though I wasn't using it). When I did, your STR started working.

bp-bd01dc0b-338c-4a67-b037-ddd350220622

I also crash with the FF release (101.0.1), but the crash stack is corrupt, and very weird:

bp-5d724d3d-ebf9-46c0-8956-40a550220622

[@ EMPTY: no crashing thread identified; MissingThreadList ]

Edit: I get the same kind of crash as you on FF 102.0 Beta 9.

This is the signature for the crashes on the WiFi Monitor thread.

Edit: Unfortunately, a lot of the ___chkstk_darwin crash stacks don't match this bug (and are corrupt) -- possibly all those not on macOS 13.

Crash Signature: [@ js::MachExceptionHandler] → [@ js::MachExceptionHandler] [@ ___chkstk_darwin ]

Additionally, with WIFI disabled and a non-debug local build with ac_add_options --disable-optimize, I can't reproduce the problem.

Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8873669c6014
Remove MemoryProtectionExceptionHandler and PageProtectingVector. r=nbp

This exception handler was only enabled for MOZ_DIAGNOSTIC_ASSERT_ENABLED builds, so Nightly and early beta IIRC. It sounds like there's an unrelated WiFi monitor issue we still need to fix.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 103 Branch

The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox102 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jdemooij)

The underlying issue here is probably a bug in the WiFi monitor, triggered by a behavior change in macOS 13. Jan's patch presumably won't fix it. I'll spin off a new bug to cover it.

Crash Signature: [@ js::MachExceptionHandler] [@ ___chkstk_darwin ] → [@ js::MachExceptionHandler]

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #27)

The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?

The next merge is in a few days and this code is disabled on release and late beta, so there's no need to uplift at this point.

Flags: needinfo?(jdemooij)

(In reply to Haik Aftandilian [:haik] from comment #23)

Additionally, with WIFI disabled and a non-debug local build with ac_add_options --disable-optimize, I can't reproduce the problem.

Please try ac_add_options --disable-jemalloc and let us know your results. I'll try that, too.

See bug 1776210 comment #5 for context.

(Following up comment #30)

I still crash with jemalloc disabled. See bug 1776210 comment #7.

(Following up comment #30)

But I don't crash with optimization disabled. See bug 1776210 comment #8.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: