<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 4

•

3 years ago

I haven't either. I've now been using FF 101.0 and 101.0.1 for a couple of days on both Intel and Apple Silicon hardware.

bp-011b7d0d-160c-42fd-8851-268c50220612

Comment 5

•

3 years ago

Crash stacks for macOS 13 Beta build 22A5266r (the latest and so far only released macOS 13 beta) are now symbolicated. I manually scraped its symbols and sent them to Gabriele Svelto. I'll keep doing that for subsequent macOS 13 betas.

    0  XUL  js::MachExceptionHandler()  js/src/ds/MemoryProtectionExceptionHandler.cpp:652  context
    1  XUL  js::detail::ThreadTrampoline<void (&)(), >::Start(void*)  js/src/threading/Thread.h:209  cfi
    2  libsystem_pthread.dylib  _pthread_start   cfi
    3  libsystem_pthread.dylib  thread_start   cfi

Nicolas B. Pierron [:nbp]

Comment 6

•

3 years ago

Note that the procedure for manual symbol scraping is different on macOS 13. See bug 1661771 comment #23.

Comment 7

•

3 years ago

Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.

There is another spike of Mac ARM64 crashes recently, with Jan changes on JIT frames, but not on amd64 as mentionned in comment 1.

Flags: needinfo?(sphink)

Comment 8

•

3 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #7)

Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.

This is MachExceptionHandler, which seems like it has a much wider range of uses.

There is another spike of Mac ARM64 crashes recently, with Jan changes on JIT frames, but not on amd64 as mentionned in comment 1.

For this crash, it would be really helpful to know what the error message is, since there are many ways that mach_msg can fail. This could be an OOM, for example. Something like:

MACH_CRASH_UNSAFE_PRINTF("MachExceptionHandler: Failed to forward to the previous handler: %s", mach_error_string(ret));

would be really helpful here (though it'll require data review).

(@smichaud, it's great to see you around here! Thank you for keeping an eye on things.)

Flags: needinfo?(sphink)

Comment 9

•

3 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #8)

(In reply to Nicolas B. Pierron [:nbp] from comment #7)

Steve, any idea what this might be related too?
I would have thought that we were no longer using the MemoryProtectionHandler any more.

This is MachExceptionHandler, which seems like it has a much wider range of uses.

...or not. I think you're right, this is only installed from the memory protection stuff. Hm, do we need this now? I'll look further.

Comment 10

•

3 years ago

I'm wondering if this is really a crash in WebRenderCommandBuilder::Destroy that is getting caught by the exception handling thread and failing there. It could even be the same as bug 1759481?

Comment 11

•

3 years ago

If so, this could be indicative of a crash reporting issue, if MOZ_CRASH is getting overshadowed by the failure here. gsvelto, do you know if that's possible?

Flags: needinfo?(gsvelto)

Comment 12

•

3 years ago

Attached file Bug 1773584 - report mach error when mach_msg fails — Details

Gabriele Svelto [:gsvelto]

Comment 13

•

3 years ago

From looking at the blame annotations, I think Jan might be the best to answer the question of whether we need this memory protection handler at all now.

Flags: needinfo?(jdemooij)

Comment 14

•

3 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #11)

If so, this could be indicative of a crash reporting issue, if MOZ_CRASH is getting overshadowed by the failure here. gsvelto, do you know if that's possible?

It's possible that the message was not forwarded correctly (i.e. Breakpad's exception handler ultimately rejects it) but here we're not waiting for a reply, just sending the message (mach_msg(..., MACH_SEND_MSG, ...)) so the crash reporter's reply shouldn't matter and shouldn't affect the return code. That being said I've seen odd things happening in the exception handler though only rarely: sometimes we get an exception with a msgh_id which doesn't match what we expect. Additionally sometimes we get an exception that's not meant for the target process (i.e. in the message task.name != mach_task_self()). We explicitly ignore those exceptions. I don't what would happen if we'd try to forward them instead, but since MachExceptionHandler does not do that particular check then I guess we might be trying to forward an exception meant for someone else. Last but not least it seems that MachExceptionHandler only listens for one exception, it doesn't wait in a loop. This might not be related to this particular crash but it seems odd to me.

Flags: needinfo?(gsvelto)

Assignee

Comment 15

•

3 years ago

Attached file Bug 1773584 - Remove MemoryProtectionExceptionHandler and PageProtectingVector. r?nbp! — Details

PageProtectingVector has been unused since bug 1342023.

MemoryProtectionExceptionHandler was only used to annotate crashes affecting
LifoAlloc memory.

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → jdemooij

Status: NEW → ASSIGNED

Assignee

Updated

•

3 years ago

Flags: needinfo?(jdemooij)

Crash:
https://crash-stats.mozilla.org/report/index/9df78665-a296-4a71-8ed4-878100220622

Comment 16

•

3 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #8)

(@smichaud, it's great to see you around here! Thank you for keeping an eye on things.)

Yup, still puttering around :-)

Haik Aftandilian [:haik]

Comment 17

•

3 years ago

Attached image Crash on Clicking Allow — Details

I'm able to reliably reproduce this on an M1 with macOS 13.0 Beta with Nightly when clicking the Allow button for location services. For example, on maps.google.com when clicking the button to go to your current location.

13-inch, M1, 2020
macOS 13.0 Beta (22A5266r)
Nightly 103.0a1 (2022-06-22) (64-bit)
https://hg.mozilla.org/mozilla-central/rev/0242545b34ca3f3290c68496c2e921ddfdf5cdc3

Comment 18

•

3 years ago

•

Edited

(In reply to comment #17)

Your STR doesn't work for me, on either Apple Silicon (a 2020 Mac Mini) or Intel (a VMware Fusion VM), using either the current FF release (101.0.1) or today's mozilla-central nightly.

By the way, a new macOS 13.0 beta has just been released. It'll be interesting to see if that makes a difference here.

Edit:

Crash:
https://crash-stats.mozilla.org/report/index/9df78665-a296-4a71-8ed4-878100220622

Note that this is partially corrupt -- the lowest line is wrong.

Haik Aftandilian [:haik]

Comment 19

•

3 years ago

•

Edited

Having WIFI disabled seems to trigger this. All the crashes I've hit have been with WIFI disabled and I haven't been able to reproduce with WIFI enabled.

The machine is connected to a USB-C hub for ethernet, power, and external monitor.

It is still reproducible with the new Beta update today (22A5286j).

Another crash report from Nightly in js::MachExceptionHandler:
https://crash-stats.mozilla.org/report/index/5df1780c-0227-4246-a97a-1c3f10220622

On Firefox Beta, the crash is different, hitting on the Wifi Monitor thread:
https://crash-stats.mozilla.org/report/index/fe4f3613-376f-4b15-ac3a-2de240220622

bp-bd01dc0b-338c-4a67-b037-ddd350220622

Comment 20

•

3 years ago

•

Edited

Interesting, me too.

I have an Ethernet connection (to a WiFi base station) on my 2020 Mac Mini, but I'd forgotten to disable WiFi (even though I wasn't using it). When I did, your STR started working.

bp-5d724d3d-ebf9-46c0-8956-40a550220622

Comment 21

•

3 years ago

•

Edited

I also crash with the FF release (101.0.1), but the crash stack is corrupt, and very weird:

[@ EMPTY: no crashing thread identified; MissingThreadList ]

Edit: I get the same kind of crash as you on FF 102.0 Beta 9.

Comment 22

•

3 years ago

•

Edited

This is the signature for the crashes on the WiFi Monitor thread.

Edit: Unfortunately, a lot of the ___chkstk_darwin crash stacks don't match this bug (and are corrupt) -- possibly all those not on macOS 13.

Crash Signature: [@ js::MachExceptionHandler] → [@ js::MachExceptionHandler] [@ ___chkstk_darwin ]

Haik Aftandilian [:haik]

Comment 23

•

3 years ago

Additionally, with WIFI disabled and a non-debug local build with ac_add_options --disable-optimize, I can't reproduce the problem.

Pulsebot

Comment 24

•

3 years ago

Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8873669c6014 Remove MemoryProtectionExceptionHandler and PageProtectingVector. r=nbp

https://hg.mozilla.org/mozilla-central/rev/8873669c6014

Assignee

Comment 25

•

3 years ago

This exception handler was only enabled for MOZ_DIAGNOSTIC_ASSERT_ENABLED builds, so Nightly and early beta IIRC. It sounds like there's an unrelated WiFi monitor issue we still need to fix.

bszekely

Comment 26

•

3 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox103: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 103 Branch

BugBot [:suhaib / :marco/ :calixte]

Comment 27

•

3 years ago

The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?

If yes, please nominate the patch for beta approval.
If no, please set status-firefox102 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jdemooij)

Comment 28

•

3 years ago

The underlying issue here is probably a bug in the WiFi monitor, triggered by a behavior change in macOS 13. Jan's patch presumably won't fix it. I'll spin off a new bug to cover it.

Crash Signature: [@ js::MachExceptionHandler] [@ ___chkstk_darwin ] → [@ js::MachExceptionHandler]

Updated

•

3 years ago

Comment 29

•

3 years ago

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #27)

The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?

The next merge is in a few days and this code is disabled on release and late beta, so there's no need to uplift at this point.

status-firefox102: affected → wontfix

Flags: needinfo?(jdemooij)

Assignee

Updated

•

3 years ago

status-firefox102: wontfix → disabled

Comment 30

•

3 years ago

(In reply to Haik Aftandilian [:haik] from comment #23)

Additionally, with WIFI disabled and a non-debug local build with ac_add_options --disable-optimize, I can't reproduce the problem.

Please try ac_add_options --disable-jemalloc and let us know your results. I'll try that, too.

See bug 1776210 comment #5 for context.

Comment 31

•

3 years ago

(Following up comment #30)

I still crash with jemalloc disabled. See bug 1776210 comment #7.