Poison crash in [@ objc_msgSend | -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]]
Categories
(Core :: Widget: Cocoa, defect, P2)
Tracking
()
People
(Reporter: mccr8, Assigned: bradwerth)
References
Details
(4 keywords, Whiteboard: [tbird crash][adv-main133+r][adv-esr128.5+r])
Crash Data
Attachments
(7 files, 4 obsolete files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
175.14 KB,
text/plain
|
Details | |
38.53 KB,
text/plain
|
Details | |
2.09 KB,
text/plain
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-esr128+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/1d1458ba-0b39-4471-b7be-1076d0240215
Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Top 10 frames of crashing thread:
0 libobjc.A.dylib objc_msgSend
1 AppKit -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]
2 AppKit __58-[_NSTrackingAreaAKManager _activeTrackingAreasNeedUpdate]_block_invoke
3 AppKit ___NSMainRunLoopPerformBlockInModes_block_invoke
4 CoreFoundation __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__
5 CoreFoundation __CFRunLoopDoBlocks
6 CoreFoundation __CFRunLoopRun
7 CoreFoundation CFRunLoopRunSpecific
8 HIToolbox RunCurrentEventLoopInMode
9 HIToolbox ReceiveNextEventCommon
Solid volume of these. I couldn't find the string "activetracking" in our codebase anywhere, so I'm not sure what this is from. They are all on various releases of MacOS 14, so not super old, but only on 14.2 and older so not the latest versions.
Comment 1•2 years ago
|
||
worrying that it seems to be getting more frequent even though it involves old versions of the OS -- seems to imply we've changed something on our end. Or, I suppose, that some popular web site did something to tickle this bug.
This might be an unactionable bug in macOS
Updated•1 years ago
|
Reporter | ||
Comment 2•1 years ago
|
||
NSTrackingArea is related to mouse and cursor tracking events. We do have a few references to that in our code base.
Comment 3•1 years ago
|
||
The severity field for this bug is set to S3. However, the bug is flagged with the sec-high
keyword.
:spohl, could you consider increasing the severity of this security bug?
For more information, please visit BugBot documentation.
Updated•1 years ago
|
Reporter | ||
Comment 5•1 years ago
|
||
The comments seem to talk about specific user actions, and it seems tricky to trigger even then, so I'll reduce this to sec-moderate. It would still be nice to fix, as the volume is quite high for a poisoning crash.
(In reply to Stephen A Pohl [:spohl] from comment #2)
(In reply to Steven Michaud [:smichaud] (Retired) from comment #1)
Someone please look for URLs and useful comments from this bug's crash data, then post them here. I'm not a Mozilla employee, so I'm not able to access this information myself.
"Closing last private tab", "closing a tab", "closing a tab within a private window" seem to be the most interesting comments. There are also two comments about the crash occurring when watching a youtube video, one of the two mentions youtube in fullscreen. Could this maybe be triggered by clicking the "x" button on a tab but holding the mouse down, initiating a tab drag and then releasing the mouse?
Comment 6•1 years ago
•
|
||
Could this maybe be triggered by clicking the "x" button on a tab but holding the mouse down, initiating a tab drag and then releasing the mouse?
I tried this in a window (private and standard) and a tab -- sometimes in the last window (or tab). No crashes, and no ASAN errors.
These crashes still happen on macOS 14.4 (which came out last Thursday), so it's not that Apple has fixed the underlying bug. It's still too soon to know, though, whether or not 14.4 will reduce the volume of these crashes. Edit: It hasn't.
Running my ASAN build I got a bunch of malloc: nano zone abandoned due to inability to reserve vm space
errors. I was able to get rid of these by using MallocNanoZone=0 ./mach run
. Their absence made no difference.
Comment 7•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Mac on release
:spohl, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 8•1 year ago
|
||
I just stumbled upon this while doing nightly crash triage. This looks like a regression to me as the crashes seem to have a specific starting date when restricting the query for nightly builds: buildid 20240201095346. There's a few macOS-specific changes on our side in that timeframe with bug 1865372 being the most suspicious, given it deals with destroying windows. Here's several comments from these crashes that point to closing a window being the culprit here:
I closed a couple of tabs of a window using Cmd+W, and Firefox crashed just after I closed the last tab in the window.
Closing last private tab
I had a YouTube video popped out, and I closed the popped out window (pressed X on the popped out window) leading to a crash.
After logging out from ‘Google Docs, I changed a Google Docs spreadsheet in another tab and then clicked close on the tab. It asked whether I was sure because it would not store the change. I was sure and closed Firefox. Then I was informed about the crash which had happened.
From the looks of it this is affecting several hundred users per day, we might want to prioritize fixing it.
Assignee | ||
Comment 10•1 year ago
|
||
I'll see if I can replicate.
Assignee | ||
Comment 11•1 year ago
|
||
Bug 1891608 is possibly relevant.
Assignee | ||
Comment 12•1 year ago
|
||
I think crash rates are going to zero with the landing of Bug 1891608. I'll wait another two weeks and then resolve this Bug.
Assignee | ||
Comment 13•1 year ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #12)
I think crash rates are going to zero with the landing of Bug 1891608. I'll wait another two weeks and then resolve this Bug.
This is fixed.
Comment 14•1 year ago
|
||
I'm pretty sure this bug isn't fixed.
As best I can tell, all beta releases on the 127 channel have the fix for bug 1891608. But there are still quite a few of this bug's crashes in them:
Until FF 127 is released, we won't be able to tell whether or not the frequency of this bug's crashes has been reduced.
Assignee | ||
Comment 15•1 year ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #14)
I'm pretty sure this bug isn't fixed.
As best I can tell, all beta releases on the 127 channel have the fix for bug 1891608. But there are still quite a few of this bug's crashes in them:
Until FF 127 is released, we won't be able to tell whether or not the frequency of this bug's crashes has been reduced.
Sorry, I misunderstood the crash data. I'll keep thinking about what might be happening here, and I'll look at new crash stacks in builds with patches for Bug 1891608 applied to see what has changed, if anything.
Assignee | ||
Comment 16•1 year ago
|
||
Okay, new theory. nsCocoaWindow
has a property accessor to get a tracking area view, and calls it when adding the tracking area, and calls it again during deallocation. That works fine as long as the tracking view accessor provides the same value at both call times. But it might not! So the fix will be to also store the view that was giving the tracking area, and then remove the tracking area from that same view later. Patch coming shortly.
Assignee | ||
Comment 17•1 year ago
|
||
It is important to remove the NSTrackingArea before the nsCocoaWindow is
deallocated. This patch ensures that we remember which NSView is holding
the tracking area and remove the tracking area from that view during
dealloc (as well as whenever the tracking area is updated).
Since the trackingAreaView
property accessor returns a value based on
contentView
or its superview, manipulations of either view will change
the return value of the accessor. We can't rely on it being consistent
throughout the lifetime of the window. This change ensures that we will
always remove the tracking area from the view that added it.
Updated•1 year ago
|
Comment 18•1 year ago
|
||
Comment 19•1 year ago
|
||
Comment 20•1 year ago
|
||
The patch landed in nightly and beta is affected.
:bradwerth, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox128
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 21•1 year ago
|
||
I don't want to uplift, because I first want to confirm it fixes the issue, and that will take some time. I think this can ride the trains.
Assignee | ||
Updated•1 year ago
|
Comment 22•1 year ago
•
|
||
The 129 branch has been patched for a week (starting with the 20240619095752 build). In that week there've been five of this bug's crashes, three of them in builds with the patch. It's pretty clear the patch hasn't fixed these crashes.
bp-b65bcf12-1bdc-4b9e-8e09-4947b0240628
bp-8852db2e-a87a-49b0-966e-8dae90240627
bp-4c6752b2-fc17-403a-a0a8-3961e0240624
Comment 23•1 year ago
•
|
||
These crashes happen almost exclusively on macOS 14, and have not diminished in recent minor versions of it. Though they started on the 124 branch, this is at least partly an Apple bug, and Mozilla will probably not be able to do anything about it. Let's report it to Apple and see what they do. Haik? :-)
Assignee | ||
Comment 24•1 year ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #22)
The 129 branch has been patched for a week (starting with the 20240619095752 build). In that week there've been five of this bug's crashes, three of them in builds with the patch. It's pretty clear the patch hasn't fixed these crashes.
Sorry for the premature closing of the Bug. Let's re-open and keep it open while we try things.
Comment hidden (obsolete) |
Comment 26•1 year ago
•
|
||
Gabriele, on the off chance that OCLP is involved here, could you check the crash reports' boot args?
Comment 27•1 year ago
•
|
||
(Following up comment #25)
Oops, I got my previous analysis badly wrong. I'll try again:
For what it's worth, these crashes happen when -[NSView(NSTrackingArea) trackingAreas]
is called on a deleted NSView
object. I found this out by looking at the AppKit framework's -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]
in a disassembler.
In objc_msgSend()
the self
argument (arg0
) isn't invalid. But since it now points to a block of poisoned memory, self->isa
(r10
) is invalid (== 0x000065e5e5e5e5e0
).
Assignee | ||
Comment 28•1 year ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #27)
(Following up comment #25)>
For what it's worth, these crashes happen when-[NSView(NSTrackingArea) trackingAreas]
is called on a deletedNSView
object.
This sounds very plausible. I think our whole approach to tracking areas is fragile. We set a tracking area on a BaseWindow
content view's superview, and clear the tracking area when the window itself is destroyed. It would be conceptually cleaner if whatever view that was would instead set its own tracking area and remove it upon destruction. I'll see if I can untangle our calling patterns here and change to a more direct pattern.
Comment hidden (obsolete) |
Comment 30•1 year ago
•
|
||
I discovered something interesting with my HookCase hook library:
All the stack traces for calls to -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]
look like this. I'm not able to reproduce any traces that look like the ones recorded for this bug's crashes.
(hook.dylib) -[NSObject(_NSTrackingAreaAKManagerSwizzling) _NSTrackingAreaAKManager__updateActiveTrackingAreasForWindowLocation:modifierFlags:] + 0x87
(AppKit) -[_NSTrackingAreaAKManager _mouseMoved:] + 0x1d3
(AppKit) _routeMouseMovedEvent + 0x126
(AppKit) -[NSWindow(NSEventRouting) _reallySendEvent:isDelayedEvent:] + 0x17d
(AppKit) -[NSWindow(NSEventRouting) sendEvent:] + 0x159
(XUL) -[ToolbarWindow sendEvent:] + 0x5c
(AppKit) routeMouseMovedEvent + 0xb4
(AppKit) -[NSApplication(NSEventRouting) sendEvent:] + 0x68e
(XUL) -[GeckoNSApplication sendEvent:] + 0x8a
(AppKit) -[NSApplication _handleEvent:] + 0x41
(AppKit) -[NSApplication run] + 0x280
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x658
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x76
(firefox) main + 0x22f
(dyld) start + 0x796
I don't know what this means. I'll keep digging.
Edit: No wait, I am able to reproduce stacks like this bug's crash stacks by double-clicking on a link in a page. I'll redo my ASAN build and see if doing that there triggers something interesting.
Edit: No luck with the ASAN build :-( I'll keep digging.
Comment 31•1 year ago
|
||
Possibly a coincidence, but I just looked at about ten crash reports and all have RawCamera
in their modules list.
Comment 32•1 year ago
|
||
Only four crashes have the mac_boot_args
field populated, one is -v watchdog=0 agdpmod=pikera alcid=11 -wegnoigpu -radcodec npci=0x2000 revpatch=sbvmm,cpuname
and the others -arm64e_preview_abi
.
Comment 33•1 year ago
|
||
Thanks! So OCLP isn't involved here.
Comment 34•1 year ago
|
||
There are several kinds of deleted NSView
objects that might be involved here, but only two of them are created and managed by Mozilla code: PixelHostingView
and ChildView
. I'll be looking through the code for any changes, relevant to these objects, on the 124 branch. With luck I'll find what triggered these crashes.
Comment 35•1 year ago
•
|
||
Actually, very few changes landed on the 124 branch in nsChildView.mm
or nsCocoaWindow.mm
.
Bug 1833814 landed a "change in nsIWidget's cursor logic", and seems unlikely to be relevant here.
Bug 1865372 and bug 1877749 seem much more likely to be relevant -- particularly bug 1865372. Brad, you wrote both of these patches. Could you go over them again and look for problems? Could we back out one or both of them for a while on mozilla-central, to see if that makes this bug's crashes go away?
(Following up comment #8) Gabriele, I now notice that you also suspect bug 1865372 might have regressed this bug.
Updated•1 year ago
|
Comment 36•1 year ago
|
||
(Following up comment #35)
Specifically, Brad, I wonder if this bug's crashes will disappear if you comment out the call to DestroyNativeWindow()
in nsCocoaWindow::Destroy()
.
Assignee | ||
Comment 37•1 year ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #36)
(Following up comment #35)
Specifically, Brad, I wonder if this bug's crashes will disappear if you comment out the call to
DestroyNativeWindow()
innsCocoaWindow::Destroy()
.
I find it credible that the changes in Bug 1865372 triggered these crashes, but those changes solved some problems also, so I don't want to back them out. Perhaps we can get all the value if the call to DestroyNativeWindow
is deferred to the next event loop. I'll think about it some more and build a patch that does this.
Comment 38•1 year ago
•
|
||
It was never my intention to have you back out your patch for bug 1865372 permanently. I just wanted to see if backing it out for a short while (say a week) got rid of this bug's crashes. They're so plentiful that a week would be enough to tell. The results of this test, either way, would give you more information on how to fix this bug's crashes.
But if you'd prefer, go ahead and try to postpone the call to DestroyNativeWindow()
from nsCocoaWindow::Destroy()
. That may be enough to fix things.
Comment 39•1 year ago
•
|
||
I've been trying to use my hook library to trigger these crashes, but without success. Not even (indirectly) calling nsCocoaWindow::Destroy
and nsCocoaWindow::DestroyNativeWindow()
right before a call to [NSView traceEvents]
(with exactly the same call stack as this bug's crashes) does the trick. It's possible I'm doing something wrong -- that I'm not calling these entirely as they normally would be called. But I think that's unlikely.
So it's possible this bug isn't caused by calls to nsCocoaWindow::DestroyNativeWindow()
, premature or otherwise. And it may be worthwhile to back the entire patch for bug 1865372 out for a week, and see what happens. If you do this, I suggest waiting until mozilla-central moves to the 130 branch. That's due to happen soon, and we wouldn't want the backout to get into a beta.
See comment #41.
Assignee | ||
Comment 40•1 year ago
|
||
Enough has changed in nsCocoaWindow.cpp
since the landing of Bug 1865372 that a pure backout isn't even possible. A backout would become a selective merge/removal. We can first try to remove the call to DestroyNativeWindow
(reverting the issue that the landing of Bug 1865372 resolved) and see how that affects these crashes.
Comment 41•1 year ago
•
|
||
(Following up comment #39)
I've kept digging, and have found some new information. But I still haven't figured out this bug's crashes.
They happen at several different offsets in -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]
. Two of them involve calls to -[NSView traceEvents]
or -[NSView superview]
on several different kinds of NSView
objects. But the most common involves a call to -[NSWindow windowNumber]
on a ToolbarWindow
object.
All objects that can be involved in these crashes are deallocated on macOS system calls to "observers" or "source0 sources" (ultimately from CFRunLoopRunSpecific()
), on later iterations of the run loop than the one where their retain count reached zero. In principle it's possible for objects to be deallocated prematurely if certain "observers" or "sources" are run prematurely (not in their correct order). I can use my hook library to trigger this kind of behavior, but doing this hasn't yet triggered this bug's crashes.
Thread contention might be involved. The calls to -[NSView traceEvents]
and -[NSView superview]
that can trigger crashes usually happen on the same object. If this object survives the first call and only crashes on the second, that means it's being deallocated while -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]
is running. And that can only be happening on a secondary thread. But though CFRunLoopRunSpecific()
can (and does) call "observers" and "sources" on secondary threads, I've never seen it do deallocations (on objects involved in this bug's crashes) anywhere but the main thread.
So there's strong evidence that these crashes are truly weird, and an Apple bug. But Mozilla might still be able to do something about them.
Brad: Once mozilla-central
is on the 130 branch, please do land a trial patch that removes the call to nsCocoaWindow::DestroyNativeWindow()
from nsCocoaWindow::Destroy()
. A week on the trunk should be enough to tell us whether or not this gets rid of this bug's crashes. If it does, you can try something like CFRunLoopTimerCreate()
and CFRunLoopAddTimer()
to postpone nsCocoaWindow::DestroyNativeWindow()
to a later iteration of the run loop.
Note, though, that after you remove nsCocoaWindow::DestroyNativeWindow()
from nsCocoaWindow::Destroy()
, nsCocoaWindow::DestroyNativeWindow()
will still be called, from nsCocoaWindow::~nsCocoaWindow()
, on a later iteration of the run loop. It may be tricky to get your "timer" to run before these calls. And it may not even be necessary -- first check that the trial patch does indeed regress bug 1865372.
Comment 42•1 year ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #41)
first check that the trial patch does indeed regress bug 1865372.
Yes, it does :-( I found out by using my hook library to emulate the trial patch in today's mozilla-central nightly.
I'll play around with CFRunLoopTimerCreate()
and CFRunLoopAddTimer()
to see if I can get around this.
Assignee | ||
Comment 43•1 year ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #41)
Brad: Once
mozilla-central
is on the 130 branch, please do land a trial patch that removes the call tonsCocoaWindow::DestroyNativeWindow()
fromnsCocoaWindow::Destroy()
. A week on the trunk should be enough to tell us whether or not this gets rid of this bug's crashes. If it does, you can try something likeCFRunLoopTimerCreate()
andCFRunLoopAddTimer()
to postponensCocoaWindow::DestroyNativeWindow()
to a later iteration of the run loop.
Good to know that delaying the call to DestroyNativeWindow()
will fix this (edit: we haven't discerned this yet); too bad about regressing Bug 1865372. I've built a local patch that pushes the call to a later iteration of the event loop using NS_DispatchToCurrentThread
, but that had a visual side effect that is unacceptable: the window disappears, reappears, then disappears again. I'm trying to figure out what is causing that, and I hope to get a patch up here soon.
Comment 44•1 year ago
•
|
||
I've found a way to delay the call to nsCocoaWindow::DestroyNativeWindow()
without regressing bug 1865372. The code looks something like this:
static void nsCocoaWindow_DestroyNativeWindow_callback(CFRunLoopTimerRef timer, void *info)
{
nsCocoaWindow *thisptr = (nsCocoaWindow *) info;
thisptr->DestroyNativeWindow();
NS_IF_RELEASE(thisptr);
}
void nsCocoaWindow::Destroy()
{
...
if (mWindow && mWindowMadeHere) {
CancelAllTransitions();
//DestroyNativeWindow();
CFRunLoopTimerContext timer_context = CFRunLoopTimerContext();
NS_ADDREF_THIS();
timer_context.info = this;
CFRunLoopTimerRef timer =
CFRunLoopTimerCreate(NULL, CFAbsoluteTimeGetCurrent() + 0.1, 0, 0, 0,
nsCocoaWindow_DestroyNativeWindow_callback, &timer_context);
CFRunLoopAddTimer(CFRunLoopGetCurrent(), timer, kCFRunLoopCommonModes);
CFRelease(timer);
}
}
Of course I don't know if this will also fix this bug's crashes. We really need to land a trial patch, for a week, that only removes the call to nsCocoaWindow::DestroyNativeWindow()
from nsCocoaWindow::Destroy()
. This will, of course, regress bug 1865372. But it's the only way we'll be able to tell if we're on the right path. If this very simple trial patch does eliminate this bug's crashes, we can start trying possible solutions that don't regress bug 1865372. Each, in turn, should be left in place for another week, to see what effect it has on the crashes.
Edit: I think this code doesn't leak timer
. And since timer
is a one-shot, I don't think you need to call CFRunLoopRemoveTimer()
afterwards. But I'm not completely sure. I based my code on the machine code for -[NSObject(NSDelayedPerforming) performSelector:withObject:afterDelay:inModes:]
in the Foundation framework.
Edit: On second thought it probably does leak timer
. As I remember it, Apple code uses automatic reference counting, but Mozilla's doesn't. I've corrected the above code accordingly (after I tried out my correction in my hook library and didn't see any crashes). The call to CFRunLoopAddTimer()
presumably adds a reference to timer
, which presumably gets released when it's invalidated after being used once.
Edit: Yes, I did it right. Here's confirmation. A one-shot timer is automatically invalidated after it runs once.
Edit: Oops, forgot to AddRef()
and Release()
thisptr
. Now corrected.
Comment 45•1 year ago
|
||
As of today's mozilla-central nightly, the trunk is now on the 130 branch.
Assignee | ||
Comment 46•1 year ago
|
||
This is a temporary measure. We would like to destroy the native window
here, but we want to see the effect of removal on crash volume.
Comment 47•1 year ago
|
||
For what it's worth, I tried deleting objects related to this bug's crashes on com.apple.NSEventThread
(using a CFRunLoopTimer
in my hook library). I do crash, but the crashes in no way resemble this bug's crashes. This doesn't rule out that this bug's crashes are caused by thread contention. But it does eliminate one particular kind of thread contention from consideration.
I'll keep digging.
Comment 48•1 year ago
|
||
Comment 49•1 year ago
|
||
Backed out for causing OSX bc failures on browser_ext_windows_create_params.js
Comment 50•1 year ago
|
||
(In reply to Norisz Fay [:noriszfay] from comment #49)
Backed out for causing OSX bc failures on browser_ext_windows_create_params.js
Given that the patch changes how a window is closed, it's odd that the test failure has to do with a window that's just opened. Could the failure be spurious?
Updated•1 year ago
|
Assignee | ||
Comment 51•1 year ago
•
|
||
I haven't figured out the test failure yet, but I haven't given up on this Bug yet, either. This Bug remains a priority due to its sec-moderate rating and topcrash status.
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 52•1 year ago
|
||
The crash happens on macOS 15.x as well; not fixed in the new version
bp-50ce7b58-ce50-44d4-971f-aac180240729
bp-9c62d11e-065a-4ec8-b68f-a98f10240725
Ignoring the 1% null crashes, there are two distinct addresses that correspond to the CPU type
amd64 crashes at 0x000065e5e5e5e5f8
arm64 crashes at 0x0065e5e5e5e5e5f0
The arm64 crashes I looked at have the 0xe5e5e5e5e5e5e5e5 poison address in x14, and then 0x0065e5e5e5e5e5e0 in registers x15 and x16. The crashing address is that + 0x10.
amd64 doesn't have as many registers. I always see 0x000065e5e5e5e5e0 in r10 and the instruction is and r11d, dword [r10 + 0x18]
(explaining the "f8" at the end). In 2 of the 5 crashes I looked at 0xe5e5e5e5e5e5e5e5 was in rax. Why would that be inconsistent?
I don't know what kind of operation would stomp a consistent 00006 or 006 prefix on top of a value that gets used as a pointer to an object. I don't know arm64 chips and if particular registers are used for consistent things, but the intel registers that are typically used as base pointers (rbp, rsp) aren't anywhere near that value.
Comment 53•1 year ago
|
||
The bug is marked as tracked for firefox130 (nightly). However, the bug still has low severity.
:gcp, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Comment 54•1 year ago
•
|
||
(In reply to Daniel Veditz [:dveditz] from comment #52)
I don't know what kind of operation would stomp a consistent 00006 or 006 prefix on top of a value that gets used as a pointer to an object. I don't know arm64 chips and if particular registers are used for consistent things, but the intel registers that are typically used as base pointers (rbp, rsp) aren't anywhere near that value.
It happens in objc_msgSend(id self, SEL op, ...)
and friends.
All Objective-C objects have Class isa
as their first member:
OBJC_ROOT_CLASS
@interface Object {
Class isa;
}
@end
All these calls derefence self->isa
. But beforehand they mask it with ISA_MASK
(0x007ffffffffffff8ULL
on Apple Silicon and 0x00007ffffffffff8ULL
on Intel). When self
has been deallocated, and its memory poisoned, the value of self->isa
before masking is 0xe5e5e5e5e5e5e5e5
. But after masking it's 0x65e5e5e5e5e5e0
on Apple Silicon and 0x65e5e5e5e5e0
on Intel.
Comment 55•1 year ago
|
||
No, this Apple bug isn't fixed in macOS 14.6 (just released), or the latest macOS 15 beta (Beta 4, build 24A5298h).
Updated•1 year ago
|
Assignee | ||
Comment 56•1 year ago
|
||
I'm going to try something new. This crash has something to do with macOS invoking tracking area methods on dead objects. Our code presently creates and re-creates lots of tracking areas for a window, with one destruction and one creation every time the window is resized. With a different structure, we could instead use only one tracking area that persists for the lifetime of the window. I'm going to provide a patch that does this, and I hope that it will reduce or eliminate this crash. I have no evidence for this, because I can't reproduce the crash, but the patch I think is a strict improvement over current code.
Assignee | ||
Comment 57•1 year ago
|
||
macOS offers an enum, NSTrackingInVisibleRect, which keeps the tracking
area up-to-date with the visible area of the window. This patch adds
this enum to our tracking area at window creation time, and removes the
code that was recreating the tracking area with every window resize.
Assignee | ||
Comment 58•1 year ago
|
||
And if needed, I could put this patch on a new, non-security Bug if that helps mask what we are trying to fix here. This might be particularly helpful if the patch causes problems and needs to be backed out.
Assignee | ||
Comment 59•1 year ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #58)
And if needed, I could put this patch on a new, non-security Bug if that helps mask what we are trying to fix here. This might be particularly helpful if the patch causes problems and needs to be backed out.
I have done this, and opened Bug 1912949 to handle this fixup.
Comment 60•1 year ago
|
||
Comment on attachment 9418935 [details]
Bug 1880582: Let macOS window manager calculate visible rect of the mouse tracking area.
Revision D219087 was moved to bug 1912949. Setting attachment 9418935 [details] to obsolete.
Assignee | ||
Comment 61•1 year ago
|
||
And that patch in Bug 1912949 is also preventing the tracking area from being manipulated during dealloc
, which may be the real issue.
Updated•1 year ago
|
Comment 63•1 year ago
•
|
||
The patch for bug 1912949 doesn't fix this bug's crashes. Build id 20240824092823 (on the 131.0a1 branch) is the first with this patch. Over the last two days, there've already been two crashes in builds containing it:
Comment 64•1 year ago
|
||
We don't know what causes these crashes. Until we do, it's best to stick to the one fact we have -- that the patch for bug 1865372 triggered them. Notice I say "triggered" and not "caused" -- this bug is almost certainly an Apple bug, deep in Apple code. Until we do understand the cause, guesses based on refactoring Mozilla code are just shots in the dark.
After letting it lie fallow for about a month, I'm back to using HookCase hook libraries to find the cause(s). Still no luck, but I have some new ideas. I'll keep trying.
Until I've had better luck, I think it's best to pursue the patch Brad landed in comment #48. That means figuring out the test failure.
Assignee | ||
Comment 65•1 year ago
|
||
Still working on figuring out the test failures with attachment 9411823 [details] -- they don't occur for me locally.
Meanwhile, new theory! I saw in the gtk code a note that BaseWidget::OnDestroy()
may delete the widget itself. In such a case, a nsCocoaWindow
might see this problem because the deallocator is called and then the rest of nsCocoaWindow::Destroy()
would execute -- including the likely-triggering call to DestroyNativeWindow()
. Experimenting with this while I try to work out the issue with the other patch.
Assignee | ||
Comment 66•1 year ago
|
||
Okay, interestingly, moving the call to OnDestroy()
allowed the patch to run without causing errors. Moderate evidence! In order to land this, the patch will contain both changes. Presuming that it works, I think the next step will be to just resume the call to DestroyNativeWindow()
as it is -- no need to delay it.
Comment 67•1 year ago
|
||
Updated•1 year ago
|
Comment 68•1 year ago
•
|
||
Backed out for causing mochitest failures on test_fullscreen-api-race.html
This failure could also be caused by the push. Log here
Assignee | ||
Comment 69•1 year ago
|
||
I'm struggling with this. My attempts to fixup the patch as-is, removing the call to DestroyNativeWindow()
result in failures for various fullscreen tests, which are notoriously finicky. Whereas if I keep the call in, and make some of the other changes that conceptually seem relevant (don't destroy the widget until Destroy()
has finished), tests pass fine. I'm not delighted to make another speculative patch, but that seems like the best move to me. I am going to update the patch and land it after the soft freeze period is over, and we'll see what that does to the crash volume.
To be clear, our fullscreen tests are "finicky" because the DOM does not define an observable event fired when the fullscreen transition is complete. Thus the tests that rapidly progress steps after a fullscreen transition can end up in unexpected states. When we moved to macOS native fullscreen, many of the tests had to be modified to do spin-checks on browser state in order to pass at all. Two of the failing tests, browser_fullscreen-sizemode.js
and test_fullscreen-api-race.html
have not been modified and I don't want to try to fix them up as part of this Bug.
Updated•1 year ago
|
Comment 70•1 year ago
•
|
||
I'm finally making progress finding the true cause of these crashes (the Apple bug). I've got a hook library that crashes mozilla-central nightlies, with this bug's signature, on macOS 14 but not on macOS 13. I still don't understand why the patch for bug 1865372 triggered them, but that may come with time.
Comment 71•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Comment 72•1 year ago
•
|
||
I've got a very simple patch that fixes the crashes triggered by my hook library:
https://treeherder.mozilla.org/jobs?repo=try&revision=439559d8301cad4168bf8fc26ff8388b273203a6
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/EYgoFPu3Q5mynPrHrtPN9w/runs/0/artifacts/public/build/target.dmg
Let's see if Brad's patch works. If not, my patch is clearly the next step to take.
Edit: I did a new tryserver build with a slightly better patch and better comments.
Comment 73•1 year ago
•
|
||
Brad's autoland build is just finishing up. Once that happens I'll try my hook library on it.
Comment 74•1 year ago
•
|
||
My hook library doesn't crash Safari. But interestingly it does crash Chrome (the release version). As best I can tell, though, Google makes it impossible for me to see the crash report, so I can't get a crash stack.
Several minidumps are present in ~/Library/Application Support/Google/Chrome/Crashpad/completed
, but I haven't figured out how to symbolicate them.
Comment 75•1 year ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #73)
Brad's autoland build is just finishing up. Once that happens I'll try my hook library on it.
The autoland build made from Brad's latest patch crashes with my hook library. So it appears not to fix this bug's crashes.
![]() |
||
Comment 76•1 year ago
|
||
Comment 77•1 year ago
•
|
||
Today's mozilla-central nightly, the first build with Brad's new patch, also crashes with my hook library:
bp-f4c685e7-b21a-4982-9115-342d10240904 (but note the odd crash address: 0x2a032a052a33aa23
)
Use the following search link to keep track of crashes without my hook library:
Already got one: bp-d91022a2-c181-4dd3-93b3-b88410240904, without the weird crash address. There will be more.
Here are two:
bp-02640901-22ce-4980-84ea-0e3510240904
bp-104a3dd1-1744-4b4a-bc73-870b90240906
Comment 78•1 year ago
|
||
My tryserver build from comment #72 doesn't regress bug 1865372.
Comment 79•1 year ago
•
|
||
For what it's worth, here's an Apple crash report for a Chromium crash triggered by my hook library. I was only able to get this after building "ungoogled" Chromium as per ungoogled-chromium-macos. An excellent project, by the way. Before I build it, I set symbol-level
to 2
in flags.macos.gn
. This stops symbol-stripping, and allows crash reports to be properly symbolicated.
The crash report is distinctly odd, though clearly recognizable as caused by the same bug that causes this bug's crashes.
Comment 80•1 year ago
•
|
||
OK, I'm going to make a stab at describing the Apple "bug" that causes this bug's crashes. It's actually a design flaw, and is built deeply into macOS (and probably most/all other modern OSes). The problem is that run loops can be nested. And if things happen just right (or wrong), a nested run loop can destroy resources being used in the run loop it's nested on top of, leading to UAF crashes in the "lower" run loop.
As best I can tell there's no general fix. The problem needs to be addressed on a case-by-case basis, whenever it shows up. You need to find the resource that's triggering the crashes, and retain it before nesting a run loop that might destroy it. (It can be released after the nested run loop finishes.)
You might say that you should avoid nesting run loops. But that's not feasible. It's benefits far outweigh its disadvantages. For example, a modal run loop (using in dragging, let's say) might freeze up your app if you can't periodically nest other run loops on top of it.
In this bug's case, the way to address the problem is my patch from comment #72. The resource that's being destroyed in a "rogue" nested run loop is the (native) main window. The workaround is to hold a reference to this native window until its Mozilla counterpart (and owner) is destroyed -- until the nsCocoaWindow destructor is called.
It's just luck that Safari doesn't crash. It's not that Apple was aware of these crashes and secretly worked around them. It just happens to hold a reference to the main window long enough to prevent them from happening.
Chromium (and Google Chrome) crashes on accessing not the main window itself, but one of its NSView components (WebContentsViewCocoa
). It, too, already holds a reference to the main window. But not, apparently, to all of its components.
Mozilla's apps (Firefox and Thunderbird) do crash on macOS 13 (and presumably also on lower versions), but with much lower frequency than on macOS 14 and 15. I'm not sure why. Chrome, as best I can tell, crashes with equal frequency on macOS 14 and 13. Google, of course, doesn't publish its crash stats. But my hook library "works" on Chrome just as easily on macOS 13 as on macOS 14.
My hook library deliberately nests a "rogue" run loop -- one that closes and destroys the main window. I'm still working on it. But I'll post it when I'm finished.
Updated•1 year ago
|
Assignee | ||
Comment 82•1 year ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #80)
OK, I'm going to make a stab at describing the Apple "bug" that causes this bug's crashes. It's actually a design flaw, and is built deeply into macOS (and probably most/all other modern OSes). The problem is that run loops can be nested. And if things happen just right (or wrong), a nested run loop can destroy resources being used in the run loop it's nested on top of, leading to UAF crashes in the "lower" run loop.
Wonderful analysis, thank you. Seems very plausible that some nested run loop might call DestroyNativeWindow
and that such a thing would cause a problem. I'm not certain that your patch is sufficient and correct because it will retain mWindow
forever if HideWindowChrome
is called. And syntactically it doesn't feel great to have an alloc-then-retain pattern. I will try to build and alternate patch based on your insights -- either something that varies behavior based on which run loop we are in or something that ensures that DestroyNativeWindow
is only called from the main run loop.
Assignee | ||
Comment 83•1 year ago
|
||
Comment 84•1 year ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #80)
And if things happen just right (or wrong), a nested run loop can destroy resources being used in the run loop it's nested on top of, leading to UAF crashes in the "lower" run loop.
That's a bit fuzzy. Can you give a more specific example? Who drops the last reference? And who is holding on to a pointer that is now invalid?
In this bug's case, the way to address the problem is my patch from comment #72. The resource that's being destroyed in a "rogue" nested run loop is the (native) main window. The workaround is to hold a reference to this native window until its Mozilla counterpart (and owner) is destroyed -- until the nsCocoaWindow destructor is called.
If the window is destroyed in the native event loop, that must mean that nobody is holding a strong reference to it anymore. Are you saying that, somewhere higher up on the stack, we still have a pointer to it anyways? Do you know who might have such a pointer? And how does this cause crashes in the tracking area code?
Comment 85•1 year ago
•
|
||
(In reply to Brad Werth [:bradwerth] from comment #82)
I'm not certain that your patch is sufficient and correct because it will retain
mWindow
forever ifHideWindowChrome
is called.
Oops, I forgot about that. I'll revise my patch to deal with it.
And syntactically it doesn't feel great to have an alloc-then-retain pattern.
I don't really see anything wrong with it. Feel free to come up with a new patch. But the basic problem is very simple, and so should the solution be. Don't encrust it with other stuff that may actually cause it to miss its goal.
I'll answer your questions in another comment. (Actually they were Markus' questions.)
Comment 86•1 year ago
•
|
||
(In reply to Markus Stange [:mstange] from comment #84)
(In reply to Steven Michaud [:smichaud] (Retired) from comment #80)
And if things happen just right (or wrong), a nested run loop can destroy resources being used in the run loop it's nested on top of, leading to UAF crashes in the "lower" run loop.
That's a bit fuzzy. Can you give a more specific example? Who drops the last reference? And who is holding on to a pointer that is now invalid?
I don't really know. That's what happens with non-reproducible crashes. At best you'll come up with a plausible way to reproduce them, which isn't necessarily exactly how they happen.
In this bug's case, the way to address the problem is my patch from comment #72. The resource that's being destroyed in a "rogue" nested run loop is the (native) main window. The workaround is to hold a reference to this native window until its Mozilla counterpart (and owner) is destroyed -- until the nsCocoaWindow destructor is called.
If the window is destroyed in the native event loop, that must mean that nobody is holding a strong reference to it anymore. Are you saying that, somewhere higher up on the stack, we still have a pointer to it anyways?
No.
And how does this cause crashes in the tracking area code?
There are two places in -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]
where we crash with any frequency. One is on a call to -[NSWindow windowNumber]
on the main window. The other is a call to -[NSView trackingArea]
on an NSView
object associated with the main window. In trying to figure out this bug, I looked for ways to reproduce exactly these crashes. Mostly I failed. But I finally found one. More details in my next comment.
Comment 87•1 year ago
|
||
Don't forget: These crashes aren't a Mozilla bug. We weren't doing anything wrong. Or at least if we were, it didn't cause these crashes.
So the solution to them isn't to "fix" Mozilla code. It's to find the least objectionable workaround.
Comment 88•1 year ago
•
|
||
(Following up comment #86)
The external call nearest to these two crash locations (of any complexity) is one to-[NSArray enumerateObjectsWithOptions:usingBlock:]
whose block->invoke
method is ___collectTrackingAreasForTargetAndWinLoc_block_invoke(block_literal *block, id obj, NSUInteger idx, BOOL *stop)
. This NSArray
method calls out to a block. So why not, after the block has returned, call out to another block that closes the main window? I did this, and it "worked". Including the results from both Firefox and Chromium, it reproduces this bug's crashes exactly. In Firefox the crashes happen on the call to -[NSWindow windowNumber]
. In Chromium they happen on the call to -[NSView trackingArea]
.
I won't go into detail about how I pulled this off. You can look at the source code for my hook library after I post it (which should be pretty soon now).
This told me what I already suspected -- that the solution is to postpone the main window's deallocation. If you look at my hook library's logging, you see that nsCocoaWindow's destructor is called quite "late". And in any case that's the last chance we have to intervene. So why not deallocate it there? And when I do that, my hook library's evil, NSWindow
eating block no longer crashes Firefox.
I have to deal with the HideWindowChrome
complication. But presuming I can, this still seems the best solution.
Comment 89•1 year ago
|
||
We haven't reported this to Apple yet because we didn't have a clear root cause / explanation. Do we have that now? Is there a particular nested run loop occurrence in Apple code we could point to? Thanks for the continued debugging, Steven.
Comment 90•1 year ago
•
|
||
(In reply to Haik Aftandilian [:haik] from comment #89)
We haven't reported this to Apple yet because we didn't have a clear root cause / explanation. Do we have that now? Is there a particular nested run loop occurrence in Apple code we could point to? Thanks for the continued debugging, Steven.
I asked you to report this to Apple when I thought it was an Apple bug. Now I think this kind of crash is something inherent to run loop nesting, and that there's nothing Apple can (or should) do about it. I can't prove it, but I suspect it's impossible to come up with a general/universal solution. App developers need to seek solutions on a case-by-case basis, which will be different according to the different designs of their apps' event handling.
So I'd vote against opening a bug on this with Apple.
Comment 91•1 year ago
•
|
||
I've started to play around with the "HideWindowChrome()
complication", but I can't find any way to trigger a call to HideWindowChrome()
.
Help! Can anyone tell me?
Edit: Even the "hidechrome" tests don't trigger calls to HideWindowChrome()
. I searched in https://searchfox.org on "hidechrome".
Comment 92•1 year ago
|
||
I've revised my patch do deal with HideWindowChrome()
:
https://treeherder.mozilla.org/jobs?repo=try&revision=931b7eca3a561be240a95ffeeb0de2a2596bb61a
https://hg.mozilla.org/try/rev/4faee9c7f7bdbe8200776921a46f3c285e3aa3a3
I still haven't figured out an easy way to trigger calls to HideWindowChrome()
. But I have noticed that it's called when you run reftests -- in fact while the reftest harness is initialized. (For example by doing ./mach reftest layout/tools/reftest/reftest.xhtml
.) This happens when a BorderlessWindow
is created -- an NSWindow
subclass used for all WindowType::Invisible
and WindowType::Child
windows, and some WindowType::Popup
ones. I'm not sure if these can be created using ordinary, unprivileged JavaScript, but I'll keep trying.
In the meantime I'll continue testing with ./mach reftest layout/tools/reftest/reftest.xhtml
.
I'm still working on my hook library. But I should be able to post it soon --possibly later today.
Updated•1 year ago
|
Comment 93•1 year ago
|
||
Here's the hook library I've been testing with, as a patch on https://github.com/steven-michaud/HookCase/blob/v8.0.1/HookLibraryTemplate/hook.mm.
Comment 94•1 year ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #90)
(In reply to Haik Aftandilian [:haik] from comment #89)
We haven't reported this to Apple yet because we didn't have a clear root cause / explanation. Do we have that now? Is there a particular nested run loop occurrence in Apple code we could point to? Thanks for the continued debugging, Steven.
I asked you to report this to Apple when I thought it was an Apple bug. Now I think this kind of crash is something inherent to run loop nesting, and that there's nothing Apple can (or should) do about it. I can't prove it, but I suspect it's impossible to come up with a general/universal solution. App developers need to seek solutions on a case-by-case basis, which will be different according to the different designs of their apps' event handling.
So I'd vote against opening a bug on this with Apple.
I'd like to back off from this, a bit. I still doubt it's possible to build a faultless run loop nesting system from first principles. But there may be an Apple bug here, after all. Or at least something for Apple to do.
These crashes aren't generic -- as you'd expect them to be if the problem was just difficulties with run loop nesting. They happen in a very specific place -- after the call to -[NSArray enumerateObjectsWithOptions:usingBlock:]
from -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:]
, whose block's invoke method is ___collectTrackingAreasForTargetAndWinLoc_block_invoke(block_literal *block, id obj, NSUInteger idx, BOOL *stop)
. Also, the crashes only happen when _updateActiveTrackingAreasForWindowLocation
is called from __58-[_NSTrackingAreaAKManager _activeTrackingAreasNeedUpdate]_block_invoke(block_literal *block)
. Normally it isn't called from a block invoke method.
All I'm certain about is this and the fact that these crashes involve a window being closed in a nested run loop. I can't get much further without being able to reproduce the crashes (which will never happen). Apple probably won't be able to reproduce them, either. But they have access to the source code for where the crashes take place, which may include interesting developer comments. They also have access to their internal bug base and crash report base. They may be able to find relevant bugs in their code, or at least a way to work around these crashes specifically.
So yes, do report these crashes to Apple if you think that's appropriate. You'll need to quote liberally from my comments. If you want, I can actually write the report myself and post it here, for you to send to Apple.
By the way, I already showed (in comment #47 above) that these crashes aren't caused by deleting a window on a secondary thread.
Comment 95•1 year ago
•
|
||
(Following up comment #44)
I thought to try this patch again, in a tryserver build. It does fix this bug's crashes (it doesn't crash with my hook library's test crash). But it regresses bug 1865372.
Updated•11 months ago
|
Updated•11 months ago
|
Updated•11 months ago
|
Assignee | ||
Comment 96•11 months ago
|
||
If either of these asserts are triggered, it may be more stable to defer
the call to [mWindow close]
to a later run of the event loop.
Comment 97•11 months ago
|
||
I predict neither of these asserts will ever be triggered.
Assignee | ||
Comment 98•11 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #97)
I predict neither of these asserts will ever be triggered.
I thought that part of your working theory is that the [window close]
call was happening inside a nested run loop. If that's incorrect, what is the current theory?
Comment 99•11 months ago
•
|
||
I assumed both runloops (base and nested) would be the main runloop. I'll test with my hook library and report back.
Comment 100•11 months ago
•
|
||
(Following up comment #99)
In limited testing it turns out that it's always the main run loop that's nested, and never some other run loop. This is true for Firefox (today's mozilla-central nightly), Safari and Chrome. I tested on macOS 14.7.
Here's the hook library I tested with, as a patch on https://github.com/steven-michaud/HookCase/blob/v9.0.0/HookLibraryTemplate/hook.mm.
Comment 101•11 months ago
•
|
||
Here are a few examples of nesting that I found in Firefox (today's mozilla-central nightly):
(Fri Oct 11 09:55:49 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[5213] MainThread[0x119380240] Hook.mm: CFRunLoopRunSpecific(2): mode __kCFPasteboardPrivateMode, seconds 60.000000, returnAfterSourceHandled 1, returning 0x4
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) Hooked_CFRunLoopRunSpecific(__CFRunLoop*, __CFString const*, double, unsigned char) + 0xe6
(CoreFoundation) ___onqueue_CFPasteboardRequestDataFromDaemon_block_invoke.409 + 0x5c
(CoreFoundation) CFPasteboardCopyData + 0x21d
(CarbonCore) CacheFMMapData + 0x1a7
(CarbonCore) IntlFCOpenComponentData + 0x9c
(HIToolbox) InitializeInputSourceCache + 0x128
(HIToolbox) GetInputSourceCacheHeader + 0x6b
(HIToolbox) islcGetInputMethodIndexFromBundleID + 0x60
(HIToolbox) ValidateEnabledInputSourceIDs + 0x1eb
(HIToolbox) CopyInputSourceEnabledPrefs + 0xb7
(HIToolbox) GetInputSourceEnabledPrefsIncludingThirdParty + 0x72
(HIToolbox) isPrefsCreateCacheFromEnabledAndDefaultInputSources + 0x49
(HIToolbox) islGetInputSourceListWithAdditions + 0x10f
(HIToolbox) _CreateKeyboardInputSourcesArray_Mutable + 0x17f
(HIToolbox) _CreateKeyboardInputSourcesArray + 0x11
(HIToolbox) UpdateSourceIndicatorMode + 0x46
(HIToolbox) InitTSMFirstEventTime + 0x2be
(HIToolbox) TISCopyCurrentKeyboardInputSource + 0x12
(XUL) mozilla::widget::TISInputSourceWrapper::InitByCurrentInputSource() + 0x26
(XUL) mozilla::widget::IMEInputHandler::OnCurrentTextInputSourceChange(__CFNotificationCenter*, void*, __CFString const*, void const*, __CFDictionary const*) + 0x49
(XUL) mozilla::widget::TextInputHandler::TextInputHandler(nsChildView*, NSView<mozView>*) + 0x1c1
(XUL) nsChildView::Create(nsIWidget*, mozilla::gfx::IntRectTyped<mozilla::LayoutDevicePixel> const&, mozilla::widget::InitData*) + 0x25f
(XUL) nsBaseWidget::CreateChild(mozilla::gfx::IntRectTyped<mozilla::LayoutDevicePixel> const&, mozilla::widget::InitData&) + 0x52
(XUL) nsView::CreateWidget(nsIWidget*, bool, bool) + 0x132
(XUL) nsDocumentViewer::MakeWindow(nsSize const&, nsView*) + 0x1fb
(XUL) nsDocumentViewer::InitInternal(nsIWidget*, nsISupports*, mozilla::dom::WindowGlobalChild*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, bool, bool, bool) + 0x16d
(XUL) nsDocumentViewer::Init(nsIWidget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::dom::WindowGlobalChild*) + 0x1b
(XUL) nsDocShell::Embed(nsIDocumentViewer*, mozilla::dom::WindowGlobalChild*, bool, bool, nsIRequest*, nsIURI*) + 0x317
(XUL) nsDocShell::CreateAboutBlankDocumentViewer(nsIPrincipal*, nsIPrincipal*, nsIContentSecurityPolicy*, nsIURI*, bool, mozilla::Maybe<nsILoadInfo::CrossOriginEmbedderPolicy> const&, bool, bool, mozilla::dom::WindowGlobalChild*) + 0x364
(XUL) nsAppShellService::JustCreateTopWindow(nsIAppWindow*, nsIURI*, unsigned int, int, int, bool, mozilla::AppWindow**) + 0x362
(XUL) nsAppShellService::CreateHiddenWindow() + 0x17d
(XUL) nsAppStartup::CreateHiddenWindow() + 0x4d
(XUL) XREMain::XRE_mainRun() + 0x4c3
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x775
(Fri Oct 11 09:56:11 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[5213] MainThread[0x119380240] Hook.mm: CFRunLoopRunSpecific(2): mode kCFRunLoopDefaultMode, seconds 0.000000, returnAfterSourceHandled 1, returning 0x4
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) Hooked_CFRunLoopRunSpecific(__CFRunLoop*, __CFString const*, double, unsigned char) + 0xe6
(HIToolbox) RunCurrentEventLoopInMode + 0x124
(HIToolbox) ReceiveNextEventCommon + 0xc9
(HIToolbox) _BlockUntilNextEventMatchingListInModeWithFilter + 0x42
(AppKit) _DPSNextEvent + 0x370
(AppKit) -[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x4f9
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(XUL) nsAppShell::ProcessNextNativeEvent(bool) + 0x29c
(XUL) nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) + 0x190
(XUL) NS_ProcessNextEvent(nsIThread*, bool) + 0xa45
(XUL) nsThreadManager::SpinEventLoopUntilInternal(nsTSubstring<char> const&, nsINestedEventLoopCondition*, mozilla::ShutdownPhase) + 0x3d8
(XUL) NS_InvokeByIndex + 0x8e
(XUL) XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) + 0x816
(XUL) XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) + 0x1bf
(XUL) js::Interpret(JSContext*, js::RunState&) + 0xb957
(XUL) js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) + 0xa52
(XUL) JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) + 0x326
(XUL) nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) + 0x8fb
(XUL) PrepareAndDispatch + 0x320
(XUL) SharedStub + 0x5b
(XUL) NS_InvokeByIndex + 0x8e
(XUL) XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) + 0x816
(XUL) XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) + 0x1bf
(XUL) js::Interpret(JSContext*, js::RunState&) + 0xb957
(XUL) js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) + 0xa52
(XUL) JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) + 0x326
(XUL) nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) + 0x8fb
(XUL) PrepareAndDispatch + 0x320
(XUL) SharedStub + 0x5b
(XUL) nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) + 0x1d1
(XUL) NS_InvokeByIndex + 0x8e
(XUL) XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) + 0x816
(XUL) XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) + 0x1bf
(XUL) js::Interpret(JSContext*, js::RunState&) + 0xb957
(XUL) js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) + 0xa52
(XUL) JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) + 0x24c
(XUL) mozilla::dom::EventListener::HandleEvent(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::Event&, mozilla::ErrorResult&) + 0x431
(XUL) mozilla::EventTargetChainItem::HandleEventTargetChain(nsTArray<mozilla::EventTargetChainItem>&, mozilla::EventChainPostVisitor&, mozilla::EventDispatchingCallback*, mozilla::ELMCreationDetector&) + 0x5b13
(XUL) mozilla::EventDispatcher::Dispatch(mozilla::dom::EventTarget*, nsPresContext*, mozilla::WidgetEvent*, mozilla::dom::Event*, nsEventStatus*, mozilla::EventDispatchingCallback*, nsTArray<mozilla::dom::EventTarget*>*) + 0x28d5
(XUL) nsINode::DispatchEvent(mozilla::dom::Event&, mozilla::dom::CallerType, mozilla::ErrorResult&) + 0x10c
(XUL) mozilla::dom::EventTarget::DispatchEvent(mozilla::dom::Event&, mozilla::ErrorResult&) + 0x2f
(XUL) nsContentUtils::DispatchXULCommand(nsIContent*, bool, mozilla::dom::Event*, mozilla::PresShell*, bool, bool, bool, bool, unsigned short, short) + 0x185
(XUL) nsXULElement::DispatchXULCommand(mozilla::EventChainVisitor const&, nsTAutoStringN<char16_t, 64ul>&) + 0x174
(XUL) nsXULElement::PreHandleEvent(mozilla::EventChainVisitor&) + 0xa8
(XUL) mozilla::EventDispatcher::Dispatch(mozilla::dom::EventTarget*, nsPresContext*, mozilla::WidgetEvent*, mozilla::dom::Event*, nsEventStatus*, mozilla::EventDispatchingCallback*, nsTArray<mozilla::dom::EventTarget*>*) + 0x3422
(XUL) nsINode::DispatchEvent(mozilla::dom::Event&, mozilla::dom::CallerType, mozilla::ErrorResult&) + 0x10c
(XUL) mozilla::dom::EventTarget::DispatchEvent(mozilla::dom::Event&) + 0x2f
(XUL) nsMenuUtilsX::DispatchCommandTo(nsIContent*, unsigned long, short) + 0x139
(AppKit) -[NSApplication(NSResponder) sendAction:to:from:] + 0x151
(AppKit) -[NSMenuItem _corePerformAction] + 0x1c7
(AppKit) _NSMenuPerformActionWithHighlighting + 0xb9
(AppKit) -[NSMenu _performActionForItem:atIndex:fromEvent:] + 0xc5
(AppKit) -[NSMenu performKeyEquivalent:] + 0x1b4
(XUL) -[GeckoNSMenu performSuperKeyEquivalent:] + 0x37
(XUL) nsMenuBarX::PerformKeyEquivalent(NSEvent*) + 0x18
(XUL) nsChildView::SendEventToNativeMenuSystem(NSEvent*) + 0x2b
(XUL) nsChildView::PostHandleKeyEvent(mozilla::WidgetKeyboardEvent*) + 0xfb
(XUL) mozilla::EventStateManager::PostHandleKeyboardEvent(mozilla::WidgetKeyboardEvent*, nsIFrame*, nsEventStatus&) + 0x111
(XUL) mozilla::EventStateManager::PostHandleEvent(nsPresContext*, mozilla::WidgetEvent*, nsIFrame*, nsEventStatus*, nsIContent*) + 0x54e
(XUL) mozilla::PresShell::EventHandler::DispatchEvent(mozilla::EventStateManager*, mozilla::WidgetEvent*, bool, nsEventStatus*, nsIContent*) + 0x12c
(XUL) mozilla::PresShell::EventHandler::HandleEventWithCurrentEventInfo(mozilla::WidgetEvent*, nsEventStatus*, bool, nsIContent*) + 0x21c
(XUL) mozilla::PresShell::EventHandler::HandleEventAtFocusedContent(mozilla::WidgetGUIEvent*, nsEventStatus*) + 0xc2
(XUL) mozilla::PresShell::HandleEvent(nsIFrame*, mozilla::WidgetGUIEvent*, bool, nsEventStatus*) + 0x151
(XUL) nsViewManager::DispatchEvent(mozilla::WidgetGUIEvent*, nsView*, nsEventStatus*) + 0x11c
(XUL) nsView::HandleEvent(mozilla::WidgetGUIEvent*, bool) + 0x91
(XUL) nsChildView::DispatchEvent(mozilla::WidgetGUIEvent*, nsEventStatus&) + 0x66
(XUL) nsBaseWidget::ProcessUntransformedAPZEvent(mozilla::WidgetInputEvent*, mozilla::layers::APZEventResult const&) + 0x87
(XUL) nsBaseWidget::DispatchInputEvent(mozilla::WidgetInputEvent*) + 0xbd
(XUL) mozilla::widget::TextEventDispatcher::DispatchInputEvent(nsIWidget*, mozilla::WidgetInputEvent&, nsEventStatus&) + 0x44
(XUL) mozilla::widget::TextEventDispatcher::DispatchKeyboardEventInternal(mozilla::EventMessage, mozilla::WidgetKeyboardEvent const&, nsEventStatus&, void*, unsigned int, bool) + 0x5c7
(XUL) mozilla::widget::TextEventDispatcher::MaybeDispatchKeypressEvents(mozilla::WidgetKeyboardEvent const&, nsEventStatus&, void*, bool) + 0x94
(XUL) mozilla::widget::TextInputHandler::HandleKeyDownEvent(NSEvent*, unsigned int) + 0x68e
(XUL) -[ChildView keyDown:] + 0x16a
(AppKit) -[NSWindow(NSEventRouting) _reallySendEvent:isDelayedEvent:] + 0x1ad
(AppKit) -[NSWindow(NSEventRouting) sendEvent:] + 0x159
(XUL) -[ToolbarWindow sendEvent:] + 0x5c
(AppKit) -[NSApplication(NSEventRouting) sendEvent:] + 0x5b0
(XUL) -[GeckoNSApplication sendEvent:] + 0x5e
(AppKit) -[NSApplication _handleEvent:] + 0x41
(AppKit) -[NSApplication run] + 0x280
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x72e
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x775
(Fri Oct 11 09:55:57 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[5213] MainThread[0x119380240] Hook.mm: CFRunLoopRunSpecific(2): mode com.apple.run-loop-mode.view-bridge.blocks, seconds 5.999978, returnAfterSourceHandled 1, returning 0x4
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) Hooked_CFRunLoopRunSpecific(__CFRunLoop*, __CFString const*, double, unsigned char) + 0xe6
(HIToolbox) -[IMKInputSessionXPCInvocation runPrivateRunloopWithTimeLimit_0_innerRunLoop0:] + 0x66
(HIToolbox) -[IMKInputSessionXPCInvocation invocationAwaitXPCReply] + 0xeb
(HIToolbox) -[IMKInputSession activate] + 0x787
(HIToolbox) -[IMKInputSession activateAfterServerConnection] + 0x7d
(HIToolbox) -[IMKClient fulfillServerDependentWork] + 0x2d8
(HIToolbox) __87+[IMKInputSession IMKXPCPerformBlockOnMainThreadInMode:performHow:callerCmd:workBlock:]_block_invoke + 0x67
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__ + 0xc
(CoreFoundation) __CFRunLoopDoBlocks + 0x18e
(CoreFoundation) __CFRunLoopRun + 0x3ac
(CoreFoundation) CFRunLoopRunSpecific + 0x22d
(hook.dylib) Hooked_CFRunLoopRunSpecific(__CFRunLoop*, __CFString const*, double, unsigned char) + 0x87
(HIToolbox) RunCurrentEventLoopInMode + 0x124
(HIToolbox) ReceiveNextEventCommon + 0xc9
(HIToolbox) _BlockUntilNextEventMatchingListInModeWithFilter + 0x42
(AppKit) _DPSNextEvent + 0x370
(AppKit) -[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x4f9
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) -[NSApplication run] + 0x25b
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x72e
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x775
Updated•11 months ago
|
Assignee | ||
Comment 102•11 months ago
|
||
...and release it in DestroyNativeWindow. This separates
responsibilities in the way we intend, where Destroy is responsible for
clsoing the window and DestroyNativeWindow is responsible for more
general memory teardown.
Comment 103•11 months ago
|
||
Brad, can you do a tryserver build with your patch? I'd like to test it with my hook library from comment #93.
Assignee | ||
Comment 104•11 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #103)
Brad, can you do a tryserver build with your patch? I'd like to test it with my hook library from comment #93.
Try build in progress at https://treeherder.mozilla.org/jobs?repo=try&revision=e6abb831026c02bfc47c1217c3b37ced7309d737. I'm hoping this will not affect the browser chrome tests, avoiding the problem we see when trying to remove the call to DestroyNativeWindow()
from Destroy()
.
Comment 105•10 months ago
•
|
||
According to my tests, Brad's latest patch (from comment #102) is likely to fix (i.e. work around) this bug's crashes. I was unable to use my hook library from comment #93 to crash the tryserver build from comment #104. It doesn't leak the window being destroyed (either its XUL nsCocoaWindow object or its native ToolbarWindow object).
I tested on macOS 14.7.
My hook library does still crash today's mozilla-central nightly (bp-fdf67909-c2f8-4fcb-b186-4b9950241013).
Edit: Brad's tryserver build also doesn't regress bug 1865372.
Assignee | ||
Comment 106•10 months ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #105)
According to my tests, Brad's latest patch (from comment #102) is likely to fix (i.e. work around) this bug's crashes.
Given that, a minimal fix is likely just adding [[mWindow retain] autorelease]
before [mWindow close]
in DestroyNativeWindow()
. That's the working part of this patch. I'm going to try to revise D225423 to work better because I think this is the right time to clarify the relative roles of Destroy()
and DestroyNativeWindow()
.
Updated•10 months ago
|
Assignee | ||
Comment 107•10 months ago
|
||
Comment 108•10 months ago
|
||
![]() |
||
Comment 109•10 months ago
|
||
Updated•10 months ago
|
Updated•10 months ago
|
Comment 110•10 months ago
•
|
||
Bad news :-( Today's mozilla-central nightly still crashes using my hook library from comment #93 (bp-190ac722-3730-4be2-b7a7-6144f0241017). I'm going to try to figure out why.
Keep track of crashes with the patch from comment #107 by using the following search (suitably updated). So far there aren't any not triggered by my hook library. But that doesn't mean too much -- mozilla-central has very few users.
This bug will probably need to be reopened.
Comment 111•10 months ago
•
|
||
Here's Brad's patch from comment #107. The problem is that [[mWindow retain] autorelease]
doesn't keep mWindow
alive "long enough" after mWindow
is closed. (A nested run loop can release autoreleased objects "early".) One way around this is my patch from comment #92. Yes, I know you guys don't like it. But to fix this bug (to work around its crashes) you're going to need to do something equivalent.
# HG changeset patch
# User Brad Werth <bwerth@mozilla.com>
# Date 1729118503 0
# Node ID 884289b29239b94fb06f257548502fcb195b5471
# Parent b8a61a1c09b8dcfa001d3e0d2f1279bd78a2ba71
Bug 1880582: Hold an autorelease reference to mWindow while releasing it. r=mstange
Differential Revision: https://phabricator.services.mozilla.com/D225554
diff --git a/widget/cocoa/nsCocoaWindow.mm b/widget/cocoa/nsCocoaWindow.mm
--- a/widget/cocoa/nsCocoaWindow.mm
+++ b/widget/cocoa/nsCocoaWindow.mm
@@ -164,17 +164,26 @@ void nsCocoaWindow::DestroyNativeWindow(
Show(false);
[mWindow removeTrackingArea];
[mWindow releaseJSObjects];
// We want to unhook the delegate here because we don't want events
// sent to it after this object has been destroyed.
mWindow.delegate = nil;
+
+ // We might be in an embedded run loop which is still using mWindow.
+ // Hold an extra reference as an autorelease to cover these cases.
+ [[mWindow retain] autorelease];
+
+ // Closing the window will also release it, though our extra retain
+ // above will keep it alive through the event loop.
+ MOZ_ASSERT(mWindow.releasedWhenClosed);
[mWindow close];
+
mWindow = nil;
[mDelegate autorelease];
NS_OBJC_END_TRY_IGNORE_BLOCK;
}
nsCocoaWindow::~nsCocoaWindow() {
NS_OBJC_BEGIN_TRY_IGNORE_BLOCK;
Comment 112•10 months ago
•
|
||
(Following up comment #111)
A nested run loop can release autoreleased objects "early".
Here's an example of this from my hook library's log of one of these crashes, with a different object (not an NSWindow object):
(Thu Oct 17 11:42:54 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[77846] MainThread[0x1078cc240] Hook.mm: [NSObject dealloc]: self NSView(0x13889e400)
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) -[NSObject(MethodSwizzling) NSObject_dealloc] + 0x8c
(AppKit) -[NSResponder dealloc] + 0x1b1
(AppKit) -[NSView dealloc] + 0x95
(AppKit) -[NSViewController dealloc] + 0xc5
(AppKit) -[NSViewController release] + 0x79
(libobjc.A.dylib) AutoreleasePoolPage::releaseUntil(objc_object**) + 0xb9
(libobjc.A.dylib) objc_autoreleasePoolPop + 0xeb
(CoreFoundation) _CFAutoreleasePoolPop + 0x16
(CoreFoundation) __CFRunLoopPerCalloutARPEnd + 0x27
(CoreFoundation) __CFRunLoopDoBlocks + 0x1ac
(CoreFoundation) __CFRunLoopRun + 0x385
(CoreFoundation) CFRunLoopRunSpecific + 0x22d
(CoreFoundation) CFRunLoopRun + 0x28
(hook.dylib) after_enumerateWithOptions(objc_object*, unsigned long, void (objc_object*, unsigned long, signed char*) block_pointer) + 0x42b
(hook.dylib) -[NSObject(NSArraySwizzling) __NSArrayM_enumerateObjectsWithOptions:usingBlock:] + 0x5d
(AppKit) -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:] + 0x6f3
(hook.dylib) -[NSObject(_NSTrackingAreaAKManagerSwizzling) _NSTrackingAreaAKManager__updateActiveTrackingAreasForWindowLocation:modifierFlags:] + 0xba
(AppKit) __58-[_NSTrackingAreaAKManager _activeTrackingAreasNeedUpdate]_block_invoke + 0x13e
(hook.dylib) Hooked_activeTrackingAreasNeedUpdate_block_invoke(_activeTrackingAreasNeedUpdate_literal*) + 0xba
(AppKit) ___NSMainRunLoopPerformBlockInModes_block_invoke + 0x19
(hook.dylib) Hooked____NSMainRunLoopPerformBlockInModes_block_invoke(_perform_block_literal*) + 0x74
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__ + 0xc
(CoreFoundation) __CFRunLoopDoBlocks + 0x18e
(CoreFoundation) __CFRunLoopRun + 0x385
(CoreFoundation) CFRunLoopRunSpecific + 0x22d
(HIToolbox) RunCurrentEventLoopInMode + 0x124
(HIToolbox) ReceiveNextEventCommon + 0xc9
(HIToolbox) _BlockUntilNextEventMatchingListInModeWithFilter + 0x42
(AppKit) _DPSNextEvent + 0x370
(AppKit) -[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x4f9
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) -[NSApplication run] + 0x25b
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x72e
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x775
Comment 113•10 months ago
•
|
||
(Following up comment #111)
And here's the NSWindow object itself being deleted "early", after the block in which it was closed has finished running:
(Thu Oct 17 11:42:54 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[77846] MainThread[0x1078cc240] Hook.mm: [NSObject dealloc]: self ToolbarWindow(0x138c7c100)
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) -[NSObject(MethodSwizzling) NSObject_dealloc] + 0x8c
(AppKit) -[NSResponder dealloc] + 0x1b1
(AppKit) -[NSWindow dealloc] + 0xb4c
(XUL) -[BaseWindow dealloc] + 0x52
(XUL) -[ToolbarWindow dealloc] + 0x7a
(Foundation) _NSKVOPerformWithDeallocatingObservable + 0x97
(Foundation) NSKVODeallocate + 0x96
(hook.dylib) __destroy_helper_block_e8_32o40r48r + 0x46
(libsystem_blocks.dylib) _call_dispose_helpers_excp + 0x25
(libsystem_blocks.dylib) _Block_release + 0xe8
(CoreFoundation) __CFRunLoopDoBlocks + 0x1b4
(CoreFoundation) __CFRunLoopRun + 0x385
(CoreFoundation) CFRunLoopRunSpecific + 0x22d
(CoreFoundation) CFRunLoopRun + 0x28
(hook.dylib) after_enumerateWithOptions(objc_object*, unsigned long, void (objc_object*, unsigned long, signed char*) block_pointer) + 0x42b
(hook.dylib) -[NSObject(NSArraySwizzling) __NSArrayM_enumerateObjectsWithOptions:usingBlock:] + 0x5d
(AppKit) -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:] + 0x6f3
(hook.dylib) -[NSObject(_NSTrackingAreaAKManagerSwizzling) _NSTrackingAreaAKManager__updateActiveTrackingAreasForWindowLocation:modifierFlags:] + 0xba
(AppKit) __58-[_NSTrackingAreaAKManager _activeTrackingAreasNeedUpdate]_block_invoke + 0x13e
(hook.dylib) Hooked_activeTrackingAreasNeedUpdate_block_invoke(_activeTrackingAreasNeedUpdate_literal*) + 0xba
(AppKit) ___NSMainRunLoopPerformBlockInModes_block_invoke + 0x19
(hook.dylib) Hooked____NSMainRunLoopPerformBlockInModes_block_invoke(_perform_block_literal*) + 0x74
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__ + 0xc
(CoreFoundation) __CFRunLoopDoBlocks + 0x18e
(CoreFoundation) __CFRunLoopRun + 0x385
(CoreFoundation) CFRunLoopRunSpecific + 0x22d
(HIToolbox) RunCurrentEventLoopInMode + 0x124
(HIToolbox) ReceiveNextEventCommon + 0xc9
(HIToolbox) _BlockUntilNextEventMatchingListInModeWithFilter + 0x42
(AppKit) _DPSNextEvent + 0x370
(AppKit) -[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x4f9
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) -[NSApplication run] + 0x25b
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x72e
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x775
Comment 114•10 months ago
•
|
||
(Following up comment #111)
Here's the NSWindow object being closed (the one that, after it's deleted, triggers the crash on a call to -[NSWindow windowNumber]
). This happens before the events in comment #112 and comment #113.
(Thu Oct 17 11:42:54 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[77846] MainThread[0x1078cc240] Hook.mm: [NSWindow close]: self <ToolbarWindow: 0x138c7c100>
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) -[NSWindow(MethodSwizzling) NSWindow_close] + 0x8f
(XUL) nsCocoaWindow::DestroyNativeWindow() + 0x95
(XUL) nsCocoaWindow::Destroy() + 0x89
(XUL) mozilla::AppWindow::Destroy() + 0x1e5
(XUL) mozilla::AppWindow::RequestWindowClose(nsIWidget*) + 0x1ec
(XUL) mozilla::AppWindow::WidgetListenerDelegate::RequestWindowClose(nsIWidget*) + 0x1c
(XUL) -[WindowDelegate windowShouldClose:] + 0x29
(AppKit) __19-[NSWindow __close]_block_invoke + 0xc6
(AppKit) -[NSWindow __close] + 0x1ad
(AppKit) -[NSApplication(NSResponder) sendAction:to:from:] + 0x151
(AppKit) -[NSControl sendAction:to:] + 0x56
(AppKit) __26-[NSCell _sendActionFrom:]_block_invoke + 0x83
(AppKit) -[NSCell _sendActionFrom:] + 0xab
(AppKit) -[NSButtonCell _sendActionFrom:] + 0x60
(AppKit) __29-[NSButtonCell performClick:]_block_invoke + 0x40
(AppKit) -[NSButtonCell performClick:] + 0x24d
(hook.dylib) invocation function for block in after_enumerateWithOptions(objc_object*, unsigned long, void (objc_object*, unsigned long, signed char*) block_pointer) + 0x5f
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__ + 0xc
(CoreFoundation) __CFRunLoopDoBlocks + 0x18e
(CoreFoundation) __CFRunLoopRun + 0x385
(CoreFoundation) CFRunLoopRunSpecific + 0x22d
(CoreFoundation) CFRunLoopRun + 0x28
(hook.dylib) after_enumerateWithOptions(objc_object*, unsigned long, void (objc_object*, unsigned long, signed char*) block_pointer) + 0x42b
(hook.dylib) -[NSObject(NSArraySwizzling) __NSArrayM_enumerateObjectsWithOptions:usingBlock:] + 0x5d
(AppKit) -[_NSTrackingAreaAKManager _updateActiveTrackingAreasForWindowLocation:modifierFlags:] + 0x6f3
(hook.dylib) -[NSObject(_NSTrackingAreaAKManagerSwizzling) _NSTrackingAreaAKManager__updateActiveTrackingAreasForWindowLocation:modifierFlags:] + 0xba
(AppKit) __58-[_NSTrackingAreaAKManager _activeTrackingAreasNeedUpdate]_block_invoke + 0x13e
(hook.dylib) Hooked_activeTrackingAreasNeedUpdate_block_invoke(_activeTrackingAreasNeedUpdate_literal*) + 0xba
(AppKit) ___NSMainRunLoopPerformBlockInModes_block_invoke + 0x19
(hook.dylib) Hooked____NSMainRunLoopPerformBlockInModes_block_invoke(_perform_block_literal*) + 0x74
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__ + 0xc
(CoreFoundation) __CFRunLoopDoBlocks + 0x18e
(CoreFoundation) __CFRunLoopRun + 0x385
(CoreFoundation) CFRunLoopRunSpecific + 0x22d
(HIToolbox) RunCurrentEventLoopInMode + 0x124
(HIToolbox) ReceiveNextEventCommon + 0xc9
(HIToolbox) _BlockUntilNextEventMatchingListInModeWithFilter + 0x42
(AppKit) _DPSNextEvent + 0x370
(AppKit) -[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x4f9
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) -[NSApplication run] + 0x25b
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x72e
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x775
Assignee | ||
Comment 115•10 months ago
|
||
Tricky, tricky, keep trying for a fix that doesn't leak memory.
Updated•10 months ago
|
Comment 116•10 months ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #115)
Tricky, tricky, keep trying for a fix that doesn't leak memory.
For what it's worth, my patch from comment #92 doesn't leak memory.
Assignee | ||
Comment 117•10 months ago
•
|
||
This is a simplified version of a fix authored by Steven Michaud. This
creates a death grip in DestroyNativeWindow() and holds it until the
nsCocoaWindow is destroyed. This seems to satisfy the various run loops
in macOS which might invoke DestroyNativeWindow() without holding a
reference to the window itself.
Comment 118•10 months ago
|
||
Brad, have you started a tryserver build for your new patch? I'd like to test it with my hook library from comment #93.
Assignee | ||
Comment 119•10 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #118)
Brad, have you started a tryserver build for your new patch? I'd like to test it with my hook library from comment #93.
Try run in progress at https://treeherder.mozilla.org/jobs?repo=try&revision=a8380edf34b8f6117919500f39bd5926b9389e16.
Comment 120•10 months ago
|
||
No crashes with my hook library from comment #93. And neither of the relevant ToolbarWindow
and nsCocoaWindow
objects are leaked. Hooray!
I haven't yet tested HideWindowChrome()
(using ./mach reftest layout/tools/reftest/reftest.xhtml
). I'll do that later, once Brad's patch from comment #117 has landed on mozilla-central.
Comment 121•10 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #120)
I haven't yet tested
HideWindowChrome()
(using./mach reftest layout/tools/reftest/reftest.xhtml
). I'll do that later, once Brad's patch from comment #117 has landed on mozilla-central.
I just did, after having added Brad's patch from comment #117 to current trunk code.
I don't see any crashes. The NSWindow
object created by the last call to HideWindowChrome()
does get leaked (running ./mach reftest layout/tools/reftest/reftest.xhtml
). This is because the nsCocoaWindow
object on which HideWindowChrome()
is called is itself leaked, so its destructor never runs. The nsCocoaWindow
object leakage also happens without the patch from comment #117, and with or without the patch from comment #107. So this isn't a big deal.
Comment 122•10 months ago
|
||
![]() |
||
Comment 123•10 months ago
|
||
Updated•10 months ago
|
Comment 124•10 months ago
•
|
||
I just tested today's mozilla-central nightly with my hook library -- no crashes! (It's the first nightly with this bug's latest patch.)
I suggest uplifting the patch to beta. Otherwise it's going to take a while to find out if the patch really does get rid of this bug's crashes.
Here's how to search for crashes in mozilla-central nightlies that have this bug's patch. Later, if the patch is uplifted, I'll update the search to include betas that have this bug's patch. (I updated it just now, per comment #125 below.)
Comment 125•10 months ago
|
||
133 rides to Beta next week.
Comment 126•10 months ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #125)
133 rides to Beta next week.
OK, that should be good enough.
Updated•10 months ago
|
Updated•10 months ago
|
Comment 127•10 months ago
•
|
||
Brad's patch from comment #117 has been on the 133 branch for about ten days, including six for which it was on beta. There have been no crashes at all in builds containing this patch. Given the high crash volume on earlier branches, I think we can say the fix has been verified.
The patch has also fixed (or probably fixed) several other bugs -- bug 1927157, bug 1884624, bug 1765391, bug 1904850 and bug 1899823. This shows that all of these (along with this bug) were triggered by run loop nesting forcing native (Cocoa) objects to be released "too early".
For a while I hoped the patch might fix all Mozilla UAF bugs on macOS. Unfortunately it doesn't. "Accessibility" UAFs still happen:
But it certainly fixes a lot of them, all of which were (probably) intractable before my insight about run loop nesting.
Updated•10 months ago
|
Comment 128•10 months ago
|
||
:bradwerth could you look at uplift requests for esr128 when you have a moment?
Assignee | ||
Comment 129•10 months ago
|
||
Comment on attachment 9431899 [details]
Bug 1880582: Make nsCocoaWindow hold a death grip on its native window until its destructor.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Seems to fix a variety of macOS crashes during tab closing.
- User impact if declined: Users will continue to experience occasional parent process crashes when closing a tab.
- Fix Landed on Version: 133
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Holds a memory reference a bit longer than normal, and releases the memory when the owning object is destructed. Seems quite safe.
Comment 130•10 months ago
|
||
Turns out bug 1899823 isn't fixed (see bug 1899823 comment #10).
Comment 131•10 months ago
|
||
And, very unfortunately, bug 1765391 isn't fixed either. See bug 1765391 comment #34.
Comment 132•9 months ago
|
||
Comment on attachment 9431899 [details]
Bug 1880582: Make nsCocoaWindow hold a death grip on its native window until its destructor.
Approved for 128.5esr.
Comment 133•9 months ago
|
||
uplift |
Updated•9 months ago
|
Comment 134•9 months ago
|
||
Bug 1927157 isn't fixed by the latest patch for this bug. See bug 1927157 comment #9.
Updated•9 months ago
|
Updated•9 months ago
|
Updated•4 months ago
|
Assignee | ||
Comment 135•4 months ago
|
||
Glad we finally got this sorted out. Thanks, all, for being patient.
Description
•