Closed Bug 1736312 Opened 3 years ago Closed 3 years ago

macOS Crash in [@ nsBaseWidget::StoreWindowClipRegion]

Categories

(Core :: Widget: Cocoa, defect, P3)

Unspecified
macOS
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox93 --- unaffected
firefox94 --- unaffected
firefox95 --- affected

People

(Reporter: aryx, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Over the weekend a few new 95.0a1 Nightly crash signatures got reported for macOS:

Could these have been started by the changes in bug 1721601?

There are also these 2 new signatures:

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/256f8bef-f05f-4f7b-b6a7-b07ec0211018

Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

Mac crash info:

{
  "num_records": 1,
  "records": [
    {
      "message": "Performing @selector(_setNeedsZoom:) from sender _NSThemeZoomWidget 0x1230c2300",
      "module": "/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit"
    }
  ]
}

This is similar to bug 1695591 comment 8.

Top 10 frames of crashing thread:

0 XUL nsBaseWidget::StoreWindowClipRegion widget/nsBaseWidget.cpp:761
1 AppKit -[NSView _startLiveResizeCacheOK:] 
2 AppKit -[NSView _startLiveResizeCacheOK:] 
3 AppKit -[NSView _startLiveResizeCacheOK:] 
4 AppKit -[NSView _startLiveResize] 
5 AppKit -[NSWindow _startLiveResize] 
6 AppKit __82-[_NSWindowEnterFullScreenTransitionController setupWindowForAfterFullScreenEnter]_block_invoke 
7 AppKit NSPerformVisuallyAtomicChange 
8 AppKit -[_NSWindowEnterFullScreenTransitionController setupWindowForAfterFullScreenEnter] 
9 AppKit -[_NSEnterFullScreenTransitionController start] 
Flags: needinfo?(tnikkel)
Crash Signature: [@ nsBaseWidget::StoreWindowClipRegion] [@ nsBaseWidget::SynthesizeNativeMouseMove] → [@ nsBaseWidget::StoreWindowClipRegion] [@ nsBaseWidget::SynthesizeNativeMouseMove ]

I can't really see how bug 1721601 might cause any of those crashes.

For the ones that have "enter full screen" on the stack maybe there are full screen related changes that landed?

Flags: needinfo?(tnikkel)

(In reply to Timothy Nikkel (:tnikkel) from comment #1)

For the ones that have "enter full screen" on the stack maybe there are full screen related changes that landed?

Flags: needinfo?(bwerth)
Crash Signature: [@ nsBaseWidget::StoreWindowClipRegion] [@ nsBaseWidget::SynthesizeNativeMouseMove ] → [@ mozilla::dom::DOMLocalization::WrapObject ] [@ nsBaseWidget::StoreWindowClipRegion] [@ nsBaseWidget::SynthesizeNativeMouseMove ]

Today's new crash signature is @ LogEntry (buildID 20211017213754). It references session history / doc loader like some of the previous signatures but no references to "zoom" in the mac crash info.

Gabriele, could you take a look at this shifting crash signature?

Flags: needinfo?(gsvelto)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #3)

It references session history / doc loader like some of the previous signatures but no references to "zoom" in the mac crash info.

Oh, that's why you thought it might be my double tap patch? I'm not sure what kind of zoom the stack was referring to, but it has nothing to do with double tap zoom.

I can't find anything that points to fullscreen changes I've made, most of which landed in late September. What's more, one of the crash signatures from a build a few days ago is referencing nsBaseWidget::StoreWindowClipRegion that I can't find anymore. Perhaps some code removal just landed? Emilio would know about changes to nsBaseWidget.

Flags: needinfo?(bwerth) → needinfo?(emilio)

Yeah, StoreWindowClipRegion was effectively dead code which was there only for plugins. I removed it in https://phabricator.services.mozilla.com/D128863. That means probably that the top of the stack we see here is somehow busted. Gankra, do you know what might be going on with those reports.

Flags: needinfo?(emilio) → needinfo?(a.beingessner)

Looking at https://crash-stats.mozilla.org/report/index/256f8bef-f05f-4f7b-b6a7-b07ec0211018:

The build is from 2 days before you removed StoreWindowClipRegion, so it is legitimately still in the code, and its entry in the symbol files looks reasonable:

FUNC 4323d00 9e 0 nsBaseWidget::StoreWindowClipRegion(nsTArray<mozilla::gfx::IntRectTyped<mozilla::LayoutDevicePixel> > const&)
4323d00 14 761 8357
4323d14 d 762 8357
4323d21 12 763 8357
4323d33 3 762 8357
4323d36 25 763 8357
4323d5b c 764 8357
4323d67 7 765 8357
4323d6e b 766 8357
4323d79 16 765 8357
4323d8f f 768 8357

Because it shows up in the context frame there's nothing really that the crash reporter could be doing to hallucinate this frame. We just grab the context frame from the minidump and report it verbatim. Unless we've messed up the macos minidump generator or dump_syms, that is indeed where %rip was pointing. (And breakpad and rust-minidump agree on this backtrace, fwiw.)

That said I agree it's almost certainly an incorrect frame. The line it's pointing at is in the POD function arguments. In fact most of the line numbers in this back trace seem pretty nonsensical. I'm not familiar enough with our mac crash reports to say if our symbols are just that unreliable or if this is particularly strange.

Because the crash is EXC_BAD_ACCESS/KERN_INVALID_ADDRESS and has this mac_crash_info:

"message": "Performing @selector(_setNeedsZoom:) from sender _NSThemeZoomWidget 0x1230c2300",
"module": "/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit"

And the only symbol we have in AppKit that contains setNeedsZoom: is

PUBLIC 3bb64c 0 -[NSWindow _setNeedsZoom:]

It suggests to me that maybe some _NSThemeZoomWidget is trying to access a null NSWindow? That certainly jives with their being a ton of AppKit window stuff in the stack. It would also be very reasonable to be vaguely in this region of graphics code if we're handling window resizes. Unfortunately I don't have a good sense for if macos is genuinely using a different source of information or if the vague agreement between mac_crash_info and the backtrace is just because they're derived from the same (bad) information.

I wonder if we the crash reporter can use the ABI of objc_messageSend to refine analysis...

Flags: needinfo?(a.beingessner)

I see similar "impossible" crashes on the macOS monterrey beta. This is a startup crash I got after wakeup from suspend: https://crash-stats.mozilla.org/report/index/9aeb8e73-a167-4704-9692-e6c790211021#tab-details

That crash signature affects more macOS versions than macOS 12 but all have been reported with 95.0a1 20211019215100. There are more new macOS crash signatures - fallout from bug 1736373?

Hmm, what's weird about the stack in https://crash-stats.mozilla.org/report/index/256f8bef-f05f-4f7b-b6a7-b07ec0211018#tab-modules is that all the AppKit symbols look perfectly reasonable, and all the XUL symbols look awful. I think stackwalking here might have succeeded splendidly, but we may have symbolicated with the wrong XUL library? Maybe we were in the middle of an update, and we were running the old version but storing the breakpadId of the new version?

See Also: → 1737201

(In reply to Markus Stange [:mstange] from comment #10)

Hmm, what's weird about the stack in https://crash-stats.mozilla.org/report/index/256f8bef-f05f-4f7b-b6a7-b07ec0211018#tab-modules is that all the AppKit symbols look perfectly reasonable, and all the XUL symbols look awful. I think stackwalking here might have succeeded splendidly, but we may have symbolicated with the wrong XUL library? Maybe we were in the middle of an update, and we were running the old version but storing the breakpadId of the new version?

If the library couldn't be found in memory then the minidump generator will have taken it's ID from the file stored on disk (see here). If the loaded library and the one on disk didn't match the resulting minidump will be impossible to symbolicate correctly.

Flags: needinfo?(gsvelto)

Does the crash tell us whether the ID came from the disk or from memory? If not, could we add that piece of information?

(In reply to Markus Stange [:mstange] from comment #12)

Does the crash tell us whether the ID came from the disk or from memory?

Unfortunately no.

If not, could we add that piece of information?

Yes, we could use one of the reserved fields in the module info entries for that or pick the "almost official" way to do it. Every entry in the minidump's module list has a version info structure. This structure is left largely empty when we generate macOS builds but it doesn't have to be so. Specifically for entries we generate by reading the file from disk (instead of from memory) we could set the VS_FF_INFOINFERRED flag in the dwFileFlags field which means:

The file's version structure was created dynamically; therefore, some of the members in this structure may be empty or incorrect.

Not 100% what we want but close enough. Adding something like this to the minidump writer is trivial but we'd still need to surface it in minidump_dump and minidump-stackwalk.

Priority: -- → P3

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.