Open Bug 1737201 Opened 3 years ago Updated 1 year ago

macOS crashes with two or more calls to -[NSView _startLiveResizeCacheOK:] in proto signature

Categories

(Core :: Widget: Cocoa, defect, P3)

Unspecified
macOS
defect

Tracking

()

People

(Reporter: smichaud, Unassigned)

References

Details

Crash Data

https://crash-stats.mozilla.org/search/?proto_signature=~_startLiveResizeCacheOK&platform=Mac%20OS%20X&date=%3E%3D2021-04-22T03%3A14%3A00.000Z&date=%3C2021-10-22T03%3A14%3A00.000Z&_facets=signature&_facets=proto_signature&_facets=platform_version&_facets=mac_crash_info&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-platform_version

These have been around for a long time, but may have increased recently. They happen on many different versions of macOS, but there are disproportionately many on macOS 11 and (oddly) OS X 10.11. There are many different "signatures", but the most common is [@ nsBaseWidget::NotifyLiveResizeStarted ].

These crash stacks make no sense. Among other things, there is no way -[NSView _startLiveResizeCacheOK:] can call any of these crashes' "signatures". It actually seems to be calling itself recursively (using objc_msgSend()). So either something very strange is going on, or these crash stacks are badly corrupt. Or both.

Sounds a lot like bug 1736312.

See Also: → 1736312

A few of these crashes have "mac crash info" sections like the following:

    {
      "num_records": 1,
      "records": [
        {
          "message": "Performing @selector(_setNeedsZoom:) from sender _NSThemeZoomWidget 0x121f41200",
          "module": "/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit"
        }
      ]
    }

Something similar is reported at bug 1579807 comment #2.

And yes, similar crash stacks are reported at bug 1736540 and bug 1736312.

Lots of crashes with the signature [@ aom_smooth_v_predictor_32x8_c ] have recently been reported on the 95 branch (currently the trunk).

Given the nature of this bug, it doesn't make much sense to record its (supposed) signatures. But here are the two most common ones.

Crash Signature: [@ nsBaseWidget::NotifyLiveResizeStarted ] [@ aom_smooth_v_predictor_32x8_c ]
See Also: → 1579807
Summary: macOS crashes with two or more calls to -[NSView _startLiveResizeCacheOK:] in proto_signature → macOS crashes with two or more calls to -[NSView _startLiveResizeCacheOK:] in proto signature

I'm beginning to suspect that the weirdness of the top lines of these crash reports is due to the instruction pointer (RIP on AMD64, PC on ARM64) having been set (more or less) to random values. If so, this is a security sensitive bug.

This is probably just a mismatch between the loaded libXUL and the one on disk, see bug 1736312 comment 11.

Another common signature (at least lately).

Crash Signature: [@ nsBaseWidget::NotifyLiveResizeStarted ] [@ aom_smooth_v_predictor_32x8_c ] → [@ nsBaseWidget::NotifyLiveResizeStarted ] [@ mozilla::dom::(anonymous namespace)::QuotaClient::InitOrigin ] [@ aom_smooth_v_predictor_32x8_c ]

(In reply to comment #7, quoting bug 1736312 comment 11)

If the library couldn't be found in memory then the minidump generator will have taken it's ID from the file stored on disk (see here). If the loaded library and the one on disk didn't match the resulting minidump will be impossible to symbolicate correctly.

Interesting. But I don't see how such a mismatch could happen.

Edit: Unless it has something to do with updates.

What's particularly weird is that (as best I can tell) there's no way for -[NSView _startLiveResizeCacheOK:] to call into XUL (directly or even indirectly). If this is true, my "random instruction pointer" theory seems more likely than the "XUL mismatch" theory.

On the other hand, it's odd that a (presumed) low level macOS bug would effect both AMD64 and ARM64 hardware.

Edit: It's also a bit odd that the (presumed) random addresses are all in XUL. But XUL is very big (far bigger than the rest of Firefox or Thunderbird). And most system libraries are in the dyld shared cache (which is loaded into a single location in memory, at a special address). And the randomness may not be entirely random (which could explain why none of the random addresses are in the dyld shared cache).

The reason I think it's "XUL mismatch" is that the other XUL frames further up in the call stack are also wrong, but they're in places of the stack where XUL code is expected (just with different function names), so I think stackwalking worked fine.

What's particularly weird is that (as best I can tell) there's no way for -[NSView _startLiveResizeCacheOK:] to call into XUL (directly or even indirectly).

Oops, it does call -[NSView viewWillStartLiveResize], which Cocoa widgets does implement for its ChildView objects, here. That seems the best place to start to try to firm up the "XUL mismatch" theory.

And yes, a "XUL mismatch" should cause all the XUL frames to be off, at least a little bit.

If the "XUL mismatch" theory is correct, there are presumably some crash stacks without mismatches. In that case it's odd that I can't find any crashes with -[ChildView viewWillStartLiveResize] in the proto signature.

https://crash-stats.mozilla.org/search/?proto_signature=~viewWillStartLiveResize&platform=Mac%20OS%20X&date=%3E%3D2021-04-22T20%3A30%3A00.000Z&date=%3C2021-10-22T20%3A30%3A00.000Z&_facets=signature&page=1&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform

Crash Signature: [@ nsBaseWidget::NotifyLiveResizeStarted ] [@ mozilla::dom::(anonymous namespace)::QuotaClient::InitOrigin ] [@ aom_smooth_v_predictor_32x8_c ] → [@ nsBaseWidget::NotifyLiveResizeStarted ] [@ mozilla::dom::(anonymous namespace)::QuotaClient::InitOrigin ] [@ aom_smooth_v_predictor_32x8_c ] [@ XUL@0x435411a | -[NSView _startLiveResizeCacheOK:] ]

(Following up comment 13)

-[ChildView viewWillStartLiveResize] calls nsBaseWidget::NotifyLiveResizeStarted(). So that seems likely to be this bug's "legitimate" crash signature. I don't know why -[ChildView viewWillStartLiveResize] doesn't show up in those crash stacks. But that seems a minor glitch compared to the other problems that show up in this bug.

Severity: -- → S2
Priority: -- → P3

The most obvious way I can think of to trigger a "XUL mismatch" is to upgrade Firefox without restarting, then crash. For as long as Firefox (or Thunderbird) is still running without having restarted, the image in memory doesn't match the one on disk. But when I tried this (using about:crashparent to trigger a crash), I saw no mismatches.

I tested with the 2021-10-26-09-03-55-mozilla-central nightly. First I upgraded it (in "About Firefox") to the current trunk nightly (2021-11-15-09-39-17-mozilla-central). Then I visited "about:crashparent". Here's the crash that resulted:

bp-1d3ec0a3-4854-4751-85ff-51c8b0211115

Then I restarted Firefox (now fully updated) and visited about:crashparent again to trigger another crash:

bp-8f1efc8a-da52-4bf2-862b-8dac30211115

The same thing happened using "about:crashcontent" (which crashes one of the content processes).

Before:

bp-2dddb67a-1661-40a5-8307-3bd850211115

After:

bp-03063880-0322-4b24-a541-739970211115

See Also: → 1736373
Severity: S2 → S3
You need to log in before you can comment on or make changes to this bug.