Closed Bug 1658432 Opened 4 years ago Closed 1 year ago

Crash in [@ objc_msgSend | CA::Layer::remove_sublayer]

Categories

(Core :: Graphics: Layers, defect, P3)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
121 Branch
Tracking Status
thunderbird_esr91 --- unaffected
firefox-esr115 120+ fixed
firefox119 --- wontfix
firefox120 + fixed
firefox121 + fixed

People

(Reporter: gsvelto, Assigned: bradwerth)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [tbird crash][adv-main120+r][adv-esr115.5+r])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug is for crash report bp-868f44e3-557d-418f-8a7a-911e10200808.

Top 10 frames of crashing thread:

0 libobjc.A.dylib objc_msgSend 
1 QuartzCore CA::Layer::remove_sublayer 
2 QuartzCore CA::Layer::remove_from_superlayer 
3 AppKit -[NSView _removeLayerFromSuperlayer] 
4 AppKit -[NSView _setSuperview:] 
5 AppKit -[NSView removeFromSuperview] 
6 AppKit -[NSView removeFromSuperviewWithoutNeedingDisplay] 
7 AppKit -[NSView _finalize] 
8 AppKit -[NSView dealloc] 
9 AppKit -[NSNextStepFrame dealloc] 

Given this is happening deep into Apple code I'm not sure if it's actionable, but it's an UAF so it's better to have it on file. Apparently we have an object in the autorelease pool that must have been already dead at some point because it contains a slice of the poison pattern.

I'm going to tentatively move this over to Graphics since this seems to be related to CoreAnimation.

Component: Widget: Cocoa → Graphics: Layers

ni? Markus in case it rings a bell, the stack doesn't look easily actionable indeed.

Blocks: wr-mac
Severity: -- → S3
Priority: -- → P3

UAFs should be filed as sec issues.....

Group: core-security

#1 crash for Mac Thunderbird 78.2.2 bp-487ce722-4b06-42ca-93ef-00bd10200921 (oddly)

Whiteboard: [tbird topcrash]
Blocks: gfx-triage
Group: core-security → gfx-core-security

Not actionable at this point.

Keywords: stalled
No longer blocks: gfx-triage

Actually needinfo'ing myself.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(jmathies)
Summary: Crash in [@ objc_msgSend | CA::Layer::remove_sublayer] → (mostly Thunderbird) Crash in [@ objc_msgSend | CA::Layer::remove_sublayer]

Actually, according to the graph this moved at least in part to Bug 1718911 but in far less numbers - Crash in [@ objc_msgSend | CA::Layer::begin_change] (at least for Thunderbird).

But still rare for version 91.

See Also: → 1718911

We've been doing a lot of mac work. What might have improved things is hard to say.

Flags: needinfo?(jmathies)

(In reply to Jim Mathies [:jimm] from comment #9)

We've been doing a lot of mac work. What might have improved things is hard to say.

For Firefox, the crash rate of the past 6 months have been dead flat https://crash-stats.mozilla.org/signature/?product=Firefox&signature=objc_msgSend%20%7C%20CA%3A%3ALayer%3A%3Aremove_sublayer&date=%3E%3D2021-08-20T15%3A21%3A00.000Z&date=%3C2022-02-20T15%3A21%3A00.000Z#graphs

As far as Thunderbird is concerned, this bug can be closed - just one (macos 10.14) version 91 crash prior to Nov 9 when crash-stats stopped accepting Thunderbird - https://crash-stats.mozilla.org/report/index/2a410474-c088-4db5-b584-e87bd0211107
0 libobjc.A.dylib objc_msgSend context
1 QuartzCore CA::Layer::remove_sublayer(CA::Transaction*, CALayer*) cfi
2 QuartzCore CA::Layer::remove_from_superlayer() cfi
3 AppKit -[NSView _removeLayerFromSuperlayer] cfi
4 AppKit -[NSView _setSuperview:] cfi
5 AppKit -[NSView removeFromSuperview] cfi
6 AppKit -[NSView removeFromSuperviewWithoutNeedingDisplay] cfi
7 AppKit -[NSView _finalize] cfi
8 AppKit -[NSView dealloc] cfi
9 AppKit -[NSNextStepFrame dealloc] cfi
10 libobjc.A.dylib (anonymous namespace)::AutoreleasePoolPage::pop(void*) cfi
11 CoreFoundation _CFAutoreleasePoolPop cfi
12 Foundation -[NSAutoreleasePool drain] cfi
13 QuickLook QLTryCatchAndCrash(void () block_pointer) cfi

prior to Nov 9 when crash-stats stopped accepting Thunderbird

Is this something we should worry about?

Edit: Now I've seen this for myself in crash-stats. Could someone tell my why this was done?

Thunderbird crash stats are now going to Backtrace instead. (MoCo crash-stats didn't want to host Thunderbird - bug 1608971.)

And Backtrace is? :-)

https://backtrace.io/ - a 3rd party service for crash report handling

https://backtrace.io/ - a 3rd party service for crash report handling

Thanks for the info, but that site doesn't seem to have any information on Thunderbird -- at least any public information.

Having done a 'thunderbird site:backtrace.io' search (which found nothing), I notice that backtrace.io pages seem to have the pattern 'blah.blah.blah.backtrace.io' (where blah.blah.blah seems to indicate a specific subject). I tried 'thunderbird.backtrace.io' and 'backtrace.io/thunderbird' and got 404 errors. Is it just that I don't know the correct site name?

Edit: Note that https://backtrace.io/ seems to be specifically for games.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #16)

Is the site I'm looking for https://crash-stats.thunderbird.net?

Yes, that's the public facing part of current Thunderbird crash stats. It allows viewing individual reports. For further details and other trends etc. an account is needed (they charge rather heavily per such account).

Whiteboard: [tbird topcrash] → [tbird crash]

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:bhood, could you consider increasing the severity of this security bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Flags: needinfo?(bhood)

Status is still the same as comment 18 and comment 19 -- about 600 crashes in the last three months on macOS 10.14 and 10.15, with two crashes in newer versions. Very few of the crashes are in Thunderbird anymore: only about 3%.

The objc_msgSend | CA::Layer::remove_sublayer signature only exists up through ESR-115. The 'objc_msgSend | CA::Layer::begin_change` signature has crashes up through a recent 119 Beta. Presumably it exists in Nightly as well, but most develpers and other folks running nightly are less likely to be using such an old OS version.

Keywords: sec-vector
Summary: (mostly Thunderbird) Crash in [@ objc_msgSend | CA::Layer::remove_sublayer] → Crash in [@ objc_msgSend | CA::Layer::remove_sublayer]

I have a theory. There's not many places where we autorelease NSView objects. One of the places is in HideWindowChrome where we call contentViewContents to create copies of the NSView objects and autorelease the array retaining them. Since HideWindowChrome is called in response to DOM changes, the array is autoreleased at the end of the event loop. It seems possible that before the autorelease pool is drained, the superview that those views are added to is destroyed, and during the drain the views are trying to remove themselves (or their layers?) from that destroyed superview. Not completely thought out, but from other crash reports I've seen that HideWindowChrome is invoked in Thunderbird, so there's some circumstantial evidence.

Anyway, if this is relevant code, then the possible fix would be to drain the NSArray before leaving HideWindowChrome, which could be done if it was a mutable array. I'll build a patch that does that.

Assignee: nobody → bwerth
Flags: needinfo?(mstange.moz)

This ensures that when the NSMutableArray returned by
contentViewContents is released, there's nothing in it that will also be
released.

(In reply to Brad Werth [:bradwerth] from comment #22)

It seems possible that before the autorelease pool is drained, the superview that those views are added to is destroyed

The superview would have removed its subviews from itself when it got destroyed. So the subviews would no longer have a pointer to it.

Looking at callstack https://crash-stats.mozilla.org/report/index/b99c5baa-2411-4c1e-a426-26cc30231022 it's more clear to me that the autorelease is of a NSWindow (probably fine) and somewhere within the content of that window an NSView is trying to remove its CALayer from its superlayer, but the CALayer was already released. So we probably have an over-releasing of CALayers somewhere in our code, and it's not harming us until the window is closed. That crash stack is from a video site, and NativeLayerCA creates several CALayers to handle video (and recreates them as needed) so that's a possible source of this kind of trouble. I'll do a review of layer memory management in that class and see if I find anything.

Attachment #9358936 - Attachment is obsolete: true

This change lets NativeLayerRootCA::Representation::Commit be the only
point where sublayers are added and removed from the root CALayer.

Comment on attachment 9359865 [details]
Bug 1658432: Don't remove wrapping CALayer from its superview outside of Commit.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Very difficult. No known reproduction case, and the changes in the patch are innocuous and not obviously security-related.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?: Very unlikely this will cause any problems, and no special testing needed.
  • Is Android affected?: No
Attachment #9359865 - Flags: sec-approval?

Comment on attachment 9359865 [details]
Bug 1658432: Don't remove wrapping CALayer from its superview outside of Commit.

Approved to land and uplift

Attachment #9359865 - Flags: sec-approval? → sec-approval+
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/89d70b1c4147 Don't remove wrapping CALayer from its superview outside of Commit. r=mac-reviewers,mstange
Group: gfx-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled

The patch landed in nightly and beta is affected.
:bradwerth, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox120 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(bwerth)

Comment on attachment 9359865 [details]
Bug 1658432: Don't remove wrapping CALayer from its superview outside of Commit.

Beta/Release Uplift Approval Request

  • User impact if declined: Users on macOS 10.15 may crash when watching video under some unknown, rare condition, possibly when transitioning from windowed mode to fullscreen mode.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This removes a too-early and redundant clearing of the video player display layer. There are remaining untouched parts of code that clear the video player display layer at the intended time and they continue to work as intended.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(bwerth)
Attachment #9359865 - Flags: approval-mozilla-beta?

Comment on attachment 9359865 [details]
Bug 1658432: Don't remove wrapping CALayer from its superview outside of Commit.

Approved for 120.0b3

Attachment #9359865 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9359865 [details]
Bug 1658432: Don't remove wrapping CALayer from its superview outside of Commit.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: ESR users are somewhat more likely to be running on 10.14 or 10.15, which is the only OS version that exhibits this crash Bug.
  • User impact if declined: Occasional crashes on closing a window after viewing a video and transitioning to fullscreen.
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is a very small code change that prevents a layer change from happening too early. The layer is still updated correctly at its expected time.
Attachment #9359865 - Flags: approval-mozilla-esr115?
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-

Comment on attachment 9359865 [details]
Bug 1658432: Don't remove wrapping CALayer from its superview outside of Commit.

Approved for 115.5esr

Attachment #9359865 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
Whiteboard: [tbird crash] → [tbird crash][adv-main120+r][adv-esr115.5+r]

The patch in attachment 9359865 [details] does not seem to have resolved this. I'm also seeing crash stacks on URLs that are not video sites, including on "about:newtab", so the Bug may not be related to video layers fix I had envisioned. I'll keep thinking about it.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Can we please move that to a new bug?

Flags: needinfo?(bwerth)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #40)

Can we please move that to a new bug?

Is that useful? The crash signature would still update this Bug, right? I don't have any new ideas on what to do about this Bug other than fix it, so I'm not sure how to title a new Bug.

Flags: needinfo?(bwerth)

Cloning this bug is fine. But given that we've already had multiple patches landed and uplifted around in this bug, it'll make tracking future work a pain.

Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED

Bulk-unhiding security bugs fixed in Firefox 119-121 (Fall 2023). Use "moo-doctrine-subsidy" to filter

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: