Closed Bug 1911317 Opened 1 year ago Closed 1 year ago

macOS Crash in [@ safeShift10BitBy6] with netflix.com

Categories

(Core :: Audio/Video: GMP, defect)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
132 Branch
Tracking Status
firefox-esr115 131+ fixed
firefox-esr128 131+ fixed
firefox129 + wontfix
firefox130 + wontfix
firefox131 + fixed
firefox132 + fixed

People

(Reporter: aryx, Assigned: aosmond)

References

Details

(4 keywords, Whiteboard: [adv-main131+r][adv-esr128.3+r][adv-esr115.16+r])

Crash Data

Attachments

(2 files)

[Tracking Requested - why for this release]:

First crash report from 2024-07-19, ~25 crashes/day on macOS over the last week. If an url is shared, it's a https://www.netflix.com page.

Crash report: https://crash-stats.mozilla.org/report/index/132d64f1-5c54-47dd-b623-153790240802

Reason: EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE

Top 10 frames:

0  XUL  safeShift10BitBy6(unsigned short const&)  gfx/layers/MacIOSurfaceImage.cpp:58
0  XUL  mozilla::layers::MacIOSurfaceImage::SetData(mozilla::layers::ImageContainer*,...  gfx/layers/MacIOSurfaceImage.cpp:200
1  XUL  mozilla::VideoData::CreateAndCopyData(mozilla::VideoInfo const&, mozilla::lay...  dom/media/MediaData.cpp:360
2  XUL  mozilla::gmp::ChromiumCDMParent::CreateVideoFrame(mozilla::gmp::CDMVideoFrame...  dom/media/gmp/ChromiumCDMParent.cpp:1020
3  XUL  mozilla::gmp::ChromiumCDMParent::RecvDecodedData(mozilla::gmp::CDMVideoFrame ...  dom/media/gmp/ChromiumCDMParent.cpp:841
4  XUL  mozilla::gmp::PChromiumCDMParent::OnMessageReceived(IPC::Message const&)  ipc/ipdl/PChromiumCDMParent.cpp:1492
5  XUL  mozilla::gmp::PGMPContentParent::OnMessageReceived(IPC::Message const&)  ipc/ipdl/PGMPContentParent.cpp:408
6  XUL  mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecyc...  ipc/glue/MessageChannel.cpp:1820
6  XUL  mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecyclePro...  ipc/glue/MessageChannel.cpp:1739
6  XUL  mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, ...  ipc/glue/MessageChannel.cpp:1530

:jimm could this be triaged and prioritized for investigation?

Flags: needinfo?(jmathies)

The bug is marked as tracked for firefox129 (release), tracked for firefox130 (beta) and tracked for firefox131 (nightly). However, the bug still isn't assigned.

:jimm, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(jmathies)

Looks like something they may have shipped and then possibly backed out? Andrew any thoughts?
Strange that this only impacted 128 and 129b4 but not 129.0 or 129.0.1.

Flags: needinfo?(jmathies)
Flags: needinfo?(aosmond)
Severity: -- → S2

This may be works for me. The crash hasn't happened in the last 10 days, regardless of release. It stated July 19th, and went away Aug 7th. From the crash it appears format related.

This is a reminder regarding comment #2!

The bug is marked as tracked for firefox129 (release), tracked for firefox130 (beta) and tracked for firefox131 (nightly). We have limited time to fix this, the soft freeze is in 14 days. However, the bug still isn't assigned.

This is a reminder regarding comment #2!

The bug is marked as tracked for firefox129 (release), tracked for firefox130 (beta) and tracked for firefox131 (nightly). We have limited time to fix this, the soft freeze is in 8 days. However, the bug still isn't assigned.

Crashes are still happening AFAICT, but the volume is pretty low.

Assignee: nobody → aosmond
Group: media-core-security
Flags: needinfo?(aosmond)

I think this is actually a security bug. We verify the bounds of any buffers we receive from the GMP process if it is a GMP plugin (i.e. OpenH264) before passing it into the GMPVideoDecoder implementation:
https://searchfox.org/mozilla-central/rev/e968519d806b140c402c3b3932cd5f6cd7cc42ac/dom/media/gmp/GMPVideoDecoderParent.cpp#301

However we do not appear to have any similar checks for the GMP process if it is a Chromium-style plugin (i.e. Widevine):
https://searchfox.org/mozilla-central/rev/e968519d806b140c402c3b3932cd5f6cd7cc42ac/dom/media/gmp/ChromiumCDMParent.cpp#991

This is called directly by ChromiumCDMParent::RecvDecodedData and ChromiumCDMParent::RecvDecodedShmem. Given this is where the process boundary is, we need to do similar bounds checks to ensure that plane offset + plane stride * plane width does not exceed the bounds of the provided data buffer. My best guess is that we are actually getting a buffer that violates that, hence the BAD_ACCESS crash on the pointer.

I suspect this is because of a bug in Widevine that Netflix is sometimes hitting. We should just drop the buffer in this case.

Status: NEW → ASSIGNED

Possibly related is bug 1906527 which substantially alters the OSX pathway and there do not appear any crashes on nightly with this signature.

There do not appear to be any signature morphs for nightly, but the volume is probably too low to be certain that bug 1906527 fixed/changed things.

Attached file Bug 1911317.

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on release
  • Top 10 content process crashes on release
  • Top 5 desktop browser crashes on Mac on release

For more information, please visit BugBot documentation.

Keywords: topcrash
Attachment #9420880 - Flags: approval-mozilla-esr115?

Comment on attachment 9420419 [details]
Bug 1911317.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: I'm not sure. They need to get the GMP process / Widevine library to provide a badly structured buffer, but the Widevine library is a blackbox. It is immediately obvious from the patch we are worried about a buffer overflow. Removing the comments/log entries won't make a difference.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Yes
  • Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: All
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?: Only esr115 required an updated patch.
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely, it just adds bounds checking for something that should never happen.
  • Is the patch ready to land after security approval is given?: Yes
  • Is Android affected?: No
Attachment #9420419 - Flags: sec-approval?
Attachment #9420880 - Flags: approval-mozilla-esr115?

We are still getting crash reports for users with the updated Widevine plugin, so it does not appear that will fix this issue.

e.g. https://crash-stats.mozilla.org/report/index/dbbc703b-9193-4829-8a22-011550240903 has gmp-widevinecdm":{"version":"4.10.2830.0" in the crash report telemetry.

Comment on attachment 9420419 [details]
Bug 1911317.

Approved to land and uplift

Attachment #9420419 - Flags: sec-approval? → sec-approval+
Group: media-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch

The patch landed in nightly and beta is affected.
:aosmond, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox131 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(aosmond)

Comment on attachment 9420419 [details]
Bug 1911317.

Beta/Release Uplift Approval Request

  • User impact if declined: Sec issue, related crashes in the wild
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just adds additional bounds checking.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(aosmond)
Attachment #9420419 - Flags: approval-mozilla-beta?
Attachment #9420419 - Flags: approval-mozilla-esr128?
Attachment #9420880 - Flags: approval-mozilla-esr115?

Comment on attachment 9420419 [details]
Bug 1911317.

Approved for 131.0b3

Attachment #9420419 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-

Comment on attachment 9420419 [details]
Bug 1911317.

Approved for 128.3esr

Attachment #9420419 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+

:aosmond can you take a look at the crashes in nightly/beta that occurred post change?

Flags: needinfo?(aosmond)

Reopening since the crashes are still happening.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 132 Branch → ---

I am hoping bug 1918778 will resolve it, should have the fix in hand EOD/this weekend, as there are several problems with how we deal with the surface recycling that can explain the crash.

Flags: needinfo?(aosmond)

:aosmond in will both patches (this and bug 1918778) need to be uplifted to esr115?
I have requested nomination to esr128 on the other patch just to be proactive, but I will wait until b8 has a little more crash-free traffic

Flags: needinfo?(aosmond)

(In reply to Dianna Smith [:diannaS] from comment #29)

:aosmond in will both patches (this and bug 1918778) need to be uplifted to esr115?
I have requested nomination to esr128 on the other patch just to be proactive, but I will wait until b8 has a little more crash-free traffic

This bug did not resolve the topcrash, but we should still uplift it for the security implications, since we did not verify the sanity of the output from the plugin in the content process.

As for bug 1918778, in order to uplift, we will need to uplift bug 1906527 as well, or I will need to see if I can write a minimalist version of the patch suitable for uplift. Let me see what I can do.

Depends on: 1918778

Let's re-close this bug then for the sake of tracking. Bug 1918778 is looking good for having cleaned up the remaining crashes under this signature.

Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Flags: needinfo?(aosmond)
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch
Attachment #9420880 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+

esr115 Uplift Approval Request

  • User impact if declined: Sec issue, related crashes in the wild
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: No
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Just adds additional bounds checking.
  • String changes made/needed: None
  • Is Android affected?: no
Whiteboard: [adv-main131+r]
Whiteboard: [adv-main131+r] → [adv-main131+r][adv-esr128.3+r]
Whiteboard: [adv-main131+r][adv-esr128.3+r] → [adv-main131+r][adv-esr128.3+r][adv-esr115.16+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: