Closed Bug 1882783 Opened 1 year ago Closed 1 year ago

Diagnostic crash in [@ mozilla::MediaTrackGraphImpl::UpdateGraph]

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
125 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox123 --- unaffected
firefox124 + fixed
firefox125 + fixed

People

(Reporter: jimm, Assigned: pehrsons)

References

(Regression)

Details

(4 keywords)

Crash Data

Attachments

(6 files)

Crash report: https://crash-stats.mozilla.org/report/index/e8129be1-e9a4-476f-b38f-8173d0240229

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(false) (A non-ended SourceMediaTrack wasn't fed enough data by NotifyPull)

Top 10 frames of crashing thread:

0  XUL  mozilla::MediaTrackGraphImpl::UpdateGraph  dom/media/MediaTrackGraph.cpp:1364
0  XUL  mozilla::MediaTrackGraphImpl::OneIterationImpl  dom/media/MediaTrackGraph.cpp:1626
1  XUL  mozilla::GraphRunner::Run  dom/media/GraphRunner.cpp:141
2  XUL  nsThread::ProcessNextEvent  xpcom/threads/nsThread.cpp:1193
3  XUL  NS_ProcessNextEvent  xpcom/threads/nsThreadUtils.cpp:480
3  XUL  mozilla::ipc::MessagePumpForNonMainThreads::Run  ipc/glue/MessagePump.cpp:300
4  XUL  MessageLoop::RunInternal  ipc/chromium/src/base/message_loop.cc:370
4  XUL  MessageLoop::RunHandler  ipc/chromium/src/base/message_loop.cc:363
4  XUL  MessageLoop::Run  ipc/chromium/src/base/message_loop.cc:345
5  XUL  nsThread::ThreadFunc  xpcom/threads/nsThread.cpp:370

STR:

I had a set of external headphones plugged into my mac. In Meet however when joining a meeting the service defaulted to the internal speaker and mic for some unknown reason. In Meet settings, I switched mic and speaker to the external, hit the test speaker button, and noticed that the audio came out of the mac speaker. So while still in the meeting, I unplugged the headphones from the audio jack, which triggered this crash.

Flags: needinfo?(apehrson)
Keywords: regression
Regressed by: 1878380

Set release status flags based on info from the regressing bug 1878380

:padenot i know :apehrsons is out until next week, do you think you could take a look?

Flags: needinfo?(padenot)

The bug is marked as tracked for firefox124 (beta). We have limited time to fix this, the soft freeze is in 13 days. However, the bug still isn't assigned.

:jimm, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(jmathies)

Jim, I'm marking this an S3, but please update if you disagree.

Severity: -- → S3

Failing this assert is not ideal (it's a canary for a specific kind of audio distortion issues) but should be harmless in release.

Assignee: nobody → apehrson
Status: NEW → ASSIGNED
Flags: needinfo?(padenot)
Flags: needinfo?(jmathies)
Flags: needinfo?(apehrson)
Priority: -- → P2

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on beta

:pehrsons, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(apehrson)
Keywords: topcrash

I cannot reproduce this crash. But I'll note we only use the type of track that can fail this assertion for fake audio tracks and remote audio tracks (i.e. audio received through RTCPeerConnection). When running meet the only such tracks in play are remote audio tracks.

I'll also note that all crash reports I looked at here have one GraphRunner thread (indicating there is one MediaTrackGraph present) and two MediaTrackGrph threads (indicating there are two SystemClockDrivers driving a MediaTrackGraph). There should never be two drivers active simultaneously for a single graph, and I think these reports indicate that is the case. P1/S2 because of this.

This seems reasonably likely related to the mThreadState == ThreadState::Wait assertion failure in bug 1882172. Marking this as a sec bug because of that. And I'll note that having two drivers run a single graph could also have security implications, though GraphRunner should mitigate some modes.

Given that there is a low rate of this signature across platforms prior to bug 1878380, it seems like the regression is an increase in the triggering of an existing issue. The root cause of the issue is unclear at this time.

Group: media-core-security
Severity: S3 → S2
Flags: needinfo?(apehrson)
Priority: P2 → P1

Trying to reproduce by unplugging headphones under various conditions within a Meet meeting. No luck reproducing so far.

Auditing the code I do see one way, introduced by bug 1878380, of having two system clock drivers drive the same graph, if the graph is on an audio driver with active fallback and switches driver, then before the next driver asynchronously stops the audio driver with fallback, a devicechange event comes in and starts a new fallback driver, disregarding that the fallback state is Stopped.

Note this mode affects only macOS, the latent issue resulting in this signature would have a different root cause. If it is similar enough maybe we can catch it with an assert, otherwise it will continue to occur.

I'll try to repro this mode in a gtest tomorrow.

OS: Unspecified → macOS

Comment on attachment 9389358 [details]
Bug 1882783 - Don't start a fallback driver after being asked to stop. r?padenot!

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not easily. Patches make it pretty clear what the issue is, but it requires both user interaction (changing default audio output device) and timing, and only applies to mac.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Yes
  • Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: beta, yes
  • If not all supported branches, which bug introduced the flaw?: Bug 1878380
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: trivial
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely to cause further regressions, but it's speculative and we are not sure it fixes the root cause of this bug.
  • Is the patch ready to land after security approval is given?: Yes
  • Is Android affected?: No
Attachment #9389358 - Flags: sec-approval?
Attachment #9389356 - Flags: sec-approval?
Attachment #9389357 - Flags: sec-approval?

Note on the sec-approval: I marked this as a sec bug because of bug 1882172, which is sec-low. That is however based on it being a shutdown assertion race. The case I am fixing here is not shutdown specific, and if it is indeed the root cause of this bug then we have a situation where a graph is driven by more than one graph driver. The security implications given that are hard to overview but I'm pretty sure we could deduce a path leading to a UAF somewhere in the graph, probably when switching driver or shutting it down.

See Also: → 1882172
Attachment #9389356 - Flags: sec-approval? → sec-approval+
Attachment #9389357 - Flags: sec-approval? → sec-approval+

Comment on attachment 9389358 [details]
Bug 1882783 - Don't start a fallback driver after being asked to stop. r?padenot!

Based on the limited blast radius (mac) and high barrier for triggering (unusual user interaction) I'm find with this landing with the test and not making it into 124 if relman is not confident in its stability and would like more bake time.

Attachment #9389358 - Flags: sec-approval? → sec-approval+
Attachment #9389603 - Flags: approval-mozilla-beta?
Attachment #9389604 - Flags: approval-mozilla-beta?
Attachment #9389605 - Flags: approval-mozilla-beta?

Uplift Approval Request

  • String changes made/needed: None
  • Is Android affected?: no
  • Fix verified in Nightly: no
  • Steps to reproduce for manual QE testing: (haven't been able to reproduce on demand)
  • User impact if declined: We suspect there could be a sec issue in the non-diagnostic-assert variant of this bug
  • Code covered by automated testing: yes
  • Risk associated with taking this patch: Low
  • Needs manual QE test: no
  • Explanation of risk level: This patch itself is low risk, bug 1878380 I deemed low-medium risk and I'd say there's still some risk whether any regression remains with this fix.
Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/d261da754cf2 In MockGraphInterface, allow passing in a SwitchedDriver runnable. r=padenot https://hg.mozilla.org/integration/autoland/rev/1f0beb4ce26d Add gtest for an audio driver receiving a DeviceChanged event after the fallback driver has been told to stop. r=padenot https://hg.mozilla.org/integration/autoland/rev/352c6538f922 Don't start a fallback driver after being asked to stop. r=padenot
Group: media-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch
Attachment #9389605 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9389604 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9389603 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Blocks: 1882172
See Also: 1882172
Regressions: 1888433
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: