Closed Bug 1660368 Opened 4 years ago Closed 1 year ago

Crash in [@ mozilla::H264::DecodeNALUnit]

Categories

(Core :: Audio/Video: Playback, defect, P2)

Unspecified
macOS
defect

Tracking

()

RESOLVED WORKSFORME
106 Branch

People

(Reporter: mccr8, Unassigned)

References

Details

(4 keywords)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/933a0c9b-6c7f-4940-b326-3d9660200819

Top 10 frames of crashing thread:

0 XUL mozilla::H264::DecodeNALUnit dom/media/platforms/agnostic/bytestreams/H264.cpp:495
1 XUL mozilla::H264::GetFrameType dom/media/platforms/agnostic/bytestreams/H264.cpp:988
2 XUL rlz_lib::BytesToString::kHex 
3 XUL mozilla::MP4TrackDemuxer::GetNextSample dom/media/mp4/MP4Demuxer.cpp:391
4 XUL mozilla::MP4TrackDemuxer::GetSamples dom/media/mp4/MP4Demuxer.cpp:459
5 XUL mozilla::TrackBuffersManager::DoDemuxVideo dom/media/mediasource/TrackBuffersManager.cpp:1501
6 XUL mozilla::MozPromise<bool, mozilla::MediaResult, true>::MozPromise xpcom/threads/MozPromise.h:237
7 XUL mozilla::TrackBuffersManager::CodedFrameProcessing dom/media/mediasource/TrackBuffersManager.cpp:1469
8 XUL rlz_lib::BytesToString::kHex 
9 XUL mozilla::TrackBuffersManager::SegmentParserLoop dom/media/mediasource/TrackBuffersManager.cpp:917
  • Big ESR uptick starting July -- likely ESR 78.0 releasing at the end of June. This big uptick is likey due to MacOS users being migrated to ESR from release and ESR reports not being throttled (release is throttled to 10%). So the increase in numbers is likely due to inflated reporting.
  • MacOS represents the majority of crashes. This MAY be due to the above reporting.

The crash appears to be when we append to rbsp[0]. It's not clear to me how this can happen in that we create that buffer/array in the same function quite close to the failure site.

[0] https://searchfox.org/mozilla-central/rev/73e4e809df771baeb61635b25f53dfb44e930904/dom/media/platforms/agnostic/bytestreams/H264.cpp#495

Severity: -- → S3
Priority: -- → P3
Assignee: nobody → bvandyk
Group: core-security
Severity: S3 → S2
Priority: P3 → P2
Group: core-security → media-core-security
See Also: → 1665411

Hey Bryce, based on comment 1 -- and just like bug 1584956 and bug 1659938, this looks to be stalled. I'm NI'ing you for a sanity-check that I'm not missing something and to ask: Are all of these bugs due to bug 1665411? (Could these 3 bugs be dups of bug 1665411?) Or do we not have enough info? Thanks!

Flags: needinfo?(bvandyk)
Keywords: stalled

(In reply to Maire Reavy [:mreavy] from comment #2)

Hey Bryce, based on comment 1 -- and just like bug 1584956 and bug 1659938, this looks to be stalled. I'm NI'ing you for a sanity-check that I'm not missing something and to ask: Are all of these bugs due to bug 1665411? (Could these 3 bugs be dups of bug 1665411?) Or do we not have enough info? Thanks!

I think that's a fair assessment. Until we figure out the root cause of bug 1665411 I don't think I can action this. I think these bugs are due to bug 1665411 and my hope is once it's fixed these will evaporate. I see some value in keeping these separate in that we can track the different signatures with more granularity and be sure they all disappear as expected (at which point we can dupe them).

Flags: needinfo?(bvandyk)
Depends on: 1676343
Assignee: brycebugemail → nobody

Largely an older ESR crash, but still happens in small numbers in release.

Severity: S2 → S3

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:jimm, could you consider increasing the severity of this security bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)

I've looked into the fuzzing coverage and there is a large gap right before the crashing location:

https://fuzzmanager.fuzzing.mozilla.org/covmanager/collections/8268/browse/#rc=1&p=dom/media/platforms/agnostic/bytestreams/H264.cpp&s=455&hl=455

From the symptoms of the bug (rbsp being a local MediaByteBuffer and we crash while appending to it), it is highly plausible that something went out of bounds somewhere. We should try to cover that missing branch in fuzzing (from the if condition, it isn't clear to me why libFuzzer isn't hitting this on its own anyway), maybe we can find a testcase.

Tyson, can you look into this and figure out why we are not hitting that branch at all? Maybe we can manually modify an existing testcase that gets close to hit this.

Flags: needinfo?(twsmith)

There's been no more crashes here past version 106 which is when I landed bug 1784018. This largely confirms my hunch that there was a problem with locking on older macOS versions. I'm closing this as fixed since all remaining crashes are coming from old versions of Firefox.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Flags: needinfo?(twsmith)

(In reply to Gabriele Svelto [:gsvelto] from comment #7)

There's been no more crashes here past version 106 which is when I landed bug 1784018. This largely confirms my hunch that there was a problem with locking on older macOS versions. I'm closing this as fixed since all remaining crashes are coming from old versions of Firefox.

But that was a Mac fix, and all the Mac crashes here appear to be in really old versions (mostly ESR-78, and 84.x).

There was definitely a burst of windows crashes with this signature that regressed in 104.0.2 (no .0.0 or .0.1 crashes!) and also affected 105.0.1, 105.0.2, and 105.0.3. As you say, this was fixed in 106, but not by a Mac patch. Ignoring the long-ago-fixed mac bugs, the Windows regressor was in this small set:
https://hg.mozilla.org/releases/mozilla-release/pushloghtml?fromchange=FIREFOX_104_0_1_RELEASE&tochange=FIREFOX_104_0_2_RELEASE

That includes media-related fixes for bug 1781759 and bug 1781063. Those bugs weren't landed on ESR until 102.3, and the earliest ESR-102 version that has any of these regression crashes on Windows is 102.3. But there are only 4 ESR crashes in the last 6 months so it's weak evidence. Also, those two bugs dealt with principals and cross-origin checks; would they have anything to do with a crash in the H264 decoder? It's stream related, so maybe?

The crashes went away in 106, but there's no "regression" bug linked to either of the above that would account for it. It might have been one of these 17 fixes:
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20cf_status_firefox106%3Afixed%2Cverified%20-cf_status_firefox105%3Afixed%2Cverified%20prod%3Dcore%20comp%3AAudio%2FVideo%20creation_ts%3E2022-08-22

All of the ones marked "regression" were clearly identified as coming from other bugs (and most were nightly 106 regressions). The other fixes don't seem relevant. Maybe the thing that fixed it wasn't filed as a media bug? Must have taken some wrong guesses somewhere

There's bug 1788558 that dealt with media decoders, but that was filed looking at code introduced way back in Firefox 72. But maybe?

More accurate to call this WFM if we don't know what the fix might have been

Group: media-core-security → core-security-release
Keywords: regression
Resolution: FIXED → WORKSFORME
Target Milestone: --- → 106 Branch
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.