Closed Bug 1731085 Opened 3 years ago Closed 3 years ago

Crash in [@ mozilla::MP4Demuxer::Init]

Categories

(Core :: Audio/Video: Playback, defect)

ARM
Android
defect

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox92 --- wontfix
firefox93 --- fixed
firefox94 --- fixed

People

(Reporter: RyanVM, Assigned: jrmuizel)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/5fd6bcff-73fc-4c9a-b2b2-c275c0210916

#5 overall topcrash in Fenix 92.1.1. Appears to be nearly all ARM32 and definitely appears to have started in the 92 timeframe. Gerald, can you help find an owner for this?

Reason: SIGBUS / BUS_ADRALN

Top 10 frames of crashing thread:

0 libxul.so mozilla::MP4Demuxer::Init dom/media/mp4/MP4Demuxer.cpp:108
1 libxul.so mozilla::TrackBuffersManager::SegmentParserLoop dom/media/mediasource/TrackBuffersManager.cpp:860
2 libxul.so mozilla::detail::RunnableMethodImpl<RefPtr<nsUrlClassifierDBServiceWorker> const, nsresult  xpcom/threads/nsThreadUtils.h:1201
3 libxul.so mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:208
4 libxul.so nsThreadPool::Run xpcom/threads/nsThreadPool.cpp:303
5 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1142
6 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:466
7 libxul.so mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
8 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:306
9 libxul.so nsThread::ThreadFunc xpcom/threads/nsThread.cpp:390
Flags: needinfo?(gsquelart)

Alastor, could you please have a look, or redirect NI as appropriate? Thank you.

Flags: needinfo?(gsquelart) → needinfo?(alwu)

We haven't modified MP4Demuxer and MSE codes for a while, and I didn't notice there is any recent change we made on Android. Do you guys have any idea about this?
Thank you.

Flags: needinfo?(jolin)
Flags: needinfo?(bvandyk)
Flags: needinfo?(alwu)

This could be Rust mp4 parsing code. If the stacks are to be believed we're crashing here. I don't know a lot about Android crash reports, but if we have stack frames that aren't being reported (due to inlining or whatever), we could be in Rust code. I'm continuing to look, though would appreciate other eyes on this too.

Crashes start in 92beta2 as far as I can tell. Nothing jumps out to me between beta 1 and 2. Maybe there's something not obvious in there. Alternatively, we didn't catch this during beta 1, but the bug was still there. I don't see anything obvious introduced since the start of the beta cycle. At this point, my working theory is our base branch point for 92 contains the bug.

Bug 1723247 landed in 92, which fits the time frame for this.

Restricting until we have a better idea what's going on.

The pointers we're accessing are not aligned, hence the SIGBUS / BUS_ADRALN. The addresses appear wild to me. That makes me nervous. If the addresses are coming from somewhere in the mp4 stream, then I am even more nervous.

When aggregating crashes on 'Android model' I see a lot of tablets. Not sure what to make of that, if anything.

Group: core-security
Group: core-security → media-core-security

(In reply to Bryce Seager van Dyk (:bryce) from comment #4)

Bug 1723247 landed in 92, which fits the time frame for this.

Looking at the changes in that update, there's rather little that would affect the MP4Demuxer code path. Instead, pretty much all those changes were AVIF-specific. However, I do see one or two things that could be relevant:

  1. be_u32_with_limit and read_buf were removed in favor of readers which use fallible allocation instead of a fixed limit. The fallible allocation should fail gracefully, but I suppose it could be admitting large uses of memory which succeed and cause crashes down the line which wouldn't have otherwise happened because valid inputs which exhausted so much memory were previously rejected.

  2. read_hdlr is a bit stricter on compliance, but none of that should result in any crashes; they're all gracefully-handled errors

This appears specific to arm(32) cpus. The one arm64 crash is on a different line. I.e. this is a 32bit arch bug.

r5 reliably holds the crashing address in reports I've looked at.

I'm going to put this down for now. My plan, when I have time, is to disasseble the xul from a failing build, and trace the source and usage of r5 to try and better identify what is responsible for the bogus value. Happy for someone else to grab this and do so if they have time. Also open to other suggestions to figure out the source.

Previously we ran into crashes here on platforms that
don't support unaligned accesses.

Assignee: nobody → jmuizelaar
Status: NEW → ASSIGNED

For reference the actual stack for this from cargo install --examples addr2line && addr2line -pCfi -e libxul.so.dbg 0x127ec1c is:

unsigned int mozilla::BufferReader::ReadType<unsigned int>() at /builds/worker/workspace/obj-build/dist/include/BufferReader.h:273
 (inlined by) mozilla::CryptoFile::DoUpdate(unsigned char const*, unsigned int) at /builds/worker/checkouts/gecko/dom/media/mp4/DecoderData.cpp:34
 (inlined by) mozilla::CryptoFile::Update(unsigned char const*, unsigned int) at /builds/worker/workspace/obj-build/dist/include/DecoderData.h:42
 (inlined by) mozilla::MP4Metadata::UpdateCrypto() at /builds/worker/checkouts/gecko/dom/media/mp4/MP4Metadata.cpp:127
 (inlined by) mozilla::MP4Metadata::Parse() at /builds/worker/checkouts/gecko/dom/media/mp4/MP4Metadata.cpp:112
 (inlined by) mozilla::MP4Demuxer::Init() at /builds/worker/checkouts/gecko/dom/media/mp4/MP4Demuxer.cpp:108

Adding to comment 9, https://bugzilla.mozilla.org/show_bug.cgi?id=1729698#c2 contains info on how to get the debug symbols for different kinds of builds.

Thanks for the help on this, Jeff!

No longer blocks: media-triage

For those who might be interested, here is the Matrix conversation that leads to the solution -- https://matrix.to/#/!PqXdkcAVuFgoyoDtBY:mozilla.org/$yzjd2LpyzdKzuyvh-1uEK-_TIEBDV5Jl-X26nXa7-x0?via=mozilla.org

Flags: needinfo?(jolin)

Does this need to be a security bug? If we're just going to always crash.

Flags: needinfo?(jmuizelaar)

(In reply to Andrew McCreight [:mccr8] from comment #12)

Does this need to be a security bug? If we're just going to always crash.

Jeff may have more insight, but I think this is not a sec bug now that we know the cause. We're doing a misaligned read, but it's into a buffer we expect to read from. My initial concern was this was a wild read, but that does not appear to be the case.

Group: media-core-security
Flags: needinfo?(jmuizelaar)

Yeah, I agree. This doesn't need to be security bug.

Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9c367207f4e1
Properly handle unaligned accesses. r=bryce
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch

Comment on attachment 9242541 [details]
Bug 1731085. Properly handle unaligned accesses.

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes on 32 bit Android when viewing some videos
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The change avoids some undefined behaviour but should not have any visible change on non-32bit Android.
  • String changes made/needed:
Attachment #9242541 - Flags: approval-mozilla-beta?

Comment on attachment 9242541 [details]
Bug 1731085. Properly handle unaligned accesses.

Approved for landing on the beta branch before the first merge into mozilla-release, thanks.

Attachment #9242541 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: