Crash in [@ mp4parse_get_indice_table]
Categories
(Core :: Audio/Video: Playback, defect, P2)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
References
Details
(Keywords: crash, csectype-uaf, sec-high)
Crash Data
This bug is for crash report bp-60ae907e-2bbe-4f46-b44d-176980190929.
Top 10 frames of crashing thread:
0 XUL gkrust_shared::panic_hook mfbt/Assertions.h:332
1 XUL core::ops::function::Fn::call src/libcore/ops/function.rs:69
2 XUL std::panicking::rust_panic_with_hook src/libstd/panicking.rs:481
3 XUL std::panicking::continue_panic_fmt src/libstd/panicking.rs:384
4 XUL rust_begin_unwind src/libstd/panicking.rs:311
5 XUL core::panicking::panic_fmt src/libcore/panicking.rs:85
6 XUL core::panicking::panic_bounds_check src/libcore/panicking.rs:61
7 XUL mp4parse_get_indice_table src/libcore/slice/mod.rs:2681
8 XUL mozilla::MP4Metadata::GetTrackIndice dom/media/mp4/MP4Metadata.cpp:444
9 XUL mozilla::MP4Demuxer::Init dom/media/mp4/MP4Demuxer.cpp:181
Not a new crash since reports go back all the way to version 63.0. Definitely mac-specific as I couldn't find crashes with this signature on other platforms. The raw crash reason points to an out-of-bounds array access in Rust.
Updated•5 years ago
|
Comment 1•4 years ago
|
||
I came across this thread on Reddit and looked up the crash: https://old.reddit.com/r/firefox/comments/iiuofl/why_does_firefox_crash_every_time_i_try_to_open_a/
The frequency of crashes increased significantly 2 months ago, around the time of the 78 release. The top cause of the crash is ESR78:
https://crash-stats.mozilla.org/signature/?product=Firefox&signature=mp4parse_get_indice_table&date=%3E%3D2019-12-09T21%3A53%3A00.000Z&date=%3C2020-09-09T21%3A53%3A00.000Z#aggregations
I skimmed through the reports, and the MozCrashReason
seems to always be like
Index out of bounds: the len is [some 5-6 digit number] but the index is 16565899579919558117
The "index" is 0xe5e5e5e5e5e5e5e5
, which looks like a jemalloc's poison value.
This suggests that the crash is caused by a use-after-free, through mp4parse_get_indice_table
in MP4Metadata::GetTrackIndice
.
Bryce, could you check if this bug report is actionable given this extra information?
Thanks for the analysis!
Recapping:
- This crash is largely MacOS specific.
Large spike for ESR78, and the majority of crashes are on ESR78.This spike is likely due to us moving old release users to ESR and release having throttling on reporting while ESR doesn't. So the numbers are inflated while the rate may not be increasing.- The MacOS crashes are all (or a vast majority) crashing due to using a value that has been poisoned.
I'm going to make this secure while I investigate. As noted above, touching poison is of concern.
I don't see any crashes yet in 82. It's possible this is due to small sample size, but another consideration is we've been landing fixes to issues found via fuzzing. I do not see anything that would clearly fix this, I will continue investigating.
NI :jbauman, does this look like anything fuzzing has caught?
Updated•4 years ago
|
Updated•4 years ago
|
Comment 4•4 years ago
|
||
I've landed a number of fuzzing fixes in mp4parse-rust, but nothing in response to a crash quite like this. Here are the most relevant changes:
- Return error for overflow in create_sample_table
- Return an error if there are multiple decoder-specific descriptors
- If the iloc size is corrupt, return an Err rather than asserting
- Add a CheckedInteger type to ensure checked arithmetic
However, if it's a use after free with mp4parse_get_indice_table
, I'd be inclined to think it's more likely an issue in the C++ code calling mp4parse_free
and then mp4parse_get_indice_table
, or possibly calling mp4parse_get_indice_table
when mp4parse_new
wasn't successful. The only place mp4parse_free
is referenced is here, so I'll poke around a bit and see if I see anything obvious.
Comment 5•4 years ago
|
||
Based on the stack, we must be here in MP4Demuxer::Init
. We can assume that mp4parse_new
was successful, because if it wasn't we'd have returned an "Parse MP4 metadata failed" error earlier in Init
. Furthermore, the only way mp4parse_free
gets called is via the FreeMP4Parser
implementation which is a destructor template argument to UniquePtr
(which I assume is correct). Given where the metadata
local variable (which contains the parser instance) is declared and lacking any calls that would destruct it before the call to metadata.GetTrackIndice()
, I'm having a hard time seeing how there would be a use-after-free here.
Does that analysis seem right to you, Bryce? Do you see another way this could be happening?
Reporter | ||
Comment 6•4 years ago
|
||
FYI I poked a few minidumps and found something odd but maybe worthy of note: the stack walker often resorts to stack scanning after the mozilla::MP4Metadata::GetTrackIndice(unsigned int)
frame. That's usually a sign that the stack has been corrupted as we can't find the frame-pointer for the previous call. I found the exact same thing in bug 1654335 so there's a chance both issues may have a common (or at least similar) cause.
Comment 7•4 years ago
|
||
Corrupt stack or not, the clues point pretty strongly at a use-after-free, but I'm having a hard time seeing how that would occur with the code the way it is.
What do I need to do to get access to view bug 1654335?
Comment 8•4 years ago
|
||
What do I need to do to get access to view bug 1654335?
I just CCed you on the bug.
I've learnt that crash reporting in release and ESR works differently and that release only reports 10% of crashes, while ESR reports them all. Since we migrated a large number of users from old MacOS release builds to ESR, the uptick we see in crash stats is likely inflated. I've updated comment 2 to reflect this.
After looking I've added what appears to be a related sig that is catching Windows crashes. Some of the crashes there look like the MacOS ones with us panicing due to bad lengths, but there appear to be other issues, so I'm not sure they're all the same root cause. I'm focusing on looking at the ones that seem like bad indexing for now.
If MacOS dumps can be debugged locally, I don't know how, but I can work with a Windows dump. Stack of a crash from https://crash-stats.mozilla.org/report/index/d3fc0715-27b9-463c-9d86-4e7100200908 (assuming correct stack walking + symbolization):
0:051> k
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr Call Site
00 (Inline Function) --------`-------- xul!MOZ_Crash+0xa [/builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h @ 332]
01 00000075`dc43e7e0 00000271`261369a5 xul!RustMozCrash+0x10 [/builds/worker/checkouts/gecko/mozglue/static/rust/wrappers.cpp @ 16]
02 00000075`dc43e810 00000271`2613692c xul!mozglue_static::panic_hook+0x75 [/builds/worker/checkouts/gecko/mozglue/static/rust/lib.rs @ 89]
03 (Inline Function) --------`-------- xul!core::ops::function::FnOnce::call_once+0x5 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 232]
04 00000075`dc43ec60 00000271`2628dcb1 xul!core::ops::function::FnOnce::call_once<fn(core::panic::PanicInfo*),(core::panic::PanicInfo*)>+0xc [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 72]
05 00000075`dc43ec90 00000271`26293b2a xul!std::panicking::rust_panic_with_hook+0x1a1 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 477]
06 00000075`dc43ed90 00000271`25f2df10 xul!std::panicking::begin_panic_handler+0x4a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 378]
07 00000075`dc43ede0 00000271`25f2ddaa xul!std::panicking::begin_panic_fmt+0x30 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 85]
08 00000075`dc43ee30 00000271`26158f60 xul!core::panicking::panic_bounds_check+0x7a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 63]
09 (Inline Function) --------`-------- xul!mp4parse_capi::create_sample_table+0x1a2e [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1450]
0a (Inline Function) --------`-------- xul!mp4parse_capi::get_indice_table+0x1b98 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1139]
0b 00000075`dc43eec0 00000271`246e51c6 xul!mp4parse_capi::mp4parse_get_indice_table+0x1bd0 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1096]
0c 00000075`dc43f060 00000271`246e32c2 xul!mozilla::MP4Metadata::GetTrackIndice+0x76 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Metadata.cpp @ 446]
0d 00000075`dc43f150 00000271`244da32f xul!mozilla::MP4Demuxer::Init+0xb52 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Demuxer.cpp @ 181]
0e (Inline Function) --------`-------- xul!mozilla::MediaFormatReader::DemuxerProxy::Init::<unnamed-tag>::operator()+0x62 [/builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp @ 728]
0f 00000075`dc43f3b0 00000271`22901390 xul!mozilla::detail::ProxyFunctionRunnable<`lambda at /builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp:723:22',mozilla::MozPromise<mozilla::MediaResult,mozilla::MediaResult,1> >::Run+0x7f [/builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h @ 1565]
10 00000075`dc43f410 00000271`2290c846 xul!mozilla::TaskQueue::Runner::Run+0x180 [/builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp @ 165]
11 00000075`dc43f4b0 00000271`21fd3dde xul!nsThreadPool::Run+0x686 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp @ 301]
12 00000075`dc43f650 00000271`21fd2c21 xul!nsThread::ProcessNextEvent+0xcae [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1239]
13 (Inline Function) --------`-------- xul!NS_ProcessNextEvent+0x20 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 513]
14 00000075`dc43f840 00000271`22dbdc78 xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x181 [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 332]
15 (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0xf [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 334]
16 00000075`dc43f8c0 00000271`21fab871 xul!MessageLoop::RunHandler+0x28 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 328]
17 00000075`dc43f910 00000271`2290785d xul!MessageLoop::Run+0x51 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 310]
18 00000075`dc43f960 00007ff9`065af854 xul!nsThread::ThreadFunc+0x12d [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 449]
19 00000075`dc43fb20 00007ff9`0659fdba nss3!_PR_NativeRunThread+0x144 [/builds/worker/checkouts/gecko/nsprpub/pr/src/threads/combined/pruthr.c @ 421]
1a 00000075`dc43fba0 00007ff9`3e8a1542 nss3!pr_root+0xa [/builds/worker/checkouts/gecko/nsprpub/pr/src/md/windows/w95thred.c @ 140]
1b 00000075`dc43fbd0 00000271`5ec3b9f0 ucrtbase+0x21542
1c 00000075`dc43fbd8 00000000`00000000 0x00000271`5ec3b9f0
i.e. something is going wrong with us indexing into the sample table[0]. It makes sense peek_index
is our bad index and that something is going wrong earlier to cause this. This suggests sort_table
contains invalid data. Could something be going wrong with the TryVec
such that we end up with it being freed and then poisoned or full of garbage that leads to bad indexing?
My current two areas I plan to look more at:
- Our fallible allocation. We've already switched out what we're doing upstream, I wonder if there's any other dangers in the old code that could lead to this.
- The data we're getting from the mp4 and how it's being used here. We're using some data here to set sizes of containers. Is there anywhere that data could cause problems by being malformed/malicious (overflows, etc)?
Comment 10•4 years ago
|
||
Could something be going wrong with the
TryVec
such that we end up with it being freed and then poisoned or full of garbage that leads to bad indexing?
It's certainly possible, since there is some unsafe code involved. Nearly all of the unsafe code involved is in this one function if you'd like to take a look. Though this crash is happening with much older code, so I don't think that's likely relevant at all. Based on the last update of mp4parse-rust code before this bug was filed, create_sample_table
would've been using a stdlib Vec
with vec_push
for fallible allocation coming from the 0.0.1 version of our mp4parse_fallible
crate. In that case, the code to check is here.
It does seem like it would be very instructive to know whether we still see this crash on code after the switch to broader use of fallible allocation or after we switched from our home-grown implementation based on malloc/realloc
to the fallible_collections
implementation based on rust stdlib alloc API.
- This update on May 9 marks the landing of the former, and should show up in 78
- This update on Sep 2 marks the landing of the latter, and should show up in 82
Comment 11•4 years ago
•
|
||
It does seem like it would be very instructive to know whether we still see this crash on code after the switch to broader use of fallible allocation or after we switched from our home-grown implementation based on malloc/realloc to the fallible_collections implementation based on rust stdlib alloc API.
This update on May 9 marks the landing of the former, and should show up in 78
Apparently we do for at least this one update:
This is still true even if we filter out all 0x0
crash addresses:
Comment 12•4 years ago
|
||
This is still true even if we filter out all 0x0 crash addresses:
But note that crash addresses > 128TB are changed to 0x0 on the Mac and Linux (and to 0xffffffffffffffff on Windows). This includes addresses that have been poisoned.
Assigning to me so this has an owner. I don't think I can action this at this stage, but will keep eyes on it.
Comment 14•4 years ago
|
||
(In reply to Bryce Seager van Dyk (:bryce) from comment #13)
Assigning to me so this has an owner. I don't think I can action this at this stage, but will keep eyes on it.
Hey Bryce, Based on your comment #13, this bug appears stalled to me. (So I'm marking it as such.) I'm needinfo'ing you for a "sanity-check" that marking it stalled makes sense to you. Thanks!
(In reply to Maire Reavy [:mreavy] from comment #14)
(In reply to Bryce Seager van Dyk (:bryce) from comment #13)
Assigning to me so this has an owner. I don't think I can action this at this stage, but will keep eyes on it.
Hey Bryce, Based on your comment #13, this bug appears stalled to me. (So I'm marking it as such.) I'm needinfo'ing you for a "sanity-check" that marking it stalled makes sense to you. Thanks!
I think that's a fair assessment. Until we figure out the root cause of bug 1665411 I don't think I can action this.
Comment 16•2 years ago
|
||
The bug assignee didn't login in Bugzilla in the last months and this bug has priority 'P2'/severity 'critical'.
:jimm, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 17•2 years ago
|
||
https://crash-stats.mozilla.org/signature/?signature=mp4parse_get_indice_table
still present, but only on very old versions of osx. pretty high volume, so it'd be great to get it fixed. stalled at this point though until we find an owner who has the cycles.
Reporter | ||
Comment 18•2 years ago
|
||
The volume is artificially amplified by the crashes happening on the ESR channel which isn't throttled. We're seeing all crashes and that's ~200 users per week which isn't too bad.
Comment 19•2 years ago
|
||
The severity field for this bug is set to S4. However, the bug is flagged with the sec-high
keyword.
:jimm, could you consider increasing the severity of this security bug?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Comment 20•2 years ago
|
||
No recent reports from any modern versions.
Comment 21•2 years ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.
Updated•1 year ago
|
Description
•