I've learnt that crash reporting in release and ESR works differently and that release only reports 10% of crashes, while ESR reports them all. Since we migrated a large number of users from old MacOS release builds to ESR, the uptick we see in crash stats is likely inflated. I've updated comment 2 to reflect this. --- After looking I've added what appears to be a related sig that is catching Windows crashes. Some of the crashes there look like the MacOS ones with us panicing due to bad lengths, but there appear to be other issues, so I'm not sure they're all the same root cause. I'm focusing on looking at the ones that seem like bad indexing for now. If MacOS dumps can be debugged locally, I don't know how, but I can work with a Windows dump. Stack of a crash from https://crash-stats.mozilla.org/report/index/d3fc0715-27b9-463c-9d86-4e7100200908 (assuming correct stack walking + symbolization): ``` 0:051> k *** Stack trace for last set context - .thread/.cxr resets it # Child-SP RetAddr Call Site 00 (Inline Function) --------`-------- xul!MOZ_Crash+0xa [/builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h @ 332] 01 00000075`dc43e7e0 00000271`261369a5 xul!RustMozCrash+0x10 [/builds/worker/checkouts/gecko/mozglue/static/rust/wrappers.cpp @ 16] 02 00000075`dc43e810 00000271`2613692c xul!mozglue_static::panic_hook+0x75 [/builds/worker/checkouts/gecko/mozglue/static/rust/lib.rs @ 89] 03 (Inline Function) --------`-------- xul!core::ops::function::FnOnce::call_once+0x5 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 232] 04 00000075`dc43ec60 00000271`2628dcb1 xul!core::ops::function::FnOnce::call_once<fn(core::panic::PanicInfo*),(core::panic::PanicInfo*)>+0xc [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 72] 05 00000075`dc43ec90 00000271`26293b2a xul!std::panicking::rust_panic_with_hook+0x1a1 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 477] 06 00000075`dc43ed90 00000271`25f2df10 xul!std::panicking::begin_panic_handler+0x4a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 378] 07 00000075`dc43ede0 00000271`25f2ddaa xul!std::panicking::begin_panic_fmt+0x30 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 85] 08 00000075`dc43ee30 00000271`26158f60 xul!core::panicking::panic_bounds_check+0x7a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 63] 09 (Inline Function) --------`-------- xul!mp4parse_capi::create_sample_table+0x1a2e [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1450] 0a (Inline Function) --------`-------- xul!mp4parse_capi::get_indice_table+0x1b98 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1139] 0b 00000075`dc43eec0 00000271`246e51c6 xul!mp4parse_capi::mp4parse_get_indice_table+0x1bd0 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1096] 0c 00000075`dc43f060 00000271`246e32c2 xul!mozilla::MP4Metadata::GetTrackIndice+0x76 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Metadata.cpp @ 446] 0d 00000075`dc43f150 00000271`244da32f xul!mozilla::MP4Demuxer::Init+0xb52 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Demuxer.cpp @ 181] 0e (Inline Function) --------`-------- xul!mozilla::MediaFormatReader::DemuxerProxy::Init::<unnamed-tag>::operator()+0x62 [/builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp @ 728] 0f 00000075`dc43f3b0 00000271`22901390 xul!mozilla::detail::ProxyFunctionRunnable<`lambda at /builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp:723:22',mozilla::MozPromise<mozilla::MediaResult,mozilla::MediaResult,1> >::Run+0x7f [/builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h @ 1565] 10 00000075`dc43f410 00000271`2290c846 xul!mozilla::TaskQueue::Runner::Run+0x180 [/builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp @ 165] 11 00000075`dc43f4b0 00000271`21fd3dde xul!nsThreadPool::Run+0x686 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp @ 301] 12 00000075`dc43f650 00000271`21fd2c21 xul!nsThread::ProcessNextEvent+0xcae [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1239] 13 (Inline Function) --------`-------- xul!NS_ProcessNextEvent+0x20 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 513] 14 00000075`dc43f840 00000271`22dbdc78 xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x181 [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 332] 15 (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0xf [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 334] 16 00000075`dc43f8c0 00000271`21fab871 xul!MessageLoop::RunHandler+0x28 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 328] 17 00000075`dc43f910 00000271`2290785d xul!MessageLoop::Run+0x51 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 310] 18 00000075`dc43f960 00007ff9`065af854 xul!nsThread::ThreadFunc+0x12d [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 449] 19 00000075`dc43fb20 00007ff9`0659fdba nss3!_PR_NativeRunThread+0x144 [/builds/worker/checkouts/gecko/nsprpub/pr/src/threads/combined/pruthr.c @ 421] 1a 00000075`dc43fba0 00007ff9`3e8a1542 nss3!pr_root+0xa [/builds/worker/checkouts/gecko/nsprpub/pr/src/md/windows/w95thred.c @ 140] 1b 00000075`dc43fbd0 00000271`5ec3b9f0 ucrtbase+0x21542 1c 00000075`dc43fbd8 00000000`00000000 0x00000271`5ec3b9f0 ``` i.e. something is going wrong with us indexing into the sample table[0]. It makes sense `peek_index` is our bad index and that something is going wrong earlier to cause this. This suggests `sort_table` contains invalid data. Could something be going wrong with the `TryVec` such that we end up with it being freed and then poisoned or full of garbage that leads to bad indexing? My current two areas I plan to look more at: - Our failable allocation. We've already switched out what we're doing upstream, I wonder if there's any other dangers in the old code that could lead to this. - The data we're getting from the mp4 and how it's being used here. We're using some data here to set sizes of containers. Is there anywhere that data could cause problems by being malformed/malicious (overflows, etc)? [0] https://searchfox.org/mozilla-central/rev/eb9d5c97927aea75f0c8e38bbc5b5d288099e687/third_party/rust/mp4parse_capi/src/lib.rs#1611
Bug 1584956 Comment 9 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
I've learnt that crash reporting in release and ESR works differently and that release only reports 10% of crashes, while ESR reports them all. Since we migrated a large number of users from old MacOS release builds to ESR, the uptick we see in crash stats is likely inflated. I've updated comment 2 to reflect this. --- After looking I've added what appears to be a related sig that is catching Windows crashes. Some of the crashes there look like the MacOS ones with us panicing due to bad lengths, but there appear to be other issues, so I'm not sure they're all the same root cause. I'm focusing on looking at the ones that seem like bad indexing for now. If MacOS dumps can be debugged locally, I don't know how, but I can work with a Windows dump. Stack of a crash from https://crash-stats.mozilla.org/report/index/d3fc0715-27b9-463c-9d86-4e7100200908 (assuming correct stack walking + symbolization): ``` 0:051> k *** Stack trace for last set context - .thread/.cxr resets it # Child-SP RetAddr Call Site 00 (Inline Function) --------`-------- xul!MOZ_Crash+0xa [/builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h @ 332] 01 00000075`dc43e7e0 00000271`261369a5 xul!RustMozCrash+0x10 [/builds/worker/checkouts/gecko/mozglue/static/rust/wrappers.cpp @ 16] 02 00000075`dc43e810 00000271`2613692c xul!mozglue_static::panic_hook+0x75 [/builds/worker/checkouts/gecko/mozglue/static/rust/lib.rs @ 89] 03 (Inline Function) --------`-------- xul!core::ops::function::FnOnce::call_once+0x5 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 232] 04 00000075`dc43ec60 00000271`2628dcb1 xul!core::ops::function::FnOnce::call_once<fn(core::panic::PanicInfo*),(core::panic::PanicInfo*)>+0xc [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 72] 05 00000075`dc43ec90 00000271`26293b2a xul!std::panicking::rust_panic_with_hook+0x1a1 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 477] 06 00000075`dc43ed90 00000271`25f2df10 xul!std::panicking::begin_panic_handler+0x4a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 378] 07 00000075`dc43ede0 00000271`25f2ddaa xul!std::panicking::begin_panic_fmt+0x30 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 85] 08 00000075`dc43ee30 00000271`26158f60 xul!core::panicking::panic_bounds_check+0x7a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 63] 09 (Inline Function) --------`-------- xul!mp4parse_capi::create_sample_table+0x1a2e [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1450] 0a (Inline Function) --------`-------- xul!mp4parse_capi::get_indice_table+0x1b98 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1139] 0b 00000075`dc43eec0 00000271`246e51c6 xul!mp4parse_capi::mp4parse_get_indice_table+0x1bd0 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1096] 0c 00000075`dc43f060 00000271`246e32c2 xul!mozilla::MP4Metadata::GetTrackIndice+0x76 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Metadata.cpp @ 446] 0d 00000075`dc43f150 00000271`244da32f xul!mozilla::MP4Demuxer::Init+0xb52 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Demuxer.cpp @ 181] 0e (Inline Function) --------`-------- xul!mozilla::MediaFormatReader::DemuxerProxy::Init::<unnamed-tag>::operator()+0x62 [/builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp @ 728] 0f 00000075`dc43f3b0 00000271`22901390 xul!mozilla::detail::ProxyFunctionRunnable<`lambda at /builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp:723:22',mozilla::MozPromise<mozilla::MediaResult,mozilla::MediaResult,1> >::Run+0x7f [/builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h @ 1565] 10 00000075`dc43f410 00000271`2290c846 xul!mozilla::TaskQueue::Runner::Run+0x180 [/builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp @ 165] 11 00000075`dc43f4b0 00000271`21fd3dde xul!nsThreadPool::Run+0x686 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp @ 301] 12 00000075`dc43f650 00000271`21fd2c21 xul!nsThread::ProcessNextEvent+0xcae [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1239] 13 (Inline Function) --------`-------- xul!NS_ProcessNextEvent+0x20 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 513] 14 00000075`dc43f840 00000271`22dbdc78 xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x181 [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 332] 15 (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0xf [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 334] 16 00000075`dc43f8c0 00000271`21fab871 xul!MessageLoop::RunHandler+0x28 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 328] 17 00000075`dc43f910 00000271`2290785d xul!MessageLoop::Run+0x51 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 310] 18 00000075`dc43f960 00007ff9`065af854 xul!nsThread::ThreadFunc+0x12d [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 449] 19 00000075`dc43fb20 00007ff9`0659fdba nss3!_PR_NativeRunThread+0x144 [/builds/worker/checkouts/gecko/nsprpub/pr/src/threads/combined/pruthr.c @ 421] 1a 00000075`dc43fba0 00007ff9`3e8a1542 nss3!pr_root+0xa [/builds/worker/checkouts/gecko/nsprpub/pr/src/md/windows/w95thred.c @ 140] 1b 00000075`dc43fbd0 00000271`5ec3b9f0 ucrtbase+0x21542 1c 00000075`dc43fbd8 00000000`00000000 0x00000271`5ec3b9f0 ``` i.e. something is going wrong with us indexing into the sample table[0]. It makes sense `peek_index` is our bad index and that something is going wrong earlier to cause this. This suggests `sort_table` contains invalid data. Could something be going wrong with the `TryVec` such that we end up with it being freed and then poisoned or full of garbage that leads to bad indexing? My current two areas I plan to look more at: - Our fallible allocation. We've already switched out what we're doing upstream, I wonder if there's any other dangers in the old code that could lead to this. - The data we're getting from the mp4 and how it's being used here. We're using some data here to set sizes of containers. Is there anywhere that data could cause problems by being malformed/malicious (overflows, etc)? [0] https://searchfox.org/mozilla-central/rev/eb9d5c97927aea75f0c8e38bbc5b5d288099e687/third_party/rust/mp4parse_capi/src/lib.rs#1611