Bug 1584956 Comment 9 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

I've learnt that crash reporting in release and ESR works differently and that release only reports 10% of crashes, while ESR reports them all. Since we migrated a large number of users from old MacOS release builds to ESR, the uptick we see in crash stats is likely inflated. I've updated comment 2 to reflect this.

---

After looking I've added what appears to be a related sig that is catching Windows crashes. Some of the crashes there look like the MacOS ones with us panicing due to bad lengths, but there appear to be other issues, so I'm not sure they're all the same root cause. I'm focusing on looking at the ones that seem like bad indexing for now.

If MacOS dumps can be debugged locally, I don't know how, but I can work with a Windows dump. Stack of a crash from https://crash-stats.mozilla.org/report/index/d3fc0715-27b9-463c-9d86-4e7100200908 (assuming correct stack walking + symbolization):

```
0:051> k
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     xul!MOZ_Crash+0xa [/builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h @ 332] 
01 00000075`dc43e7e0 00000271`261369a5     xul!RustMozCrash+0x10 [/builds/worker/checkouts/gecko/mozglue/static/rust/wrappers.cpp @ 16] 
02 00000075`dc43e810 00000271`2613692c     xul!mozglue_static::panic_hook+0x75 [/builds/worker/checkouts/gecko/mozglue/static/rust/lib.rs @ 89] 
03 (Inline Function) --------`--------     xul!core::ops::function::FnOnce::call_once+0x5 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 232] 
04 00000075`dc43ec60 00000271`2628dcb1     xul!core::ops::function::FnOnce::call_once<fn(core::panic::PanicInfo*),(core::panic::PanicInfo*)>+0xc [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 72] 
05 00000075`dc43ec90 00000271`26293b2a     xul!std::panicking::rust_panic_with_hook+0x1a1 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 477] 
06 00000075`dc43ed90 00000271`25f2df10     xul!std::panicking::begin_panic_handler+0x4a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 378] 
07 00000075`dc43ede0 00000271`25f2ddaa     xul!std::panicking::begin_panic_fmt+0x30 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 85] 
08 00000075`dc43ee30 00000271`26158f60     xul!core::panicking::panic_bounds_check+0x7a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 63] 
09 (Inline Function) --------`--------     xul!mp4parse_capi::create_sample_table+0x1a2e [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1450] 
0a (Inline Function) --------`--------     xul!mp4parse_capi::get_indice_table+0x1b98 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1139] 
0b 00000075`dc43eec0 00000271`246e51c6     xul!mp4parse_capi::mp4parse_get_indice_table+0x1bd0 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1096] 
0c 00000075`dc43f060 00000271`246e32c2     xul!mozilla::MP4Metadata::GetTrackIndice+0x76 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Metadata.cpp @ 446] 
0d 00000075`dc43f150 00000271`244da32f     xul!mozilla::MP4Demuxer::Init+0xb52 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Demuxer.cpp @ 181] 
0e (Inline Function) --------`--------     xul!mozilla::MediaFormatReader::DemuxerProxy::Init::<unnamed-tag>::operator()+0x62 [/builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp @ 728] 
0f 00000075`dc43f3b0 00000271`22901390     xul!mozilla::detail::ProxyFunctionRunnable<`lambda at /builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp:723:22',mozilla::MozPromise<mozilla::MediaResult,mozilla::MediaResult,1> >::Run+0x7f [/builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h @ 1565] 
10 00000075`dc43f410 00000271`2290c846     xul!mozilla::TaskQueue::Runner::Run+0x180 [/builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp @ 165] 
11 00000075`dc43f4b0 00000271`21fd3dde     xul!nsThreadPool::Run+0x686 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp @ 301] 
12 00000075`dc43f650 00000271`21fd2c21     xul!nsThread::ProcessNextEvent+0xcae [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1239] 
13 (Inline Function) --------`--------     xul!NS_ProcessNextEvent+0x20 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 513] 
14 00000075`dc43f840 00000271`22dbdc78     xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x181 [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 332] 
15 (Inline Function) --------`--------     xul!MessageLoop::RunInternal+0xf [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 334] 
16 00000075`dc43f8c0 00000271`21fab871     xul!MessageLoop::RunHandler+0x28 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 328] 
17 00000075`dc43f910 00000271`2290785d     xul!MessageLoop::Run+0x51 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 310] 
18 00000075`dc43f960 00007ff9`065af854     xul!nsThread::ThreadFunc+0x12d [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 449] 
19 00000075`dc43fb20 00007ff9`0659fdba     nss3!_PR_NativeRunThread+0x144 [/builds/worker/checkouts/gecko/nsprpub/pr/src/threads/combined/pruthr.c @ 421] 
1a 00000075`dc43fba0 00007ff9`3e8a1542     nss3!pr_root+0xa [/builds/worker/checkouts/gecko/nsprpub/pr/src/md/windows/w95thred.c @ 140] 
1b 00000075`dc43fbd0 00000271`5ec3b9f0     ucrtbase+0x21542
1c 00000075`dc43fbd8 00000000`00000000     0x00000271`5ec3b9f0
```

i.e. something is going wrong with us indexing into the sample table[0]. It makes sense `peek_index` is our bad index and that something is going wrong earlier to cause this. This suggests `sort_table` contains invalid data. Could something be going wrong with the `TryVec` such that we end up with it being freed and then poisoned or full of garbage that leads to bad indexing?

My current two areas I plan to look more at:
- Our failable allocation. We've already switched out what we're doing upstream, I wonder if there's any other dangers in the old code that could lead to this.
- The data we're getting from the mp4 and how it's being used here. We're using some data here to set sizes of containers. Is there anywhere that data could cause problems by being malformed/malicious (overflows, etc)?

[0] https://searchfox.org/mozilla-central/rev/eb9d5c97927aea75f0c8e38bbc5b5d288099e687/third_party/rust/mp4parse_capi/src/lib.rs#1611
I've learnt that crash reporting in release and ESR works differently and that release only reports 10% of crashes, while ESR reports them all. Since we migrated a large number of users from old MacOS release builds to ESR, the uptick we see in crash stats is likely inflated. I've updated comment 2 to reflect this.

---

After looking I've added what appears to be a related sig that is catching Windows crashes. Some of the crashes there look like the MacOS ones with us panicing due to bad lengths, but there appear to be other issues, so I'm not sure they're all the same root cause. I'm focusing on looking at the ones that seem like bad indexing for now.

If MacOS dumps can be debugged locally, I don't know how, but I can work with a Windows dump. Stack of a crash from https://crash-stats.mozilla.org/report/index/d3fc0715-27b9-463c-9d86-4e7100200908 (assuming correct stack walking + symbolization):

```
0:051> k
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     xul!MOZ_Crash+0xa [/builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h @ 332] 
01 00000075`dc43e7e0 00000271`261369a5     xul!RustMozCrash+0x10 [/builds/worker/checkouts/gecko/mozglue/static/rust/wrappers.cpp @ 16] 
02 00000075`dc43e810 00000271`2613692c     xul!mozglue_static::panic_hook+0x75 [/builds/worker/checkouts/gecko/mozglue/static/rust/lib.rs @ 89] 
03 (Inline Function) --------`--------     xul!core::ops::function::FnOnce::call_once+0x5 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 232] 
04 00000075`dc43ec60 00000271`2628dcb1     xul!core::ops::function::FnOnce::call_once<fn(core::panic::PanicInfo*),(core::panic::PanicInfo*)>+0xc [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\src\libcore\ops\function.rs @ 72] 
05 00000075`dc43ec90 00000271`26293b2a     xul!std::panicking::rust_panic_with_hook+0x1a1 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 477] 
06 00000075`dc43ed90 00000271`25f2df10     xul!std::panicking::begin_panic_handler+0x4a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libstd\panicking.rs @ 378] 
07 00000075`dc43ede0 00000271`25f2ddaa     xul!std::panicking::begin_panic_fmt+0x30 [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 85] 
08 00000075`dc43ee30 00000271`26158f60     xul!core::panicking::panic_bounds_check+0x7a [/rustc/4fb7144ed159f94491249e86d5bbd033b5d60550\/src\libcore\panicking.rs @ 63] 
09 (Inline Function) --------`--------     xul!mp4parse_capi::create_sample_table+0x1a2e [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1450] 
0a (Inline Function) --------`--------     xul!mp4parse_capi::get_indice_table+0x1b98 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1139] 
0b 00000075`dc43eec0 00000271`246e51c6     xul!mp4parse_capi::mp4parse_get_indice_table+0x1bd0 [/builds/worker/checkouts/gecko/third_party/rust/mp4parse_capi/src/lib.rs @ 1096] 
0c 00000075`dc43f060 00000271`246e32c2     xul!mozilla::MP4Metadata::GetTrackIndice+0x76 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Metadata.cpp @ 446] 
0d 00000075`dc43f150 00000271`244da32f     xul!mozilla::MP4Demuxer::Init+0xb52 [/builds/worker/checkouts/gecko/dom/media/mp4/MP4Demuxer.cpp @ 181] 
0e (Inline Function) --------`--------     xul!mozilla::MediaFormatReader::DemuxerProxy::Init::<unnamed-tag>::operator()+0x62 [/builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp @ 728] 
0f 00000075`dc43f3b0 00000271`22901390     xul!mozilla::detail::ProxyFunctionRunnable<`lambda at /builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp:723:22',mozilla::MozPromise<mozilla::MediaResult,mozilla::MediaResult,1> >::Run+0x7f [/builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h @ 1565] 
10 00000075`dc43f410 00000271`2290c846     xul!mozilla::TaskQueue::Runner::Run+0x180 [/builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp @ 165] 
11 00000075`dc43f4b0 00000271`21fd3dde     xul!nsThreadPool::Run+0x686 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp @ 301] 
12 00000075`dc43f650 00000271`21fd2c21     xul!nsThread::ProcessNextEvent+0xcae [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1239] 
13 (Inline Function) --------`--------     xul!NS_ProcessNextEvent+0x20 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 513] 
14 00000075`dc43f840 00000271`22dbdc78     xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x181 [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 332] 
15 (Inline Function) --------`--------     xul!MessageLoop::RunInternal+0xf [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 334] 
16 00000075`dc43f8c0 00000271`21fab871     xul!MessageLoop::RunHandler+0x28 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 328] 
17 00000075`dc43f910 00000271`2290785d     xul!MessageLoop::Run+0x51 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 310] 
18 00000075`dc43f960 00007ff9`065af854     xul!nsThread::ThreadFunc+0x12d [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 449] 
19 00000075`dc43fb20 00007ff9`0659fdba     nss3!_PR_NativeRunThread+0x144 [/builds/worker/checkouts/gecko/nsprpub/pr/src/threads/combined/pruthr.c @ 421] 
1a 00000075`dc43fba0 00007ff9`3e8a1542     nss3!pr_root+0xa [/builds/worker/checkouts/gecko/nsprpub/pr/src/md/windows/w95thred.c @ 140] 
1b 00000075`dc43fbd0 00000271`5ec3b9f0     ucrtbase+0x21542
1c 00000075`dc43fbd8 00000000`00000000     0x00000271`5ec3b9f0
```

i.e. something is going wrong with us indexing into the sample table[0]. It makes sense `peek_index` is our bad index and that something is going wrong earlier to cause this. This suggests `sort_table` contains invalid data. Could something be going wrong with the `TryVec` such that we end up with it being freed and then poisoned or full of garbage that leads to bad indexing?

My current two areas I plan to look more at:
- Our fallible allocation. We've already switched out what we're doing upstream, I wonder if there's any other dangers in the old code that could lead to this.
- The data we're getting from the mp4 and how it's being used here. We're using some data here to set sizes of containers. Is there anywhere that data could cause problems by being malformed/malicious (overflows, etc)?

[0] https://searchfox.org/mozilla-central/rev/eb9d5c97927aea75f0c8e38bbc5b5d288099e687/third_party/rust/mp4parse_capi/src/lib.rs#1611

Back to Bug 1584956 Comment 9