Crash in [@ mozilla::CheckedInt<T>::value] via MediaSourceTrackDemuxer::DoSeek
Categories
(Core :: Audio/Video: Playback, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr60 | --- | wontfix |
| firefox66 | --- | wontfix |
| firefox67 | --- | wontfix |
| firefox68 | --- | fix-optional |
People
(Reporter: mccr8, Assigned: alwu)
References
Details
(Keywords: crash, regression, Whiteboard: [geckoview:fenix:p3][gvtv:p2][media-q2])
Crash Data
Attachments
(2 obsolete files)
This bug is for crash report bp-fe0e3eef-20c0-4198-88ed-e03e40190401.
Top 10 frames of crashing thread:
0 libxul.so mozilla::CheckedInt<long long>::value const mfbt/CheckedInt.h:535
1 libxul.so mozilla::media::TimeUnit::operator>= const dom/media/TimeUnits.h:119
2 libxul.so mozilla::media::TimeUnit::operator< const dom/media/TimeUnits.h:126
3 libxul.so mozilla::media::TimeUnit const& std::__ndk1::max<mozilla::media::TimeUnit, std::__ndk1::__less<mozilla::media::TimeUnit, mozilla::media::TimeUnit> > android-ndk/sources/cxx-stl/llvm-libc++/include/algorithm:719
4 libxul.so mozilla::MediaSourceTrackDemuxer::DoSeek android-ndk/sources/cxx-stl/llvm-libc++/include/algorithm:2719
5 libxul.so mozilla::detail::ProxyRunnable<mozilla::MozPromise<mozilla::media::TimeUnit, mozilla::MediaResult, true>, RefPtr<mozilla::MozPromise<mozilla::media::TimeUnit, mozilla::MediaResult, true> > xpcom/threads/nsThreadUtils.h:1122
6 libxul.so mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:199
7 libxul.so nsThreadPool::Run xpcom/threads/nsThreadPool.cpp:244
8 libxul.so non-virtual thunk to nsThreadPool::Run xpcom/threads/nsThreadPool.cpp
9 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1180
Alex made integer overflow checks into release asserts, and this is one of the crashes. It is Android-only.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 1•6 years ago
|
||
This CheckedInt assertion failure is a top crash for Fennec 68 Nightly.
| Assignee | ||
Comment 2•6 years ago
•
|
||
I think the possible root cause is that we have super large negative start time, and then have a overflow when substract a value [1].
Is it common having a negative start time? Should we do the overflow check here or prevent start time to have a super large negative value?
Comment 3•6 years ago
|
||
Tentatively assigning to Alastor because he's investigating the other CheckedInt assertion failures on Fire TV.
| Assignee | ||
Comment 4•6 years ago
|
||
I added some assertions and pushed them on the try server to see if it's possible to catch any crash, because it seems to me that we should find why we would have a incorrect seek time.
If we can't catch the crash on try, I'm going to handle the invalid computation result and reject the seeking promise with error.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=85d47c1667e1f1b953bbbca40435bfdcd0f91338
https://treeherder.mozilla.org/#/jobs?repo=try&revision=01fa6a716ecdddcaa86342b4987dfaa40dcc9a92
| Assignee | ||
Comment 5•6 years ago
|
||
If the seek target time has been overflow after subtracting the pre roll time, we should reject seek promise and return the error.
Comment 6•6 years ago
|
||
(In reply to Alastor Wu [:alwu] from comment #5)
Created attachment 9057621 [details]
Bug 1540744 - ensure we have valid seek target time.If the seek target time has been overflow after subtracting the pre roll time, we should reject seek promise and return the error.
This is not a possible explanation, MSE guarantees that the buffered data starts at 0; it should be up to to the caller to ensure than Seek isn't called with nonsensical arguments and reject that seek right from the start, otherwise you need to modify every single demuxer and every single piece code touching a Time Unit.
| Assignee | ||
Comment 7•6 years ago
•
|
||
As we've ensured that the seek time is non-negative and invalid in MediaDecoder [1], so the time in seek target should be always valid. However, sometime we would adjust [2] the time in seek target, which only ensures that the new time should be valid but doesn't ensure the time is non-negative.
The crash seems happening in [3], which might be caused if (1) aTime is invalid (2) aTime is valid but it's a super small negative value (3) mManager->HighestStartTime(mType) is invaild (4) mManager->HighestStartTime(mType) is valid but it's a super small negative value. In addition, Jya mentioned in [4] that "mManager->HighestStartTime(mType) can't overflow and can't be negative". So the possibilities might be (1) and (2).
However, I didn't find any possible place where aTime is invalid for now, as I mentioned above, MediaDecoder has ensured that the time in seek target is always valid and non-negative. So now I'm looking for the possibility that MediaFormatReader uses the improper value for seeking.
For (2), we could add a check in SeekTarget::SetTime() to ensure that new seek time should always be non-negative.
[1] https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/dom/media/MediaDecoder.cpp#592-594
[2] https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/dom/media/SeekTarget.h#46-49
[3] https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/dom/media/mediasource/MediaSourceDemuxer.cpp#391-397
[4] https://phabricator.services.mozilla.com/D27136#798322
Updated•6 years ago
|
| Comment hidden (obsolete) |
Updated•6 years ago
|
| Assignee | ||
Comment 9•6 years ago
|
||
tldr; I've checked the all possibilities again, and I believe that the possilbe reason is that the aTime is valid but it's a super small negative value, which causes integer overflow after substracting the mPreload [1].
--
In comment7, I mentioned 4 possible reasons, the following is my analysis for them.
(1) aTime is invalid
aTime comes from seek target, and we've ensured that the time in seek target is not possible invalid from [2]. We did the check on both MediaDecoder and SeekTarget itself, so even we changed its start time later after we've created the seek target, the assertion would still ensure that we can't set the invalid start time for seek target.
(2) aTime is valid but it's a super small negative value
If (1)(3)(4) are not possible, so that this becomes the most possible reason for the crash which happened in MediaSourceTrackDemuxer::DoSeek(). As seek time could be negative, so we can't add assertion in seek target to avoid the time being negative. So I think we should still need to have a check in MediaSourceTrackDemuxer to avoid the seekTime being invalid after the calculation.
(3) mManager->HighestStartTime(mType) is invaild
(4) mManager->HighestStartTime(mType) is valid but it's a super small negative value.
The highest start time would only be updated when the sample time is larger than itself, and it was initial by 0. So this value is always positive, which makes these two assumeption impossible.
As we've done the comparision to make sure this time is highest, it means that the highest start time is always valid (so we can get its value).
The highest start time is always positive, so (4) is also impossible.
[2]
https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/dom/media/MediaDecoder.cpp#592-594
https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/dom/media/SeekTarget.h#46-49
| Assignee | ||
Comment 10•6 years ago
|
||
Hi, Jya,
What do you think for the comment9?
Thank you.
Updated•6 years ago
|
| Assignee | ||
Comment 11•6 years ago
|
||
In MDSM, there are some places that we might set the invalid time to the SeekTarget without having an assertion [1]. Therefore, for this bug, I'm thinking about adding an assertion in the constructor of SeekTarget to see whether we indeed sets an incorrect time to SeekTarget. Although it can't really fix the crash, it can help us to clarify the possible root cause.
I've filed bug1546506 for doing that. We could see whether this crash still occurs after landing bug1546506.
[1]
https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/dom/media/MediaDecoderStateMachine.cpp#378
https://searchfox.org/mozilla-central/rev/ec489aa170b6486891cf3625717d6fa12bcd11c1/dom/media/MediaDecoderStateMachine.cpp#2131
Comment 12•6 years ago
|
||
(In reply to Alastor Wu [:alwu] from comment #2)
I think the possible root cause is that we have super large negative start time, and then have a overflow when substract a value [1].
Is it common having a negative start time? Should we do the overflow check here or prevent start time to have a super large negative value?
MSE in theory force the sample time to always be >= 0, in practice we had to add some leeway to prevent some files to not play.
Otherwise, it's rather common for plain mp4 to be negative, in particular with edit list used to align audio & video.
MFR / Demuxer / Decoder use the real sample time. The MDSM use the adjusted sample time which is offset by mStartTime.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 13•6 years ago
|
||
There are still a few MediaSourceTrackDemuxer::DoSeek crashes from Fennec and Focus+GV users:
The crash volume isn't bad, but this assertion failure points to a real logic bug. This MOZ_RELEASE_ASSERT was changed to a MOZ_DIAGNOSTIC_ASSERT (in bug 1548409) so it won't blow up in Beta or Release builds.
| Assignee | ||
Comment 14•6 years ago
|
||
From the crash report, the last time crash happened on the build 20190502095227, and then it never happen again on other later builds. So I guess this crash has been fixed after the build 20190502095227, and the crash rate should get lower and lower.
Comment 15•6 years ago
|
||
From bug 1548409:
"I suggest we do that until we're actually ready and have covered all the required changes."
Jean-Yves what are these required changes? Do we need to fix things here or can simply close this with the assumption this was caused by broken content in combination with a too aggressive assertion?
Updated•6 years ago
|
Comment 16•6 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #15)
From bug 1548409:
"I suggest we do that until we're actually ready and have covered all the required changes."
Jean-Yves what are these required changes? Do we need to fix things here or can simply close this with the assumption this was caused by broken content in combination with a too aggressive assertion?
We have lowered the severity of the assertion already, it is still happening however so we haven't completed all the work to turn it back on yet.
Comment 17•3 years ago
|
||
Closing because no crashes reported for 12 weeks.
Description
•