Crash in [@ mozilla::CheckedInt<T>::value] via MediaDecoderStateMachine::GetDecodedAudioDuration
Categories
(Core :: Audio/Video: Playback, defect, P1)
Tracking
()
People
(Reporter: mccr8, Assigned: alwu)
References
Details
(Keywords: crash, regression, topcrash, Whiteboard: [geckoview:fenix:m5][gvtv:p1][media-q2])
Crash Data
Attachments
(3 files, 1 obsolete file)
This bug is for crash report bp-2a830582-a8d1-4c28-97f2-1cabf0190401.
Top 10 frames of crashing thread:
0 libxul.so mozilla::CheckedInt<long long>::value const mfbt/CheckedInt.h:535
1 libxul.so mozilla::MediaDecoderStateMachine::GetDecodedAudioDuration dom/media/TimeUnits.h:88
2 libxul.so mozilla::MediaDecoderStateMachine::HaveEnoughDecodedAudio dom/media/MediaDecoderStateMachine.cpp:2731
3 libxul.so mozilla::MediaDecoderStateMachine::DecodingState::DispatchDecodeTasksIfNeeded dom/media/MediaDecoderStateMachine.cpp:2352
4 libxul.so mozilla::MediaDecoderStateMachine::DecodingState::HandleAudioDecoded dom/media/MediaDecoderStateMachine.cpp:569
5 libxul.so mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, true>::ThenValue<mozilla::MediaDecoderStateMachine::RequestAudioData dom/media/MediaDecoderStateMachine.cpp:3078
6 libxul.so mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, true>::ThenValueBase::ResolveOrRejectRunnable::Run
7 libxul.so mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run xpcom/threads/TaskDispatcher.h:197
8 libxul.so mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:199
9 libxul.so nsThreadPool::Run xpcom/threads/nsThreadPool.cpp:244
Alex made integer overflow checks into release asserts, and this is one of the crashes. It is Android-only.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 1•6 years ago
|
||
This CheckedInt assertion failure is a top crash for Fennec 68 Nightly.
Comment 3•6 years ago
|
||
Alastor can you please check if bugs 1540746, 1540744 and 1540740 can all be solved via TimeUnits.h or if each needs an individual patch a level above TimeUnits.h.
| Assignee | ||
Comment 4•6 years ago
|
||
It seems to me that we have to submit individual patches for those bugs.
| Assignee | ||
Comment 5•6 years ago
|
||
This one might be fixed by bug1534993, because it looks like we getting overflow in AudioSink::GetPosition().
| Assignee | ||
Comment 6•6 years ago
|
||
I just landed the bug1534993, we can see whether it can fix this crash as well.
| Assignee | ||
Comment 7•6 years ago
|
||
This crash still happens [1] after landing bug1534993, will have a patch for it later.
[1] https://crash-stats.mozilla.com/report/index/4566e0f5-53e5-439a-b875-c7b8b0190414
| Assignee | ||
Comment 8•6 years ago
•
|
||
As the crash happened in [1], the possible root cause is that we have improper endTime, which is super large positive value, and the startTime is negative (it's possible). As we've ensured that endTime is always larger than startTime, so a positive value substracting a negative value is the only possible situation to cause a overflow.
Updated•6 years ago
|
| Assignee | ||
Comment 9•6 years ago
•
|
||
The description in comment8 was not quite right. As we would do the time adjustment in ReaderProxy [1], the audio sample's mTime should be adjusted to be larger than 0, unless the first sample's start time is less than the start time we got from metadata. So usually the start and end time of audio sample are both larger than zero.
If this crash were caused by negative start time (which might cause overflow if we substract it from a super large positive end time). We should force that all sample's start time in MediaQueue should be larger than zero.
| Assignee | ||
Comment 10•6 years ago
|
||
The purpose of substracting sample's time by the start time, which we got from the metadata, is to ensure that the start time of the first sample can start from 0 and align all following samples to the correct position.
However, if the first sample's time is earlier than the metadata's start time, the calculated result would be negative, which is not we want.
In this case, we should update the start time and manually adjust first sample's time to 0.
Comment 11•6 years ago
|
||
(In reply to Alastor Wu [:alwu] from comment #8)
As the crash happened in [1], the possible root cause is that we have improper
endTime, which is super large positive value, and thestartTimeis negative (it's possible). As we've ensured thatendTimeis always larger thanstartTime, so a positive value substracting a negative value is the only possible situation to cause a overflow.
There's another issue that will occur now however.
the buffered range is calculated by the demuxer, so now the buffered range reported will be negative. If we were to adjust the start time offset, really we need to keep two start time, one to use with the buffered range, and one used for the time values.
| Assignee | ||
Comment 12•6 years ago
•
|
||
FYI, I found that all these crashes [1] happened on Android API version 19, that is Android 4.4, or earlier. I guess that maybe some new APIs didn't work well on the old version which resulted in Android decoder returning incorrect sample time to us.
[1]
https://crash-stats.mozilla.com/report/index/21e263a9-e26a-45ab-81a6-f61550190418
https://crash-stats.mozilla.com/report/index/59ace521-6d8a-4ac3-94bc-f01fc0190418
https://crash-stats.mozilla.com/report/index/93dd23ba-05db-418c-8832-e046e0190418
https://crash-stats.mozilla.com/report/index/47829ac1-0400-40cd-ae2e-0626e0190418
https://crash-stats.mozilla.com/report/index/2a830582-a8d1-4c28-97f2-1cabf0190401
| Assignee | ||
Comment 13•6 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #11)
the buffered range is calculated by the demuxer, so now the buffered range reported will be negative. If we were to adjust the start time offset, really we need to keep two start time, one to use with the buffered range, and one used for the time values.
Sorry, I didn't understand this part, did we adjust the buffered range? or you mean we should adjust buffered range as well?
| Assignee | ||
Comment 14•6 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #11)
There's another issue that will occur now however.
the buffered range is calculated by the demuxer, so now the buffered range reported will be negative. If we were to adjust the start time offset, really we need to keep two start time, one to use with the buffered range, and one used for the time values.
I guess that the adjustment for buffered range you mentioned is here [1]?
[1] https://searchfox.org/mozilla-central/rev/966590c35c19d3b7850c5444a31bf0fd68a7d0c4/dom/media/MediaFormatReader.cpp#2947
Updated•6 years ago
|
| Assignee | ||
Comment 15•6 years ago
|
||
As the purpose of adjusting time is to make the start of the playback to align to zero, so the result should not be negative.
Updated•6 years ago
|
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
| Assignee | ||
Comment 18•6 years ago
|
||
It's possible that the time of the first demuxed sample is different from the time of the first decoded sample.
This would happen when the first video frame is B frame, which might have larger PTS than other frames.
That means it's not the actually first frame for presentation, so we should update the start time after we get the time from both the first decoded audio and video sample to ensure the we won't hit the assertion in patch2.
| Assignee | ||
Comment 19•6 years ago
•
|
||
We hit the assertion in patch2 when running the crash test 1389304.html [1], because the first demuxed video frame of the video in that test is not the first frame for presentation, which causes the result of adjustment being negative. Therefore, we have to adjust our start time manually after we get the first decoded sample from both audio and video.
Will ask for a review if there is no any problem on the try result [2].
[1] https://searchfox.org/mozilla-central/source/dom/media/test/crashtests/1389304.html
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a51efb3c7bde8102944da0877a293772f2f6697e
| Assignee | ||
Comment 20•6 years ago
|
||
I found that the crash test (1389304.html) has different decoded result in different platforms. On OSX, the first frame decoder returned has smallest timestamp, but it hasn't on Linux and Windows. So even I added patch3, which is used to adjust the start time by checking the first decoded sample, we would still hit the assertion in media data.
So it seems to me that we can't always guarantee the first decoded frame has smallest timestamp, so the assertion I added in patch2 actually is not a good assumption, because it's not about logic, it depends on file's content.
After discussed with jolin, I am going to remove the assertion in patch2, and print warning instead. Hoping patch1 is enough for fixing the crash.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 21•6 years ago
|
||
Comment 22•6 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/8c9fc50e48c2
https://hg.mozilla.org/mozilla-central/rev/57b12599afe2
Updated•6 years ago
|
Comment 23•6 years ago
|
||
Backed out for causing more crashes (Bug 1547604). https://hg.mozilla.org/mozilla-central/rev/420e18a75314b8123b515d8a93cbacd145ecb03c
Updated•6 years ago
|
| Assignee | ||
Comment 24•6 years ago
|
||
It seems to me that Android decoder would return output after the flush() is called, where mFirstDemuxedSampleTime would be reset.
From Android medio codec document [1],
Note, however, that there may still be outstanding onOutputBufferAvailable callbacks that were not handled prior to calling flush. The indices returned via these callbacks also become invalid upon calling flush and should be discarded.
It's possible that Android decoder returns sample in this situation, so we should discard sample if mFirstDemuxedSampleTime contains nothing.
[1] https://developer.android.com/reference/android/media/MediaCodec#flush()
Updated•6 years ago
|
| Assignee | ||
Comment 25•6 years ago
|
||
There functions won't change any internal variables, so they should be marked as const.
Comment 26•6 years ago
|
||
Comment 27•6 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/8a834fa8a29f
https://hg.mozilla.org/mozilla-central/rev/3835a514150a
https://hg.mozilla.org/mozilla-central/rev/7a04d9e59754
Description
•