Closed Bug 1402584 Opened 7 years ago Closed 7 years ago

Crash in mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek

Categories

(Core :: Audio/Video: Playback, defect, P2)

57 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- fixed
firefox58 --- fixed

People

(Reporter: philipp, Assigned: jwwang)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files, 2 obsolete files)

This bug was filed from the Socorro interface and is report bp-2d4e43f5-a7fa-4905-9113-4b0dd0170922. ============================================================= Crashing Thread (71), Name: MediaPlayback #9 Frame Module Signature Source 0 xul.dll mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek(mozilla::SeekTarget) dom/media/MediaDecoderStateMachine.cpp:352 1 xul.dll mozilla::MediaDecoderStateMachine::Seek(mozilla::SeekTarget const&) dom/media/MediaDecoderStateMachine.cpp:3082 2 xul.dll mozilla::detail::ProxyRunnable<mozilla::MozPromise<bool, bool, 1>, RefPtr<mozilla::MozPromise<bool, bool, 1> > ( mozilla::MediaDecoderStateMachine::*)(mozilla::SeekTarget const&), mozilla::MediaDecoderStateMachine, StoreCopyPassByRRef<mozilla::SeekTarget> >::Run() xpcom/threads/MozPromise.h:1395 3 xul.dll mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run() xpcom/threads/TaskDispatcher.h:209 4 xul.dll mozilla::TaskQueue::Runner::Run() xpcom/threads/TaskQueue.cpp:246 5 xul.dll nsThreadPool::Run() xpcom/threads/nsThreadPool.cpp:226 6 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1039 7 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:521 8 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:338 9 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 10 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 11 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:427 12 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 13 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 14 ucrtbase.dll o__realloc_base 15 kernel32.dll BaseThreadInitThunk 16 ntdll.dll RtlUserThreadStart this crash signature for content process crashes is newly showing up since 57.0a1. so far they are only from windows machines and all come with 'MOZ_RELEASE_ASSERT(false) (Can't seek while decoding metadata.)' that got added in bug 1308603.
Kaku, Can you help check this bug?
Flags: needinfo?(kaku)
Priority: -- → P2
Hmm......, the crash report does not talk much, and the only thing we know is that a seek operation is invoked before Metadata is read. It's HTMLMediaElement::Seek()'s responsibility to filter out seek requests that are not valid, and it's this check which rejects seek request before Metadata loaded: http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/dom/html/HTMLMediaElement.cpp#2765-2769 For non-MSE cases, MediaDecoder::GetSeekable() calls MediaDecoder::GetDuration() to check if the duration is known or not, and the duration "should" ever be set only after Metadata is loaded. However, bug 1385699 refactored how MDSM calculating duration and I think it's the patch 2 (https://reviewboard.mozilla.org/r/164224/diff/1#index_header) might have something to do with this bug. Bug 1385699 patch 2 removed MDSM::mObservedDuration which was assgined in two places: (1) MDSM::UpdatePlaybackPositionInternal() and MDSM::BufferedRangeUpdated(). MDSM::UpdatePlaybackPositionInternal() is only ever be called after Metadata loaded, but that's not the case for MDSM::BufferedRangeUpdated(). In the old mechanism, setting MDSM::mObservedDuration also invokes MDSM::RecomputeDuration() which filters out the case that MDSM::mObservedDuration is set before Metadata is loaded. However, the criterion is not kept in the new mechanism. http://searchfox.org/mozilla-central/rev/b2911ee198ac26f9ce7d6fef8cd42f5dbd022a21/dom/media/MediaDecoderStateMachine.cpp#3107-3111 Let's have a patch for this and see if it works.
Assignee: nobody → kaku
Status: NEW → ASSIGNED
Flags: needinfo?(kaku)
Blocks: 1385699
(In reply to Tzuhao Kuo [:kaku] (PTO 9/27) from comment #2) > For non-MSE cases, MediaDecoder::GetSeekable() calls > MediaDecoder::GetDuration() to check if the duration is known or not, and > the duration "should" ever be set only after Metadata is loaded. However, > bug 1385699 refactored how MDSM calculating duration and I think it's the > patch 2 (https://reviewboard.mozilla.org/r/164224/diff/1#index_header) might > have something to do with this bug. I don't think bug 1385699 is responsible. http://searchfox.org/mozilla-central/rev/20e41d4a61a8f5e34c9cf357304b78b3e9bced8a/dom/html/HTMLMediaElement.cpp#2751-2755 The |mReadyState == nsIDOMHTMLMediaElement::HAVE_NOTHING| test ensures mDecoder->Seek is called only after the metadata is loaded. I think there is something wrong to HTMLMediaElement::mReadyState which becomes HAVE_METADATA or above before metadata loaded is reported by mDecoder.
Comment on attachment 8913088 [details] Bug 1402584 - only set MDSM::mDuration after metadata is loaded in MDSM::BufferedRangeUpdated(); https://reviewboard.mozilla.org/r/184504/#review190012 Per comment 5, I don't think this is the right fix.
Attachment #8913088 - Flags: review?(jwwang)
Assignee: kaku → nobody
Status: ASSIGNED → NEW
Assignee: nobody → jwwang
Can we land this?
Flags: needinfo?(jwwang)
I am not sure if I follow. There is no patch to land so far.
Flags: needinfo?(jwwang)
Attachment #8915425 - Flags: review?(gsquelart)
Comment on attachment 8915425 [details] Bug 1402584 - add logs for debugging. See comment 5 for the detail. https://reviewboard.mozilla.org/r/186616/#review191680 LGTM, but please give it a good Try, to make sure you don't obviously add crashes for normal situations ::: dom/html/HTMLMediaElement.cpp:2840 (Diff revision 1) > + MOZ_CRASH_UNSAFE_PRINTF( > + "Metadata not loaded! readyState=%d networkState=%d", > + mReadyState.Ref(), > + mNetworkState); These variables are uint16_t, so "%d" is probably incorrect. Please use PRIu16, or cast to int. Same with the other printf below.
Attachment #8915425 - Flags: review?(gsquelart) → review+
Thanks!
Pushed by jwwang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/442e8020e2ac add logs for debugging. See comment 5 for the detail. r=gerald
Keywords: leave-open
There are 5 crashes with signature 'mozilla::dom::HTMLMediaElement::ChangeReadyState'. The crash reasons are 'Metadata not loaded! readyState=4 networkState=2' and 'Metadata not loaded! readyState=2 networkState=2'.
Crash Signature: [@ mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek] → [@ mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek] [@ mozilla::dom::HTMLMediaElement::ChangeReadyState]
Flags: needinfo?(jwwang)
Per comment 5, the assertion (Nightly and early Beta only) is added for debugging. However, seeing new crash reports, I still have no idea how this could be. I will more assertions to debug the issue.
Flags: needinfo?(jwwang)
Attachment #8915425 - Attachment is obsolete: true
Attachment #8915801 - Flags: review?(gsquelart)
Attachment #8915801 - Flags: review?(gsquelart)
I think I found the root cause. 1. the media element loads the URI from a source child and starts playback. 2. a decode error is encountered (assume readyState is HAVE_ENOUGH_DATA). 3. the media element continues to load the next source child without resetting the readyState. 4. now readyState is HAVE_ENOUGH_DATA before metadata is loaded for the next source child.
Attachment #8915801 - Attachment is obsolete: true
Attachment #8916205 - Flags: review?(gsquelart)
Attachment #8916206 - Flags: review?(gsquelart)
Attachment #8916205 - Flags: review?(gsquelart) → review+
Comment on attachment 8916206 [details] Bug 1402584. P2 - reset readyState to HAVE_NOTHING before loading next source child. See comment 18 for the root cause. https://reviewboard.mozilla.org/r/187448/#review192534 ::: dom/html/HTMLMediaElement.cpp:1917 (Diff revision 1) > if (!mIsLoadingFromSourceChildren || mShuttingDown) { > return; > } > > + if (mDecoder) { > + // Reset readyState to HAVE_NOTHING since we're gonna load a new decoder. "gonna" -> "going to"
Attachment #8916206 - Flags: review?(gsquelart) → review+
Thanks!
Keywords: leave-open
Pushed by jwwang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e17425d9f940 P1 - Backed out changeset 442e8020e2ac. r=gerald https://hg.mozilla.org/integration/autoland/rev/e80105b6353d P2 - reset readyState to HAVE_NOTHING before loading next source child. See comment 18 for the root cause. r=gerald
Backed out for frequent mda failures on Windows 10, e.g. in dom/media/test/test_seek-2.html: https://hg.mozilla.org/integration/autoland/rev/db5526c855d07beb52b3c0f4b5b1ec769bb89fce https://hg.mozilla.org/integration/autoland/rev/53548fbb1e5fd6ce9f973a5ed9089750dcb81296 Related to the disabled DRM notification bar? E.g. https://treeherder.mozilla.org/logviewer.html#?job_id=135520104&repo=autoland 15:42:08 INFO - 1061 INFO TEST-PASS | dom/media/test/test_replay_metadata.html | small-shot-mp3.mp4 isEncrypted should be true if we have decryption keys 15:42:08 INFO - 1062 INFO TEST-PASS | dom/media/test/test_replay_metadata.html | No more than 1 onloadeddata event for small-shot-mp3.mp4 15:42:08 INFO - Buffered messages finished 15:42:08 ERROR - 1063 INFO TEST-UNEXPECTED-FAIL | dom/media/test/test_replay_metadata.html | Test timed out! 15:42:08 INFO - onTimeout@dom/media/test/manifest.js:1693:7 15:42:08 INFO - setTimeout handler*MediaTestManager/this.started@dom/media/test/manifest.js:1699:26 15:42:08 INFO - startTest@dom/media/test/test_replay_metadata.html:94:3 15:42:08 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1748:7 15:42:08 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1727:7 15:42:08 INFO - playbackEnded@dom/media/test/test_replay_metadata.html:86:5 15:42:08 INFO - EventListener.handleEvent*startTest@dom/media/test/test_replay_metadata.html:107:3 15:42:08 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1748:7 15:42:08 INFO - MediaTestManager/this.runTests/<@dom/media/test/manifest.js:1668:7 15:42:08 INFO - 1064 INFO small-shot.mp3-2 timed out! See also https://treeherder.mozilla.org/logviewer.html#?job_id=135528693&repo=autoland
Flags: needinfo?(jwwang)
Both test_replay_metadata.html and test_playback_rate.html don't run the code added by my patches at all. I don't think my patches are responsible for the timeouts.
Flags: needinfo?(jwwang)
Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/autoland/rev/f7b9b00dfe17 P1 - Backed out changeset 442e8020e2ac. r=gerald https://hg.mozilla.org/integration/autoland/rev/a4466933d251 P2 - reset readyState to HAVE_NOTHING before loading next source child. See comment 18 for the root cause. r=gerald on a CLOSED TREE
Please request Beta approval on this ASAP.
Flags: needinfo?(jwwang)
Approval Request Comment [Feature/Bug causing the regression]:1385699 [User impact if declined]:crash in beta builds, but not release build. [Is this code covered by automated tests?]:no [Has the fix been verified in Nightly?]:yes [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]:none [Is the change risky?]:low [Why is the change risky/not risky?]:a simple change. [String changes made/needed]:none
Flags: needinfo?(jwwang)
Attachment #8918155 - Flags: review+
Attachment #8918155 - Flags: approval-mozilla-beta?
Comment on attachment 8918155 [details] [diff] [review] 1402584_fix_beta_57.patch Crash fix, Beta57+
Attachment #8918155 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #32) > [Needs manual test from QE? If yes, steps to reproduce]: no Per JW's assessment on manual testing needs, marking this issue as qe-verify-.
Flags: qe-verify-
See Also: → 1846396
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: