Crash in mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek

RESOLVED FIXED in Firefox 57

Status

()

defect
P2
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: philipp, Assigned: jwwang)

Tracking

({crash, regression})

57 Branch
mozilla58
All
Windows
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox55 unaffected, firefox56 unaffected, firefox57 fixed, firefox58 fixed)

Details

(crash signature)

Attachments

(4 attachments, 2 obsolete attachments)

This bug was filed from the Socorro interface and is 
report bp-2d4e43f5-a7fa-4905-9113-4b0dd0170922.
=============================================================
Crashing Thread (71), Name: MediaPlayback #9
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek(mozilla::SeekTarget) 	dom/media/MediaDecoderStateMachine.cpp:352
1 	xul.dll 	mozilla::MediaDecoderStateMachine::Seek(mozilla::SeekTarget const&) 	dom/media/MediaDecoderStateMachine.cpp:3082
2 	xul.dll 	mozilla::detail::ProxyRunnable<mozilla::MozPromise<bool, bool, 1>, RefPtr<mozilla::MozPromise<bool, bool, 1> > ( mozilla::MediaDecoderStateMachine::*)(mozilla::SeekTarget const&), mozilla::MediaDecoderStateMachine, StoreCopyPassByRRef<mozilla::SeekTarget> >::Run() 	xpcom/threads/MozPromise.h:1395
3 	xul.dll 	mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run() 	xpcom/threads/TaskDispatcher.h:209
4 	xul.dll 	mozilla::TaskQueue::Runner::Run() 	xpcom/threads/TaskQueue.cpp:246
5 	xul.dll 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:226
6 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1039
7 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:521
8 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:338
9 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
10 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
11 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:427
12 	nss3.dll 	PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
13 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
14 	ucrtbase.dll 	o__realloc_base 	
15 	kernel32.dll 	BaseThreadInitThunk 	
16 	ntdll.dll 	RtlUserThreadStart

this crash signature for content process crashes is newly showing up since 57.0a1. so far they are only from windows machines and all come with 'MOZ_RELEASE_ASSERT(false) (Can't seek while decoding metadata.)' that got added in bug 1308603.
Kaku,
Can you help check this bug?
Flags: needinfo?(kaku)
Priority: -- → P2
Hmm......, the crash report does not talk much, and the only thing we know is that a seek operation is invoked before Metadata is read.

It's HTMLMediaElement::Seek()'s responsibility to filter out seek requests that are not valid, and it's this check which rejects seek request before Metadata loaded:
http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/dom/html/HTMLMediaElement.cpp#2765-2769

For non-MSE cases, MediaDecoder::GetSeekable() calls MediaDecoder::GetDuration() to check if the duration is known or not, and the duration "should" ever be set only after Metadata is loaded. However, bug 1385699 refactored how MDSM calculating duration and I think it's the patch 2 (https://reviewboard.mozilla.org/r/164224/diff/1#index_header) might have something to do with this bug.

Bug 1385699  patch 2 removed MDSM::mObservedDuration which was assgined in two places: (1) MDSM::UpdatePlaybackPositionInternal() and MDSM::BufferedRangeUpdated(). MDSM::UpdatePlaybackPositionInternal() is only ever be called after Metadata loaded, but that's not the case for MDSM::BufferedRangeUpdated().

In the old mechanism, setting MDSM::mObservedDuration also invokes MDSM::RecomputeDuration() which filters out the case that MDSM::mObservedDuration is set before Metadata is loaded. However, the criterion is not kept in the new mechanism. 
http://searchfox.org/mozilla-central/rev/b2911ee198ac26f9ce7d6fef8cd42f5dbd022a21/dom/media/MediaDecoderStateMachine.cpp#3107-3111

Let's have a patch for this and see if it works.
Assignee: nobody → kaku
Status: NEW → ASSIGNED
Flags: needinfo?(kaku)
Blocks: 1385699
(In reply to Tzuhao Kuo [:kaku] (PTO 9/27) from comment #2)
> For non-MSE cases, MediaDecoder::GetSeekable() calls
> MediaDecoder::GetDuration() to check if the duration is known or not, and
> the duration "should" ever be set only after Metadata is loaded. However,
> bug 1385699 refactored how MDSM calculating duration and I think it's the
> patch 2 (https://reviewboard.mozilla.org/r/164224/diff/1#index_header) might
> have something to do with this bug.

I don't think bug 1385699 is responsible.

http://searchfox.org/mozilla-central/rev/20e41d4a61a8f5e34c9cf357304b78b3e9bced8a/dom/html/HTMLMediaElement.cpp#2751-2755

The |mReadyState == nsIDOMHTMLMediaElement::HAVE_NOTHING| test ensures mDecoder->Seek is called only after the metadata is loaded.

I think there is something wrong to HTMLMediaElement::mReadyState which becomes HAVE_METADATA or above before metadata loaded is reported by mDecoder.
Comment on attachment 8913088 [details]
Bug 1402584 - only set MDSM::mDuration after metadata is loaded in MDSM::BufferedRangeUpdated();

https://reviewboard.mozilla.org/r/184504/#review190012

Per comment 5, I don't think this is the right fix.
Attachment #8913088 - Flags: review?(jwwang)
Assignee: kaku → nobody
Status: ASSIGNED → NEW
Assignee: nobody → jwwang
Can we land this?
Flags: needinfo?(jwwang)
I am not sure if I follow. There is no patch to land so far.
Flags: needinfo?(jwwang)
Attachment #8915425 - Flags: review?(gsquelart)
Comment on attachment 8915425 [details]
Bug 1402584 - add logs for debugging. See comment 5 for the detail.

https://reviewboard.mozilla.org/r/186616/#review191680

LGTM, but please give it a good Try, to make sure you don't obviously add crashes for normal situations

::: dom/html/HTMLMediaElement.cpp:2840
(Diff revision 1)
> +    MOZ_CRASH_UNSAFE_PRINTF(
> +      "Metadata not loaded! readyState=%d networkState=%d",
> +      mReadyState.Ref(),
> +      mNetworkState);

These variables are uint16_t, so "%d" is probably incorrect. Please use PRIu16, or cast to int.

Same with the other printf below.
Attachment #8915425 - Flags: review?(gsquelart) → review+
Thanks!
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/442e8020e2ac
add logs for debugging. See comment 5 for the detail. r=gerald
Keywords: leave-open
There are 5 crashes with signature 'mozilla::dom::HTMLMediaElement::ChangeReadyState'.
The crash reasons are 'Metadata not loaded! readyState=4 networkState=2' and 'Metadata not loaded! readyState=2 networkState=2'.
Crash Signature: [@ mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek] → [@ mozilla::MediaDecoderStateMachine::DecodeMetadataState::HandleSeek] [@ mozilla::dom::HTMLMediaElement::ChangeReadyState]
Flags: needinfo?(jwwang)
Per comment 5, the assertion (Nightly and early Beta only) is added for debugging. However, seeing new crash reports, I still have no idea how this could be. I will more assertions to debug the issue.
Flags: needinfo?(jwwang)
Attachment #8915425 - Attachment is obsolete: true
Attachment #8915801 - Flags: review?(gsquelart)
Attachment #8915801 - Flags: review?(gsquelart)
I think I found the root cause.

1. the media element loads the URI from a source child and starts playback.
2. a decode error is encountered (assume readyState is HAVE_ENOUGH_DATA).
3. the media element continues to load the next source child without resetting the readyState.
4. now readyState is HAVE_ENOUGH_DATA before metadata is loaded for the next source child.
Attachment #8915801 - Attachment is obsolete: true
Attachment #8916205 - Flags: review?(gsquelart)
Attachment #8916206 - Flags: review?(gsquelart)
Comment on attachment 8916205 [details]
Bug 1402584. P1 - Backed out changeset 442e8020e2ac.

https://reviewboard.mozilla.org/r/187446/#review192532
Attachment #8916205 - Flags: review?(gsquelart) → review+
Comment on attachment 8916206 [details]
Bug 1402584. P2 - reset readyState to HAVE_NOTHING before loading next source child. See comment 18 for the root cause.

https://reviewboard.mozilla.org/r/187448/#review192534

::: dom/html/HTMLMediaElement.cpp:1917
(Diff revision 1)
>    if (!mIsLoadingFromSourceChildren || mShuttingDown) {
>      return;
>    }
>  
> +  if (mDecoder) {
> +    // Reset readyState to HAVE_NOTHING since we're gonna load a new decoder.

"gonna" -> "going to"
Attachment #8916206 - Flags: review?(gsquelart) → review+
Thanks!
Keywords: leave-open
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e17425d9f940
P1 - Backed out changeset 442e8020e2ac. r=gerald
https://hg.mozilla.org/integration/autoland/rev/e80105b6353d
P2 - reset readyState to HAVE_NOTHING before loading next source child. See comment 18 for the root cause. r=gerald
Backed out for frequent mda failures on Windows 10, e.g. in dom/media/test/test_seek-2.html:

https://hg.mozilla.org/integration/autoland/rev/db5526c855d07beb52b3c0f4b5b1ec769bb89fce
https://hg.mozilla.org/integration/autoland/rev/53548fbb1e5fd6ce9f973a5ed9089750dcb81296

Related to the disabled DRM notification bar?

E.g. https://treeherder.mozilla.org/logviewer.html#?job_id=135520104&repo=autoland

15:42:08     INFO -  1061 INFO TEST-PASS | dom/media/test/test_replay_metadata.html | small-shot-mp3.mp4 isEncrypted should be true if we have decryption keys
15:42:08     INFO -  1062 INFO TEST-PASS | dom/media/test/test_replay_metadata.html | No more than 1 onloadeddata event for small-shot-mp3.mp4
15:42:08     INFO -  Buffered messages finished
15:42:08    ERROR -  1063 INFO TEST-UNEXPECTED-FAIL | dom/media/test/test_replay_metadata.html | Test timed out!
15:42:08     INFO -      onTimeout@dom/media/test/manifest.js:1693:7
15:42:08     INFO -      setTimeout handler*MediaTestManager/this.started@dom/media/test/manifest.js:1699:26
15:42:08     INFO -      startTest@dom/media/test/test_replay_metadata.html:94:3
15:42:08     INFO -      MediaTestManager/this.nextTest@dom/media/test/manifest.js:1748:7
15:42:08     INFO -      MediaTestManager/this.finished@dom/media/test/manifest.js:1727:7
15:42:08     INFO -      playbackEnded@dom/media/test/test_replay_metadata.html:86:5
15:42:08     INFO -      EventListener.handleEvent*startTest@dom/media/test/test_replay_metadata.html:107:3
15:42:08     INFO -      MediaTestManager/this.nextTest@dom/media/test/manifest.js:1748:7
15:42:08     INFO -      MediaTestManager/this.runTests/<@dom/media/test/manifest.js:1668:7
15:42:08     INFO -  1064 INFO small-shot.mp3-2 timed out!

See also https://treeherder.mozilla.org/logviewer.html#?job_id=135528693&repo=autoland
Flags: needinfo?(jwwang)
Both test_replay_metadata.html and test_playback_rate.html don't run the code added by my patches at all. I don't think my patches are responsible for the timeouts.
Flags: needinfo?(jwwang)
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/f7b9b00dfe17
P1 - Backed out changeset 442e8020e2ac. r=gerald
https://hg.mozilla.org/integration/autoland/rev/a4466933d251
P2 - reset readyState to HAVE_NOTHING before loading next source child. See comment 18 for the root cause. r=gerald on a CLOSED TREE
https://hg.mozilla.org/mozilla-central/rev/f7b9b00dfe17
https://hg.mozilla.org/mozilla-central/rev/a4466933d251
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Please request Beta approval on this ASAP.
Flags: needinfo?(jwwang)
Approval Request Comment
[Feature/Bug causing the regression]:1385699
[User impact if declined]:crash in beta builds, but not release build.
[Is this code covered by automated tests?]:no
[Has the fix been verified in Nightly?]:yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:none
[Is the change risky?]:low
[Why is the change risky/not risky?]:a simple change.
[String changes made/needed]:none
Flags: needinfo?(jwwang)
Attachment #8918155 - Flags: review+
Attachment #8918155 - Flags: approval-mozilla-beta?
Comment on attachment 8918155 [details] [diff] [review]
1402584_fix_beta_57.patch

Crash fix, Beta57+
Attachment #8918155 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #32)
> [Needs manual test from QE? If yes, steps to reproduce]: no

Per JW's assessment on manual testing needs, marking this issue as qe-verify-.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.