Closed Bug 1270607 Opened 5 years ago Closed 5 years ago

Widevine crash in mozilla::SamplesWaitingForKey::WaitIfKeyNotUsable when using DASH.js player or Amazon

Categories

(Core :: Audio/Video: Playback, defect, P1)

Unspecified
All
defect

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox46 --- unaffected
firefox47 --- fixed
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: cpeterson, Assigned: gerald)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-21904d1c-73aa-4ceb-b8f0-38dd82160505.
=============================================================

I hit this crash once when trying to repro DASH.js crash bug 1268694, but there are 15 crash reports. 13 are from Amazon, 1 from imdb.com, and my DASH.js crash.

Curiously, of the 13 Amazon crash reports: 8 are from amazon.de, 3 from amazon.co.uk, and only 1 is from amazon.com. They don't appear to be the same users because each crash report has a different Firefox install time. Is there something different about Amazon's video player outside the US? I don't think this is an artifact of crash report processing for easterly time zones because most of the crash reports are many days old.

List of crash reports including URLs and install times:
https://is.gd/SefnKs


0 	XUL 	mozilla::SamplesWaitingForKey::WaitIfKeyNotUsable(mozilla::MediaRawData*) 	mfbt/RefPtr.h:261
1 	XUL 	mozilla::EMEDecryptor::Input(mozilla::MediaRawData*) 	dom/media/platforms/agnostic/eme/EMEDecoderModule.cpp:48
2 	XUL 	nsRunnableMethodImpl<nsresult (mozilla::MediaDataDecoder::*)(mozilla::MediaRawData*), true, RefPtr<mozilla::MediaRawData> >::Run 	xpcom/glue/nsThreadUtils.h:676
3 	XUL 	mozilla::TaskQueue::Runner::Run() 	xpcom/threads/TaskQueue.cpp:170
4 	XUL 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:227
5 	XUL 	non-virtual thunk to nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:154
6 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:994
7 	XUL 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:297
8 	XUL 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:355
9 	XUL 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:234
10 	XUL 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:396
11 	libnss3.dylib 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:216
STR from bug 1268694:
1. Load http://dashif.org/reference/players/javascript/v2.1.1/samples/dash-if-reference-player/index.html
2. Load manifest http://html5.cablelabs.com:8100/cenc/wv/dash.mpd (also accessible from the DASH player's menu: Sample Streams > CableLabs Test Content > CableLabs Cenc Widevine) on DASH.js player menu.
3. Quickly hit the "Load" button 2–3 more times. If the crash happens, it always seems to happen within the first 1–2 button clicks when you first play the video after the browser starts.
See Also: → 1268694
See Also: → 1270689
cpearce, how should we prioritize this intermittent crash? I reproduced it once or twice with the DASH.js player, but we also have a dozen crash reports from Amazon users.
Flags: needinfo?(cpearce)
cpearce said he is going to investigate this crash. Anthony suggested trying to reproduce on Linux using rr.
Assignee: nobody → cpearce
P1 because this crash affects Amazon.
Priority: -- → P1
Gerald volunteered. :)
Assignee: cpearce → gsquelart
Flags: needinfo?(cpearce)
I could not reproduce the issue.
Chris, is there some magic between steps 2 and 3 in your comment 2? I.e., should I wait between pressing "Load" the first time, and the subsequent 2-3 times?

In any case, the crash reports point at EMEDecryptor::mSamplesWaitingForKey being null.
But this should only be the case if Shutdown() happened before (as it nulls mSamplesWaitingForKey), and the 'MOZ_ASSERT(!mIsShutdown) in Input() indicates that this was not considered possible.

However based on the crash report, and other MediaDataDecoder's (e.g. WMFVideoMFTManager::Input() handles the already-shutdown case as a possible occurrence), I will just change the MOZ_ASSERT into a simple test-and-bail, unless or until we find a better explanation.
Flags: needinfo?(cpeterson)
Based on crash reports, it seems Input() is actually called after Shutdown(),
which causes a nullptr deref when trying to access mSamplesWaitingForKey's
mProxy in mSamplesWaitingForKey->WaitIfKeyNotUsable().

So like other MediaDataDecoder's we should just gracefully handle this case.

Keeping a warning like in Decrypted(), to catch this situation in future
debugging sessions.

Review commit: https://reviewboard.mozilla.org/r/52361/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/52361/
Comment on attachment 8752008 [details]
MozReview Request: Bug 1270607 - Handle already-shutdown state in EMEDecryptor::Input - c?cpearce

https://reviewboard.mozilla.org/r/52361/#review49303
Attachment #8752008 - Flags: review+
Comment on attachment 8752008 [details]
MozReview Request: Bug 1270607 - Handle already-shutdown state in EMEDecryptor::Input - c?cpearce

Approval Request Comment
[Feature/regressing bug #]: Widevine.

[User impact if declined]: A few crashes per day on Beta.

[Describe test coverage new/current, TreeHerder]: Media tests in aurora try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=531002b0404ba2bd9c7b364bdd19b78af5eff11a and beta try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=03c7dabbe6f43c14cbf68d3e9b7099b78dec09f5

[Risks and why]: No risk that I can see, as the patch is only adding a test and returning early, as also done in that method for other reasons, and in the same method in sibling classes.

[String/UUID change made/needed]: None.
Attachment #8752008 - Flags: approval-mozilla-beta?
Attachment #8752008 - Flags: approval-mozilla-aurora?
Attached patch 1270607.patchSplinter Review
Same patch, with fixed commit description 'c?cpearce' -> 'r=cpearce'.
Attachment #8752008 - Attachment is obsolete: true
Attachment #8752008 - Flags: approval-mozilla-beta?
Attachment #8752008 - Flags: approval-mozilla-aurora?
Attachment #8752032 - Flags: review+
Comment on attachment 8752032 [details] [diff] [review]
1270607.patch

Approval Request Comment
[Feature/regressing bug #]: Widevine.

[User impact if declined]: A few crashes per day on Beta.

[Describe test coverage new/current, TreeHerder]: Media tests in aurora try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=531002b0404ba2bd9c7b364bdd19b78af5eff11a and beta try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=03c7dabbe6f43c14cbf68d3e9b7099b78dec09f5

[Risks and why]: No risk that I can see, as the patch is only adding a test and returning early, as also done in that method for other reasons, and in the same method in sibling classes.

[String/UUID change made/needed]: None.
Attachment #8752032 - Flags: approval-mozilla-beta?
Attachment #8752032 - Flags: approval-mozilla-aurora?
(In reply to Gerald Squelart [:gerald] from comment #7)
> I could not reproduce the issue.
> Chris, is there some magic between steps 2 and 3 in your comment 2? I.e.,
> should I wait between pressing "Load" the first time, and the subsequent 2-3
> times?

Sorry. I don't have better STR. I was only able to reproduce the crash with this particular stack trace once, though I hit a couple other crashes (some related to Widevine shutdown and some random IPC crashes) using the same STR.
Flags: needinfo?(cpeterson)
Alright thanks.
Let us know if it happens again after this lands.
https://hg.mozilla.org/mozilla-central/rev/45bf3c63c81f
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Comment on attachment 8752032 [details] [diff] [review]
1270607.patch

Widevine crash fix, Aurora48+, Beta47+
Attachment #8752032 - Flags: approval-mozilla-beta?
Attachment #8752032 - Flags: approval-mozilla-beta+
Attachment #8752032 - Flags: approval-mozilla-aurora?
Attachment #8752032 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.