Closed Bug 1154133 Opened 5 years ago Closed 5 years ago

[EME] EMEDecryptor::Flush() sometimes hangs

Categories

(Core :: Audio/Video, defect)

x86_64
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- wontfix
firefox38.0.5 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed
firefox-esr38 --- wontfix

People

(Reporter: cpearce, Assigned: cpearce)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I'm common hangs on MacOSX in the EME mochitests (for example, bug 1090523), with call stacks in waiting forever in EMEDecryptor::Flush(), for example:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1428001916/mozilla-inbound_yosemite_test-mochitest-3-bm107-tests1-macosx-build362.txt.gz

14:33:17     INFO -   4  libnss3.dylib!PR_WaitCondVar [ptsynch.c:9a931cd178a7 : 385 + 0x7]
14:33:17     INFO -   5  XUL!mozilla::MediaTaskQueue::SyncDispatch(mozilla::TemporaryRef<nsIRunnable>) [CondVar.h:9a931cd178a7 : 79 + 0xd]
14:33:17     INFO -   6  XUL!mozilla::EMEDecryptor::Flush() [EMEDecoderModule.cpp:9a931cd178a7 : 112 + 0x7]
14:33:17     INFO -   7  XUL!mozilla::MP4Reader::Flush(mp4_demuxer::TrackType) [MP4Reader.cpp:9a931cd178a7 : 976 + 0x5]
14:33:17     INFO -   8  XUL!mozilla::MP4Reader::ResetDecode() [MP4Reader.cpp:9a931cd178a7 : 880 + 0x4]
14:33:17     INFO -   9  XUL!mozilla::MediaSourceReader::ResetDecode() [MediaSourceReader.cpp:9a931cd178a7 : 872 + 0x5]
14:33:17     INFO -  10  XUL!nsRunnableMethodImpl<nsresult (mozilla::MediaDecoderReader::*)(), true, >::Run [nsThreadUtils.h:9a931cd178a7 : 574 + 0xd]
14:33:17     INFO -  11  XUL!mozilla::MediaTaskQueue::Runner::Run() [MediaTaskQueue.cpp:9a931cd178a7 : 226 + 0x9]

Usually there are two or more EMEDecryptors blocked in Flush() waiting for it to complete. Since MediaSourceReader::ResetDecode() holds the decoder monitor while Flush() runs, we end up deadlocking.

Often the main thread is blocked trying to take the decoder monitor, and we hang.

I don't understand fully how EMEDecryptor::Flush() can end up waiting forever, but I think that since EMEDecryptor::Flush() does a sync dispatch, if we happen to flush the task queue while waiting for the sync dispatch to complete we could flush the task that we're waiting to complete, and so it will never complete, and thus we wait forever.
* Deprecate MediaTaskQueue::SyncDispatch(), and stop using it in EMEDecryptor.
* Specify in CDMProxy::Decrypt() what task queue to run the DecryptionClient on; simplifies the code in EMEDecryptor.
* Make EMEDecryptor call its wrapped MediaDataDecoder on the decode task queue, rather than creating a new sub-task-queue and sometimes sync dispatching to it. We used to need this when the MP4Reader was synchronous, but we no longer do since it's now async. This means we don't need to do the sync dispatch in EMEDecryptor anymore either. Happy days.
Attachment #8592042 - Flags: review?(edwin)
Blocks: 1094871
Comment on attachment 8592042 [details] [diff] [review]
Patch: Don't use sync dispatch in EMEDecryptor

Review of attachment 8592042 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaTaskQueue.h
@@ +83,5 @@
>    // This should only be used for things that absolutely can't afford to be
>    // flushed. Normal operations should use Dispatch.
>    nsresult ForceDispatch(TemporaryRef<nsIRunnable> aRunnable);
>  
> +  // DEPRECATED! Do not us, if a flush happens at the same time, this function

nit: use

@@ +84,5 @@
>    // flushed. Normal operations should use Dispatch.
>    nsresult ForceDispatch(TemporaryRef<nsIRunnable> aRunnable);
>  
> +  // DEPRECATED! Do not us, if a flush happens at the same time, this function
> +  // can hang and block forever! 

nit: trailing whitespace

::: dom/media/eme/CDMProxy.cpp
@@ +694,5 @@
> +  if (GMP_SUCCEEDED(aResult)) {
> +    nsAutoPtr<MediaRawDataWriter> writer(mSample->CreateWriter());
> +    PodCopy(writer->mData,
> +      aDecryptedData.Elements(),
> +      std::min<size_t>(aDecryptedData.Length(), mSample->mSize));

nit: indent style.

@@ +706,5 @@
> +    NS_WARNING(str.get());
> +    mSample = nullptr;
> +  }
> +  mTaskQueue->Dispatch(RefPtr<nsIRunnable>(this).forget());
> +  return;

return is superfluous here.

@@ +707,5 @@
> +    mSample = nullptr;
> +  }
> +  mTaskQueue->Dispatch(RefPtr<nsIRunnable>(this).forget());
> +  return;
> +

nit: extra line break.

::: dom/media/eme/CDMProxy.h
@@ +261,3 @@
>      nsAutoPtr<DecryptionClient> mClient;
> +    GMPErr mResult;
> +    ~DecryptJob() {}

nit: odd place to put ~DecryptJob()...
Attachment #8592042 - Flags: review?(edwin) → review+
(In reply to Chris Pearce (:cpearce) from comment #1)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=00541c13f880

Looking distinctly green! :)
Keywords: checkin-needed
(In reply to Chris Pearce (:cpearce) from comment #5)
> (In reply to Chris Pearce (:cpearce) from comment #1)
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=00541c13f880
> 
> Looking distinctly green! :)

heh i guess you meant https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5966ab62a36 :)
this failed to apply:

renamed 1154133 -> eme-decryptor-non-sync-dispatch.patch
applying eme-decryptor-non-sync-dispatch.patch
patching file dom/media/fmp4/eme/EMEDecoderModule.cpp
Hunk #1 FAILED at 16
Hunk #2 FAILED at 265
Hunk #3 FAILED at 295
3 out of 3 hunks FAILED -- saving rejects to file dom/media/fmp4/eme/EMEDecoderModule.cpp.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh eme-decryptor-non-sync-dispatch.patch

could you take a look ? thanks!
Flags: needinfo?(cpearce)
Keywords: checkin-needed
(In reply to Carsten Book [:Tomcat] from comment #7)
> this failed to apply:
> 
> renamed 1154133 -> eme-decryptor-non-sync-dispatch.patch
> applying eme-decryptor-non-sync-dispatch.patch
> patching file dom/media/fmp4/eme/EMEDecoderModule.cpp
> Hunk #1 FAILED at 16
> Hunk #2 FAILED at 265
> Hunk #3 FAILED at 295
> 3 out of 3 hunks FAILED -- saving rejects to file
> dom/media/fmp4/eme/EMEDecoderModule.cpp.rej
> patch failed, unable to continue (try -v)
> patch failed, rejects left in working dir
> errors during apply, please fix and refresh
> eme-decryptor-non-sync-dispatch.patch
> 
> could you take a look ? thanks!

With pleasure!

https://hg.mozilla.org/integration/mozilla-inbound/rev/6e449c897499
Flags: needinfo?(cpearce)
https://hg.mozilla.org/mozilla-central/rev/6e449c897499
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment on attachment 8592042 [details] [diff] [review]
Patch: Don't use sync dispatch in EMEDecryptor

Approval Request Comment
[Feature/regressing bug #]: EME
[User impact if declined]: Negligible. This patch makes ClearKey EME more reliable, and makes our tests not randomly fail on MacOSX and Linux. Windows is not affected by this patch. ClearKey is a toy DRM, not used anywhere for anything serious, so uplifting this would just make our tests more reliable, our sheriffs happier, and would have not much impact to our user population as there are no known users of ClearKey in production at this stage.
[Describe test coverage new/current, TreeHerder]: We have lots of EME mochitests.
[Risks and why]: I've spammed our EME mochitests hard with this patch on TryServer, so I think this is low risk. 
[String/UUID change made/needed]: None.
Attachment #8592042 - Flags: approval-mozilla-beta?
Attachment #8592042 - Flags: approval-mozilla-aurora?
Comment on attachment 8592042 [details] [diff] [review]
Patch: Don't use sync dispatch in EMEDecryptor

We want to simplify our life. Taking it. Should be in 38 beta 5.
Attachment #8592042 - Flags: approval-mozilla-beta?
Attachment #8592042 - Flags: approval-mozilla-beta+
Attachment #8592042 - Flags: approval-mozilla-aurora?
Attachment #8592042 - Flags: approval-mozilla-aurora+
Needs rebasing or dependent patches nominated and approved for uplift.
Flags: needinfo?(cpearce)
Blocks: 1120741
Blocks: 1147732
Comment on attachment 8592042 [details] [diff] [review]
Patch: Don't use sync dispatch in EMEDecryptor

[Triage Comment]
We merged m-b into m-r for 38.0
Attachment #8592042 - Flags: approval-mozilla-beta+ → approval-mozilla-release+
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #12)
> Needs rebasing or dependent patches nominated and approved for uplift.

The patch here requires stuff not uplifted, so I am trying to rework the patch to work without it, and I keep being distracted by things blocking EME shipping in 38. Will get to it...
How much time is there get a fix landed for 38 at this point? I've seen you turning down a lot of approvals at this point due to lack of time left in the release cycle.
Flags: needinfo?(sledru)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #15)
> How much time is there get a fix landed for 38 at this point? I've seen you
> turning down a lot of approvals at this point due to lack of time left in
> the release cycle.

I would be fine with this missing 38; it's only useful in that it makes our tests more reliable, it doesn't help with any functionality we expect to be used in the wild.
OK, no need to rush then.
Flags: needinfo?(sledru)
This has missed 38 at this point. Oh well, I can disable the affected tests on esr38 since EME is disabled on that branch anyway, so lost test coverage doesn't really matter.
Attachment #8592042 - Flags: approval-mozilla-release+
Comment on attachment 8592042 [details] [diff] [review]
Patch: Don't use sync dispatch in EMEDecryptor

Removing the Aurora approval since 39 is on Beta now. When there's a branch patch, we can nominate that for Beta directly.
Attachment #8592042 - Flags: approval-mozilla-aurora+
Giving up on uplifting this further. Going to invest my time in user facing stuff.
Flags: needinfo?(cpearce)
You need to log in before you can comment on or make changes to this bug.