Closed Bug 1207198 Opened 4 years ago Closed 4 years ago

Vine site crash at MozPromise by "MOZ_DIAGNOSTIC_ASSERT(!IsExclusive || !mHaveRequest)" failure

Categories

(Core :: Audio/Video: Playback, defect, P1)

Unspecified
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla44
blocking-b2g 2.5+
Tracking Status
firefox44 --- fixed

People

(Reporter: sotaro, Assigned: jya)

References

Details

(Whiteboard: [ft:conndevices][partner-blocker][platform])

Attachments

(5 files, 9 obsolete files)

2.92 KB, text/plain
Details
10.12 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
3.41 KB, patch
Details | Diff | Splinter Review
3.03 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
1.33 KB, patch
jwwang
: review+
Details | Diff | Splinter Review
During scrolling in vine web site on flame-kk and aries(KK), browser process was crashed because of MozPromise's MOZ_DIAGNOSTIC_ASSERT failure.

logcat log had the following output.
> MOZ_Assert: Assertion failure: !IsExclusive || !mHaveRequest, at ../../dist/include/mozilla/MozPromise.h:600

STR
[1] Start browser app
[2] Move to https://vine.co/tags/firefoxos
[3] Scroll the site up and down until crash happens.
Attached file stack of the crash
When the crash happened, MediaFormatReader's functions are called like the following order.

> MediaFormatReader::AsyncReadMetadata()
> MediaFormatReader::OnDemuxerInitDone()
> MediaFormatReader::ResetDecode()
> MediaFormatReader::ReleaseMediaResources()
> MediaFormatReader::AsyncReadMetadata()
  -> crash

MediaFormatReader::mMetadataPromise seems not cleared between AsyncReadMetadata() calls.
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.5?
[Blocking Requested - why for this release]: Crash happens easily on the link. And vine is popular service.
OS: Unspecified → Gonk (Firefox OS)
Summary: Vine site crash by MozPromise by "MOZ_DIAGNOSTIC_ASSERT(!IsExclusive || !mHaveRequest)" failure → Vine site crash at MozPromise by "MOZ_DIAGNOSTIC_ASSERT(!IsExclusive || !mHaveRequest)" failure
Assignee: nobody → sotaro.ikeda.g
Blocks: 1207214
Attachment #8664279 - Flags: review?(jyavenard)
Comment on attachment 8664279 [details] [diff] [review]
patch - Add mMetadataPromise reject

Review of attachment 8664279 [details] [diff] [review]:
-----------------------------------------------------------------

This is an error with the MDSM, it shouldn't attempt to use the reader before the metadata promise has been resolved. If it does, you can be certain it will crash in many other places and not just AsyncReadMetadata. 

So there's appear to be a logic error in the MDSM likely related to dormant mode.

Simply rejecting the promise in the MediaFormatReader isn't sufficient anyway: you'd have to also disconnect pending promises etc. 

Better to have the MDSM do it properly. 

It is an error to call ResetDecode before the reader has completed its initialisation.
Attachment #8664279 - Flags: review?(jyavenard) → review-
blocking-b2g: 2.5? → 2.5+
Priority: -- → P1
jya, can you explain how MDSM can cancel AsyncReadMetadata promise in MediaFormatReader? If MDSM needs enter dormant state and AsyncReadMetadata is not completed yet, MDSM needs to cancel AsyncReadMetadata promise. MDSM already disconnect mMetadataRequest in MediaDecoderStateMachine::Reset().
Flags: needinfo?(jyavenard)
That's the tricky part of promise chaining. When you disconnect the promise, you disconnect the chained promise instead of MediaFormatReader::mMetadataPromise. That is also why I don't like the idea to expose the internal thread model to the client code.
(In reply to Sotaro Ikeda [:sotaro] from comment #7)
> jya, can you explain how MDSM can cancel AsyncReadMetadata promise in
> MediaFormatReader? If MDSM needs enter dormant state and AsyncReadMetadata
> is not completed yet, MDSM needs to cancel AsyncReadMetadata promise. MDSM
> already disconnect mMetadataRequest in MediaDecoderStateMachine::Reset().

Well, that's the thing. As JW mentioned ; with promises, there should be no such thing as cancelling. You just disconnect from it.

For better or worse, we have been treated the call to AsyncReadMetadata() as an initialization event.
Starting initialization, only to cancel it and resume shortly after makes no sense.

With everything the MediaFormatReader is doing being async, attempting the cancel the initialization will very likely leave one or more component in an unusable state. I wouldn't be surprised if you enabled that change, that suddenly you find crashes elsewhere.
"cancelling" (as opposed to ignoring) the metadata promise imply that we will re-initialize the reader later ; in which case we should move back the reader to be in a non-initialized state. Just as well destroy it and create a new one and start from scratch.

As I have mentioned in the past; going into dormant mode *before* returning from AsyncMetadata (but after calling it) makes no sense to me.
Either you go dormant before AsyncReadMetadatada is called ; or after but not before it returns.

If the aim is to minimize the use of the hardware decoder ; we can simply stop initializing the decoders during the AsyncReadMetadata step, and instead perform the initialization of the MediaDataDecoder lazily when we have to decode a sample. Alfredo has already done all the hard work there, and everything is there to implement it easily.

In which case, there's absolutely zero need to cancel the initialization.

I just can't validate your change sorry, I believe it fundamentally hide a more serious architectural issue with dormant mode.
Flags: needinfo?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #9)
> 
> I just can't validate your change sorry, I believe it fundamentally hide a
> more serious architectural issue with dormant mode.

Thanks for the detailed explanation. I agree the above.
(In reply to Jean-Yves Avenard [:jya] from comment #9)
> (In reply to Sotaro Ikeda [:sotaro] from comment #7)
> > jya, can you explain how MDSM can cancel AsyncReadMetadata promise in
> > MediaFormatReader? If MDSM needs enter dormant state and AsyncReadMetadata
> > is not completed yet, MDSM needs to cancel AsyncReadMetadata promise. MDSM
> > already disconnect mMetadataRequest in MediaDecoderStateMachine::Reset().
> 
> Well, that's the thing. As JW mentioned ; with promises, there should be no
> such thing as cancelling. You just disconnect from it.
> 
> For better or worse, we have been treated the call to AsyncReadMetadata() as
> an initialization event.
> Starting initialization, only to cancel it and resume shortly after makes no
> sense.
> 
> With everything the MediaFormatReader is doing being async, attempting the
> cancel the initialization will very likely leave one or more component in an
> unusable state. I wouldn't be surprised if you enabled that change, that
> suddenly you find crashes elsewhere.
> "cancelling" (as opposed to ignoring) the metadata promise imply that we
> will re-initialize the reader later ; in which case we should move back the
> reader to be in a non-initialized state. Just as well destroy it and create
> a new one and start from scratch.
> 
> As I have mentioned in the past; going into dormant mode *before* returning
> from AsyncMetadata (but after calling it) makes no sense to me.
> Either you go dormant before AsyncReadMetadatada is called ; or after but
> not before it returns.
> 
> If the aim is to minimize the use of the hardware decoder ; we can simply
> stop initializing the decoders during the AsyncReadMetadata step, and
> instead perform the initialization of the MediaDataDecoder lazily when we
> have to decode a sample. Alfredo has already done all the hard work there,
> and everything is there to implement it easily.

There could be a case that MediaDataDecoder exists already, but AsyncReadMetadata is not completed yet. It seems necessary to handle it.
Depends on: 1210045
Hi Sotaro,
This is now TV P1 bug and we need it fix asap. Thanks!
Flags: needinfo?(sotaro.ikeda.g)
Whiteboard: [ft:conndevices][partner-blocker]
Blocks: TV_P1
Whiteboard: [ft:conndevices][partner-blocker] → [ft:conndevices][partner-blocker][platform]
(In reply to Josh Cheng [:josh] from comment #12)
> Hi Sotaro,
> This is now TV P1 bug and we need it fix asap. Thanks!

Thanks for the notification!
Flags: needinfo?(sotaro.ikeda.g)
Attachment #8664279 - Attachment is obsolete: true
Update comments.
Attachment #8676057 - Attachment is obsolete: true
Fix mInitPromise handling.
Attachment #8676061 - Attachment is obsolete: true
Attachment #8676088 - Flags: review?(jyavenard)
I still strongly believe that this is the wrong approach, and that the MDSM shouldn't go dormant during metadata. 

It makes the MediaFormatReader code almost impossible to read, with all that hacking around something that shouldn't happen in the first place.
jya: how can we release video decoder if it needs to release during reading metadata? If we do not release it, there could be a case that the MediaFormatReader holds video decoder very long time depending on the network condition, even when a document of the MediaFormatReader is already in background and hidden.
Flags: needinfo?(jyavenard)
(In reply to Sotaro Ikeda [:sotaro] from comment #18)
> jya: how can we release video decoder if it needs to release during reading
> metadata? If we do not release it, there could be a case that the
> MediaFormatReader holds video decoder very long time depending on the
> network condition, even when a document of the MediaFormatReader is already
> in background and hidden.

This seems not a problem on MediaFormatReader.
Flags: needinfo?(jyavenard)
Attachment #8676088 - Attachment is obsolete: true
Attachment #8676088 - Flags: review?(jyavenard)
Attachment #8676390 - Flags: review?(jwwang)
(In reply to Sotaro Ikeda [:sotaro] from comment #18)
> jya: how can we release video decoder if it needs to release during reading
> metadata? If we do not release it, there could be a case that the
> MediaFormatReader holds video decoder very long time depending on the
> network condition, even when a document of the MediaFormatReader is already
> in background and hidden.

What I suggested before was to simply not allocate a decoder during the call to AsyncMetadata ; the MediaFormatReader now allocates the decoder on demand, when it actually needs them.

Allocating the decoders allows to know earlier if that works or not ; but from the user point of view the end result would be identical: he gets an error.

Alfredo has put everything in place to do that and it's been even more perfected since; so we just have to remove the calls to EnsureDecodersCreated / EnsureDecodersInitialized in MediaFormatReader::OnDemuxerInitDone

it's a 3 lines change and you'll achieve exactly what you want to do: it's much simpler and I believe far more elegant.
I have a patch for you, it actually makes everything much cleaner: e.g. ReadMetadata does what its name suggest: read the metadata and just that. Everything else is lazily constructed.
And it allows to remove a lot of unnecessary code.


Will update shortly.
Ok thanks.
Attachment #8676390 - Flags: review?(jwwang)
Decoders are created and initialized automatically when needed.
Also fix issues where we could have a pending initialization promise pending when going into dormant mode.

Another proof that the simpler the better; removed heaps of code thanks to this.
Attachment #8676591 - Flags: review?(cpearce)
Sotaro, can you try this patch and see how that works for you?

Quiet please on how much clean up it allowed to do
Flags: needinfo?(sotaro.ikeda.g)
Thanks for the patch! I am going to check it.
You still need to ensure that the MDSM doesn't call AsyncReadMetadata before the previous one completed.

We still have the issue that you don't want to attempt to initialize the reader before the previous init completed.

With the remaining code two things can happen when we enter dormant:

1- Demuxer initialisation has completed, just waiting to resolve the metadata promise. Simply rejecting the promise in ReleaseMediaResourcesInternal will be sufficient as a new call to AsyncReadMetadata will have no negative effect.

2- Demuxer hasn't completed its initialization. So mInitDone will still be false.. We can't simply disconnect from the demuxer initialization promise and restart from zero. The next call to AsyncReadMedata will attempt to initialise things once again and creating objects that have already been created. We can't simply destroyed the demuxer and recreate a new one; especially with MediaSource: shutting down the MediaSource demuxer would cause all the source buffers to be released and the MediaSource object to be invalid.

So really, we MUST NOT call AsyncReadMetadata before the previous AsyncReadMetadata has completed, this is not an interruptible event (not without major work). That means, we don't even want the MDSM to disconnect from the MetadataPromise as it would have then no way of knowing the MediaFormatReader initialization has completed.

JW/Sotaro, this is probably a task for you.

I will add to the patch above   MOZ_ASSERT(mMetadataPromise.IsEmpty());
to AsyncReadMetadataPromise to ensure this case can not happen.
Decoders are created and initialized automatically when needed.
Also fix issues where we could have a pending initialization promise pending when going into dormant mode.

Add MOZ_DIAGNOSTIC_ASSERT(mMetadataPromise.IsEmpty());

as it is paramount that we do not attempt to initialize the MediaFormatReader while another one is pending ; it's up to the MDSM to ensure AsyncReadMetadata isn't called before the previous one return.

The MediaFormatReader will not attempt to initialise a decoder during this time, so there's no possibility on making HW resources unavailable during this time.
Attachment #8676600 - Flags: review?(cpearce)
Attachment #8676591 - Attachment is obsolete: true
Attachment #8676591 - Flags: review?(cpearce)
:jya, the patch worked. Can you take the bug, since you created the patch?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(jyavenard)
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> :jya, the patch worked. Can you take the bug, since you created the patch?

Well, the solution is still incomplete as I described in comment 28.

As it is now, the MDSM can still call AsyncReadMetadata , go into dormant mode and immediately call AsyncReadMetadata again: it's a racy situation and it will cause nasty things to happen
Flags: needinfo?(jyavenard)
jya: can's we prevent calling AsyncReadMetadata problem by attachment 8676390 [details] [diff] [review]?
Flags: needinfo?(jyavenard)
Comment on attachment 8676390 [details] [diff] [review]
patch - Defer dormant request during reading metadata in MDSM

Review of attachment 8676390 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderStateMachine.cpp
@@ +1231,5 @@
>  
>    DECODER_LOG("SetDormant=%d", aDormant);
>  
> +  if (mState == DECODER_STATE_DECODING_METADATA && !aForce) {
> +    // Do not enter dormant during reading metadata

I would use the existence of the mMetadataRequest existing instead; in which case you don't even need the aForce boolean as mMetadataRequest is cleared upon the metadata being complete.

so instead you would do:
if (mMetadataRequest.Exists()) {
...
}

and it all looks overly complex to me:
I would rename mPendingDormantDuringMetadataRead to mPendingDormant

so this ends up being simply:

if (mMetadataRequest.Exists()) {
  mPendingDormant = aDormant;
  return;
}
mPendingDormat = false;

@@ +1897,5 @@
>  
> +  if (mPendingDormantDuringMetadataRead) {
> +    SetDormant(true /* aDormant */, true /* aForce */);
> +    mPendingDormantDuringMetadataRead = false;
> +    return;

with the changes recommended above you would just do:

if (mPendingDormant) {
  SetDormant(true);
  return;
}

@@ +1978,5 @@
>  {
>    MOZ_ASSERT(OnTaskQueue());
>    MOZ_ASSERT(mState == DECODER_STATE_DECODING_METADATA);
>    mMetadataRequest.Complete();
> +  mPendingDormantDuringMetadataRead = false;

this shouldn't be required ; we're entering in error mode which is non-recoverable.
Thanks, yea, the patch is too complex than necessary.
Apply the comments.
Attachment #8676390 - Attachment is obsolete: true
Attachment #8676600 - Flags: review?(cpearce) → review+
Attachment #8676688 - Flags: review+
Assignee: sotaro.ikeda.g → jyavenard
Flags: needinfo?(jyavenard)
Comment on attachment 8676688 [details] [diff] [review]
patch - Defer dormant request during reading metadata in MDSM

Review of attachment 8676688 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderStateMachine.cpp
@@ +1914,5 @@
>  
> +  if (mPendingDormant) {
> +    SetDormant(true);
> +    return;
> +  }

just thinking now.

If SetDormant(false) was called while AsyncReadMetadata was running (e.g. we are resuming from dormant)

SetDormant() will do nothing and simply reset mPendingDormant to false.

When the MetadataPromise completes, OnMetadataRead will do nothing and we will not come out of dormant.

It may be preferable to set mPendingDormant to a Maybe<bool> and replace this with:
if (mPendingDormant) {
  SetDormant(mPendingDormant.ref());
  return;
}

and in SetDormant; the mPendingDormant = false is to be replaced with mPendingDormant.reset()
Attachment #8676688 - Flags: review+
Attachment #8676732 - Flags: review?(sotaro.ikeda.g)
Attachment #8676732 - Flags: review?(sotaro.ikeda.g) → review+
Comment on attachment 8676732 [details] [diff] [review]
P2.  Defer dormant request while ReadMetadata is pending in MDSM.

Sorry, I want to recheck the patch again.
Attachment #8676732 - Flags: review+
> OnMetadataRead will do nothing and we will not come out of dormant

There seems no use case that MDSM exits from dormant in OnMetadataRead(). When MDSM is in DECODER_STATE_DORMANT, mMetadataRequest always does not exist, because in MediaDecoderStateMachine::SetDormant(), Reset() is called when entering into DECODER_STATE_DORMANT.

And calling in SetDormant(false) seems to cut MDSM's task chain of loading a media content.
jya: how do you think about Comment 38?
Flags: needinfo?(jyavenard)
It's a racy situation.. You're right, it probably will never happen that we will come in and out of dormant fast enough that AsyncReadMetadata wouldn't have completed.

However there's no impact in handling this case anyway: and handling this case means that you don't have to comment and explain on why it's not going to happen anyway. 

Ultimately, it does nothing bad compare to the original version and achieve the same purpose for the cases that *do* happen.

So it's better in every way :)
Flags: needinfo?(jyavenard)
So what's going on? are you happy with this patch or not, or is there anything else you want to do?
Flags: needinfo?(sotaro.ikeda.g)
attachment 8676732 [details] [diff] [review] seems to cause the problem when SetDormant(true) and SetDormant(false) is called during mMetadataRequest.Exists() is true. In this use case, MediaDecoderStateMachine::OnMetadataRead() should not call SetDormant(false). Just pending dormant request is cancelled in the use case.
Flags: needinfo?(sotaro.ikeda.g)
I don't really understand what you're saying here nor what the problem is...

SetDormant will be called in OnMetadataRead with the exact same arguments it was requested earlier but got delayed due to AsyncReadMetadata still running.

So if SetDormant(false) was called earlier, its action is just delayed.

Or are you saying that if there was already a pending SetDormant(true) and a new SetDormant(false) is queued, then nothing should happen at all and the two requests just cancel each other.

that would make sense and something easy to implement.
(In reply to Jean-Yves Avenard [:jya] from comment #43)
> I don't really understand what you're saying here nor what the problem is...
> 
> SetDormant will be called in OnMetadataRead with the exact same arguments it
> was requested earlier but got delayed due to AsyncReadMetadata still running.
> 
> So if SetDormant(false) was called earlier, its action is just delayed.
> 
> Or are you saying that if there was already a pending SetDormant(true) and a
> new SetDormant(false) is queued, then nothing should happen at all and the
> two requests just cancel each other.

Sorry, my explanation was not clear. It is what I wrote.

> 
> that would make sense and something easy to implement.
SetDormant(true) followed by SetDormant(false) while AsyncReadMetadata is pending will simply be cancelled.
Otherwise, only the last SetDormant action will be actioned upon
Attachment #8677171 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8677171 [details] [diff] [review]
P2.  Defer dormant request while ReadMetadata is pending in MDSM.

Review of attachment 8677171 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8677171 - Flags: review?(sotaro.ikeda.g) → review+
This causes failure in dom/media/test/test_invalid_reject.html mochitest.

The reason for this is that the test expect that decoding the metadata will fail ; however we won't fail until we attempt to decode the first frame (we can only know the file is invalid when we attempt to create a decoder).

The definition for the loadedmetadata event is "The user agent has just determined the duration and dimensions of the media resource and the text tracks are ready." ; as such firing loadedmetadata appears valid to me; and instead we should be failing if the "loadeddata" event is fired (loadeddata: "The user agent can render the media data at the current playback position for the first time. ").

Now, that exposes what I believe is another bug in our HTMLMediaElement implementation.

the event "loadeddata", "canplay" and "canplaythrough" are fired even before the first frame is decoded (which will error).

JW stated on IRC that this was due to the file being small and the media cache is disabled on that file. As the file is entirely loaded in memory, it causes the element's readyState to be set to HAVE_ENOUGH_DATA.

I believe this behaviour to be wrong and the changed in P1 exposes a bug in our handling of the readyState.

JW, do you concur?
Flags: needinfo?(jwwang)
https://hg.mozilla.org/mozilla-central/file/f029ccdee154/dom/html/HTMLMediaElement.cpp#l3783

It looks like we should check NextFrameStatus() before checking mDownloadSuspendedByCache. We should not transition to HAVE_ENOUGH_DATA when NextFrameStatus() is not NEXT_FRAME_AVAILABLE.
Flags: needinfo?(jwwang)
loadedmetadata is to be fired when "The user agent has just determined the duration and dimensions of the media resource and the text tracks are ready. " .
The invalid-preskip.webm has valid metadata however the codec data is invalid, meaning we will fail decoding the first frame.

As such firing loadedmetadata for this file appears correct (it's an audio only file and we have determined its duration)
Attachment #8677255 - Flags: review?(jwwang)
We must have at least a decoded frame available to transition to HAVE_ENOUGH_DATA, as otherwise canplay/canplaythrough will always be fired even if the data was invalid so long that it was small.
Attachment #8677256 - Flags: review?(jwwang)
Blocks: 1217304
Do we still need P3 and P4 after bug 1217304 is landed?
(In reply to JW Wang [:jwwang] from comment #51)
> Do we still need P3 and P4 after bug 1217304 is landed?

Yes.
One is because loadedmetadata will be fired for the broken webm/opus file and that is considered to be a failure.

The other is to ensure canplay/canplaythrough isn't fired (as if the file is small enough we automatically go into readyState = HAVE_ENOUGH_DATA.

Another way to do P4 if bug 1217304 lands before, would be to test on the mFirstFrameLoaded boolean instead of checking NextFrameStatus(). The end result would be the same.
Attachment #8677255 - Flags: review?(jwwang) → review+
(In reply to Jean-Yves Avenard [:jya] from comment #52)
> Another way to do P4 if bug 1217304 lands before, would be to test on the
> mFirstFrameLoaded boolean instead of checking NextFrameStatus(). The end
> result would be the same.
I would prefer this approach.
We must have at least a decoded frame available to transition to HAVE_ENOUGH_DATA, as otherwise canplay/canplaythrough will always be fired even if the data was invalid so long that it was small.

Use value of mFirstFrameLoaded ; also apply the same logic and do no fire canplaythrough until the first frame has been decoded.
Attachment #8677889 - Flags: review?(jwwang)
Attachment #8677256 - Attachment is obsolete: true
Attachment #8677256 - Flags: review?(jwwang)
Comment on attachment 8677889 [details] [diff] [review]
P4. Do not transition to HAVE_ENOUGH_DATA readyState until we do have data.

will move this to bug 1217304, makes more sense there
Attachment #8677889 - Attachment is obsolete: true
Attachment #8677889 - Flags: review?(jwwang)
Attachment #8676732 - Attachment is obsolete: true
Depends on: 1270154
Depends on: 1272182
You need to log in before you can comment on or make changes to this bug.