Closed Bug 1245463 Opened 4 years ago Closed 4 years ago

crash in mp4_demuxer::MP4Metadata::GetNumberTracks, abort should let appendBuffer finish

Categories

(Core :: Audio/Video: Playback, defect, P1, critical)

42 Branch
x86
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox44 --- wontfix
firefox45 - fixed
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: gerald, Assigned: jya)

References

Details

(Keywords: crash)

Crash Data

Attachments

(4 files, 5 obsolete files)

58 bytes, text/x-review-board-request
gerald
: review+
Details
58 bytes, text/x-review-board-request
gerald
: review+
Details
58 bytes, text/x-review-board-request
gerald
: review+
Details
58 bytes, text/x-review-board-request
gerald
: review+
Details
Follow-up to bug 1239983. The patch there just adds a few diagnostic checks, which should help narrow down the root cause, or disprove some guesses (e.g. bug 1239983 comment 2).

NI:myself to regularly monitor new crashes that may happen in FF47 over the next few days.
Crash signatures may be a bit different, the following TrackBuffersManager methods should be looked for: ShutdownDemuxers, OnDemuxerResetDone, OnDemuxerInitDone.

[Tracking Requested - why for this release]: This seems like a common enough crash. We are investigating.
Flags: needinfo?(gsquelart)
I believe the issue is a race between when the demuxer init promise gets resolved (and queue a task) and when the ResetParserState task is queued.

If the ResetParserState task is queued prior the demuxer resolving, then the input buffer will be cleared and the demuxer destroyed ; resulting in the crash reported.

I will make sourcebuffer.abort() block until the SegmentParserLoop has completed its run.
Assignee: gsquelart → jyavenard
Flags: needinfo?(gsquelart)
Keeping NI:me as reminder to monitor crash reports, just in case there is useful information there.
Flags: needinfo?(gsquelart)
The changes that follow may cause the active sourcebuffer list to be modified; which will trigger an assertion if the mediasource object is closed.

Review commit: https://reviewboard.mozilla.org/r/34311/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34311/
Attachment #8717783 - Flags: review?(gsquelart)
The W3C spec indicates that while everything in MSE is asynchronous, the abort() command is to interrupt the current segment parser loop and have the reset parser loop synchronously completes the frames present in the input buffer.
This causes a fundamental issue that abort() will never result in a deterministic outcome as the segment parser loop may be in different condition.

We used to really attempt to abort the current operation, however there could have been a race in the order in which tasks were queued. As such, we now simply wait for the current appendBuffer to complete.

This also simplifies the code greatly, as we don't need to worry about pending concurrent appendBuffer.

The actually happens to be similar to the Chromium behavior.

Review commit: https://reviewboard.mozilla.org/r/34313/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34313/
Attachment #8717784 - Flags: review?(gsquelart)
This is similar to bug 1239983, we strongly assert should a segment parser loop be running when it must have completed.

Review commit: https://reviewboard.mozilla.org/r/34315/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34315/
Attachment #8717785 - Flags: review?(gsquelart)
Attachment #8717783 - Flags: review?(gsquelart) → review+
Comment on attachment 8717783 [details]
MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r?gerald

https://reviewboard.mozilla.org/r/34311/#review31005
Comment on attachment 8717784 [details]
MozReview Request: Bug 1245463: [MSE] P2. Remove MediaSource's duration mirror. r?gerald

https://reviewboard.mozilla.org/r/34313/#review31007

r+ with nits:

::: dom/media/mediasource/SourceBuffer.cpp:217
(Diff revision 1)
> +    // mAborting will become false once the AppendPromise will be resolved.

"once the AppendPromise is resolved."

::: dom/media/mediasource/TrackBuffersManager.cpp:160
(Diff revision 1)
>    MOZ_ASSERT(!mAppendRunning, "AbortAppendData must have been called");

'AbortAppendData' is dead, so the assert comment is now incorrect.
Not sure about the assertion itself, I think it is still needed.
Attachment #8717784 - Flags: review?(gsquelart) → review+
Comment on attachment 8717785 [details]
MozReview Request: Bug 1245463: [MSE] P3. Diags to ensure the Segment Parser Loop isn't running. r?gerald

https://reviewboard.mozilla.org/r/34315/#review31009

::: dom/media/mediasource/TrackBuffersManager.cpp:160
(Diff revision 1)
> -  MOZ_ASSERT(!mAppendRunning, "AbortAppendData must have been called");
> +  MOZ_RELEASE_ASSERT(!mAppendRunning, "Append is running, abort must have been called");

Ignore my nit in P2. :-)
Attachment #8717785 - Flags: review?(gsquelart) → review+
Attachment #8717786 - Flags: review?(gsquelart) → review+
Comment on attachment 8717786 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove dead code. r?gerald

https://reviewboard.mozilla.org/r/34317/#review31013
Comment on attachment 8717784 [details]
MozReview Request: Bug 1245463: [MSE] P2. Remove MediaSource's duration mirror. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34313/diff/1-2/
Attachment #8717783 - Attachment is obsolete: true
Attachment #8717785 - Attachment is obsolete: true
Attachment #8717786 - Attachment is obsolete: true
Attachment #8717787 - Attachment is obsolete: true
Attachment #8717787 - Flags: review?(gsquelart)
The changes that follow may cause the active sourcebuffer list to be modified; which will trigger an assertion if the mediasource object is closed.

Review commit: https://reviewboard.mozilla.org/r/34325/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34325/
Attachment #8717791 - Flags: review?(gsquelart)
This is similar to bug 1239983, we strongly assert should a segment parser loop be running when it must have completed.

Review commit: https://reviewboard.mozilla.org/r/34327/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34327/
Attachment #8717792 - Flags: review?(gsquelart)
Flags: needinfo?(gsquelart)
Summary: crash in mp4_demuxer::MP4Metadata::GetNumberTracks, follow-up → crash in mp4_demuxer::MP4Metadata::GetNumberTracks, abort should let appendBuffer finish
Comment on attachment 8717791 [details]
MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r=gerald

https://reviewboard.mozilla.org/r/34325/#review31017
Attachment #8717791 - Flags: review?(gsquelart) → review+
Comment on attachment 8717792 [details]
MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r?gerald

https://reviewboard.mozilla.org/r/34327/#review31019
Attachment #8717792 - Flags: review?(gsquelart) → review+
Comment on attachment 8717793 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove dead code. r?gerald

https://reviewboard.mozilla.org/r/34329/#review31021
Attachment #8717793 - Flags: review?(gsquelart) → review+
Comment on attachment 8717794 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove no longer working Dump() commands. r=gerald

https://reviewboard.mozilla.org/r/34331/#review31023
Attachment #8717794 - Flags: review?(gsquelart) → review+
Comment on attachment 8717791 [details]
MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34325/diff/1-2/
Comment on attachment 8717784 [details]
MozReview Request: Bug 1245463: [MSE] P2. Remove MediaSource's duration mirror. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34313/diff/2-3/
Comment on attachment 8717792 [details]
MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34327/diff/1-2/
Comment on attachment 8717793 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove dead code. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34329/diff/1-2/
Comment on attachment 8717794 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove no longer working Dump() commands. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34331/diff/1-2/
Comment on attachment 8717784 [details]
MozReview Request: Bug 1245463: [MSE] P2. Remove MediaSource's duration mirror. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34313/diff/3-4/
Comment on attachment 8717792 [details]
MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34327/diff/2-3/
Comment on attachment 8717793 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove dead code. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34329/diff/2-3/
Comment on attachment 8717794 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove no longer working Dump() commands. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34331/diff/2-3/
sorry backout for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=21417727&repo=mozilla-inbound
Flags: needinfo?(jyavenard)
Flags: needinfo?(jyavenard)
Comment on attachment 8717791 [details]
MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34325/diff/2-3/
Attachment #8717791 - Attachment description: MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r?gerald → MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r=gerald
Comment on attachment 8717784 [details]
MozReview Request: Bug 1245463: [MSE] P2. Remove MediaSource's duration mirror. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34313/diff/4-5/
Attachment #8717784 - Attachment description: MozReview Request: Bug 1245463: [MSE] P2. When abort() is called, wait until the current appendBuffer completes. r?gerald → MozReview Request: Bug 1245463: [MSE] P2. Remove MediaSource's duration mirror. r?gerald
Comment on attachment 8717792 [details]
MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34327/diff/3-4/
Attachment #8717792 - Attachment description: MozReview Request: Bug 1245463: [MSE] P3. Diags to ensure the Segment Parser Loop isn't running. r?gerald → MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r=gerald
Comment on attachment 8717794 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove no longer working Dump() commands. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34331/diff/3-4/
Attachment #8717794 - Attachment description: MozReview Request: Bug 1245463: [MSE] P5. Remove no longer working Dump() commands. r?gerald → MozReview Request: Bug 1245463: [MSE] P4. Remove no longer working Dump() commands. r=gerald
Attachment #8717793 - Attachment is obsolete: true
https://reviewboard.mozilla.org/r/34327/#review31283

::: dom/media/mediasource/TrackBuffersManager.cpp:611
(Diff revision 4)
> +  mAppendRunning = true;

Is this one needed?
(In reply to Gerald Squelart [:gerald] from comment #39)
> https://reviewboard.mozilla.org/r/34327/#review31283
> 
> ::: dom/media/mediasource/TrackBuffersManager.cpp:611
> (Diff revision 4)
> > +  mAppendRunning = true;
> 
> Is this one needed?

no...

bad rebase.
Comment on attachment 8717792 [details]
MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34327/diff/4-5/
Comment on attachment 8717794 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove no longer working Dump() commands. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34331/diff/4-5/
Comment on attachment 8717784 [details]
MozReview Request: Bug 1245463: [MSE] P2. Remove MediaSource's duration mirror. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34313/diff/5-6/
Comment on attachment 8717792 [details]
MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34327/diff/5-6/
Attachment #8717792 - Attachment description: MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r=gerald → MozReview Request: Bug 1245463: [MSE] P3. When abort() is called, wait until the current appendBuffer completes. r?gerald
Comment on attachment 8717794 [details]
MozReview Request: Bug 1245463: [MSE] P4. Remove no longer working Dump() commands. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34331/diff/5-6/
Duplicate of this bug: 1247875
Comment on attachment 8717791 [details]
MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r=gerald

Approval is for all patches.

Approval Request Comment
[Feature/regressing bug #]: 1245463
[User impact if declined]: Crashes, YouTube stalls
[Describe test coverage new/current, TreeHerder]: Manual test, lots of mochitest and regression tests (and hence why this was backed out a few times until it was perfect)
[Risks and why]: Medium. We introduce a wait on the main thread which has historically been a problem in the past resulting in deadlocks (though the code deadlocking was always wrong). However, we are implementing the specs to the letter (which also requires a synchronous wait). Great care was taken to ensure that this wouldn't happen. We have added lots of release assert that will immediately be triggered should this situation happen, allowing for noticing very quickly and should come to worse: back it out or fix it quickly.
[String/UUID change made/needed]: None
Attachment #8717791 - Flags: approval-mozilla-beta?
Attachment #8717791 - Flags: approval-mozilla-aurora?
Comment on attachment 8717791 [details]
MozReview Request: Bug 1245463: [MSE] P1. Do not attempt to retrieve the buffered range if the mediasource is in closed state. r=gerald

This crash already impacts the release and the number of crashes on all channels is low. 
As the patches are pretty big and the risk medium, I prefer that this rides the train from aurora.
Attachment #8717791 - Flags: approval-mozilla-beta?
Attachment #8717791 - Flags: approval-mozilla-beta+
Attachment #8717791 - Flags: approval-mozilla-aurora?
Attachment #8717791 - Flags: approval-mozilla-aurora+
Attachment #8717784 - Flags: approval-mozilla-aurora+
Attachment #8717792 - Flags: approval-mozilla-aurora+
Attachment #8717794 - Flags: approval-mozilla-aurora+
I was about to ask that only P1 be beta+ then, but I see that you've approved it for this one which is great
(bug 1247875).

This one is for a new feature only in 45 and later, and my guess is that once 45 becomes release, it's severity will increase greatly.
Sorry, I wasn't planning to accept in beta but it seems that my mistake was in fact correct :p
Good news, everyone!

A crash has been induced by the patch in bug 1239983, as hoped:
https://crash-stats.mozilla.com/report/index/b16687f2-026c-439d-a408-aa4fc2160213
It shows that an unexpected CompleteResetParserState() happens while an init/reset Promise is in flight.

Hopefully Jean-Yves' patch should rectify this situation.
I'll keep monitoring crashes, if all goes well we might see a few from builds that contain bug 1239983 (> 2016-02-03), and then no more for builds containing patches from this bug (> 2016-02-14).
Crash Signature: [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] → [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] [@ mozilla::TrackBuffersManager::ShutdownDemuxers ]
Sylvestre, looks like uplift has been too eager and the lot has been uplift d to beta..
Rather than revert the lot, what about we let it bake for a week and see how it goes?
Flags: needinfo?(sledru)
Oups :(
Well, we built beta 5 with it. Looks like we don't have much choice anyway :p
Flags: needinfo?(sledru)
Duplicate of this bug: 1248832
Crash Signature: [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] [@ mozilla::TrackBuffersManager::ShutdownDemuxers ] → [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] [@ mozilla::TrackBuffersManager::ShutdownDemuxers ] [@ mozilla::dom::MediaSource::Duration]
Depends on: 1248909
this is marked fixed in 45, but the mozilla::dom::MediaSource::Duration signature is one of the top signatures in 45.0.1 & making up 3% of all crashes there atm...

should we reopen this bug/file a new issue?
Flags: needinfo?(jyavenard)
I don't believe the two are related
(mozilla::dom::MediaSource::Duration) has nothing to do with the null deref in mp4_demuxer::
Flags: needinfo?(jyavenard)
Crash Signature: [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] [@ mozilla::TrackBuffersManager::ShutdownDemuxers ] [@ mozilla::dom::MediaSource::Duration] → [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] [@ mozilla::TrackBuffersManager::ShutdownDemuxers ]
You need to log in before you can comment on or make changes to this bug.