Closed Bug 1258410 Opened 4 years ago Closed 4 years ago

crash in mp4_demuxer::MP4Metadata::GetNumberTracks

Categories

(Core :: Audio/Video: Playback, defect, P1, critical)

45 Branch
x86
Windows
defect

Tracking

()

VERIFIED FIXED
mozilla48
Tracking Status
firefox45 - wontfix
firefox46 + verified
firefox47 --- verified
firefox48 --- verified
firefox-esr45 52+ fixed

People

(Reporter: calixte, Assigned: jya)

References

(Blocks 1 open bug)

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1245463 +++

This crash is spiking in 45.0.1 and is one of the top crasher for this version.
Flags: needinfo?(jyavenard)
Flags: needinfo?(ajones)
[Tracking Requested - why for this release]:
Crashes numbers has increased by 2989%.
Keywords: crashtopcrash
Keywords: crash
Flags: needinfo?(jyavenard)
Summary: crash in mp4_demuxer::MP4Metadata::GetNumberTracks, abort should let appendBuffer finish → crash in mp4_demuxer::MP4Metadata::GetNumberTracks
[@ mozilla::dom::MediaSource::Duration ] has nothing to do with the mp4 demuxer null deref
Crash Signature: [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] [@ mozilla::TrackBuffersManager::ShutdownDemuxers ] [@ mozilla::dom::MediaSource::Duration] → [@ mp4_demuxer::MP4Metadata::GetNumberTracks ] [@ mp4_demuxer::MP4Metadata::GetNumberTracks const ] [@ mozilla::TrackBuffersManager::ShutdownDemuxers ]
Blocks: MSE
Assignee: nobody → gsquelart
Note that it spiked when we turned on 45 updates (we did that when we released 45.0.1).
The crash actually spiking is bug 1258562. The crash signatures was incorrectly pointing to this bug. I have updated the signatures but the crash reports still points to here.

bug 1258562 is fixed and uplift requests are pending
Flags: needinfo?(sledru)
The crash reports point at 'CompleteResetParserState' unexpectedly happening between 'InitializationSegmentReceived' (which calls mInputDemuxer->Init()) and 'OnDemuxerInitDone', but a lot of checks should prevent that situation from ever happening. I haven't been successful in finding where things actually go wrong.

Given the critical level, Jean-Yves and I think we should start with some wall-papering to at least reduce crashes, by dealing with these anomalous situation gracefully (e.g., changing some crashing asserts into warnings, checking pointers before using them, etc.)
We'll also open a follow-up bug to keep trying to find the real cause.

Reassigning to Jean-Yves, as he knows the code better.
Assignee: gsquelart → jyavenard
Comment on attachment 8733705 [details]
MozReview Request: Bug 1258410: [MSE] P2. Disconnect init promise if any pending. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41873/diff/1-2/
Comment on attachment 8733704 [details]
MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r=gerald

https://reviewboard.mozilla.org/r/41871/#review38399
Attachment #8733704 - Flags: review?(gsquelart) → review+
Attachment #8733705 - Flags: review?(gsquelart) → review+
Comment on attachment 8733705 [details]
MozReview Request: Bug 1258410: [MSE] P2. Disconnect init promise if any pending. r=gerald

https://reviewboard.mozilla.org/r/41873/#review38403

The MOZ_DIAGNOSTIC_ASSERT's mean these strange situations will still cause crashes in Nightly&Aurora -- which I think is good because it will keep the issue on the radar, hence r+.
However one downside is that the new code (that should gracefully handles these situations) will not be tested until it lands in Beta.

I can see two options to go ahead:

1. Keep patches as-is, and request Beta uplift ASAP, so the new code is exercized quickly;
   Monitor crash reports, and be ready to respond to beta issues due to the new code, if any.

2. Remove MOZ_DIAGNOSTIC_ASSERT's for now (maybe replace them with warnings?), which should prove the new code in Nightly first;
   Uplift to Aurora and Beta;
   Then re-introduce MOZ_DIAGNOSTIC_ASSERT's so that these bad situations become visible on Nightly&Aurora again.
The diagnostic assert is already there in CreateMimeType. 

If I was to remove the new assert, the one in CreateMimeType would now never be called. 

I was thinking we could add diagnostic with telemetry. Reports whenever something like this is occurring. 

Having said that, I have a full rework on the handling of tasks pending. Which would eliminate those issues completely (having said that, what we're seeing here is in effect not possible!)
Comment on attachment 8733704 [details]
MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41871/diff/1-2/
Attachment #8733704 - Attachment description: MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r?gerald → MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r=gerald
Comment on attachment 8733705 [details]
MozReview Request: Bug 1258410: [MSE] P2. Disconnect init promise if any pending. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41873/diff/2-3/
Attachment #8733705 - Attachment description: MozReview Request: Bug 1258410: [MSE] P2. Disconnect init promise if any pending. r?gerald → MozReview Request: Bug 1258410: [MSE] P2. Disconnect init promise if any pending. r=gerald
Comment on attachment 8733705 [details]
MozReview Request: Bug 1258410: [MSE] P2. Disconnect init promise if any pending. r=gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41873/diff/3-4/
https://hg.mozilla.org/mozilla-central/rev/8573ffb50d47
https://hg.mozilla.org/mozilla-central/rev/3b23fe0cdd44
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8733704 [details]
MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r=gerald

Request for both patch

Approval Request Comment
[Feature/regressing bug #]:1258410
[User impact if declined]:crashes
[Describe test coverage new/current, TreeHerder]: in central, we had a code that would just diagnostic_assert, it now softly fails
[Risks and why]: low, crash rate didn't increase in central and aurora despite the release_assert. Will continue to monitor things
[String/UUID change made/needed]:none
Attachment #8733704 - Flags: approval-mozilla-beta?
Attachment #8733704 - Flags: approval-mozilla-aurora?
Blocks: 1259274
Comment on attachment 8733704 [details]
MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r=gerald

Possible fix for topcrash. Should add diagnostic info. 
Let's try landing this for beta 5.
Attachment #8733704 - Flags: approval-mozilla-beta?
Attachment #8733704 - Flags: approval-mozilla-beta+
Attachment #8733704 - Flags: approval-mozilla-aurora?
Attachment #8733704 - Flags: approval-mozilla-aurora+
Tracking belatedly since it was a topcrash in beta.
We should verify this fix once beta 6 has been out for a few days (I don't think this made it into beta 5)
Flags: qe-verify+
Assuming ni? related to finding someone to fix the issue
Flags: needinfo?(ajones)
(In reply to Jean-Yves Avenard [:jya] from comment #6)
> bug 1258562 is fixed and uplift requests are pending
OK, thanks. As you saw, I am going to take in a potential 45.0.2
Flags: needinfo?(sledru)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #24)
> We should verify this fix once beta 6 has been out for a few days (I don't
> think this made it into beta 5)

there's been no crash in 46.0b6 nor any in the past 7 days in builds containing the fixes.
(In reply to Jean-Yves Avenard [:jya] from comment #27)
> there's been no crash in 46.0b6 nor any in the past 7 days in builds
> containing the fixes.

If we take into account the ADI for 46.0b6, I think the number of crashes for this signature is almost meaningless:
date          2016-03-29  2016-03-30  2016-03-31
adi_count     322         3710        7353

For 46.0b5:
adi_count     1530971     1574075     1551512
crashes_count 122         126         91
There are no crashes in the last 28 days on versions > 46b5, 48.0a1 (2016-03-23), for all the 3 signatures.
Marking as verified fixed.
Status: RESOLVED → VERIFIED
No new crashes with this signatures present in Socorro for 47 branch. Marking accordingly.
Flags: qe-verify+
[Tracking Requested - why for this release]: Any plans to update ESR branch? As it's also affected , see for example:( http://forums.mozillazine.org/viewtopic.php?f=9&t=3026098 )
Comment on attachment 8733704 [details]
MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r=gerald

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Crashes, including when using YouTube
Fix Landed on Version: 46
Risk to taking this patch (and alternatives if risky): Can't think of any, has been in central for almost 1 year.
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(jyavenard)
Attachment #8733704 - Flags: approval-mozilla-esr45?
Duplicate of this bug: 1325802
Comment on attachment 8733704 [details]
MozReview Request: Bug 1258410: [MSE] P1. Abort if mInputDemuxer has been reset. r=gerald

This doesn't match the esr criteria (sec high and critical only) but, because the crash volume is huge, taking it to esr.
Attachment #8733704 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
You need to log in before you can comment on or make changes to this bug.