Closed Bug 1112822 Opened 10 years ago Closed 9 years ago

crash in mozilla::MediaTaskQueue::Dispatch(mozilla::TemporaryRef<nsIRunnable>)

Categories

(Core :: Audio/Video, defect, P1)

37 Branch
All
Windows 7
defect

Tracking

()

VERIFIED FIXED
mozilla38
Tracking Status
firefox36 + verified
firefox37 --- verified
firefox38 --- verified
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: marcia, Assigned: cpearce)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, reproducible)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-c39012c0-8b1c-4864-8230-5b5492141217.
=============================================================

Application Basics
------------------

Name: Firefox
Version: 37.0a1
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:37.0) Gecko/20100101 Firefox/37.0
Multiprocess Windows: 1/1

Crash Reports for the Last 3 Days
---------------------------------

Report ID: bp-6192743e-18da-4c69-99c8-c7c982141217
Submitted: 2 minutes ago

Report ID: bp-c39012c0-8b1c-4864-8230-5b5492141217
Submitted: 51 minutes ago

All Crash Reports

Extensions
----------

Graphics
--------

Adapter Description: Mobile Intel(R) HD Graphics
Adapter Drivers: igdumd64 igd10umd64 igd10umd64 igdumdx32 igd10umd32 igd10umd32
Adapter RAM: Unknown
Device ID: 0x0126
Direct2D Enabled: Blocked for your graphics driver version.
DirectWrite Enabled: false (6.2.9200.16571)
Driver Date: 5-19-2011
Driver Version: 8.830.7.0
GPU #2 Active: false
GPU Accelerated Windows: 0/1 Basic (OMTC) Blocked for your graphics driver version.
Subsys ID: 00000000
Vendor ID: 0x8086
WebGL Renderer: Blocked for your graphics driver version.
windowLayerManagerRemote: true
AzureCanvasBackend: skia
AzureContentBackend: cairo
AzureFallbackCanvasBackend: cairo
AzureSkiaAccelerated: 0

STR:
1. https://www.youtube.com/watch?v=-pRHO1psAYY
2. Switch to 2160p
3. The other active tab was playing https://www.youtube.com/watch?v=EOlkWAwbos4

Shortly this I also crashed in a different stack - https://crash-stats.mozilla.com/report/index/571706c5-7236-4bc7-9b51-f66652141217.

Adding bholley and cpearce since looks like they touched code related to the crash stack
The first crash looks like GetTaskQueue() is returning nullptr, the second looks like decoder.mDecoder is nullptr.

Can't reproduce either of these on OSX though.
(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> Can't reproduce either of these on OSX though.

Did you turn on mp4 for osx?
MP4 is the default for OSX now.
Keywords: reproducible
Jean-Yves - can you reproduce this issue?
Flags: needinfo?(jyavenard)
isn't this a windows issue... Can't reproduce it on mac...
Flags: needinfo?(jyavenard)
I tried reproducing this on a Mac machine running Yosemite with no luck, but I was also able to reproduce on a different Windows 7 Lenovo laptop.
Chris - any thoughts? Are we mishandling some kind of failure? Could the stack size make a difference?
Flags: needinfo?(cpearce)
Blocks: MSE
Priority: -- → P1
Assignee: nobody → ajones
Status: NEW → ASSIGNED
Flags: needinfo?(cpearce)
We encountered this crash while hitting 'Run All' from https://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/tip.html?test_type=conformance-test&timestamp=1420443875533 using latest Aurora on Windows 7 64-bit. 

Using the same steps on Mac OS X 10.9.5 I got a new crash signature only in Aurora:

bp-2da6c285-85ad-4123-baa7-261c62150105 
bp-54aeedeb-e1d3-4c12-a3cc-541f12150105

Are this two crashes related or should I log a new bug on it?

FYI, I was unable to reproduce this crash with the steps from comment 0 though using 5-6 different Nightly 36.0a1 builds.
Flags: needinfo?(cpearce)
Flags: needinfo?(ajones)
I think those crashes are caused by the MediaDecoderReader::mDecoder reference becoming stale after the MediaDecoder is destroyed but MediaDecoderReader::mDecoder isn't cleared. Probably the same cause, we can track them here I guess.
Flags: needinfo?(cpearce)
Recent crashes from yesterday - 2015010403 - indicate that this is still around in low volume.
Flags: needinfo?(ajones) → needinfo?(cpearce)
(In reply to Bogdan Maris, QA [:bogdan_maris] from comment #10)
> We encountered this crash while hitting 'Run All' from
> https://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/tip.
> html?test_type=conformance-test&timestamp=1420443875533 using latest Aurora
> on Windows 7 64-bit. 

I was able to repro a crash in Aurora/DevEdition using these STR, but he same STR don't crash Nightly.
I can reproduce the crash with a locally built Aurora. If I import 96b86345fe30 from bug 1108707 and copy dom/media/MediaPromise.* into my aurora tree and rebuild, I cannot reproduce the crash. (96b86345fe30 does not build without an updated MediaPromise.h)

Thus I conclude that this was fixed by bholley's async shutdown patch from bug 1108707, along with all the other fixes he's done to MediaPromise.

We should uplift those changesets to Aurora. We need some, or all of the non-backed-out changesets from central's `hg log dom/media/MediaPromise.h`, as well as 96b86345fe30...

Ralph: What do you think?

Alternatively, we could just write a single patch that updates MediaPromise.* to central's. That would limit the impact of other changesets being pulled in... That could get messy if we later decide to uplift more changesets...
Flags: needinfo?(cpearce)
(forgot to ni? ...)

Ralph: What do you think about my previous comment; uplifting the MediaPromise fixes and 96b86345fe30/bug 1108707?
Flags: needinfo?(giles)
Yeah IMO we should aggressively uplift all the MediaPromise stuff to aurora - these have had a lot of time to bake on m-c, and we'd benefit from having the async interaction patterns be the same on nightly/aurora (and soon, beta).
Yep, I agree.
Flags: needinfo?(giles)
I've requested uplift for bug 1108707 and all the changes to MediaPromise.* that weren't already on Aurora. Please retest after those have landed, probably in the January 8 Developer Edition build.
[Tracking Requested - why for this release]:

This is #4 with 3.3% of all crashes on DevEd 36 in the last 3 days.
After the push from bug 1108707, I did not encounter any crash while running Conformance tests (as in comment 10) with Aurora 36.0a2 (Build ID: 20150108004006) on Windows 7 64-bit and Mac OS X 10.9.5. Based on https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3AMediaTaskQueue%3A%3ADispatch(mozilla%3A%3ATemporaryRef%3CnsIRunnable%3E)&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=3 there still are 2 crashes with Aurora from 2015-01-08.

Marcia, is this crash still reproducible for you?
Flags: needinfo?(mozillamarcia.knous)
Blocks: ytb37
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
This crash is still hanging around on Nightly, but in low volume. Latest build ID is 20150112030201 which is today's build. I am not able to reproduce this on the machine where I originally reproduced it when this bug was filed.

https://crash-stats.mozilla.com/report/index/135c44f5-ea42-416d-8216-f30022150112 was one report I dug up in crash stats.
Status: RESOLVED → REOPENED
Flags: needinfo?(mozillamarcia.knous)
Resolution: WORKSFORME → ---
Chris - can you look into this?
Assignee: ajones → cpearce
Flags: needinfo?(cpearce)
https://crash-stats.mozilla.com/report/index/dfa0e6b4-6340-4cb0-b34e-af4f52150112 was a recent Windows 7 crash, which is what I originally reported this crash on. The crash in Comment 22 is Windows 8.
I just checked crash stats again, and in early 38 data this is #14 in top crashes: https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/38.0a1/date_range_type/build?days=7 - although there is certainly not significant crash volume.
I don't fully understand how this can happen, but my theory is that we TrackBuffer::EvictData() or EvictBefore runs on the main thread, calls RemoveBuffer(), which dispatches an event to reader's task queue to shutdown the reader. When that event runs, we'll dispatch an event to the main thread to run Reader::BreakCycles() which clears the reader's task queue reference. If that reader is the one we're decoding with, we may try to use that reader from the MSReader, when it has a null task queue.

Fix here is to just check from MP4Reader::Request*Data() whether MediaDecoderReader::Shutdown() has run, and reject the promise with CANCELED if so. Even if my theory is wrong, we should still be safe with this patch...
Attachment #8540463 - Attachment is obsolete: true
Flags: needinfo?(cpearce)
Attachment #8549324 - Flags: review?(matt.woodrow)
That sounds the same as https://bugzilla.mozilla.org/show_bug.cgi?id=1121288#c5, which might even cause exactly this crash in a opt build (since we won't hit the MOZ_ASSERT in RequestVideoData).

No real harm in returning early though.
Comment on attachment 8549324 [details] [diff] [review]
Patch: Don't let MP4Reader decode when shutdown

Review of attachment 8549324 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/fmp4/MP4Reader.cpp
@@ +521,5 @@
>  nsRefPtr<MediaDecoderReader::VideoDataPromise>
>  MP4Reader::RequestVideoData(bool aSkipToNextKeyframe,
>                              int64_t aTimeThreshold)
>  {
>    MOZ_ASSERT(GetTaskQueue()->IsCurrentThreadIn());

fwiw, we'll probably still crash on this line in a debug build. Not that it matters much.
Attachment #8549324 - Flags: review?(matt.woodrow) → review+
Will need uplift...
Flags: needinfo?(giles)
https://hg.mozilla.org/mozilla-central/rev/876b33985130
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Flags: needinfo?(giles)
Comment on attachment 8549324 [details] [diff] [review]
Patch: Don't let MP4Reader decode when shutdown

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: More crashes playing videos.
[Describe test coverage new/current, TBPL]: Landed on m-c.
[Risks and why]: Low. Change is straightforward and isolated.
[String/UUID change made/needed]: None.
Attachment #8549324 - Flags: approval-mozilla-beta?
Attachment #8549324 - Flags: approval-mozilla-aurora?
Attachment #8549324 - Flags: approval-mozilla-beta?
Attachment #8549324 - Flags: approval-mozilla-beta+
Attachment #8549324 - Flags: approval-mozilla-aurora?
Attachment #8549324 - Flags: approval-mozilla-aurora+
Flags: qe-verify+
I did not encounter any crash (with the steps from comment 0 and comment 10) using Firefox 36 Beta 7 (Build ID: 20150205114429) on Windows 7 64-bit and Mac OS X 10.9.5. 

Although, over the last week, we have on Soccoro [*] 2 crashes: one on Firefox 36.0b5 (buildID: 20150129200438)and one on Firefox 36.0b6 (20150202183609). 

Any thoughts about this? 

[*] - https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3AMediaTaskQueue%3A%3ADispatch%28mozilla%3A%3ATemporaryRef%3CnsIRunnable%3E%29#tab-reports
Depends on: 1129877
Socorro [1] stats over the past 4 weeks show:
* Firefox 36 - 1 crash in Beta 5 + 3 crashes in Beta 6
* Firefox 37 - 1 crash in builds after January 15th
* Firefox 38 - 0 crashes in builds after January 15th

[1] - https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=mozilla%3A%3AMediaTaskQueue%3A%3ADispatch%28mozilla%3A%3ATemporaryRef%3CnsIRunnable%3E%29#tab-reports

Given the very low volume and the fact that this could not be reproduced while testing manually, I'm marking this as verified.
Forgot to mention that there are also no more crashes with signature reported in comment 10.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: