Open Bug 1277626 Opened 4 years ago Updated 2 years ago

Crashes in CreateDecodeDeviceLH

Categories

(Core :: Audio/Video: Playback, defect, P3, critical)

Unspecified
Windows
defect

Tracking

()

REOPENED
mozilla50
Tracking Status
firefox48 - affected
firefox49 --- affected
firefox50 --- affected
firefox51 --- affected
firefox52 --- wontfix
firefox53 --- affected

People

(Reporter: milan, Assigned: mattwoodrow)

References

Details

(Keywords: crash, stale-bug)

Crash Data

Attachments

(4 files)

These crashes represent ~43% of all Intel driver crashes.  The top two signatures ~17% of all Intel driver crashes are in this group.

They also account for 42% of all ATI crashes in the said DLL, and 9% of all the Nvidia crashes.

I'd say we take a hard look at these ones.  Let me know how graphics can help.
Flags: needinfo?(bugs)
Flags: needinfo?(ajones)
Matt, can we wrap try/catch around the mDecoderService->CreateVideoDecoder call in here 
https://dxr.mozilla.org/mozilla-central/source/dom/media/platforms/wmf/DXVA2Manager.cpp#223 and would it catch the crash and help us recover?
Flags: needinfo?(matt.woodrow)
I can randomly see some of these are D3D9 DXVA, but I can't tell if it's all of them.
Attached image Crashes graph
From http://ashughes1.github.io/metrics-graphics-gfx/, put CreateDecodeDeviceLH in the input box, press Graph if nothing shows up.

Is the spike because of the release of 44?
(In reply to Milan Sreckovic [:milan] from comment #2)
> Matt, can we wrap try/catch around the mDecoderService->CreateVideoDecoder
> call in here 
> https://dxr.mozilla.org/mozilla-central/source/dom/media/platforms/wmf/
> DXVA2Manager.cpp#223 and would it catch the crash and help us recover?

This callsite is outside of outside of our startup crash guards (only the ::Init function is covered).

I think we should definitely fix that, so that these crashes only happen once, rather than repeatedly.

I'm a little scared about try/catch, since we don't know if any memory has been corrupted, so we might just crash later on in places that make it harder to track.

Ultimately, I think the solution is to get accelerated video decoding running out of process (we'll be discussing this more in London), and to have crash guards so that we don't repeatedly crash.

How about we start with the crash guard, and then consider adding try/catch or blacklisting if we decide we want to prevent the one-off crash too.
Flags: needinfo?(matt.woodrow)
Crash guard would be awesome, with the annotations, maybe even with telemetry?
(In reply to Milan Sreckovic [:milan] from comment #0)
> With Intel graphics:
> 
> https://crash-stats.mozilla.com/search/?product=Firefox&signature=~igdumd32.
> dll&proto_signature=~CreateDecodeDeviceLH&_facets=signature&_columns=date&_co
> lumns=signature&_columns=product&_columns=version&_columns=build_id&_columns=
> platform#facet-signature
> 
> With AMD graphics:
> 
> https://crash-stats.mozilla.com/search/
> ?product=Firefox&proto_signature=~CreateDecodeDeviceLH&signature=~atiumdva&_f
> acets=signature&_columns=date&_columns=signature&_columns=product&_columns=ve
> rsion&_columns=build_id&_columns=platform#facet-signature
> 
> With Nvidia graphics:
> 
> https://crash-stats.mozilla.com/search/
> ?product=Firefox&proto_signature=~CreateDecodeDeviceLH&signature=~nvd3dum.
> dll&_facets=signature&_columns=date&_columns=signature&_columns=product&_colu
> mns=version&_columns=build_id&_columns=platform#facet-signature

Why do these queries not return crashes in 48 and 49? Did we blacklist some of these drivers in 48/49? We blacklisted a bunch of ATI drivers for D3D9 in 49 in bug 1265281, and igdumd64.dll in bug 1271525. But those are only in 49 thus far; they've not been uplift yet.
David, can you add a crash guard for this call site too please?
Flags: needinfo?(dvander)
(In reply to Chris Pearce (:cpearce) from comment #7)
> ...
> Why do these queries not return crashes in 48 and 49? Did we blacklist some
> of these drivers in 48/49? We blacklisted a bunch of ATI drivers for D3D9 in
> 49 in bug 1265281, and igdumd64.dll in bug 1271525. But those are only in 49
> thus far; they've not been uplift yet.

Good question.  I don't know enough about video changes that went into 48, but it's also possible something else (e.g. in graphics) changed that moved or masked these crashes.

I would suggest we add telemetry for how many actually get through these crash guards.  We probably don't want to end up in a situation where we block everything.
OS: Unspecified → Windows
(In reply to Matt Woodrow (:mattwoodrow) from comment #8)
> David, can you add a crash guard for this call site too please?

This may be a good first project for Ryan?
Chris Pearce is aware of this so I'll remove the needinfo.
Component: Audio/Video → Audio/Video: Playback
Flags: needinfo?(ajones)
Priority: -- → P1
Assignee: nobody → rhunt
Status: NEW → ASSIGNED
Flags: needinfo?(dvander)
Comment on attachment 8762189 [details] [diff] [review]
Added crash guards in SupportsConfig, telemetry in D3D*VideoCrashGuard.

Review of attachment 8762189 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/src/DriverCrashGuard.cpp
@@ +506,5 @@
>  
> +void
> +D3D9VideoCrashGuard::Initialize()
> +{
> +  if (!XRE_IsParentProcess()) {

We basically only run videos in the child process, so this is going to stop this being effective for most cases.
Attachment #8762189 - Flags: review?(matt.woodrow) → review-
Attached patch bug1277626.patchSplinter Review
Attachment #8763269 - Flags: review?(matt.woodrow)
Comment on attachment 8763269 [details] [diff] [review]
bug1277626.patch

Talked about this offline, but: this still won't work because NS_IsMainThread() will be false. The easiest (or maybe only sane way) to fix this would be to have a new PDriverCrashGuard top-level protocol that PContentParent opens on the main thread and PContentChild bridges on the video thread.

The other solution would be doing some hack to block the video thread and post a message to the main event loop, and have that event wake up the video thread when we've received a crash guard response from the parent process.

With either of these solutions we'd have to make sure we're not going to run into deadlocks (i.e. cpow to Parent -> Child -> Video and guard Video -> Parent).

Milan, is this crash severe enough that we want a fix in the next release or two, or is it something that can wait for out-of-process video?
Flags: needinfo?(milan)
Attachment #8763269 - Flags: review?(matt.woodrow)
More than 4000 crashes in a week, and this is with the blocklisting we are doing on a number of configurations, so I would like to pursue this more, and see what we can come up with that doesn't need OOP video.
Flags: needinfo?(milan)
I think the best approach here would be to defer CreateDecoder calls until the driver crash guard has been confirmed. The way it would work is like this:

1. Decoder thread dispatches to the main thread: "initiate a driver crash guard with the parent process".
2. Decoder thread then goes idle.
3. Main thread notifies the decoder thread whether or not it is okay to proceed.
4. If okay, decoder thread calls CreateDecoder.
5. If the call does not crash, decoder dispatches to the main thread "tell the parent process that the decoder did not crash".

Media folks, how hard would it be to factor out this initialization step so it can be interrupted and resumed?
Assignee: rhunt → dvander
Flags: needinfo?(cpearce)
Flags: needinfo?(jyavenard)
(In reply to David Anderson [:dvander] from comment #17)
> Media folks, how hard would it be to factor out this initialization step so
> it can be interrupted and resumed?

I think jya is the best person to answer (and implement) this.

I expect it would involve changing MediaFormatReader::EnsureDecoderCreated() to return a MozPromise, and then changing all PlatformDecoderModules' CreateXXDecoder() functions to also return MozPromises. jya may be able to come up with a better solution here.
Flags: needinfo?(cpearce)
we should simply move the WMF creation to be performed during the initialisation stage rather than in the constructor like it does now.

All other platforms (linux, mac, android) are already doing it that way. Unfortunately not on WMF.

One issue remain however is how to determine if the platform supports the codec or not. Right now this is done by creating a decoder and simply checking if the creation succeeded. It can't be promise based.
Assignee: dvander → jyavenard
Flags: needinfo?(jyavenard)
[Tracking Requested - why for this release]: 3700+ crashes in the past week.
Crash Signature: [@ @0x0 | igdumd32.dll | atiu9pag.dll | CreateDecodeDeviceLH ] [@ atiumdva.dll | atiu9pag.dll | CreateDecodeDeviceLH ] [@ @0x0 | igdumdim32.dll | CreateDecodeDeviceLH ] [@ @0x0 | igdumdim32.dll | CreateDecodeDeviceLH ]
Crash Signature: [@ @0x0 | igdumd32.dll | atiu9pag.dll | CreateDecodeDeviceLH ] [@ atiumdva.dll | atiu9pag.dll | CreateDecodeDeviceLH ] [@ @0x0 | igdumdim32.dll | CreateDecodeDeviceLH ] [@ @0x0 | igdumdim32.dll | CreateDecodeDeviceLH ] → [@ @0x0 | igdumd32.dll | atiu9pag.dll | CreateDecodeDeviceLH ] [@ atiumdva.dll | atiu9pag.dll | CreateDecodeDeviceLH ] [@ @0x0 | igdumdim32.dll | CreateDecodeDeviceLH ] [@ @0x0 | igdumdim32.dll | CreateDecodeDeviceLH ] [@ nvd3dum.dll | CreateDecodeDev…
Track this as there is high volume of crashes in 48.
Hi Blake,
Can you help on this?
Flags: needinfo?(bwu)
(In reply to Gerry Chang [:gchang] from comment #22)
> Hi Blake,
> Can you help on this?
Anthony has talked with jya about this via IRC, jya will have a go at it on Monday.
Flags: needinfo?(bwu)
Flags: needinfo?(jyavenard)
Was those crashes present in 47 and earlier?

There's been no much changes in the way decoders are created and initialised since 41. because if it's something spiking in 48 it may be something else... (or maybe just dxva11 crashing quicker)
Flags: needinfo?(jyavenard)
Flags: needinfo?(jyavenard)
Assignee: jyavenard → matt.woodrow
Flags: needinfo?(jyavenard)
Flags: needinfo?(bugs)
Attachment #8771836 - Flags: review?(jyavenard)
Attachment #8771836 - Flags: review?(dvander)
Comment on attachment 8771836 [details] [diff] [review]
main-thread-crash-guard-video

Review of attachment 8771836 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/platforms/wmf/WMFVideoMFTManager.cpp
@@ +508,5 @@
> +    , mSupportsConfig(false)
> +  {}
> +
> +  NS_IMETHOD Run() {
> +    NS_ASSERTION(NS_IsMainThread(), "Must be on main thread.");

Strong assert would be better no?

@@ +553,5 @@
> +    new SupportsConfigEvent(mDXVA2Manager, aType, framerate);
> +
> +  if (NS_IsMainThread()) {
> +    event->Run();
> +  }

} else {
Attachment #8771836 - Flags: review?(jyavenard) → review+
Comment on attachment 8771836 [details] [diff] [review]
main-thread-crash-guard-video

Review of attachment 8771836 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/platforms/wmf/WMFVideoMFTManager.cpp
@@ +555,5 @@
> +  if (NS_IsMainThread()) {
> +    event->Run();
> +  }
> +  else {
> +    NS_DispatchToMainThread(event, NS_DISPATCH_SYNC);

And this won't deadlock, right?
Attachment #8771836 - Flags: review?(dvander) → review+
(In reply to David Anderson [:dvander] from comment #27)
> Comment on attachment 8771836 [details] [diff] [review]
> main-thread-crash-guard-video
> 
> Review of attachment 8771836 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/platforms/wmf/WMFVideoMFTManager.cpp
> @@ +555,5 @@
> > +  if (NS_IsMainThread()) {
> > +    event->Run();
> > +  }
> > +  else {
> > +    NS_DispatchToMainThread(event, NS_DISPATCH_SYNC);
> 
> And this won't deadlock, right?

I *believe* so, but it's hard to be entirely sure. We do the same thing in Init (from the same thread) without issues, so that's a good sign.
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6047635058ed
Create test DXVA decoders on the main thread so that we can protect them with a crash guard. r=jya,dvander
https://hg.mozilla.org/mozilla-central/rev/6047635058ed
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Crash volume for signature '@0x0 | igdumd32.dll | atiu9pag.dll | CreateDecodeDeviceLH':
 - Ni.  (version 51): 0 crashes from 2016-08-01.
 - Aur. (version 50): 0 crashes from 2016-08-01.
 - Beta (version 49): 8 crashes from 2016-08-02.
 - Rel. (version 48): 2 crashes from 2016-07-25.
 - Esr  (version 45): 0 crashes from 2016-04-20.

Crash volume on the last weeks (Week N is from 08-08 to 08-14):
              W. N-1  W. N-2
 - Beta (49)       4
 - Rel. (48)       0       0

Affected platform: Windows

Crash volume for signature 'igdumd32.dll | RtlConsoleMultiByteToUnicodeN | igdumd32.dll | CreateDecodeDeviceLH':
 - Ni.  (version 51): 0 crashes from 2016-08-01.
 - Aur. (version 50): 0 crashes from 2016-08-01.
 - Beta (version 49): 46 crashes from 2016-08-02.
 - Rel. (version 48): 30 crashes from 2016-07-25.
 - Esr  (version 45): 3 crashes from 2016-04-20.

Crash volume on the last weeks (Week N is from 08-08 to 08-14):
              W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - Beta (49)      18
 - Rel. (48)      15       0
 - Esr  (45)       0       0       0       2       0       0       0

Affected platform: Windows

Crash volume for signature 'nvd3dum.dll | CreateDecodeDeviceLH':
 - Ni.  (version 51): 0 crashes from 2016-08-01.
 - Aur. (version 50): 0 crashes from 2016-08-01.
 - Beta (version 49): 40 crashes from 2016-08-02.
 - Rel. (version 48): 51 crashes from 2016-07-25.
 - Esr  (version 45): 98 crashes from 2016-04-20.

Crash volume on the last weeks (Week N is from 08-08 to 08-14):
              W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - Beta (49)      11
 - Rel. (48)      12       0
 - Esr  (45)      12       9       5      10      17      17       4

Affected platform: Windows

Crash volume for signature 'atiumdva.dll | atiu9pag.dll | CreateDecodeDeviceLH':
 - Ni.  (version 51): 0 crashes from 2016-08-01.
 - Aur. (version 50): 0 crashes from 2016-08-01.
 - Beta (version 49): 104 crashes from 2016-08-02.
 - Rel. (version 48): 173 crashes from 2016-07-25.
 - Esr  (version 45): 61 crashes from 2016-04-20.

Crash volume on the last weeks (Week N is from 08-08 to 08-14):
              W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - Beta (49)      47
 - Rel. (48)      74       0
 - Esr  (45)       5       9      11      12       6       9       2

Affected platform: Windows

Crash volume for signature '@0x0 | igdumdim32.dll | CreateDecodeDeviceLH':
 - Ni.  (version 51): 0 crashes from 2016-08-01.
 - Aur. (version 50): 2 crashes from 2016-08-01.
 - Beta (version 49): 181 crashes from 2016-08-02.
 - Rel. (version 48): 55 crashes from 2016-07-25.
 - Esr  (version 45): 436 crashes from 2016-04-20.

Crash volume on the last weeks (Week N is from 08-08 to 08-14):
              W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - Aur. (50)       1
 - Beta (49)      57
 - Rel. (48)      26       0
 - Esr  (45)      80      74      55      69      51      41      17

Affected platform: Windows
According to comment #31, the volume is not that big.
However, Liz would be probably happy to take a patch
Flags: needinfo?(matt.woodrow)
Keywords: crash
Comment on attachment 8771836 [details] [diff] [review]
main-thread-crash-guard-video

Approval Request Comment
[Feature/regressing bug #]: Unknown, long standing bug
[User impact if declined]: Buggy graphics drivers can repeatedly crash firefox/content process. This stops us from attempting the buggy functionality more than once.
[Describe test coverage new/current, TreeHerder]: Minimal.
[Risks and why]: Very low risk, expands the existing crash guard mechanism to another callsite.
[String/UUID change made/needed]: None.
Flags: needinfo?(matt.woodrow)
Attachment #8771836 - Flags: approval-mozilla-beta?
Comment on attachment 8771836 [details] [diff] [review]
main-thread-crash-guard-video

Stopping repeated crashes sounds good, let's take this for beta 6 (or 7 depending when we can build today)
Attachment #8771836 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Doesn't look like this is fixed, based on the crash reports?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
signatures like this are still appearing in current nightly builds. on 51.0b [@ @0x0 | igdumd32.dll | CreateDecodeDeviceLH] is accounting for 0.3% of browser crashes...
Crash Signature: CreateDecodeDeviceLH ] [@ igdumd32.dll | RtlConsoleMultiByteToUnicodeN | igdumd32.dll | CreateDecodeDeviceLH ] → CreateDecodeDeviceLH ] [@ igdumd32.dll | RtlConsoleMultiByteToUnicodeN | igdumd32.dll | CreateDecodeDeviceLH ] [@ @0x0 | igdumd32.dll | CreateDecodeDeviceLH ] [@ igdumd32.dll | CreateDecodeDeviceLH ]
Mass wontfix for bugs affecting firefox 52.
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2
Severity: normal → critical
You need to log in before you can comment on or make changes to this bug.