Closed Bug 1048099 Opened 5 years ago Closed 5 years ago

crash in mozilla::layers::DXGITextureHostD3D9::OpenSharedHandle()

Categories

(Core :: Graphics, defect, critical)

32 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla34
Tracking Status
firefox33 + verified
firefox34 --- verified

People

(Reporter: dmajor, Assigned: nical)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-a9a24db6-98de-47d1-a0f9-521562140731.
=============================================================

First seen on Nightly 7/30 and Aurora 7/31. Currently #7 crash on Nightly 34 and #12 on Aurora 33.

95% Intel gfx.

The crash addresses are strangely consistent. On x86, the device from GetDevice() always has a vtable pointer of 0x8b55ff8b. On x64, it's 0x90c1834890418b48 (crash-stats shows it as 0xff... due to a bug).
Flags: needinfo?(nical.bugzilla)
[Tracking Requested - why for this release]:
Interestingly, googling "0x8b55ff8b" shows a lot of crash reports in different programs, always on Windows. There is often mention of an uncaught C++ exception.
I see that DeviceManagerD3D9::DestroyDevice doesn't actually clear the DeviceManagerD3D9's mDevice member. This means that event if the deivce was reset for some reason, DeviceManagerD3D9::device() will happily return mDevice. I don't know that code at all but (from far away) it looks to me like the DeviceManager should null out mDevice in DestroyDevice or not return it in device() if mDeviceWasRemoved is equal to true, or even be recreated entirely.

This would explain why the regression is caused by bug 1041416, since the only difference with how the device is accessed between before and after the bug, is that before we would ask the DeviceManager from the gfxWindowPlatform singleton. The latter is notified when a device is lost and will recreate a new device manager.

Bas, is what I am saying making sense?
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla) → needinfo?(bas)
(In reply to Nicolas Silva [:nical] from comment #3)
> I see that DeviceManagerD3D9::DestroyDevice doesn't actually clear the
> DeviceManagerD3D9's mDevice member. This means that event if the deivce was
> reset for some reason, DeviceManagerD3D9::device() will happily return
> mDevice. I don't know that code at all but (from far away) it looks to me
> like the DeviceManager should null out mDevice in DestroyDevice or not
> return it in device() if mDeviceWasRemoved is equal to true, or even be
> recreated entirely.
> 
> This would explain why the regression is caused by bug 1041416, since the
> only difference with how the device is accessed between before and after the
> bug, is that before we would ask the DeviceManager from the
> gfxWindowPlatform singleton. The latter is notified when a device is lost
> and will recreate a new device manager.
> 
> Bas, is what I am saying making sense?

It does.
Flags: needinfo?(bas)
CompositorD3D9 actually has a mechanism to check the validity of the device, but it is run before we start to draw a frame (see CompositorD3D9::Ready). if the device is lost, a TextureD3D9 coming asynchronously from async-video may call device() before we handle recreating the device. Recreating the device involves invalidating the layer tree and a bunch of other things so I'd rather not have it triggered by async-video. So let's just make sure to not return a device in this case. This will delay the deserialization of the DXGI handle to the next time we lock the texture with a valid device.

The alternative is to have the DXGI TextureHost always use gfxWindowsPlatform's device as it used to, but I am not found of the idea of a texture and the compositor's device being out of sync. And it might complicate things if at some point we decide to have compositors use different devices.
Attachment #8467095 - Flags: review?(bas)
Comment on attachment 8467095 [details] [diff] [review]
Check that the d3d9 wasn't lost before providing access to it

Review of attachment 8467095 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d9/CompositorD3D9.h
@@ +91,5 @@
>    {
> +    // If the reset counts don't match it means the device was lost and we are
> +    // in the process of recreating a new one or will be soon.
> +    // cf. comment in EnsureSwapChain.
> +    return mDeviceManager && mDeviceResetCount == mDeviceManager->GetDeviceResetCount()

This is a little confusing, if it works, it's fine, but we need to make this all a little bit more understandable long term.
Attachment #8467095 - Flags: review?(bas) → review+
Crash, tracking.
Noting this is the #3 topcrash on Nightly right now with 956/14248 crashes in the last 7 days.
Keywords: topcrash
https://hg.mozilla.org/mozilla-central/rev/8deaf054ca33
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
nical, this needs and uplift to aurora, can you please request approval?
Flags: needinfo?(nical.bugzilla)
QA Whiteboard: [qa+]
Comment on attachment 8467095 [details] [diff] [review]
Check that the d3d9 wasn't lost before providing access to it

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Windows OMTC blocker
[Describe test coverage new/current, TBPL]:
[Risks and why]: low, very small change, has been on m-c for a week or so.
[String/UUID change made/needed]:
Attachment #8467095 - Flags: approval-mozilla-aurora?
Flags: needinfo?(nical.bugzilla)
Attachment #8467095 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
There are no reports of this crash with builds since the fix was checked in.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.