Handle device resets in the GPU process

RESOLVED FIXED in Firefox 52

Status

()

defect
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: dvander, Assigned: rhunt)

Tracking

unspecified
mozilla52
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(2 attachments, 1 obsolete attachment)

The final Present call in CompositorD3D11 can indicate that the device was reset, in which case the compositor needs to be restarted and content repainted. Normally we detect this before painting a frame in the UI, and propagate it via nsBaseWidget/CompositorSession [1] [2]. It's a little complicated but it gets the job done.

With the GPU process we have an opportunity to change this, or we could just keep it as-is for now, but we do need to make sure something works. Possibilities:
 (1) Send a message back to the GPUChild, asking it to terminate the GPU process.
 (2) Send a message back to the CompositorWidget, telling it to call [1].
 (3) Send a message back to the GPUChild, asking it to throw away CompositorSessions *without* throwing away the GPU process. This would re-use the machinery from GPU process restarts - just rebuilding the IPDL bridges and repainting.

I suspect (1) or (2) are the easiest ways forward for now. Possible ways to test are doing a driver upgrade/downgrade while Firefox is running, or disabling/re-enabling the drivers in Device Manager. Bas also has some external tool to trigger resets.

[1] http://searchfox.org/mozilla-central/rev/8cf1367dd89cc36ef8f025dfc6af6d5c086838a7/widget/nsBaseWidget.cpp#316
[2] http://searchfox.org/mozilla-central/rev/8cf1367dd89cc36ef8f025dfc6af6d5c086838a7/widget/windows/nsWindowGfx.cpp#180
Assignee: dvander → rhunt
Whiteboard: [gfx-noted]
Priority: -- → P2
Assignee

Comment 1

3 years ago
I've gotten a few different methods working with pros and cons. Option (1) works well and is easy to implement but has the drawback that we lose the GPU process and fall back into software. Option (2) works very well and is also relatively easy to implement and has the advantage that we keep the GPU process. The only extra work is in resetting the dx devices so we keep acceleration. Option (3) does not seem to be feasible at this time. I ran into a few different shutdown issues going down this route.

Looking into it more, I identified a potential Option (4), which is when we detect a device reset in the compositor, we post a task to initiate resets locally, and don't go through the UI process as in Option (2). Although we need to initiate a repaint and invalidate layers after the reset, and so a message to the UI process still needs to be sent just like in Option (2). We could reuse this message here [1]. The advantage of this approach is that it makes the most sense in a GPU process only world. The problem is that I can't figure out a way to make it work well when we don't have the GPU process.

So I think going with Option (2) is best as it is a simple change and works well.

[1] http://searchfox.org/mozilla-central/source/gfx/layers/ipc/CompositorBridgeChild.cpp#328
Assignee

Comment 2

3 years ago
Add a ResetCompositor IPDL message and make it so that CompositorSessions can be Reset. LocalCompositorSessions can use the existing mechanism, while RemoteCompositorSessions can use IPDL.
Attachment #8808006 - Flags: review?(dvander)
Assignee

Comment 3

3 years ago
This patch detects when a device reset has happened and syncs the information to the UI process where it hooks into the existing code in nsBaseWidget for device resets. It also handles the DX11 device reset.
Attachment #8808007 - Flags: review?(dvander)
Attachment #8808006 - Flags: review?(dvander) → review+
Comment on attachment 8808007 [details] [diff] [review]
Sync device resets to the UI process

Review of attachment 8808007 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/ipc/GPUParent.cpp
@@ +128,5 @@
> +#ifdef XP_WIN
> +  auto deviceManager = DeviceManagerDx::Get();
> +
> +  DeviceResetReason reason;
> +  if (!deviceManager->GetAnyDeviceRemovedReason(&reason)) {

Suggesting a small change here: don't return if it's false. We probably always want to reset the device if the compositor found a reset. However, if it returns true, we should collect telemetry like here:

http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/gfx/thebes/gfxWindowsPlatform.cpp#443

Maybe there is a way to common this out to DeviceManagerDx, but if not, I'm not worried about it.

::: gfx/layers/d3d11/CompositorD3D11.cpp
@@ +996,5 @@
>      *aRenderBoundsOut = IntRect();
> +
> +    // If we are in the GPU process then the main process doesn't
> +    // know that a device reset has happened and needs to be informed
> +    if (GPUParent::GetSingleton() && !mNotifiedParentDeviceReset) {

You can use XRE_IsGPUProcess() here.

@@ +998,5 @@
> +    // If we are in the GPU process then the main process doesn't
> +    // know that a device reset has happened and needs to be informed
> +    if (GPUParent::GetSingleton() && !mNotifiedParentDeviceReset) {
> +      mNotifiedParentDeviceReset = true;
> +      GPUParent::GetSingleton()->NotifyDeviceReset();

mNotifiedParentDeviceReset is a little odd, since it's per-compositor but all the compositors share one device. Maybe it's better to just defer to NotifyDeviceReset, and handle duplicate messages there? Otherwise we could reset multiple times by accident.

Cheesy way to do it would be for compositors to pass the ID3D11Device in a RefPtr to NotifyDeviceReset. Then GPUParent would only do the reset if the device ptr matched what's in DeviceManagerDx.

One thing I'm worried about is that we'll have two ID3D11Devices in flight while all the compositors are busy getting reset notifications from the UI process. But I think that behavior is pre-existing, and at least now if we crash, we have a fallback.
Attachment #8808007 - Flags: review?(dvander)
Also: very happy that this approach worked out! After this we should have the best device-reset handling we've ever had.
Assignee

Comment 6

3 years ago
(In reply to David Anderson [:dvander] from comment #4)
> Comment on attachment 8808007 [details] [diff] [review]
> Sync device resets to the UI process
> 
> Review of attachment 8808007 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/ipc/GPUParent.cpp
> @@ +128,5 @@
> > +#ifdef XP_WIN
> > +  auto deviceManager = DeviceManagerDx::Get();
> > +
> > +  DeviceResetReason reason;
> > +  if (!deviceManager->GetAnyDeviceRemovedReason(&reason)) {
> 
> Suggesting a small change here: don't return if it's false. We probably
> always want to reset the device if the compositor found a reset. However, if
> it returns true, we should collect telemetry like here:
> 
> http://searchfox.org/mozilla-central/rev/
> f5c9e9a249637c9abd88754c8963ecb3838475cb/gfx/thebes/gfxWindowsPlatform.
> cpp#443
> 
> Maybe there is a way to common this out to DeviceManagerDx, but if not, I'm
> not worried about it.
> 

I can't see a good way to factor this out to DeviceManagerDx without cluttering the code there. Most code like this seems to go in gfxWindowsPlatform, which we can't use here.

I'm not the happiest about having the code in GPUParent, but the initialization code for DxDeviceManager is in GPUParent too. So at least it's all in one place.

> @@ +998,5 @@
> > +    // If we are in the GPU process then the main process doesn't
> > +    // know that a device reset has happened and needs to be informed
> > +    if (GPUParent::GetSingleton() && !mNotifiedParentDeviceReset) {
> > +      mNotifiedParentDeviceReset = true;
> > +      GPUParent::GetSingleton()->NotifyDeviceReset();
> 
> mNotifiedParentDeviceReset is a little odd, since it's per-compositor but
> all the compositors share one device. Maybe it's better to just defer to
> NotifyDeviceReset, and handle duplicate messages there? Otherwise we could
> reset multiple times by accident.
> 
> Cheesy way to do it would be for compositors to pass the ID3D11Device in a
> RefPtr to NotifyDeviceReset. Then GPUParent would only do the reset if the
> device ptr matched what's in DeviceManagerDx.
> 

Hmm yeah I agree. Passing the device in a refptr would work. But that would force the method to be wrapped in #ifdef XP_WIN which isn't the greatest.

My attempt at this problem was to use GetAnyDeviceRemovedReason() to see if we'd already sent the reset. Because after we have run through NotifyDeviceReset(), GetAnyDeviceRemovedReason() should be false. Even if we don't manage to reinit a new compositordevice we'd still be left with null from Reset().

Would this be a solution?

> One thing I'm worried about is that we'll have two ID3D11Devices in flight
> while all the compositors are busy getting reset notifications from the UI
> process. But I think that behavior is pre-existing, and at least now if we
> crash, we have a fallback.

Yeah I can see that being a problem too. I think the fix might be to batch all the ResetCompositor messages and reset everything first then do the new compositor construction. The fix for this would also apply to the non-gpu process system too.

But I think the current behavior is good for the first attempt.
(In reply to Ryan Hunt [:rhunt] from comment #6)
> (In reply to David Anderson [:dvander] from comment #4)
> > Comment on attachment 8808007 [details] [diff] [review]
> > Sync device resets to the UI process
> > 
> I can't see a good way to factor this out to DeviceManagerDx without
> cluttering the code there. Most code like this seems to go in
> gfxWindowsPlatform, which we can't use here.
> 

Maybe add a function called MaybeResetAndReacquireDevices() or something, that returns true if a reset happened? That way all the device stuff (and the telemetry) is encapsulated in DeviceManagerDx, even if gfxWindowsPlatform can't share it.

> My attempt at this problem was to use GetAnyDeviceRemovedReason() to see if
> we'd already sent the reset. Because after we have run through
> NotifyDeviceReset(), GetAnyDeviceRemovedReason() should be false. Even if we
> don't manage to reinit a new compositordevice we'd still be left with null
> from Reset().

Yup, okay, I think that works. We can just remove the per-compositor notification bit then.

> Yeah I can see that being a problem too. I think the fix might be to batch
> all the ResetCompositor messages and reset everything first then do the new
> compositor construction. The fix for this would also apply to the non-gpu
> process system too.
> 
> But I think the current behavior is good for the first attempt.

Yeah, we can leave that for another day. It's hard because you have to basically shut down all the layer managers and compositors before you swap devices.
Assignee

Comment 8

3 years ago
Fixes + remove per compositor flag
Attachment #8808007 - Attachment is obsolete: true
Attachment #8808786 - Flags: review?(dvander)
Comment on attachment 8808786 [details] [diff] [review]
Sync device resets to the UI process

Review of attachment 8808786 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/CompositorD3D11.cpp
@@ +995,5 @@
>      *aRenderBoundsOut = IntRect();
> +
> +    // If we are in the GPU process then the main process doesn't
> +    // know that a device reset has happened and needs to be informed
> +    if (XRE_GetProcessType() == GeckoProcessType_GPU) {

Can use XRE_IsGPUProcess() here.
Attachment #8808786 - Flags: review?(dvander) → review+

Comment 11

3 years ago
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cf317263531
Allow sending reset compositor messages over IPDl. r=dvander
https://hg.mozilla.org/integration/mozilla-inbound/rev/31fe465d3932
Sync a device reset from GPU process to main process. r=dvander

Comment 12

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0cf317263531
https://hg.mozilla.org/mozilla-central/rev/31fe465d3932
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.