Force a repaint after TDRs

RESOLVED FIXED in Firefox 46

Status

()

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: dvander, Assigned: dvander)

Tracking

(Blocks: 1 bug)

unspecified
mozilla46
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox46 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

When I get a TDR, Firefox just paints black. It looks like the only place that calls UpdateRenderMode() is nsWindow::OnPaint, which isn't necessarily going to be called anytime soon.
Created attachment 8639550 [details] [diff] [review]
fix

Not sure if this is a good approach. I wasn't sure if nsRefreshDriver should just directly clear compositors, or if we should RedrawWindow() as soon as we detect a TDR, or what.
Attachment #8639550 - Flags: review?(bas)
Hrm, a TDR -should- be triggering a WM_PAINT event after it has occurred, is it perhaps due to the order of events that something here is going wrong? I'm not against doing this, but I'd rather not if it's unneeded.
Do you happen ton know what's going on here/are able to reproduce this?
Flags: needinfo?(dvander)

Comment 4

3 years ago
I've been having frequent TDRs on my machine and seeing this behavior around 50% of the time (solid black non-repainting FF window after TDR). Is there something I can do to help you get data to answer your questions, Bas?
Comment on attachment 8639550 [details] [diff] [review]
fix

Review of attachment 8639550 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +1204,5 @@
> +  if (!DidRenderingDeviceReset()) {
> +    return false;
> +  }
> +
> +  // Trigger an ::OnPaint for each window.

This may be worth a telemetry counter?
(In reply to Bas Schouten (:bas.schouten) from comment #2)
> Hrm, a TDR -should- be triggering a WM_PAINT event after it has occurred, is
> it perhaps due to the order of events that something here is going wrong?
> I'm not against doing this, but I'd rather not if it's unneeded.

I don't see evidence of this in the MSDN docs, outside of DirectComposition. I did a synthetic test for this bug, but I'll try to do a real one as well.
It's pretty easy to trigger device resets by disabling the GPU drivers in Device Manager. I'd say about 50% of the time when I do this, Firefox becomes unresponsive. It will still respond to input events, but painting is completely broken. Getting to the URL or getting the tab to reload sometimes fixes it.

When a TDR does happen, in a debug build, there's pretty much an endless amount of assertions that trigger. I think we would need an extension to gfxCriticalError to address this. I'm still trying to narrow down why painting stops.
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #7)
> It's pretty easy to trigger device resets by disabling the GPU drivers in
> Device Manager. I'd say about 50% of the time when I do this, Firefox
> becomes unresponsive. It will still respond to input events, but painting is
> completely broken. Getting to the URL or getting the tab to reload sometimes
> fixes it.
> 
> When a TDR does happen, in a debug build, there's pretty much an endless
> amount of assertions that trigger. I think we would need an extension to
> gfxCriticalError to address this. I'm still trying to narrow down why
> painting stops.

I think I've changed my mind and think we should do this. I feel we should fire this off with a 1 or 2 second delay though to make sure it arrives after the device reset has completed.

Fwiw any TDR I trigger locally on purpose doesn't cause any issues. My intel drivers seem to be crashy those and some of those -do-. They do seem to trigger a WM_PAINT, but that seems to arrive too 'early' (i.e. the paint is run before we seem to have reset the device), hence my suggestion of the delay just to be 100% sure this works.
I'm seeing similar results. My test is, I disable the Intel GPU, then re-enable it.

The initial disable causes a WM_PAINT, TDR, and then another WM_PAINT. When we re-acquire the device, there's a VendorID mismatch: nsIGfxInfo reports Intel (0x8086), but DXGI reports the Microsoft Basic Render Driver (0x1414). Firefox still seems to work fine.

When I re-enable the Intel device, nothing happens. Firefox continues painting, but none of the new pixels ever reach the screen, so it appears completely unresponsive.

If I Ctrl+L into the URL bar and type (which creates a child window), one of two things happens. About 50% of the time we immediately crash in CopySubresourceRegion deep in Intel drivers (this is probably what you saw). Sometimes we just get the WM_PAINT+TDR sandwich and resume painting fine. In the crash case, the DXGI adapter has the real vendor ID so probably we did get a TDR and the drivers are just buggy.

Anyway, that's all pretty mysterious. I'll test the delayed WM_PAINT tomorrow and see how that changes things.
(In reply to David Anderson [:dvander] from comment #7)

> When a TDR does happen, in a debug build, there's pretty much an endless amount of assertions that trigger. I think we would need an extension to gfxCriticalError to address this...

What type of enhancement?  gfxCriticalNote is something that only prints the first time it's hit, but that's probably not what we want - something more like "temporarily pause gfxCriticalError, we're cleaning things up"?  What did you have in mind?
(In reply to Milan Sreckovic [:milan] from comment #10)
> (In reply to David Anderson [:dvander] from comment #7)
> 
> > When a TDR does happen, in a debug build, there's pretty much an endless amount of assertions that trigger. I think we would need an extension to gfxCriticalError to address this...
> 
> What type of enhancement?  gfxCriticalNote is something that only prints the
> first time it's hit, but that's probably not what we want - something more
> like "temporarily pause gfxCriticalError, we're cleaning things up"?  What
> did you have in mind?

I was thinking of a version that takes an HRESULT and skips MOZ_ASSERT if it's device-removed related.
(In reply to Milan Sreckovic [:milan] from comment #10)
> (In reply to David Anderson [:dvander] from comment #7)
> 
> > When a TDR does happen, in a debug build, there's pretty much an endless amount of assertions that trigger. I think we would need an extension to gfxCriticalError to address this...
> 
> What type of enhancement?  gfxCriticalNote is something that only prints the
> first time it's hit, but that's probably not what we want - something more
> like "temporarily pause gfxCriticalError, we're cleaning things up"?  What
> did you have in mind?

Files where this would help:

  DrawTargetD2D1.cpp
  SourceSurfaceD2D1.cpp
  TextureD3D11.cpp
Created attachment 8699216 [details] [diff] [review]
fix rebased

Bas, it seems like the original fix, which was tested against synthetic resets, works for real TDRs as well. With this patch applied I can't reproduce either the painting or crash problems.

Here's a rebased version - I'd be curious if you could test as well.

I'm not sure if a timer is needed - the fact that we sometimes crash if we wait too long to flush layers is a point in favor of sending a WM_PAINT as soon as we detect the reset.
Attachment #8639550 - Attachment is obsolete: true
Attachment #8639550 - Flags: review?(bas)
Attachment #8699216 - Flags: review?(bas)
Comment on attachment 8699216 [details] [diff] [review]
fix rebased

Review of attachment 8699216 [details] [diff] [review]:
-----------------------------------------------------------------

Drive by.

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +1205,5 @@
> +{
> +  if (!DidRenderingDeviceReset()) {
> +    return false;
> +  }
> +

Is it worth dropping some kind of a message in here, gfxCriticalNoteOnce perhaps, so that we can tell if there has been a handled reset at all?  Or maybe a telemetry probe, counting these?

::: layout/base/nsRefreshDriver.cpp
@@ +1637,5 @@
>  
>    AutoRestore<TimeStamp> restoreTickStart(mTickStart);
>    mTickStart = TimeStamp::Now();
>  
> +  gfxPlatform::GetPlatform()->UpdateForDeviceReset();

How expensive can this call get, when there has been no reset?
(In reply to Milan Sreckovic [:milan] from comment #14)
> Comment on attachment 8699216 [details] [diff] [review]
> fix rebased
> 
> Review of attachment 8699216 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Drive by.
> 
> ::: gfx/thebes/gfxWindowsPlatform.cpp
> @@ +1205,5 @@
> > +{
> > +  if (!DidRenderingDeviceReset()) {
> > +    return false;
> > +  }
> > +
> 
> Is it worth dropping some kind of a message in here, gfxCriticalNoteOnce
> perhaps, so that we can tell if there has been a handled reset at all?  Or
> maybe a telemetry probe, counting these?

Do you mean a reset caught specifically by the refresh driver, versus WM_PAINT? (DidRenderingDeviceReset already has telemetry, but with this patch it'll be lumped in with the previous TDR-catching callsites). Either way a critical note seems fine.

> ::: layout/base/nsRefreshDriver.cpp
> @@ +1637,5 @@
> >  
> >    AutoRestore<TimeStamp> restoreTickStart(mTickStart);
> >    mTickStart = TimeStamp::Now();
> >  
> > +  gfxPlatform::GetPlatform()->UpdateForDeviceReset();
> 
> How expensive can this call get, when there has been no reset?

Bas would know better than I, but since it's just probing the device for a flag I'm hoping it'll be quite fast.
Comment on attachment 8699216 [details] [diff] [review]
fix rebased

Review of attachment 8699216 [details] [diff] [review]:
-----------------------------------------------------------------

At the very least this can do no harm, we can always send -another- one a second later, it's not like WM_PAINT messages are that expensive.

::: layout/base/nsRefreshDriver.cpp
@@ +1637,5 @@
>  
>    AutoRestore<TimeStamp> restoreTickStart(mTickStart);
>    mTickStart = TimeStamp::Now();
>  
> +  gfxPlatform::GetPlatform()->UpdateForDeviceReset();

Basically free, all it does is call DidRenderingDeviceReset.
Attachment #8699216 - Flags: review?(bas) → review+

Comment 19

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c71782d6e6a1
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox46: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.