Closed Bug 1232042 Opened 9 years ago Closed 8 years ago

repainting problems with combinations of maximizing, attaching/detaching (docking) Microsoft Surface Book, and restarting Firefox

Categories

(Core :: Graphics, defect)

x86
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox44 - wontfix
firefox45 + wontfix
firefox46 + wontfix
firefox47 + fixed

People

(Reporter: dbaron, Assigned: bas.schouten)

References

(Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

Attachments

(4 files, 2 obsolete files)

This is a bug that I've observed a few times on a new Microsoft Surface Book running Windows 10.

However, I haven't figured out steps to reproduce the bug.  However, I've determined that the minimum steps that are needed after rebooting the machine are some combination of:
 1. maximizing and unmaximizing Firefox
 2. starting and exiting Firefox (closing the window with the [X]), possibly while maximized
 3. detaching and re-attaching the tablet half of the machine from the laptop

After doing some of these things, we get into a situation where the first repaint after the size change resulting from maximizing / unmaximizing doesn't actually repaint, but then later repaints work.

(That said, I've also seen some graphics card failures (reset by Windows or whatever it is) and one kernel crash during attempts to reproduce this bug.

I showed it to :Bas in action, and he's reasonably convinced that there's something failing in Graphics code.


The machine has two graphics cards:
 1. Intel(R) HD Graphics 520 used when the tablet is detached (or sometimes, for some reasons, while it's attached too)
 2. an NVIDIA card (can't get the details right now since it's not currently being used for some reason), which is used much of the time when the tablet is attached to the keyboard half of the machine
Attached image screenshot #2
(not sure if rotation was involved here)
Attached image screenshot #3
This one definitely didn't involve rotation of the tablet (since the most recent reboot), and also didn't involve going to about:support.
It seems we're running into this with SurfaceBook often enough that we need to figure out what it is.  Bas, do you have access to one of those?
David, any error messages in graphics section of about:support when this happens?
Assignee: nobody → bas
Flags: needinfo?(bas)
Whiteboard: [gfx-noted]
Andrew, this about matches what you see?
Flags: needinfo?(overholt)
Yes, but i haven't pinned down an str yet. I see graphics crashes reported by the OS each time it happens and they often (always?) note the Intel driver.
Flags: needinfo?(overholt)
I've seen the graphics crash popup from the OS once or twice, but far from every time.
(In reply to Milan Sreckovic [:milan] from comment #4)
> It seems we're running into this with SurfaceBook often enough that we need
> to figure out what it is.  Bas, do you have access to one of those?
> David, any error messages in graphics section of about:support when this
> happens?

I do not have access to one of these. But I have heard the stories, and yes, we need to fix this.

I've also seen a lot of instability in the intel driver recently fwiw, but I'm not convinced there's a strong relation between the two. I get the intel driver instability with Edge as well.
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #4)
> David, any error messages in graphics section of about:support when this
> happens?

Not sure what I should be looking for there (or where), but I don't think so.

I just saw this without any docking/undocking.  I'd been using flightradar24 in fullscreen, and tried to exit fullscreen, but hit Ctrl+PgDn first (which switched tabs to google calendar), and then F11 to un-fullscreen (the difference between the two is just Ctrl vs. Fn), and then the UI above the tab strip failed to paint on that one repaint (but then repainted later as I took further actions).
[Tracking Requested - why for this release]: We now have a standalone application that reproduces this problem, and we've sent the bug report to Microsoft.  In the meantime, Bas is testing a workaround.  We may want to consider it for full uplifts, especially since last weeks Windows update seems to have made this somewhat worse (although the reports are sketchy.)
Attachment #8712616 - Flags: review?(jmuizelaar)
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

https://reviewboard.mozilla.org/r/32589/#review29451

::: gfx/ipc/GfxMessageUtils.h:791
(Diff revision 1)
> +    WriteParam(aMsg, (void*)aParam.mSwapChain);

Can you specialize ParamTraits for mSwapChain instead of casting to void*?

It also seems like this could cause a use after free because a reference count is not being sent over IPC.

Can we explicitly hold a reference in both places instead of relying on some implicit ownership scheme?
Tracking for 45+, since this sounds like a bad issue for Windows users and we may want to uplift a fix.
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32589/diff/1-2/
Attachment #8712616 - Flags: review?(jmuizelaar)
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

https://reviewboard.mozilla.org/r/32589/#review29781

::: gfx/ipc/GfxMessageUtils.h:790
(Diff revisions 1 - 2)
> +    MOZ_ASSERT(XRE_IsParentProcess());

I believe these should take a reference to aParam and then move it into aResult.

::: gfx/ipc/GfxMessageUtils.h:796
(Diff revisions 1 - 2)
> +    uint32_t supportedBlendModes = 0;

I don't think supportedBlendModes is supposed to be here and result should probably default to false.

::: gfx/layers/Compositor.cpp:30
(Diff revision 2)
> +TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther)

This should be:
TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther) = default;

same for the copy constructor.
Attachment #8712616 - Flags: review?(jmuizelaar)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #15)
> Comment on attachment 8712616 [details]
> MozReview Request: Bug 1232042: Workaround Windows presentation bug by
> executing a present call on the main thread during a WM_PAINT event.
> r=jrmuizel
> 
> https://reviewboard.mozilla.org/r/32589/#review29781
> 
> ::: gfx/ipc/GfxMessageUtils.h:790
> (Diff revisions 1 - 2)
> > +    MOZ_ASSERT(XRE_IsParentProcess());
> 
> I believe these should take a reference to aParam and then move it into
> aResult.

They can't, this is also sent to the child process, which then can't release the reference. There's no way to check from here whether we're sending to another thread or another process as far as I could tell. I could add a check to only send this if we're in a sync call, which is something we -can- check from here.

> ::: gfx/ipc/GfxMessageUtils.h:796
> (Diff revisions 1 - 2)
> > +    uint32_t supportedBlendModes = 0;
> 
> I don't think supportedBlendModes is supposed to be here and result should
> probably default to false.

Copy paste fail. No, it shouldn't, if we're not in the main process, i.e. a content process that's perfectly fine, we just don't want the content process to have a dangling pointer here. (A content process main thread will never have an nsWindow so it will never need this pointer)

> ::: gfx/layers/Compositor.cpp:30
> (Diff revision 2)
> > +TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther)
> 
> This should be:
> TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther)
> = default;
> 
> same for the copy constructor.

Hrm, doesn't seem to make any difference to the compiled code in VS2015. But sure :).
Flags: needinfo?(jmuizelaar)
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32589/diff/2-3/
Attachment #8712616 - Flags: review?(jmuizelaar)
Attachment #8712616 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

https://reviewboard.mozilla.org/r/32589/#review29987

:(

::: gfx/layers/Compositor.cpp:42
(Diff revision 2)
> +}



::: gfx/layers/CompositorTypes.h:175
(Diff revision 3)
> +#endif

Add a comment here about this member being needed for passing the swap chain to the main thread so that we can work around bugXXXXX
And one assert about "D3D11 swap chain preset failed 0x887a0005" in https://treeherder.mozilla.org/logviewer.html#?job_id=21212984&repo=mozilla-inbound
Flags: needinfo?(jmuizelaar)
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32589/diff/3-4/
Let's try this. This is the only idea I've got.
Valid but not critical enough to warrant (inclusion in) a dot release. Wontfix for Fx44.
Blocks: 1245972
https://hg.mozilla.org/mozilla-central/rev/3dce92c9562d
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Bas, seems like this is something that we might just let ride the train with 47. Wontfixing for 45 and 46.  

I thought I'd check with you and dbaron though; please let me know if it's something you would like to try uplifting.
Flags: needinfo?(dbaron)
Flags: needinfo?(bas)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #28)
> Bas, seems like this is something that we might just let ride the train with
> 47. Wontfixing for 45 and 46.  
> 
> I thought I'd check with you and dbaron though; please let me know if it's
> something you would like to try uplifting.

Tricky. It seems this might fix some other issues people have been having, it might be worth uplifting it to Aurora at least.
Flags: needinfo?(bas)
Missed seeing it the first time, but both times this has apparently caused... something, like a driver bluescreen maybe, which causes the Win7 slaves to disconnect (that being the only visible symptom) and stop working until it is rebooted - the blue things in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&filter-searchStr=ccd5c97a295aafb6dd47e1855ac4b66d587c26ca&tochange=70b70f6fc6a7&fromchange=78fe845914c0 are all slaves that I then had to reboot to get working again.
This apparently is causing Windows 7 machines that run it to break and become unusable.

Backed out in https://hg.mozilla.org/mozilla-central/rev/0629918a09ae
Status: RESOLVED → REOPENED
Flags: needinfo?(bas)
Resolution: FIXED → ---
I don't think I have anything to add here (or any relevant expertise).
Flags: needinfo?(dbaron)
(In reply to Phil Ringnalda (:philor) from comment #30)
> Missed seeing it the first time, but both times this has apparently
> caused... something, like a driver bluescreen maybe, which causes the Win7
> slaves to disconnect (that being the only visible symptom) and stop working
> until it is rebooted - the blue things in
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&group_state=expanded&filter-
> searchStr=ccd5c97a295aafb6dd47e1855ac4b66d587c26ca&tochange=70b70f6fc6a7&from
> change=78fe845914c0 are all slaves that I then had to reboot to get working
> again.

Do we have any report of this occurring outside of our test machines?
Flags: needinfo?(bas)
My backout of this patch apparently caused bug 1249512.
Blocks: 1249512
Comment on attachment 8721482 [details] [diff] [review]
Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event.

Review of attachment 8721482 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/CompositorD3D11.cpp
@@ +1145,5 @@
> +    if (::GetTickCount() > waitStart + 10000) {
> +      gfxDevCrash(LogReason::PresentNeverOccured);
> +      break;
> +    }
> +  }

Change this to avoid calling ::GetTickCount() in the first place if mSyncState is ::Free?

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +2409,5 @@
>  
> +    RefPtr<ID3D10Multithread> multi;
> +    HRESULT hr = mD3D11Device->QueryInterface(__uuidof(ID3D10Multithread), getter_AddRefs(multi));
> +    if (SUCCEEDED(hr) && multi) {
> +      multi->SetMultithreadProtected(TRUE);

I'm assuming this is on purpose and not part of a debugging session - can we put it behind a pref, just in case?

::: widget/windows/nsWindowGfx.cpp
@@ +518,5 @@
>        case LayersBackend::LAYERS_CLIENT:
> +        {
> +          ClientLayerManager* clientLM =
> +            static_cast<ClientLayerManager*>(GetLayerManager());
> +          clientLM->PrepareForPresentedComposite();

ClientLayerManager is guaranteed to exist?
Attachment #8721482 - Flags: review?(milan)
Attachment #8721482 - Flags: review?(jmuizelaar)
Attachment #8721482 - Flags: feedback+
Attachment #8721482 - Attachment is obsolete: true
Attachment #8721482 - Flags: review?(jmuizelaar)
Attachment #8722620 - Flags: review?(jmuizelaar)
Comment on attachment 8722620 [details] [diff] [review]
Execute an additional present on the compositor thread when a WM_PAINT event returns

Review of attachment 8722620 [details] [diff] [review]:
-----------------------------------------------------------------

Please add some more comments about why we need this. CompositorD3D11.h and nsWindowGfx are probably the best places to put this.
Attachment #8722620 - Flags: review?(jmuizelaar) → review+
https://hg.mozilla.org/mozilla-central/rev/246e0ff966fc
https://hg.mozilla.org/mozilla-central/rev/68295c584858
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Depends on: 1251778
Depends on: 1256086
Depends on: 1256547
Depends on: 1256728
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: