repainting problems with combinations of maximizing, attaching/detaching (docking) Microsoft Surface Book, and restarting Firefox

RESOLVED FIXED in Firefox 47

Status

()

Core
Graphics
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: dbaron, Assigned: bas)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

unspecified
mozilla47
x86
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox44- wontfix, firefox45+ wontfix, firefox46+ wontfix, firefox47+ fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

2 years ago
This is a bug that I've observed a few times on a new Microsoft Surface Book running Windows 10.

However, I haven't figured out steps to reproduce the bug.  However, I've determined that the minimum steps that are needed after rebooting the machine are some combination of:
 1. maximizing and unmaximizing Firefox
 2. starting and exiting Firefox (closing the window with the [X]), possibly while maximized
 3. detaching and re-attaching the tablet half of the machine from the laptop

After doing some of these things, we get into a situation where the first repaint after the size change resulting from maximizing / unmaximizing doesn't actually repaint, but then later repaints work.

(That said, I've also seen some graphics card failures (reset by Windows or whatever it is) and one kernel crash during attempts to reproduce this bug.

I showed it to :Bas in action, and he's reasonably convinced that there's something failing in Graphics code.


The machine has two graphics cards:
 1. Intel(R) HD Graphics 520 used when the tablet is detached (or sometimes, for some reasons, while it's attached too)
 2. an NVIDIA card (can't get the details right now since it's not currently being used for some reason), which is used much of the time when the tablet is attached to the keyboard half of the machine
(Reporter)

Comment 1

2 years ago
Created attachment 8697686 [details]
screenshot #1 (involving rotation as well)
(Reporter)

Comment 2

2 years ago
Created attachment 8697687 [details]
screenshot #2

(not sure if rotation was involved here)
(Reporter)

Comment 3

2 years ago
Created attachment 8697688 [details]
screenshot #3

This one definitely didn't involve rotation of the tablet (since the most recent reboot), and also didn't involve going to about:support.
It seems we're running into this with SurfaceBook often enough that we need to figure out what it is.  Bas, do you have access to one of those?
David, any error messages in graphics section of about:support when this happens?
Assignee: nobody → bas
Flags: needinfo?(bas)
Whiteboard: [gfx-noted]
Andrew, this about matches what you see?
Flags: needinfo?(overholt)
Yes, but i haven't pinned down an str yet. I see graphics crashes reported by the OS each time it happens and they often (always?) note the Intel driver.
Flags: needinfo?(overholt)
(Reporter)

Comment 7

2 years ago
I've seen the graphics crash popup from the OS once or twice, but far from every time.
(Assignee)

Comment 8

2 years ago
(In reply to Milan Sreckovic [:milan] from comment #4)
> It seems we're running into this with SurfaceBook often enough that we need
> to figure out what it is.  Bas, do you have access to one of those?
> David, any error messages in graphics section of about:support when this
> happens?

I do not have access to one of these. But I have heard the stories, and yes, we need to fix this.

I've also seen a lot of instability in the intel driver recently fwiw, but I'm not convinced there's a strong relation between the two. I get the intel driver instability with Edge as well.
Flags: needinfo?(bas)
(Reporter)

Comment 9

a year ago
(In reply to Milan Sreckovic [:milan] from comment #4)
> David, any error messages in graphics section of about:support when this
> happens?

Not sure what I should be looking for there (or where), but I don't think so.

I just saw this without any docking/undocking.  I'd been using flightradar24 in fullscreen, and tried to exit fullscreen, but hit Ctrl+PgDn first (which switched tabs to google calendar), and then F11 to un-fullscreen (the difference between the two is just Ctrl vs. Fn), and then the UI above the tab strip failed to paint on that one repaint (but then repainted later as I took further actions).
Blocks: 1242622
[Tracking Requested - why for this release]: We now have a standalone application that reproduces this problem, and we've sent the bug report to Microsoft.  In the meantime, Bas is testing a workaround.  We may want to consider it for full uplifts, especially since last weeks Windows update seems to have made this somewhat worse (although the reports are sketchy.)
tracking-firefox44: --- → ?
tracking-firefox45: --- → ?
tracking-firefox46: --- → ?
tracking-firefox47: --- → ?
(Assignee)

Comment 11

a year ago
Created attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

Review commit: https://reviewboard.mozilla.org/r/32589/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/32589/
Attachment #8712616 - Flags: review?(jmuizelaar)
Attachment #8712616 - Flags: review?(jmuizelaar)
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

https://reviewboard.mozilla.org/r/32589/#review29451

::: gfx/ipc/GfxMessageUtils.h:791
(Diff revision 1)
> +    WriteParam(aMsg, (void*)aParam.mSwapChain);

Can you specialize ParamTraits for mSwapChain instead of casting to void*?

It also seems like this could cause a use after free because a reference count is not being sent over IPC.

Can we explicitly hold a reference in both places instead of relying on some implicit ownership scheme?

Updated

a year ago
status-firefox44: --- → affected
status-firefox45: --- → affected
status-firefox46: --- → affected
status-firefox47: --- → affected
Tracking for 45+, since this sounds like a bad issue for Windows users and we may want to uplift a fix.
tracking-firefox45: ? → +
tracking-firefox46: ? → +
tracking-firefox47: ? → +
(Assignee)

Comment 14

a year ago
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32589/diff/1-2/
Attachment #8712616 - Flags: review?(jmuizelaar)
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

https://reviewboard.mozilla.org/r/32589/#review29781

::: gfx/ipc/GfxMessageUtils.h:790
(Diff revisions 1 - 2)
> +    MOZ_ASSERT(XRE_IsParentProcess());

I believe these should take a reference to aParam and then move it into aResult.

::: gfx/ipc/GfxMessageUtils.h:796
(Diff revisions 1 - 2)
> +    uint32_t supportedBlendModes = 0;

I don't think supportedBlendModes is supposed to be here and result should probably default to false.

::: gfx/layers/Compositor.cpp:30
(Diff revision 2)
> +TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther)

This should be:
TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther) = default;

same for the copy constructor.
Attachment #8712616 - Flags: review?(jmuizelaar)
(Assignee)

Comment 16

a year ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #15)
> Comment on attachment 8712616 [details]
> MozReview Request: Bug 1232042: Workaround Windows presentation bug by
> executing a present call on the main thread during a WM_PAINT event.
> r=jrmuizel
> 
> https://reviewboard.mozilla.org/r/32589/#review29781
> 
> ::: gfx/ipc/GfxMessageUtils.h:790
> (Diff revisions 1 - 2)
> > +    MOZ_ASSERT(XRE_IsParentProcess());
> 
> I believe these should take a reference to aParam and then move it into
> aResult.

They can't, this is also sent to the child process, which then can't release the reference. There's no way to check from here whether we're sending to another thread or another process as far as I could tell. I could add a check to only send this if we're in a sync call, which is something we -can- check from here.

> ::: gfx/ipc/GfxMessageUtils.h:796
> (Diff revisions 1 - 2)
> > +    uint32_t supportedBlendModes = 0;
> 
> I don't think supportedBlendModes is supposed to be here and result should
> probably default to false.

Copy paste fail. No, it shouldn't, if we're not in the main process, i.e. a content process that's perfectly fine, we just don't want the content process to have a dangling pointer here. (A content process main thread will never have an nsWindow so it will never need this pointer)

> ::: gfx/layers/Compositor.cpp:30
> (Diff revision 2)
> > +TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther)
> 
> This should be:
> TextureFactoryIdentifier::operator=(const TextureFactoryIdentifier& aOther)
> = default;
> 
> same for the copy constructor.

Hrm, doesn't seem to make any difference to the compiled code in VS2015. But sure :).
Flags: needinfo?(jmuizelaar)
(Assignee)

Comment 17

a year ago
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32589/diff/2-3/
Attachment #8712616 - Flags: review?(jmuizelaar)
Attachment #8712616 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

https://reviewboard.mozilla.org/r/32589/#review29987

:(

::: gfx/layers/Compositor.cpp:42
(Diff revision 2)
> +}



::: gfx/layers/CompositorTypes.h:175
(Diff revision 3)
> +#endif

Add a comment here about this member being needed for passing the swap chain to the main thread so that we can work around bugXXXXX

Comment 19

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/aeceae3e0e96
Curiouser and curiouser. Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/31373f8aaa98 for giving Win8 around a 10% chance of crashing during the addon manager browser-chrome tests:

https://treeherder.mozilla.org/logviewer.html#?job_id=21209334&repo=mozilla-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=21199549&repo=mozilla-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=21204017&repo=mozilla-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=21204528&repo=mozilla-inbound
And one assert about "D3D11 swap chain preset failed 0x887a0005" in https://treeherder.mozilla.org/logviewer.html#?job_id=21212984&repo=mozilla-inbound
Flags: needinfo?(jmuizelaar)
Duplicate of this bug: 1236152
(Assignee)

Comment 23

a year ago
Comment on attachment 8712616 [details]
MozReview Request: Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event. r=jrmuizel

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32589/diff/3-4/
(Assignee)

Comment 24

a year ago
Let's try this. This is the only idea I've got.

Comment 25

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/3dce92c9562d
Valid but not critical enough to warrant (inclusion in) a dot release. Wontfix for Fx44.
status-firefox44: affected → wontfix
tracking-firefox44: ? → -

Updated

a year ago
Blocks: 1245972

Comment 27

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/3dce92c9562d
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox47: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Bas, seems like this is something that we might just let ride the train with 47. Wontfixing for 45 and 46.  

I thought I'd check with you and dbaron though; please let me know if it's something you would like to try uplifting.
status-firefox45: affected → wontfix
status-firefox46: affected → wontfix
Flags: needinfo?(dbaron)
Flags: needinfo?(bas)
(Assignee)

Comment 29

a year ago
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #28)
> Bas, seems like this is something that we might just let ride the train with
> 47. Wontfixing for 45 and 46.  
> 
> I thought I'd check with you and dbaron though; please let me know if it's
> something you would like to try uplifting.

Tricky. It seems this might fix some other issues people have been having, it might be worth uplifting it to Aurora at least.
Flags: needinfo?(bas)
Missed seeing it the first time, but both times this has apparently caused... something, like a driver bluescreen maybe, which causes the Win7 slaves to disconnect (that being the only visible symptom) and stop working until it is rebooted - the blue things in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&filter-searchStr=ccd5c97a295aafb6dd47e1855ac4b66d587c26ca&tochange=70b70f6fc6a7&fromchange=78fe845914c0 are all slaves that I then had to reboot to get working again.
This apparently is causing Windows 7 machines that run it to break and become unusable.

Backed out in https://hg.mozilla.org/mozilla-central/rev/0629918a09ae
Status: RESOLVED → REOPENED
Flags: needinfo?(bas)
Resolution: FIXED → ---
(Reporter)

Comment 32

a year ago
I don't think I have anything to add here (or any relevant expertise).
Flags: needinfo?(dbaron)
(Assignee)

Comment 33

a year ago
(In reply to Phil Ringnalda (:philor) from comment #30)
> Missed seeing it the first time, but both times this has apparently
> caused... something, like a driver bluescreen maybe, which causes the Win7
> slaves to disconnect (that being the only visible symptom) and stop working
> until it is rebooted - the blue things in
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&group_state=expanded&filter-
> searchStr=ccd5c97a295aafb6dd47e1855ac4b66d587c26ca&tochange=70b70f6fc6a7&from
> change=78fe845914c0 are all slaves that I then had to reboot to get working
> again.

Do we have any report of this occurring outside of our test machines?
Flags: needinfo?(bas)
This regressed tresize on windows:

https://treeherder.allizom.org/perf.html#/graphs?series=[mozilla-inbound,4ac681a39a4caefb56468c5bc86fa23b8cee4c4f,1]&zoom=1455204874546.0342,1455252525044.0476,11.619847807106572,12.372231838139383&selected=[mozilla-inbound,4ac681a39a4caefb56468c5bc86fa23b8cee4c4f,19235,18942494]

We didn't get an alert when it happened because there was some noise when it was committed, but we did get notified of the improvement when it was backed out:

https://treeherder.allizom.org/perf.html#/graphs?series=[mozilla-inbound,4ac681a39a4caefb56468c5bc86fa23b8cee4c4f,1]&zoom=1455624591937.6392,1455765760801.7817,11.567843809446881,12.506505519484055&selected=[mozilla-inbound,4ac681a39a4caefb56468c5bc86fa23b8cee4c4f,19468,19244849]

It's about a 2% regression overall. Not huge but it would be good to mitigate this in any updated patch if possible.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8316f76d9e26
My backout of this patch apparently caused bug 1249512.
Blocks: 1249512
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a4583eedadd
(Assignee)

Comment 38

a year ago
Created attachment 8721482 [details] [diff] [review]
Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event.
Attachment #8712616 - Attachment is obsolete: true
Attachment #8721482 - Flags: review?(milan)
Comment on attachment 8721482 [details] [diff] [review]
Bug 1232042: Workaround Windows presentation bug by executing a present call on the main thread during a WM_PAINT event.

Review of attachment 8721482 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/CompositorD3D11.cpp
@@ +1145,5 @@
> +    if (::GetTickCount() > waitStart + 10000) {
> +      gfxDevCrash(LogReason::PresentNeverOccured);
> +      break;
> +    }
> +  }

Change this to avoid calling ::GetTickCount() in the first place if mSyncState is ::Free?

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +2409,5 @@
>  
> +    RefPtr<ID3D10Multithread> multi;
> +    HRESULT hr = mD3D11Device->QueryInterface(__uuidof(ID3D10Multithread), getter_AddRefs(multi));
> +    if (SUCCEEDED(hr) && multi) {
> +      multi->SetMultithreadProtected(TRUE);

I'm assuming this is on purpose and not part of a debugging session - can we put it behind a pref, just in case?

::: widget/windows/nsWindowGfx.cpp
@@ +518,5 @@
>        case LayersBackend::LAYERS_CLIENT:
> +        {
> +          ClientLayerManager* clientLM =
> +            static_cast<ClientLayerManager*>(GetLayerManager());
> +          clientLM->PrepareForPresentedComposite();

ClientLayerManager is guaranteed to exist?
Attachment #8721482 - Flags: review?(milan)
Attachment #8721482 - Flags: review?(jmuizelaar)
Attachment #8721482 - Flags: feedback+
(Assignee)

Comment 40

a year ago
Created attachment 8722620 [details] [diff] [review]
Execute an additional present on the compositor thread when a WM_PAINT event returns
Attachment #8721482 - Attachment is obsolete: true
Attachment #8721482 - Flags: review?(jmuizelaar)
Attachment #8722620 - Flags: review?(jmuizelaar)
Comment on attachment 8722620 [details] [diff] [review]
Execute an additional present on the compositor thread when a WM_PAINT event returns

Review of attachment 8722620 [details] [diff] [review]:
-----------------------------------------------------------------

Please add some more comments about why we need this. CompositorD3D11.h and nsWindowGfx are probably the best places to put this.
Attachment #8722620 - Flags: review?(jmuizelaar) → review+

Comment 42

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/6a24b31747be
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/640eb13b730c - opt: https://treeherder.mozilla.org/logviewer.html#?job_id=22247086&repo=mozilla-inbound, debug: https://treeherder.mozilla.org/logviewer.html#?job_id=22246872&repo=mozilla-inbound

Comment 44

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/246e0ff966fc

Comment 45

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/68295c584858

Comment 46

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/246e0ff966fc
https://hg.mozilla.org/mozilla-central/rev/68295c584858
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED
Depends on: 1251778

Updated

a year ago
Depends on: 1256086

Updated

a year ago
Depends on: 1256547

Updated

a year ago
Depends on: 1256728
You need to log in before you can comment on or make changes to this bug.