Closed Bug 1573710 Opened 3 months ago Closed 11 days ago

Download dialog is blank (GeForce RTX 2080 SUPER)

Categories

(Core :: Graphics: WebRender, defect, P3)

68 Branch
x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 --- disabled
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fixed
firefox72 --- fixed

People

(Reporter: yoasif, Assigned: sotaro)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(8 files, 3 obsolete files)

Attached image aw7Jkiw.png

Reported on reddit: https://www.reddit.com/r/firefox/comments/cpztp1/download_confirmation_window_is_whiteblank/

User says:

Just set up a new computer and have noticed issues with Firefox where, more often than not, the download window will appear blank and I have to try a second time in order for it to work.

This is on 68.0.1 (64-bit) with Windows 10 64-bit Pro.

User confirms that disabling WebRender resolves the issue for them.

Attached file about:support
OS: Unspecified → Windows 10

Could not reproduce with 68.0.1/Win10/GTX1060. (but bug 1517472 is still reproducible)

Download source from screenshot:
https://bdcraft.net/downloads/purebdcraft-minecraft/ > "Accept Cookies" > "Click here to accept the Terms of Use" > "MC1.12" > "256x" > "Download 256x" > "Download with Ad".

from comment 1:

Launcher Process: Disabled due to failure

bitdefender.com

Keywords: regression
Hardware: Unspecified → x86_64
Summary: Download dialog is blank → Download dialog is blank (GeForce RTX 2080 SUPER)

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

Might be hard to track down without more specifics. Asif - can you found out from the person who reported this bug what apps they were trying to download when this happened?

Blocks: wr-70
Flags: needinfo?(yoasif)
Priority: -- → P3

I think darkspirit provided that in comment 2.

Flags: needinfo?(yoasif)

More than one to test would be nice! Reporter seemed to indicate it happened in more than one place

Followed up in the reddit thread -- unfortunately, people sometimes ignore requests for a follow up. Hope we can get traction on this.

User says they saw the issue on downloads for these pages:

https://www.msi.com/page/afterburner
https://www.techpowerup.com/gpuz/
https://www.cpuid.com/softwares/cpu-z.html

They also say that they are not seeing the issue on Nightly.

It's hard to reproduce and it occurs rarely but still there.

Though this is not 100% reproducible,

  1. Go to https://archive.mozilla.org/pub/firefox/nightly/2019/08/2019-08-17-21-50-27-mozilla-central/
  2. Click .exe or .tar.gz or .mar or .zip or .bz2
  3. Cancel if dialog displayed normally
  4. Repeat step.2-3 (maybe 10-50 times)
Attached image (screenshot)
Attached file about:support

Happy to take a patch for 70 or beyond.
Since we are getting close to the end of the 69 beta cycle and this is set to P3, I'm marking it fix-optional for 69 and 70 to remove it from weekly triage.

Hey Nical - can you try and repro this?

Flags: needinfo?(nical.bugzilla)

I am unable to rerpoduce this on the older nvidia GPU I have handy. Is there a GeForce RTX 2080 available in the Toronto office?

Flags: needinfo?(nical.bugzilla)

Per IRC, seems like it is also happening here: https://www.fosshub.com/HWiNFO.html

I'm also seeing this happen when clicking the "Create a New Profile" button on the about:profiles page. At least for me, its happening about 1 in every 20 clicks of the button on that page. So, this may not be limited to just the download dialog. The button is currently broken, but should work again in the next nightly with Bug 1583114.

Is there anything I can do to capture some information about the blank window that would be helpful? about:support doesn't report any errors.

(In reply to Trevor Rowbotham [:rowbot] from comment #16)

Is there anything I can do to capture some information about the blank window that would be helpful? about:support doesn't report any errors.

What GPU do you have?

Flags: needinfo?(trevor.rowbotham)

RTX 2080, though its not a SUPER model. I'll see if I can repo the create profile button thing on my GTX 770.

Flags: needinfo?(trevor.rowbotham)
Attached image blank_window.png

I can repro the blank window with the create profile button on my GTX 770 as well. It looks like things are actually being drawn, but at the wrong coordinates based on the attached picture.

Flags: needinfo?(aosmond)

Tried the steps out in comment 9 https://bugzilla.mozilla.org/show_bug.cgi?id=1573710#c9 - was not able to get it to reproduce. Also tried the 'Create a new profile' button on about:profiles. Clicked it around 20 times, dialogue showed up correctly each time.

Computer I was trying on had same driver as https://bugzilla.mozilla.org/show_bug.cgi?id=1573710#c1

Can you share the driver version you are using?

Flags: needinfo?(trevor.rowbotham)

My driver version is 436.60 (the DCH version if it matters), which is the latest.

Flags: needinfo?(trevor.rowbotham)

I've been able to reproduce this on our windows rig with a GT 730 (investigating now).

Flags: needinfo?(aosmond) → needinfo?(a.beingessner)
Attached file about:support
It also happens for me. It also happens after reinstalling Windows 10 (because of other reasons...).

Because this happens in little adhoc windows it's a bit harder to debug; gonna take a bit to find the exact culprit.

Assignee: nobody → a.beingessner
Flags: needinfo?(a.beingessner)
See Also: → 1587223

Some notes from today's investigation using Visual Studio's Spy++ tool:

  • All of gecko's/webrender's state seems fully coherent. Everyone knows sizes and positions perfectly well.
  • The download dialog is actually two WNDs (HANDLEs)
    • The title bar is the actual nsWindow (eWindowType_dialog)
    • The content is a WinCompositorWidget (MozillaCompositorWindowClass)
      • When the glitch occurs, this window has a size of 1x1 (which is what it is initialized to)
      • However when I print-debug UpdateCompositorWindowSizeIfNecessary the OS SetWindowPos call succeeded with the proper (non-1x1) values!!

This is as far as I got, but it indicates something very strange is happening. Perhaps we're hitting a straight-up OS bug. Perhaps we're racing with another syscall on this WND? Not sure, I'll look into this tomorrow, unless Sotaro knows about well-known issue that could cause this.

What threads are the window interactions happening on? I wonder if we could be racing? It reminds me a bit of bug 1157784 where we had a race between Present and SetWindowLong(mWnd, GWL_STYLE, style): https://bug1157784.bmoattachments.org/attachment.cgi?id=8605355

See Also: → 1586130
See Also: → 1588690

Hey sotaro, it seems like all this code is your domain of expertese, would you be able to figure out what the right fix is for this? Here's my final investigation notes, including a very hacky fix that basically tries to work around the bug. Unfortunately, the bug is fundamentally non-deterministic so I can't ever be certain my hacky fix works.


When we open a download dialog, it sometimes shows up as blank. However this happens in two different ways:

  1. totally blank, mCompositorWnd client-rect == (0,0) 1x1
  2. global position, mCompoistorWnd client-rect ~= (0,0) 400x150

I believe both are fundamentally caused by a race between the CompositorChild calling SetParent(mCompostorWnd, mWnd) and the RendererThread calling SetPosition(mCompositorWnd, newClientRect). (See diagram below for terms and methods)

In Case 1, the window is completely messed up. It still has the dummy bounds we initialized it with, suggesting that even though SetPosition succeeded, it was "undone". We fail to detect this however, because we only query mWnd's bounds for if an update needs to occur. We do not expect mCompositorWnd's size to change without our explicit request (seemingly reasonable).

In Case 2, mCompositorWindow has the right size, but it has been positioned at absolute (0,0) instead of relative (0,0). Or, to put it another way, it's relative to mInitialWnd instead of mWnd. However the delta between mWnd and mInitialWnd is "remembered" by the OS, so when we move the download dialog the contents remain at this fixed offset (up and to the left). If you get the download dialog to spawn near the top-left corner you can easily see this happening, as the window's contents will be peeking out of the top-left corner (because they have only a small negative offset).

Case 2 I believe to be some sort of issue with SWP_NOMOVE, where reparenting from mInitialParent at (0,0) to mWnd at (x,y) is kind of a move?

I appear to be able to fix Case 1 by checking that GetParent(mCompositorWnd) == mWnd before calling SetPosition. Indeed, based on my logging we very frequently reach SetPosition while this check fails (although it doesn't break every time). However this does no fix Case 2. To do that, I also had to remove SWP_NOMOVE from SetPosition.

These are of course hacks: ideally we would just properly synchronize access to avoid the issue. Although it's unclear Case 2 can actually be fixed by synchronization?


Here's a basic diagram of the code flow and the threads involved, as I understand it. As you can see, there seems to be a race between SetParent and SetPosition. (Might want to copy into a text editor, very wide)


                      mWnd = Real Window
            mInitialParent = Dummy Parent of mCompositorWnd
            mCompositorWnd = Window's Contents (compositor surface)
                    widget = All 3




                        CompositorParent                                    |                   WinCompositorWindowThread                               |           CompositorChild                 |       
----------------------------------------------------------------------------+---------------------------------------------------------------------------+-------------------------------------------+
    AllocPWebRenderBridgeParent                                             |                                                                           |                                           |
        EnsureCompositorWindow                                              |                                                                           |                                           |
            CreateCompositorWindow                                          |                                                                           |                                           |
                WinCompositorWindowThread::CreateCompositorWindow  ----- runnable ---> mIntialParent = !::CreateWindowEx(null, 0, 0, 1, 1)              |                                           |
                (blocking)                                                  |          mCompositorWnd = !::CreateWindowEx(mInitialParent, 0, 0, 1, 1)   |                                           |
                                                                    <----- done -------                                                                 |                                           |
                                                                            |                                                                           |                                           |
            UpdateCompositorWnd(mCompositorWnd, mWnd)    ------- async -----|---------------------------------------------------------------------------|---> RecvUpdateCompositorWnd               |
        WebRenderAPI::Create(widget) ~~~>                                   |                                                                           |       !::SetParent(mCompositorWnd, mWnd)  |
            (Sync-Creates the Renderer(widget) on the RendererThread)       |                                                                           |                                           |
        AsyncPipelineManager(widget) ~~~>                                   |                                                                           |                                           |
        WebRenderBridgeParent(widget) ~~~>                                  |                                                                           |                                           |
                                                                            |                                                                           |                                           |





                                                     |                      RendererThread                        |
                                                     +------------------------------------------------------------+
                                              ??? ~~~|~> UpdateAndRender                                          | 
                                                     |      RendererOGL::UpdateAndRender                          |
                                                     |          RendererCompositorANGLE::BeginFrame               |
                                                     |             UpdateCompositorWndSizeIfNecessary             |
                                                     |                              <------------------------------------ fix #1 - add: if (GetParent(mCompositorWnd) != mWnd) { return; }
                                                     |                  rect = !::GetClientRect(mWnd)             |
                                                     |                  if rect != prevRect:                      |
                                                     |                      !::SetPosition(mCompositorWnd, rect) <------- fix #2 - remove: SWP_NOMOVE (because we do move relative to parent)
                                                     |                      prevRect = rect                       |
                                                     |                                                            |





Flags: needinfo?(sotaro.ikeda.g)
Attached image race-diagram.png

Here's that ascii diagram as an image so it's a bit easier to resize

Hi :Gankra. Great investigation!
Yea, it seems that there is a race condition exists between SetParent() in UI process and SetPosition() on RenderThread in GPU process. I could reproduce the problem on my one low-end Win10 PC and investigated the problem with printf_stderr().

Both on [1] and [2] cases, SetPosition() was succeeded. Further SetParent() and SetPosition() seemed to happen concurrently.

When [1] happened,the sequence was like the following
-(1) SetParent() called in UI process
-(2) Entered WinCompositorWidget::UpdateCompositorWndSizeIfNecessary()
-(3) Parent of compositor window was still initial parent window
-(4) SetWindowPos() with (0, 0, 432, 165) succeeded
-(5) WM_NCCALCSIZE with client rect(0, 0, 432, 165) event happened on WinCompositorWindowThread
-(6) WM_NCCALCSIZE with client rect(0, 0, 1, 1) event happened on WinCompositorWindowThread
-(7) After SetWindowPos(), GetClientRect() with compositor window returned rect(0, 0, 1, 1)
-(8) Exited SetParent() in UI process

When [2] happened,the sequence was like the following. It was hard to reproduce [2] .
-(1) SetParent() called in UI process
-(2) Entered WinCompositorWidget::UpdateCompositorWndSizeIfNecessary()
-(3) Parent of compositor window was still initial parent window
-(4) SetWindowPos() with (0, 0, 432, 313) succeeded
-(5) WM_NCCALCSIZE with client rect(0, 0, 1, 1) event happened on WinCompositorWindowThread
-(6) WM_NCCALCSIZE with client rect(-404, -221, 28, 92) event happened on WinCompositorWindowThread
-(7) After SetWindowPos(), ::GetClientRect() with compositor window returned rect(0, 0, 432, 313)
-(8) Exited SetParent() in UI process
-(9) WM_NCCALCSIZE with client rect(-404, -221, 28, 67) event happened on WinCompositorWindowThread

From the above, it seems that we could suppress the problem by "GetParent(mCompositorWnd) == mWnd" as you mentioned.
And from the [2] sequence, there is a case that client rect of compositor window becomes strange by OS.
Therefore removing SWP_NOMOVE seems reasonable. And It seems better to remove WinCompositorWidget::mLastCompositorWndSize usage.
It assumes that a window position by SetWindowPos() does not change. But size of client rect is changed by OS at unexpected timing.

And as an option, we might want to trigger WR rendering when SetParent() was completed in UI process.

Flags: needinfo?(sotaro.ikeda.g)

Here's the "jank" fix, if you want to go forward with it. Otherwise, should I just reassign the bug to you?

(In reply to Alexis Beingessner [:Gankra] from comment #34)

Here's the "jank" fix, if you want to go forward with it. Otherwise, should I just reassign the bug to you?

It looks good as a workaround :) If we needs to do more, we could handle it in a new bug.

(In reply to Alexis Beingessner [:Gankra] from comment #35)

Also here's a try build in case anyone wants to test it:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac8ea7e4b32d4641b88f1301fbaa63b34e039651

The problem was addressed for me.

Pushed by abeingessner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f6ca531ca5b2
make window resizing more robust to races. r=sotaro
Status: NEW → RESOLVED
Closed: 25 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
Regressions: 1589875
Regressions: 1589872
Regressions: 1589902

Backed out changeset f6ca531ca5b2 (bug 1573710) requested by alice0775. a=backout

Backout link: https://hg.mozilla.org/mozilla-central/rev/11d13005c8767500d6a20858e9db82603e8fb808

Status: RESOLVED → REOPENED
Flags: needinfo?(alice0775)
Resolution: FIXED → ---
Target Milestone: mozilla71 → ---

Changing the NI request.

Flags: needinfo?(alice0775) → needinfo?(a.beingessner)
Assignee: a.beingessner → sotaro.ikeda.g

I take the bug.

Flags: needinfo?(a.beingessner)

Tested Bug 1589875 and Bug 1589902 with the patch. From it, compositor rendering was not triggered since SetParent().

Attachment #9102906 - Attachment description: Bug 1573710 - Trigger composite after SetParent() and make window resizing more robust to races → Bug 1573710 - Trigger composite after SetParent()
Blocks: wr-71
No longer blocks: wr-70
Duplicate of this bug: 1591572
Pushed by sikeda.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7a4cf3982659
Trigger composite after SetParent() r=Gankro
Attachment #9102859 - Attachment is obsolete: true
Flags: needinfo?(sotaro.ikeda.g)
Attachment #9101800 - Attachment is obsolete: true
Pushed by sikeda.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eb6461a904aa
Trigger composite after SetParent() r=Gankro
Status: REOPENED → RESOLVED
Closed: 25 days ago11 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

Attachment #9101933 - Attachment is obsolete: true
Blocks: 1580222

Comment on attachment 9102906 [details]
Bug 1573710 - Trigger composite after SetParent()

Beta/Release Uplift Approval Request

  • User impact if declined: Dialog window could be blank with WebRender or D3D Compositor with double buffering. It depends on the timing.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): The patch changed how to trigger composition around SetParent() call . But it is relatively simple.
  • String changes made/needed: None
Attachment #9102906 - Flags: approval-mozilla-beta?

Sotaro, is this bug happening with graphic cards and drivers we officially support on the release channel or does it happen for users that enabled webrender manually in about:config?

Flags: needinfo?(sotaro.ikeda.g)

This is caused by Windows OS problem that happens when SetWindowPos() and SetParent() are called at the same time. It depends on the timing. It does not related to graphics card nor driver. The problem could happen when WebRender is enabled. WebRender is already enabled on some PCs on release.

And the problem could also happens with D3D Compositor with pref gfx.direct3d11.use-double-buffering:true. It is enabled until beta by Bug 1580222 .

Flags: needinfo?(sotaro.ikeda.g)

Comment on attachment 9102906 [details]
Bug 1573710 - Trigger composite after SetParent()

P3 but covered by tests and may be a more visible problem on release with bug 1580222 riding the trains, uplift approved for 71 beta 8, thanks.

Attachment #9102906 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: 1574746
You need to log in before you can comment on or make changes to this bug.