Download dialog is blank (GeForce RTX 2080 SUPER)
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
People
(Reporter: yoasif, Assigned: sotaro)
References
Details
(Keywords: regression)
Attachments
(8 files, 3 obsolete files)
Reported on reddit: https://www.reddit.com/r/firefox/comments/cpztp1/download_confirmation_window_is_whiteblank/
User says:
Just set up a new computer and have noticed issues with Firefox where, more often than not, the download window will appear blank and I have to try a second time in order for it to work.
This is on 68.0.1 (64-bit) with Windows 10 64-bit Pro.
User confirms that disabling WebRender resolves the issue for them.
Reporter | ||
Comment 1•6 years ago
|
||
Reporter | ||
Updated•6 years ago
|
Comment 2•6 years ago
|
||
Could not reproduce with 68.0.1/Win10/GTX1060. (but bug 1517472 is still reproducible)
Download source from screenshot:
https://bdcraft.net/downloads/purebdcraft-minecraft/ > "Accept Cookies" > "Click here to accept the Terms of Use" > "MC1.12" > "256x" > "Download 256x" > "Download with Ad".
from comment 1:
Launcher Process: Disabled due to failure
bitdefender.com
Comment 3•6 years ago
|
||
Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.
Updated•6 years ago
|
Comment 4•6 years ago
|
||
Might be hard to track down without more specifics. Asif - can you found out from the person who reported this bug what apps they were trying to download when this happened?
Reporter | ||
Comment 5•6 years ago
|
||
I think darkspirit provided that in comment 2.
Comment 6•6 years ago
|
||
More than one to test would be nice! Reporter seemed to indicate it happened in more than one place
Reporter | ||
Comment 7•6 years ago
|
||
Followed up in the reddit thread -- unfortunately, people sometimes ignore requests for a follow up. Hope we can get traction on this.
Reporter | ||
Comment 8•6 years ago
|
||
User says they saw the issue on downloads for these pages:
https://www.msi.com/page/afterburner
https://www.techpowerup.com/gpuz/
https://www.cpuid.com/softwares/cpu-z.html
They also say that they are not seeing the issue on Nightly.
It's hard to reproduce and it occurs rarely but still there.
Though this is not 100% reproducible,
- Go to https://archive.mozilla.org/pub/firefox/nightly/2019/08/2019-08-17-21-50-27-mozilla-central/
- Click .exe or .tar.gz or .mar or .zip or .bz2
- Cancel if dialog displayed normally
- Repeat step.2-3 (maybe 10-50 times)
Comment 10•6 years ago
|
||
Comment 11•6 years ago
|
||
Updated•6 years ago
|
Comment 12•6 years ago
|
||
Happy to take a patch for 70 or beyond.
Since we are getting close to the end of the 69 beta cycle and this is set to P3, I'm marking it fix-optional for 69 and 70 to remove it from weekly triage.
Comment 14•5 years ago
|
||
I am unable to rerpoduce this on the older nvidia GPU I have handy. Is there a GeForce RTX 2080 available in the Toronto office?
Comment 15•5 years ago
|
||
Per IRC, seems like it is also happening here: https://www.fosshub.com/HWiNFO.html
Comment 16•5 years ago
|
||
I'm also seeing this happen when clicking the "Create a New Profile" button on the about:profiles page. At least for me, its happening about 1 in every 20 clicks of the button on that page. So, this may not be limited to just the download dialog. The button is currently broken, but should work again in the next nightly with Bug 1583114.
Is there anything I can do to capture some information about the blank window that would be helpful? about:support doesn't report any errors.
Comment 17•5 years ago
|
||
(In reply to Trevor Rowbotham [:rowbot] from comment #16)
Is there anything I can do to capture some information about the blank window that would be helpful? about:support doesn't report any errors.
What GPU do you have?
Comment 18•5 years ago
|
||
RTX 2080, though its not a SUPER model. I'll see if I can repo the create profile button thing on my GTX 770.
Comment 19•5 years ago
|
||
I can repro the blank window with the create profile button on my GTX 770 as well. It looks like things are actually being drawn, but at the wrong coordinates based on the attached picture.
Updated•5 years ago
|
Comment 20•5 years ago
•
|
||
Tried the steps out in comment 9 https://bugzilla.mozilla.org/show_bug.cgi?id=1573710#c9 - was not able to get it to reproduce. Also tried the 'Create a new profile' button on about:profiles. Clicked it around 20 times, dialogue showed up correctly each time.
Computer I was trying on had same driver as https://bugzilla.mozilla.org/show_bug.cgi?id=1573710#c1
Comment 21•5 years ago
|
||
Can you share the driver version you are using?
Comment 22•5 years ago
|
||
My driver version is 436.60 (the DCH version if it matters), which is the latest.
Comment 23•5 years ago
|
||
I've been able to reproduce this on our windows rig with a GT 730 (investigating now).
Updated•5 years ago
|
Comment 24•5 years ago
|
||
Comment 25•5 years ago
|
||
Because this happens in little adhoc windows it's a bit harder to debug; gonna take a bit to find the exact culprit.
Comment 26•5 years ago
|
||
Some notes from today's investigation using Visual Studio's Spy++ tool:
- All of gecko's/webrender's state seems fully coherent. Everyone knows sizes and positions perfectly well.
- The download dialog is actually two WNDs (HANDLEs)
- The title bar is the actual nsWindow (eWindowType_dialog)
- The content is a WinCompositorWidget (MozillaCompositorWindowClass)
- When the glitch occurs, this window has a size of 1x1 (which is what it is initialized to)
- However when I print-debug UpdateCompositorWindowSizeIfNecessary the OS SetWindowPos call succeeded with the proper (non-1x1) values!!
This is as far as I got, but it indicates something very strange is happening. Perhaps we're hitting a straight-up OS bug. Perhaps we're racing with another syscall on this WND? Not sure, I'll look into this tomorrow, unless Sotaro knows about well-known issue that could cause this.
Comment 27•5 years ago
|
||
What threads are the window interactions happening on? I wonder if we could be racing? It reminds me a bit of bug 1157784 where we had a race between Present and SetWindowLong(mWnd, GWL_STYLE, style): https://bug1157784.bmoattachments.org/attachment.cgi?id=8605355
Comment 28•5 years ago
|
||
Hey sotaro, it seems like all this code is your domain of expertese, would you be able to figure out what the right fix is for this? Here's my final investigation notes, including a very hacky fix that basically tries to work around the bug. Unfortunately, the bug is fundamentally non-deterministic so I can't ever be certain my hacky fix works.
When we open a download dialog, it sometimes shows up as blank. However this happens in two different ways:
- totally blank, mCompositorWnd client-rect == (0,0) 1x1
- global position, mCompoistorWnd client-rect ~= (0,0) 400x150
I believe both are fundamentally caused by a race between the CompositorChild calling SetParent(mCompostorWnd, mWnd)
and the RendererThread calling SetPosition(mCompositorWnd, newClientRect)
. (See diagram below for terms and methods)
In Case 1, the window is completely messed up. It still has the dummy bounds we initialized it with, suggesting that even though SetPosition succeeded, it was "undone". We fail to detect this however, because we only query mWnd's bounds for if an update needs to occur. We do not expect mCompositorWnd's size to change without our explicit request (seemingly reasonable).
In Case 2, mCompositorWindow has the right size, but it has been positioned at absolute (0,0) instead of relative (0,0). Or, to put it another way, it's relative to mInitialWnd instead of mWnd. However the delta between mWnd and mInitialWnd is "remembered" by the OS, so when we move the download dialog the contents remain at this fixed offset (up and to the left). If you get the download dialog to spawn near the top-left corner you can easily see this happening, as the window's contents will be peeking out of the top-left corner (because they have only a small negative offset).
Case 2 I believe to be some sort of issue with SWP_NOMOVE, where reparenting from mInitialParent at (0,0) to mWnd at (x,y) is kind of a move?
I appear to be able to fix Case 1 by checking that GetParent(mCompositorWnd) == mWnd
before calling SetPosition. Indeed, based on my logging we very frequently reach SetPosition while this check fails (although it doesn't break every time). However this does no fix Case 2. To do that, I also had to remove SWP_NOMOVE from SetPosition.
These are of course hacks: ideally we would just properly synchronize access to avoid the issue. Although it's unclear Case 2 can actually be fixed by synchronization?
Here's a basic diagram of the code flow and the threads involved, as I understand it. As you can see, there seems to be a race between SetParent and SetPosition. (Might want to copy into a text editor, very wide)
mWnd = Real Window
mInitialParent = Dummy Parent of mCompositorWnd
mCompositorWnd = Window's Contents (compositor surface)
widget = All 3
CompositorParent | WinCompositorWindowThread | CompositorChild |
----------------------------------------------------------------------------+---------------------------------------------------------------------------+-------------------------------------------+
AllocPWebRenderBridgeParent | | |
EnsureCompositorWindow | | |
CreateCompositorWindow | | |
WinCompositorWindowThread::CreateCompositorWindow ----- runnable ---> mIntialParent = !::CreateWindowEx(null, 0, 0, 1, 1) | |
(blocking) | mCompositorWnd = !::CreateWindowEx(mInitialParent, 0, 0, 1, 1) | |
<----- done ------- | |
| | |
UpdateCompositorWnd(mCompositorWnd, mWnd) ------- async -----|---------------------------------------------------------------------------|---> RecvUpdateCompositorWnd |
WebRenderAPI::Create(widget) ~~~> | | !::SetParent(mCompositorWnd, mWnd) |
(Sync-Creates the Renderer(widget) on the RendererThread) | | |
AsyncPipelineManager(widget) ~~~> | | |
WebRenderBridgeParent(widget) ~~~> | | |
| | |
| RendererThread |
+------------------------------------------------------------+
??? ~~~|~> UpdateAndRender |
| RendererOGL::UpdateAndRender |
| RendererCompositorANGLE::BeginFrame |
| UpdateCompositorWndSizeIfNecessary |
| <------------------------------------ fix #1 - add: if (GetParent(mCompositorWnd) != mWnd) { return; }
| rect = !::GetClientRect(mWnd) |
| if rect != prevRect: |
| !::SetPosition(mCompositorWnd, rect) <------- fix #2 - remove: SWP_NOMOVE (because we do move relative to parent)
| prevRect = rect |
| |
Comment 29•5 years ago
|
||
Here's that ascii diagram as an image so it's a bit easier to resize
Assignee | ||
Comment 30•5 years ago
•
|
||
Hi :Gankra. Great investigation!
Yea, it seems that there is a race condition exists between SetParent() in UI process and SetPosition() on RenderThread in GPU process. I could reproduce the problem on my one low-end Win10 PC and investigated the problem with printf_stderr().
Both on [1] and [2] cases, SetPosition() was succeeded. Further SetParent() and SetPosition() seemed to happen concurrently.
When [1] happened,the sequence was like the following
-(1) SetParent() called in UI process
-(2) Entered WinCompositorWidget::UpdateCompositorWndSizeIfNecessary()
-(3) Parent of compositor window was still initial parent window
-(4) SetWindowPos() with (0, 0, 432, 165) succeeded
-(5) WM_NCCALCSIZE with client rect(0, 0, 432, 165) event happened on WinCompositorWindowThread
-(6) WM_NCCALCSIZE with client rect(0, 0, 1, 1) event happened on WinCompositorWindowThread
-(7) After SetWindowPos(), GetClientRect() with compositor window returned rect(0, 0, 1, 1)
-(8) Exited SetParent() in UI process
When [2] happened,the sequence was like the following. It was hard to reproduce [2] .
-(1) SetParent() called in UI process
-(2) Entered WinCompositorWidget::UpdateCompositorWndSizeIfNecessary()
-(3) Parent of compositor window was still initial parent window
-(4) SetWindowPos() with (0, 0, 432, 313) succeeded
-(5) WM_NCCALCSIZE with client rect(0, 0, 1, 1) event happened on WinCompositorWindowThread
-(6) WM_NCCALCSIZE with client rect(-404, -221, 28, 92) event happened on WinCompositorWindowThread
-(7) After SetWindowPos(), ::GetClientRect() with compositor window returned rect(0, 0, 432, 313)
-(8) Exited SetParent() in UI process
-(9) WM_NCCALCSIZE with client rect(-404, -221, 28, 67) event happened on WinCompositorWindowThread
From the above, it seems that we could suppress the problem by "GetParent(mCompositorWnd) == mWnd" as you mentioned.
And from the [2] sequence, there is a case that client rect of compositor window becomes strange by OS.
Therefore removing SWP_NOMOVE seems reasonable. And It seems better to remove WinCompositorWidget::mLastCompositorWndSize usage.
It assumes that a window position by SetWindowPos() does not change. But size of client rect is changed by OS at unexpected timing.
And as an option, we might want to trigger WR rendering when SetParent() was completed in UI process.
Assignee | ||
Comment 31•5 years ago
|
||
Assignee | ||
Comment 32•5 years ago
|
||
Attachment 9101800 [details] [diff] was used for Comment 30 .
Comment 33•5 years ago
|
||
Comment 34•5 years ago
|
||
Here's the "jank" fix, if you want to go forward with it. Otherwise, should I just reassign the bug to you?
Comment 35•5 years ago
|
||
Also here's a try build in case anyone wants to test it:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac8ea7e4b32d4641b88f1301fbaa63b34e039651
Assignee | ||
Comment 36•5 years ago
|
||
(In reply to Alexis Beingessner [:Gankra] from comment #34)
Here's the "jank" fix, if you want to go forward with it. Otherwise, should I just reassign the bug to you?
It looks good as a workaround :) If we needs to do more, we could handle it in a new bug.
Assignee | ||
Comment 37•5 years ago
|
||
(In reply to Alexis Beingessner [:Gankra] from comment #35)
Also here's a try build in case anyone wants to test it:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac8ea7e4b32d4641b88f1301fbaa63b34e039651
The problem was addressed for me.
Comment 38•5 years ago
|
||
Comment 39•5 years ago
|
||
bugherder |
Comment 40•5 years ago
|
||
Backed out changeset f6ca531ca5b2 (bug 1573710) requested by alice0775. a=backout
Backout link: https://hg.mozilla.org/mozilla-central/rev/11d13005c8767500d6a20858e9db82603e8fb808
Comment 41•5 years ago
|
||
Changing the NI request.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 43•5 years ago
|
||
Tested Bug 1589875 and Bug 1589902 with the patch. From it, compositor rendering was not triggered since SetParent().
Assignee | ||
Comment 44•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 45•5 years ago
|
||
Assignee | ||
Comment 46•5 years ago
|
||
Comment 48•5 years ago
|
||
Comment 49•5 years ago
|
||
Backed out for mda failures on test_getUserMedia_basicScreenshare.
Backout link: https://hg.mozilla.org/integration/autoland/rev/72d2c01f3bbfbfed901a99923d5a091f8bd37f17
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=273421760&repo=autoland&lineNumber=22453
There were also WdH Tier2 failures: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=273425286&repo=autoland&lineNumber=856
Comment 50•5 years ago
|
||
There were also xpcshell failures : https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=273433181&repo=autoland&lineNumber=5833
Comment 51•5 years ago
•
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 52•5 years ago
•
|
||
Comment 53•5 years ago
|
||
Comment 54•5 years ago
|
||
bugherder |
Comment 55•5 years ago
|
||
Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 56•5 years ago
|
||
Comment on attachment 9102906 [details]
Bug 1573710 - Trigger composite after SetParent()
Beta/Release Uplift Approval Request
- User impact if declined: Dialog window could be blank with WebRender or D3D Compositor with double buffering. It depends on the timing.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): The patch changed how to trigger composition around SetParent() call . But it is relatively simple.
- String changes made/needed: None
Comment 57•5 years ago
|
||
Sotaro, is this bug happening with graphic cards and drivers we officially support on the release channel or does it happen for users that enabled webrender manually in about:config?
Assignee | ||
Comment 58•5 years ago
|
||
This is caused by Windows OS problem that happens when SetWindowPos() and SetParent() are called at the same time. It depends on the timing. It does not related to graphics card nor driver. The problem could happen when WebRender is enabled. WebRender is already enabled on some PCs on release.
And the problem could also happens with D3D Compositor with pref gfx.direct3d11.use-double-buffering:true. It is enabled until beta by Bug 1580222 .
Updated•5 years ago
|
Comment 59•5 years ago
|
||
Comment on attachment 9102906 [details]
Bug 1573710 - Trigger composite after SetParent()
P3 but covered by tests and may be a more visible problem on release with bug 1580222 riding the trains, uplift approved for 71 beta 8, thanks.
Comment 60•5 years ago
|
||
bugherder uplift |
Description
•