Closed Bug 1632357 Opened 5 years ago Closed 4 years ago

Completely blank window in Nightly starting 2020-04-07

Categories

(Core :: Graphics, defect, P3)

77 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
firefox-esr68 --- unaffected
firefox75 --- unaffected
firefox76 --- unaffected
firefox77 --- unaffected
firefox78 --- fixed

People

(Reporter: pokechu022, Assigned: sotaro)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Attached file about:support contents

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0

Steps to reproduce:

Start Firefox Nightly without holding shift. Firefox 75 (my normal browser) is not affected.

My machine is very old; it's an HP laptop with CPU AMD A6-3420M APU with Radeon(tm) HD Graphics, 1500 Mhz, 4 Core(s), 4 Logical Processor(s) and GPU AMD Radeon HD 6520G.

Using mozregression, I eventually bisected to this:

8:15.88 INFO: Running autoland build built on 2020-04-07 01:14:27.697000, revision 707b309f
8:20.30 INFO: Launching c:\Users\Pokechu22\AppData\Local\Temp\tmp2gmfkt\firefox\firefox.exe
8:20.30 INFO: Application command: c:\Users\Pokechu22\AppData\Local\Temp\tmp2gmfkt\firefox\firefox.exe --allow-downgrade --wait-for-browser -profile c:\users\pokech~1\appdata\local\temp\tmpdl9yhn.mozrunner
8:20.42 INFO: application_buildid: 20200406233834
8:20.42 INFO: application_changeset: 707b309fb85e305b0dd42cc6701b02aaa3de3273
8:20.43 INFO: application_name: Firefox
8:20.43 INFO: application_repository: https://hg.mozilla.org/integration/autoland
8:20.44 INFO: application_version: 77.0a1
Was this integration build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry', 'back' or 'exit' and press Enter): bad
8:44.49 INFO: Narrowed integration regression window from [bcff183d, 21633ffc] (3 builds) to [bcff183d, 707b309f] (2 builds) (~1 steps left)
8:44.50 INFO: No more integration revisions, bisection finished.
8:44.50 INFO: Last good revision: bcff183d3130db1765d505ba1e75e1a9b986d679
8:44.50 INFO: First bad revision: 707b309fb85e305b0dd42cc6701b02aaa3de3273
8:44.51 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=bcff183d3130db1765d505ba1e75e1a9b986d679&tochange=707b309fb85e305b0dd42cc6701b02aaa3de3273

Actual results:

The window was completely white/blank. However, I am able to open new tabs (with Control+T), and the cursor changes appropriately when I hover over where text or an input field should be. Dragging a tab does show a small preview of it correctly. Attempting to close Firefox with multiple tabs open brings up a white window with a proper title (see https://i.imgur.com/qcuXPBN.png) which does cause Firefox to exit when the right place is clicked.

Starting Nightly in safe mode by holding shift on startup does prevent the issue from occurring; as such, I was able to access about:support. (Refreshing Nightly didn't help.)

Expected results:

The window should have been visible. It's kinda hard to use a browser when you can't see...

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Graphics
Product: Firefox → Core
Has Regression Range: --- → yes
Regressed by: 1627505

Same issue with that build; all windows are blank unless safe mode is used (I get https://i.imgur.com/q52mWE6.png if I launch it with firefox.exe -P Nightly and https://i.imgur.com/uxuxfev.png with firefox.exe -P Nightly -safe-mode; adding --allow-downgrade gives a blank main window and --allow-downgrade plus safe mode gives a visible main window).

Flags: needinfo?(pokechu022)
Whiteboard: apz-planning
Flags: needinfo?(tnikkel)
Whiteboard: apz-planning

Do you use Chrome? Does it have the same problem or does it work fine? The reason I ask is because they use the same flag that caused the issue for you in Firefox.

If Chrome works fine for you, could you download winspy++ ( http://www.catch22.net/software/winspy ) and then in winspy++ uncheck "minimize winspy" and then drag the finder tool from winspy++ over the top part of the Chrome window (the url bar etc) and see if you get Intermediate D3D Window in the winspy window.

Flags: needinfo?(tnikkel) → needinfo?(pokechu022)

After installing chrome, it seems to also be affected (producing a white window, though it turns black on resizing while Nightly doesn't). Edge, however, is not affected (but I'm not sure if I'm using the chromium-based edge or not; the version is Microsoft Edge 44.19041.1.0, Microsoft EdgeHTML 18.19041).

Flags: needinfo?(pokechu022)

Same issue with that build too.

Flags: needinfo?(pokechu022)

Thanks for testing. I'll, at the very least, give you a pref to flip so you can continue using Firefox. Hopefully we can do better.

On older machines it creates a blank window and we only need the pref to direct manipulation (which hasn't landed yet and will be preffed off by default when it lands).

Assignee: nobody → tnikkel
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/428d3ada2cb2 Add a pref to disable adding the WS_EX_LAYERED style to the compositor window. r=sotaro
Backout by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8b50748f444b Backed out 1 changesets for build bustages. CLOSED TREE

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=428d3ada2cb2f7d65e30d5c51f4c8633fdd6d38a&selectedJob=299185705

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=299185705&repo=autoland

Backout link: https://hg.mozilla.org/integration/autoland/rev/8b50748f444b41f9ad14c367c098b9cd93344c35

[task 2020-04-24T08:14:41.535Z] 08:14:41     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/layout/inspector'
[task 2020-04-24T08:14:41.536Z] 08:14:41     INFO -  make[4]: Entering directory '/builds/worker/workspace/obj-build/layout/inspector'
[task 2020-04-24T08:14:41.536Z] 08:14:41     INFO -  layout/inspector/Unified_cpp_layout_inspector0.obj
[task 2020-04-24T08:14:41.536Z] 08:14:41     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/layout/inspector'
[task 2020-04-24T08:14:42.644Z] 08:14:42     INFO -  make[4]: Entering directory '/builds/worker/workspace/obj-build/widget/windows'
[task 2020-04-24T08:14:42.644Z] 08:14:42     INFO -  /builds/worker/fetches/sccache/sccache /builds/worker/fetches/clang/bin/clang-cl -Xclang -std=c++17 -m32 -FoUnified_cpp_widget_windows1.obj -c  -I/builds/worker/workspace/obj-build/dist/stl_wrappers -guard:cf -DNDEBUG=1 -DTRIMMED=1 -DUNICODE -D_UNICODE -D_CRT_RAND_S -DCERT_CHAIN_PARA_HAS_EXTRA_FIELDS -D_SECURE_ATL -DCHROMIUM_BUILD -DU_STATIC_IMPLEMENTATION -DOS_WIN=1 -DWIN32 -D_WIN32 -D_WINDOWS -DWIN32_LEAN_AND_MEAN -DCOMPILER_MSVC -DMOZ_UNICODE -DWINAPI_NO_BUNDLED_LIBRARIES -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/widget/windows -I/builds/worker/workspace/obj-build/widget/windows -I/builds/worker/workspace/obj-build/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/checkouts/gecko/ipc/glue -I/builds/worker/checkouts/gecko/layout/forms -I/builds/worker/checkouts/gecko/layout/generic -I/builds/worker/checkouts/gecko/layout/xul -I/builds/worker/checkouts/gecko/toolkit/xre -I/builds/worker/checkouts/gecko/widget -I/builds/worker/checkouts/gecko/widget/headless -I/builds/worker/checkouts/gecko/xpcom/base -I/builds/worker/workspace/obj-build/dist/include -I/builds/worker/workspace/obj-build/dist/include/nspr -I/builds/worker/workspace/obj-build/dist/include/nss -MD -FI /builds/worker/workspace/obj-build/mozilla-config.h -DMOZILLA_CLIENT -Qunused-arguments -Qunused-arguments -fcrash-diagnostics-dir=/builds/worker/artifacts -TP -Zc:sizedDealloc- -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -arch:SSE2 -Gw -Wno-inline-new-delete -Wno-invalid-offsetof -Wno-microsoft-enum-value -Wno-microsoft-include -Wno-unknown-pragmas -Wno-ignored-pragmas -Wno-deprecated-declarations -Wno-invalid-noreturn -Wno-inconsistent-missing-override -Wno-implicit-exception-spec-mismatch -Wno-microsoft-exception-spec -Wno-unused-local-typedef -Wno-ignored-attributes -Wno-used-but-marked-unused -D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING -GR- -Z7 -Xclang -load -Xclang /builds/worker/workspace/obj-build/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O2 -Oy- -Werror -I/builds/worker/workspace/obj-build/dist/include/cairo -Xclang -fexperimental-new-pass-manager  -Xclang -MP -Xclang -dependency-file -Xclang .deps/Unified_cpp_widget_windows1.obj.pp -Xclang -MT -Xclang Unified_cpp_widget_windows1.obj   Unified_cpp_widget_windows1.cpp
[task 2020-04-24T08:14:42.644Z] 08:14:42     INFO -  In file included from Unified_cpp_widget_windows1.cpp:11:
[task 2020-04-24T08:14:42.644Z] 08:14:42     INFO -  /builds/worker/checkouts/gecko/widget/windows/WinCompositorWindowThread.cpp(163,27): error: no member named 'apz_windows_force_disable_direct_manipulation' in namespace 'mozilla::StaticPrefs'
[task 2020-04-24T08:14:42.644Z] 08:14:42     INFO -          if (!StaticPrefs::apz_windows_force_disable_direct_manipulation()) {
[task 2020-04-24T08:14:42.644Z] 08:14:42     INFO -               ~~~~~~~~~~~~~^
[task 2020-04-24T08:14:42.645Z] 08:14:42     INFO -  1 error generated.
[task 2020-04-24T08:14:42.645Z] 08:14:42     INFO -  /builds/worker/checkouts/gecko/config/rules.mk:750: recipe for target 'Unified_cpp_widget_windows1.obj' failed
[task 2020-04-24T08:14:42.645Z] 08:14:42    ERROR -  make[4]: *** [Unified_cpp_widget_windows1.obj] Error 1
[task 2020-04-24T08:14:42.645Z] 08:14:42     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/widget/windows'
[task 2020-04-24T08:14:42.645Z] 08:14:42     INFO -  /builds/worker/checkouts/gecko/config/recurse.mk:74: recipe for target 'widget/windows/target-objects' failed
[task 2020-04-24T08:14:42.646Z] 08:14:42    ERROR -  make[3]: *** [widget/windows/target-objects] Error 2
[task 2020-04-24T08:14:42.646Z] 08:14:42     INFO -  make[3]: *** Waiting for unfinished jobs....
Flags: needinfo?(tnikkel)
Attachment #9142935 - Attachment description: Bug 1632357. Add a pref to disable adding the WS_EX_LAYERED style to the compositor window. r?sotaro → Bug 1632357. Add a pref to disable adding the WS_EX_LAYERED style to the compositor window. r=sotaro
Flags: needinfo?(tnikkel)
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3cbc071d09c7 Add a pref to disable adding the WS_EX_LAYERED style to the compositor window. r=sotaro
Status: UNCONFIRMED → NEW
Ever confirmed: true

Jeff, do you have thoughts on what to do here?

Direct Manipulation sends DM_POINTERHITTEST to windows to ask them if they want to start a dmanip session. The compositor window seems to get in the way of this event. There seems to be a hittesting fast path in Windows that finds our compositor window unless we use the WS_EX_LAYERED style on it. We know of this work around because MS gave it to Chrome to work around a different issue with messages getting sent to their compositor window delaying events to their main window. The problem is that this causes a completely blank window for some users on very old hardware, and it seems like Chrome has the same problem on that hardware. We've had two reports, both machines from ~2011. Details below. One Intel/Nvidia with two gpus. One ATI, only one gpu. So it seems like the issue might be hard to pin down with a blocklist approach.

  1. Don't do anything. We have a force disable dmanip pref so these users can get a working Firefox. But finding the pref seems pretty impossible unless they interact with us via bugzilla and the bug gets seen by us.
  2. Blocklist specific machines for dmanip as bugs are filed so that we don't use the WS_EX_LAYERED flag for them.
  3. We only need dmanip when there is a precision touchpad. We can probably (haven't looked into this yet) detect if there is a precision touchpad on the system and only use the WS_EX_LAYERED flag if there is one present. This means precision touchpads that are plugged in while Firefox is running don't get dmanip (yes, such a thing exists, but probably pretty rare).
  4. Try to come up with a more general rule to target "old" machines that are more likely to hit this bug and less likely to have precision touchpads (they were first introduced with Windows 8.1 in 2013). Old enough driver date? A database mapping cpuid to release date? A database mapping (vendorid,deviceid) to release date?
  5. Somehow detect the completely blank window and auto-block dmanip? Gfx sanity test maybe?
  6. Something I haven't thought of?

Machines affected so far:

HP laptop with CPU AMD A6-3420M APU with Radeon(tm) HD Graphics, 1500 Mhz, 4 Core(s), 4 Logical Processor(s) and GPU AMD Radeon HD 6520G

GPU #1
Active: Yes
Description: P￙¥￐│Lルト ￘ ᄁ^₩ᆭzヌ↓ᄀ
Vendor ID: 0x1002
Device ID: 0x9647
Driver Version: 15.200.1012.2
Driver Date: 3-11-2015
Drivers: aticfx64 aticfx64 aticfx64 aticfx32 aticfx32 aticfx32 atiumd64 atidxx64 atidxx64 atiumdag atidxx32 atidxx32 atiumdva atiumd6a atitmm64 amdxc32 amdxc64
Subsys ID: 358b103c
RAM: 512
GPU #2
Active: No
RAM: 0

Think Pad 520 on Windows 10 with Nvidia Optimus (switchable graphics - see below). When using integrated graphics (Intel(R) HD Graphics 3000) firefox works normally. When discrete one is used (NVIDIA Quadro 2000M)

GPU #1
Active: Yes
Description: Intel(R) HD Graphics 3000
Vendor ID: 0x8086
Device ID: 0x0126
Driver Version: 9.17.10.4459
Driver Date: 5-19-2016
Drivers: igdumd64 igd10umd64 igd10umd64 igdumd32 igd10umd32 igd10umd32
Subsys ID: 21d117aa
RAM: Unknown
GPU #2
Active: No
Description: NVIDIA Quadro 2000M
Vendor ID: 0x10de
Device ID: 0x0dda
Driver Version: 21.21.13.7748
Driver Date: 6-8-2017

Flags: needinfo?(jmuizelaar)

pokechu022, can you file an issue in the Chrome bug tracker: https://crbug.com/wizard and link to it from here?

Flags: needinfo?(pokechu022)
Flags: needinfo?(pokechu022)

Hey Jessie, would you mind setting a priority on this one?

Flags: needinfo?(jbonisteel)
Flags: needinfo?(jbonisteel)
Priority: -- → P3

After talking to Microsoft they confirmed that DManip is behaving as expected. It does a "speed hittest" to determine which window to start a dmanip session with, and that WS_EX_LAYERED is how you get the speed hittest to find our main window instead of the compositor window and this is the expected behaviour from their perspective.

See Also: → 1544074

I wonder if parent window might block to show compositor windows content like Bug 1570879.

See Also: → 1570879

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3 (Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3 (normal.)

Assignee: tnikkel → nobody
Severity: normal → S3

(In reply to pokechu022 from comment #17)

Done: https://bugs.chromium.org/p/chromium/issues/detail?id=1074582

From the issue, it seems that IDXGIFactory2::CreateSwapChainForComposition() was failed on the PC. Then chrome falls back to disabling hardware acceleration. As a result, child window usage was disabled.

On Firefix, I am not sure if it works. Firefox has more fallback routes. The CreateSwapChainForComposition() is used with WebRender. When the CreateSwapChainForComposition() was failed, it falls back to IDXGIFactory2::CreateSwapChainForHwnd() on current gecko. In this case, we might need to quit compositor window usage. But there is a remaining problem with WebRender. "WebRender + native compositor(current default setting)" does not use the CreateSwapChainForComposition(). Instead, IDCompositionDesktopDevice::CreateTargetForHwnd() is used. We do not know yet if the CreateTargetForHwnd() fails on the PC.

And if CompositorD3D11 or MLGPU is used, Compositor window is also used when gfxVars::UseDoubleBufferingWithCompositor() is true on Win10. In this case, IDXGIFactory2::CreateSwapChainForHwnd() + DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL is used. We also do not know yet if "CreateSwapChainForHwnd() + DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL" fails on the PC. By the way, gfxVars::UseDoubleBufferingWithCompositor() is true only on nightly.

https://phabricator.services.mozilla.com/D76470 is created for possible patch, though it might not work on the PC.

Thanks Sotaro, I'll push that to try so we can get it tested by someone who sees this bug.

That build appears to render properly, but nightly 20200522094316 also appears to render properly so that's odd (even after refreshing nightly, so it's not related to any preference changes (and I don't remember changing preferences either)).

After bisecting, it looks like it was fixed on 2020-04-28 in https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c1b1ba2c99b1268ba658158daa200a8c79b1b8e7&tochange=a5bdcb8018458a3bf0795b0d41adf6cc5651f058 -- which is slightly odd since AMD Radeon HD 6520G should be Northern Islands and not Evergreen (if I'm reading Wikipedia right), but makes some sense. Looks like when webrenderer is enabled, the issue goes away, but if I force MOZ_WEBRENDER=0, the issue comes back on current nightly builds.

But, oddly, I'm not able to retroactively confirm this by setting MOZ_WEBRENDER=1 and starting the 2020-04-07 build. A second bisect with that variable set gives 2020-04-20 and https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e001cc8bb74f0fc4ffa5c40f72fee6d02f4f8c88&tochange=20aad8c2fc424a4ef056386d3c54e18ca0c741d7 (bug 1631312) -- that's the build that fixed it with WebRenderer enabled (or made WebRenderer properly enable in the first place).

I can also confirm that the test build renders properly with MOZ_WEBRENDER=0 forced -- so the test build does actually fix the issue as well.

Flags: needinfo?(pokechu022)

(In reply to pokechu022 from comment #26)

I can also confirm that the test build renders properly with MOZ_WEBRENDER=0 forced -- so the test build does actually fix the issue as well.

Hooray! Thanks for all that testing!

Sotaro, do you want to make whatever changes you want to make to your patch and request review and land it?

Flags: needinfo?(sotaro.ikeda.g)
Attachment #9151042 - Attachment description: Bug 1632357 - Add compositor window usage failure handling → Bug 1632357 - Use compositor window only when it is necessary
Flags: needinfo?(jmuizelaar)

Good! I updated a patch comment and am going to ask a review.

Flags: needinfo?(sotaro.ikeda.g)
Assignee: nobody → sotaro.ikeda.g
Status: NEW → ASSIGNED
Pushed by sikeda.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8c637aeb40e0 Use compositor window only when it is necessary r=nical
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78

@Pokechu22 Could you please take a look and confirm if the issue is fixed or not? We lack the specifed hardware in order to verify the issue.

Flags: needinfo?(pokechu022)

Yes, it is fixed as of Nightly 79.0a1 (2020-06-12), tested starting normally and also with MOZ_WEBRENDER=0 and MOZ_WEBRENDER=1.

Flags: needinfo?(pokechu022)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: