Firefox is unusable on Windows 11 install, opens small white box, no pages or menus can load
Categories
(Core :: Widget: Win32, defect, P2)
Tracking
()
People
(Reporter: marc, Assigned: rkraesig)
References
(Regression)
Details
(Keywords: regression, Whiteboard: [win:stability])
Attachments
(2 files, 2 obsolete files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36
Steps to reproduce:
This problem crept up over time. Every so often (not sure of the pattern), opening Firefox would result in a small white box with no text being displayed first. Several seconds later it would relaunch and would startup as normal. Then, on February 3 2022, it became unusable. The white box would show at launch, and the window that opened was unusable - I couldn't load any webpage or settings page.
Unfortunately, I don't have any easy steps to reproduce. I didn't change anything about my system from February 2nd (when it was working) to the 3rd (when it wasn't). The only significant recent change was I disabled WSL, but that was about a week earlier.
Actual results:
I've tried uninstalling Firefox and deleting all local caches/settings from AppData, then reinstalling, which didn't work. (See Video 1 linked below)
I also tried downloading and compiling Firefox locally which did work. However, if I tried to launch Firefox Nightly through File Explorer it didn't work - double-clicking the Application file didn't work but ./mach run
in MozillaBuild did. (See Video 2 linked below)
I uploaded some screen recordings to OneDrive to demonstrate the issue: https://1drv.ms/u/s!AjSZ2fG9P0AgyDbaY2xHQmh-pOfS?e=uhhpe8
Expected results:
Firefox should've been able to load webpages and settings menus.
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Win32' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Comment 3•2 years ago
|
||
In your video, Firefox displays a doorhanger with a button to "Refresh Firefox..." underneath the URL bar. When you click that button, do things start to work correctly again?
(In reply to Stephen A Pohl [:spohl] from comment #3)
In your video, Firefox displays a doorhanger with a button to "Refresh Firefox..." underneath the URL bar. When you click that button, do things start to work correctly again?
No, that didn't fix it. The Firefox processes don't seem to entirely end after clicking "Refresh Firefox" - or any time I go to quit Firefox. I have to manually end processes in Task Manager before I can relaunch it.
I added a third video to this shared folder to demonstrate this: https://1drv.ms/u/s!AjSZ2fG9P0AgyDbaY2xHQmh-pOfS?e=uhhpe8. I recorded this right after another fresh install. Between installs I've been deleting C:\Users\marc\AppData\Roaming\Mozilla
.
I'm not sure if there are any diagnostic logs I can provide to help debug this? Or if so, where I would find them? Happy to help in any way to fix this.
Comment 5•2 years ago
|
||
Could you tell us if you're running any antivirus software, such as WebRoot or anything else?
It would be great if you could collect minidumps for the running Firefox processes when you encounter this bug. minidumps may contain personal data, so I suggest that you email me separately with the link to the files once you've been able to upload them somewhere such as your OneDrive. To collect the minidumps:
- Open Task Manager
- Expand Firefox in the list of processes
- For each process, right click and select "Create dump file"
- Upload each dump file
Thank you!
Updated•2 years ago
|
(In reply to Stephen A Pohl [:spohl] from comment #5)
Could you tell us if you're running any antivirus software, such as WebRoot or anything else?
It would be great if you could collect minidumps for the running Firefox processes when you encounter this bug. minidumps may contain personal data, so I suggest that you email me separately with the link to the files once you've been able to upload them somewhere such as your OneDrive. To collect the minidumps:
- Open Task Manager
- Expand Firefox in the list of processes
- For each process, right click and select "Create dump file"
- Upload each dump file
Thank you!
No antivirus programs other than the normal Windows Security/Windows Defender. I'll capture those minidumps and email you the link.
Comment 7•2 years ago
|
||
Thanks, Marc! Toshi, would you be able to take a look please? Thank you!
Comment 8•2 years ago
•
|
||
This is very similar to bug 1735495. What happens is tab processes failed to load one of the essential XPCOM modules such as nss3.dll or xul.dll while the browser process loaded all of them. That blank popup is "Couldn't load XPCOM" opened here. In bug 1735495, the reporter said the issue was gone for some reason and unfortunately we couldn't finish investigation.
Marc, thank you very much for your cooperation. Can you follow the steps of https://bugzilla.mozilla.org/show_bug.cgi?id=1735495#c26 to capture DLL loader's debug trace?
Reporter | ||
Comment 10•2 years ago
|
||
(In reply to Toshihito Kikuchi [:toshi] from comment #8)
This is very similar to bug 1735495. What happens is tab processes failed to load one of the essential XPCOM modules such as nss3.dll or xul.dll while the browser process loaded all of them. That blank popup is "Couldn't load XPCOM" opened here. In bug 1735495, the reporter said the issue was gone for some reason and unfortunately we couldn't finish investigation.
Marc, thank you very much for your cooperation. Can you follow the steps of https://bugzilla.mozilla.org/show_bug.cgi?id=1735495#c26 to capture DLL loader's debug trace?
I attached the DLL loader's log to this bug report! Let me know if there's anything else I can provide.
Comment 11•2 years ago
|
||
(In reply to Marc from comment #10)
I attached the DLL loader's log to this bug report! Let me know if there's anything else I can provide.
Thank you! We analyzed the log, but it didn't capture the child processes and we couldn't find any problem in the main process's log. A possible reason is there were Firefox processes already running.
Can you capture the same log again? Before running the command, please make sure no firefox process is running. You can terminate all firefox.exe instances by the command taskkill /im firefox.exe /f
.
Reporter | ||
Comment 12•2 years ago
|
||
Reporter | ||
Comment 13•2 years ago
|
||
Reporter | ||
Comment 14•2 years ago
|
||
(In reply to Toshihito Kikuchi [:toshi] from comment #11)
(In reply to Marc from comment #10)
I attached the DLL loader's log to this bug report! Let me know if there's anything else I can provide.
Thank you! We analyzed the log, but it didn't capture the child processes and we couldn't find any problem in the main process's log. A possible reason is there were Firefox processes already running.
Can you capture the same log again? Before running the command, please make sure no firefox process is running. You can terminate all firefox.exe instances by the command
taskkill /im firefox.exe /f
.
Sorry about that! I think there were leftover firefox processes running at the time. I restarted my PC and ran that taskkill command before doing this two more times. I attached the new logs to the bug report.
Also, any time I went to load a webpage, i.e. reddit, wikipedia, etc. an additional firefox process would appear in Task Manager and the Windows alert sound played. Could be related to those essential modules you mentioned not loading? In the logs I saw a line "ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017". Not too familiar with Windows/desktop development, but figured this might help!
Comment 15•2 years ago
|
||
(In reply to Marc from comment #14)
(In reply to Toshihito Kikuchi [:toshi] from comment #11)
Sorry about that! I think there were leftover firefox processes running at the time. I restarted my PC and ran that taskkill command before doing this two more times. I attached the new logs to the bug report.Also, any time I went to load a webpage, i.e. reddit, wikipedia, etc. an additional firefox process would appear in Task Manager and the Windows alert sound played. Could be related to those essential modules you mentioned not loading? In the logs I saw a line "ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017". Not too familiar with Windows/desktop development, but figured this might help!
Thank you so much for collecting logs quickly. We reached the same conclusion. The process somehow failed to load WSOCK32.dll, thus failing to load nss3.dll and showing the XPCOM error popup. The errorcode 0xc0000017 is STATUS_NO_MEMORY
. It's unlikely, but just in case, can you check the system memory is not full when you hit this problem from Task Manager?
I'll try to find out what operation can cause LdrpComputeLazyDllPath
to return 0xc0000017.
2ef4:04a8 @ 00179843 - LdrpFindKnownDll - ENTER: DLL name: WS2_32.dll
2ef4:4478 @ 00179843 - LdrpSearchPath - ENTER: DLL name: WSOCK32.dll
2ef4:04a8 @ 00179843 - LdrpFindKnownDll - RETURN: Status: 0x00000000
2ef4:04a8 @ 00179843 - LdrpMinimalMapModule - ENTER: DLL name: C:\WINDOWS\System32\WS2_32.dll
2ef4:4478 @ 00179843 - LdrpComputeLazyDllPath - ERROR: Lazy DLL search path computation failed with status: 0xc0000017.
2ef4:4478 @ 00179843 - LdrpSearchPath - RETURN: Status: 0xc0000017
2ef4:4478 @ 00179843 - LdrpProcessWork - ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017
ModLoad: 00007ffd`23a50000 00007ffd`23abf000 C:\WINDOWS\System32\WS2_32.dll
2ef4:04a8 @ 00179859 - LdrpMinimalMapModule - RETURN: Status: 0x00000000
Reporter | ||
Comment 16•2 years ago
|
||
(In reply to Toshihito Kikuchi [:toshi] from comment #15)
(In reply to Marc from comment #14)
(In reply to Toshihito Kikuchi [:toshi] from comment #11)
Sorry about that! I think there were leftover firefox processes running at the time. I restarted my PC and ran that taskkill command before doing this two more times. I attached the new logs to the bug report.Also, any time I went to load a webpage, i.e. reddit, wikipedia, etc. an additional firefox process would appear in Task Manager and the Windows alert sound played. Could be related to those essential modules you mentioned not loading? In the logs I saw a line "ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017". Not too familiar with Windows/desktop development, but figured this might help!
Thank you so much for collecting logs quickly. We reached the same conclusion. The process somehow failed to load WSOCK32.dll, thus failing to load nss3.dll and showing the XPCOM error popup. The errorcode 0xc0000017 is
STATUS_NO_MEMORY
. It's unlikely, but just in case, can you check the system memory is not full when you hit this problem from Task Manager?I'll try to find out what operation can cause
LdrpComputeLazyDllPath
to return 0xc0000017.2ef4:04a8 @ 00179843 - LdrpFindKnownDll - ENTER: DLL name: WS2_32.dll 2ef4:4478 @ 00179843 - LdrpSearchPath - ENTER: DLL name: WSOCK32.dll 2ef4:04a8 @ 00179843 - LdrpFindKnownDll - RETURN: Status: 0x00000000 2ef4:04a8 @ 00179843 - LdrpMinimalMapModule - ENTER: DLL name: C:\WINDOWS\System32\WS2_32.dll 2ef4:4478 @ 00179843 - LdrpComputeLazyDllPath - ERROR: Lazy DLL search path computation failed with status: 0xc0000017. 2ef4:4478 @ 00179843 - LdrpSearchPath - RETURN: Status: 0xc0000017 2ef4:4478 @ 00179843 - LdrpProcessWork - ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017 ModLoad: 00007ffd`23a50000 00007ffd`23abf000 C:\WINDOWS\System32\WS2_32.dll 2ef4:04a8 @ 00179859 - LdrpMinimalMapModule - RETURN: Status: 0x00000000
No, system memory is not full when running into this problem. My system has ~24GB of memory available while the firefox processes each use less than 200MB.
Comment 17•2 years ago
•
|
||
LdrpComputeLazyDllPath
basically does string manipulation to build a DLL search path based on the current directory, the environment variable, etc.
Analyzing the provided dumps, I found the environment variable PATH
was insanely long, that includes repetitive entries as below. If it's too long, I confirmed the process fails to launch with 0xc0000017. Can you clean up the environment variable (removing duplicate entries) and try launching Firefox?
C:\Python310\Scripts\
C:\Python310\
C:\Program Files\Eclipse Adoptium\jdk-17.0.1.12-hotspot\bin
C:\Program Files\Eclipse Foundation\jdk-16.0.2.7-hotspot\bin
C:\Program Files\Common Files\Oracle\Java\javapath
C:\Program Files (x86)\Common Files\Oracle\Java\javapath
C:\ffmpeg\bin
C:\Program Files (x86)\VMware\VMware Player\bin\
C:\Program Files\Eclipse Foundation\jdk-16.0.2.7-hotspot\bin
C:\Program Files\Common Files\Oracle\Java\javapath
C:\Program Files (x86)\Common Files\Oracle\Java\javapath
C:\ffmpeg\bin
...
I also noticed the environment variable in a child process is longer than one in the main process in the dumps. I think that's why only child processes failed to load modules while the main process worked fine. There may be a bug in our code launching a child process.
Updated•2 years ago
|
Reporter | ||
Comment 18•2 years ago
|
||
Wow, nice catch! That completely fixed the issue. I'm able to install without any problems and the browser works as expected. No child processes are left over in Task Manager when I close the main program either.
I had no idea my system PATH variable was that bloated with duplicate values. Either some program must've modified it incorrectly... or I did haha. There was an entry in the PATH with the value "%PATH%", so maybe I set it wrong a couple times!
I probably won't be able to test any fixes since my PATH is fixed now, but if there's anything more you need from me, let me know!
Comment 19•2 years ago
|
||
(In reply to Marc from comment #18)
Wow, nice catch! That completely fixed the issue. I'm able to install without any problems and the browser works as expected. No child processes are left over in Task Manager when I close the main program either.
I had no idea my system PATH variable was that bloated with duplicate values. Either some program must've modified it incorrectly... or I did haha. There was an entry in the PATH with the value "%PATH%", so maybe I set it wrong a couple times!
I probably won't be able to test any fixes since my PATH is fixed now, but if there's anything more you need from me, let me know!
Glad to hear the problem was solved! I'll dig into this behavior more closely, but I think we got everything we need now. Thank you for your patience and kindly providing the dumps and logs. Have a good weekend!
Comment 20•2 years ago
|
||
I found we explicitly expand environment variables in PATH in the process. So if the PATH includes "%PATH%" (it did in this case) and expanding it crosses the limit, it's quite possible that the browser process fails to start a child process. I think it's a bug we should fix. The fix would be to drop %PATH% in PATH when expanding it (or leave it as is).
Reporter | ||
Comment 21•2 years ago
|
||
(In reply to Toshihito Kikuchi [:toshi] from comment #20)
I found we explicitly expand environment variables in PATH in the process. So if the PATH includes "%PATH%" (it did in this case) and expanding it crosses the limit, it's quite possible that the browser process fails to start a child process. I think it's a bug we should fix. The fix would be to drop %PATH% in PATH when expanding it (or leave it as is).
Just to confirm that finding, I added "%PATH%" to my PATH once and nothing "broke". But after it was in there four times, that must've been enough to cross that limit, and Firefox started showing that white box and not launching properly again. Removing those entries from PATH fixed Firefox completely - so there's an easily reproducible test case!
Updated•2 years ago
|
Updated•2 years ago
|
Comment 22•2 years ago
|
||
Handing this over to Ray.
(In reply to Marc from comment #21)
Just to confirm that finding, I added "%PATH%" to my PATH once and nothing "broke". But after it was in there four times, that must've been enough to cross that limit, and Firefox started showing that white box and not launching properly again. Removing those entries from PATH fixed Firefox completely - so there's an easily reproducible test case!
Thank you for confirming that. The function SanitizeEnvironmentVariables
was introduced in 2011. We found a long-standing regression.
Updated•2 years ago
|
Assignee | ||
Comment 23•2 years ago
•
|
||
Unchecked, %PATH% may expand beyond the documented limits of Windows
environment variables, and in fact, to a long enough string that
processes inheriting this environment will fail to spawn due to
insufficient memory.
Avoid this by capping the length to that documented limit.
Assignee | ||
Comment 24•2 years ago
|
||
It probably isn't going to help to expand PATH multiple times: if the
first pass didn't get them all, there's probably some recursion afoot.
Expand PATH only in the launcher process (including when the launcher
process is just the browser).
Depends on D139072
Updated•2 years ago
|
Assignee | ||
Comment 25•2 years ago
|
||
It probably isn't going to help to expand PATH multiple times: if the
first pass didn't get them all, there's likely some recursion afoot.
Expand PATH only in the topmost Firefox process in the process tree.
Depends on D139072
Comment 26•2 years ago
|
||
Pushed by rkraesig@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cd5cd2b632a6 avoid pathological PATH expansion r=tkikuchi
Comment 27•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Updated•2 years ago
|
Comment 28•2 years ago
|
||
The patch landed in nightly and beta is affected.
:rkraesig, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 29•2 years ago
•
|
||
No; this affects very few users and has been present in Firefox for quite a while.
Comment 30•2 years ago
|
||
Comment on attachment 9265070 [details]
Bug 1753910 - limit PATH expansion to launcher process r?tkikuchi
Revision D139432 was moved to bug 1757038. Setting attachment 9265070 [details] to obsolete.
Updated•2 years ago
|
Description
•