Closed Bug 1753910 Opened 2 years ago Closed 2 years ago

Firefox is unusable on Windows 11 install, opens small white box, no pages or menus can load

Categories

(Core :: Widget: Win32, defect, P2)

Firefox 96
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox99 --- fixed

People

(Reporter: marc, Assigned: rkraesig)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [win:stability])

Attachments

(2 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36

Steps to reproduce:

This problem crept up over time. Every so often (not sure of the pattern), opening Firefox would result in a small white box with no text being displayed first. Several seconds later it would relaunch and would startup as normal. Then, on February 3 2022, it became unusable. The white box would show at launch, and the window that opened was unusable - I couldn't load any webpage or settings page.

Unfortunately, I don't have any easy steps to reproduce. I didn't change anything about my system from February 2nd (when it was working) to the 3rd (when it wasn't). The only significant recent change was I disabled WSL, but that was about a week earlier.

Actual results:

I've tried uninstalling Firefox and deleting all local caches/settings from AppData, then reinstalling, which didn't work. (See Video 1 linked below)

I also tried downloading and compiling Firefox locally which did work. However, if I tried to launch Firefox Nightly through File Explorer it didn't work - double-clicking the Application file didn't work but ./mach run in MozillaBuild did. (See Video 2 linked below)

I uploaded some screen recordings to OneDrive to demonstrate the issue: https://1drv.ms/u/s!AjSZ2fG9P0AgyDbaY2xHQmh-pOfS?e=uhhpe8

Expected results:

Firefox should've been able to load webpages and settings menus.

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Win32' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Widget: Win32
Product: Firefox → Core
Attached image screenshot.png

In your video, Firefox displays a doorhanger with a button to "Refresh Firefox..." underneath the URL bar. When you click that button, do things start to work correctly again?

Severity: -- → S2
Flags: needinfo?(marc)
Priority: -- → P3

(In reply to Stephen A Pohl [:spohl] from comment #3)

In your video, Firefox displays a doorhanger with a button to "Refresh Firefox..." underneath the URL bar. When you click that button, do things start to work correctly again?

No, that didn't fix it. The Firefox processes don't seem to entirely end after clicking "Refresh Firefox" - or any time I go to quit Firefox. I have to manually end processes in Task Manager before I can relaunch it.

I added a third video to this shared folder to demonstrate this: https://1drv.ms/u/s!AjSZ2fG9P0AgyDbaY2xHQmh-pOfS?e=uhhpe8. I recorded this right after another fresh install. Between installs I've been deleting C:\Users\marc\AppData\Roaming\Mozilla.

I'm not sure if there are any diagnostic logs I can provide to help debug this? Or if so, where I would find them? Happy to help in any way to fix this.

Flags: needinfo?(marc)

Could you tell us if you're running any antivirus software, such as WebRoot or anything else?

It would be great if you could collect minidumps for the running Firefox processes when you encounter this bug. minidumps may contain personal data, so I suggest that you email me separately with the link to the files once you've been able to upload them somewhere such as your OneDrive. To collect the minidumps:

  1. Open Task Manager
  2. Expand Firefox in the list of processes
  3. For each process, right click and select "Create dump file"
  4. Upload each dump file

Thank you!

Flags: needinfo?(marc)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: P3 → P2

(In reply to Stephen A Pohl [:spohl] from comment #5)

Could you tell us if you're running any antivirus software, such as WebRoot or anything else?

It would be great if you could collect minidumps for the running Firefox processes when you encounter this bug. minidumps may contain personal data, so I suggest that you email me separately with the link to the files once you've been able to upload them somewhere such as your OneDrive. To collect the minidumps:

  1. Open Task Manager
  2. Expand Firefox in the list of processes
  3. For each process, right click and select "Create dump file"
  4. Upload each dump file

Thank you!

No antivirus programs other than the normal Windows Security/Windows Defender. I'll capture those minidumps and email you the link.

Flags: needinfo?(marc)

Thanks, Marc! Toshi, would you be able to take a look please? Thank you!

Flags: needinfo?(tkikuchi)

This is very similar to bug 1735495. What happens is tab processes failed to load one of the essential XPCOM modules such as nss3.dll or xul.dll while the browser process loaded all of them. That blank popup is "Couldn't load XPCOM" opened here. In bug 1735495, the reporter said the issue was gone for some reason and unfortunately we couldn't finish investigation.

Marc, thank you very much for your cooperation. Can you follow the steps of https://bugzilla.mozilla.org/show_bug.cgi?id=1735495#c26 to capture DLL loader's debug trace?

Flags: needinfo?(tkikuchi) → needinfo?(marc)
See Also: → 1735495
Flags: needinfo?(marc)

(In reply to Toshihito Kikuchi [:toshi] from comment #8)

This is very similar to bug 1735495. What happens is tab processes failed to load one of the essential XPCOM modules such as nss3.dll or xul.dll while the browser process loaded all of them. That blank popup is "Couldn't load XPCOM" opened here. In bug 1735495, the reporter said the issue was gone for some reason and unfortunately we couldn't finish investigation.

Marc, thank you very much for your cooperation. Can you follow the steps of https://bugzilla.mozilla.org/show_bug.cgi?id=1735495#c26 to capture DLL loader's debug trace?

I attached the DLL loader's log to this bug report! Let me know if there's anything else I can provide.

(In reply to Marc from comment #10)

I attached the DLL loader's log to this bug report! Let me know if there's anything else I can provide.

Thank you! We analyzed the log, but it didn't capture the child processes and we couldn't find any problem in the main process's log. A possible reason is there were Firefox processes already running.

Can you capture the same log again? Before running the command, please make sure no firefox process is running. You can terminate all firefox.exe instances by the command taskkill /im firefox.exe /f.

Flags: needinfo?(marc)

(In reply to Toshihito Kikuchi [:toshi] from comment #11)

(In reply to Marc from comment #10)

I attached the DLL loader's log to this bug report! Let me know if there's anything else I can provide.

Thank you! We analyzed the log, but it didn't capture the child processes and we couldn't find any problem in the main process's log. A possible reason is there were Firefox processes already running.

Can you capture the same log again? Before running the command, please make sure no firefox process is running. You can terminate all firefox.exe instances by the command taskkill /im firefox.exe /f.

Sorry about that! I think there were leftover firefox processes running at the time. I restarted my PC and ran that taskkill command before doing this two more times. I attached the new logs to the bug report.

Also, any time I went to load a webpage, i.e. reddit, wikipedia, etc. an additional firefox process would appear in Task Manager and the Windows alert sound played. Could be related to those essential modules you mentioned not loading? In the logs I saw a line "ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017". Not too familiar with Windows/desktop development, but figured this might help!

Flags: needinfo?(marc)

(In reply to Marc from comment #14)

(In reply to Toshihito Kikuchi [:toshi] from comment #11)
Sorry about that! I think there were leftover firefox processes running at the time. I restarted my PC and ran that taskkill command before doing this two more times. I attached the new logs to the bug report.

Also, any time I went to load a webpage, i.e. reddit, wikipedia, etc. an additional firefox process would appear in Task Manager and the Windows alert sound played. Could be related to those essential modules you mentioned not loading? In the logs I saw a line "ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017". Not too familiar with Windows/desktop development, but figured this might help!

Thank you so much for collecting logs quickly. We reached the same conclusion. The process somehow failed to load WSOCK32.dll, thus failing to load nss3.dll and showing the XPCOM error popup. The errorcode 0xc0000017 is STATUS_NO_MEMORY. It's unlikely, but just in case, can you check the system memory is not full when you hit this problem from Task Manager?

I'll try to find out what operation can cause LdrpComputeLazyDllPath to return 0xc0000017.

2ef4:04a8 @ 00179843 - LdrpFindKnownDll - ENTER: DLL name: WS2_32.dll
2ef4:4478 @ 00179843 - LdrpSearchPath - ENTER: DLL name: WSOCK32.dll
2ef4:04a8 @ 00179843 - LdrpFindKnownDll - RETURN: Status: 0x00000000
2ef4:04a8 @ 00179843 - LdrpMinimalMapModule - ENTER: DLL name: C:\WINDOWS\System32\WS2_32.dll
2ef4:4478 @ 00179843 - LdrpComputeLazyDllPath - ERROR: Lazy DLL search path computation failed with status: 0xc0000017.
2ef4:4478 @ 00179843 - LdrpSearchPath - RETURN: Status: 0xc0000017
2ef4:4478 @ 00179843 - LdrpProcessWork - ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017
ModLoad: 00007ffd`23a50000 00007ffd`23abf000   C:\WINDOWS\System32\WS2_32.dll
2ef4:04a8 @ 00179859 - LdrpMinimalMapModule - RETURN: Status: 0x00000000

(In reply to Toshihito Kikuchi [:toshi] from comment #15)

(In reply to Marc from comment #14)

(In reply to Toshihito Kikuchi [:toshi] from comment #11)
Sorry about that! I think there were leftover firefox processes running at the time. I restarted my PC and ran that taskkill command before doing this two more times. I attached the new logs to the bug report.

Also, any time I went to load a webpage, i.e. reddit, wikipedia, etc. an additional firefox process would appear in Task Manager and the Windows alert sound played. Could be related to those essential modules you mentioned not loading? In the logs I saw a line "ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017". Not too familiar with Windows/desktop development, but figured this might help!

Thank you so much for collecting logs quickly. We reached the same conclusion. The process somehow failed to load WSOCK32.dll, thus failing to load nss3.dll and showing the XPCOM error popup. The errorcode 0xc0000017 is STATUS_NO_MEMORY. It's unlikely, but just in case, can you check the system memory is not full when you hit this problem from Task Manager?

I'll try to find out what operation can cause LdrpComputeLazyDllPath to return 0xc0000017.

2ef4:04a8 @ 00179843 - LdrpFindKnownDll - ENTER: DLL name: WS2_32.dll
2ef4:4478 @ 00179843 - LdrpSearchPath - ENTER: DLL name: WSOCK32.dll
2ef4:04a8 @ 00179843 - LdrpFindKnownDll - RETURN: Status: 0x00000000
2ef4:04a8 @ 00179843 - LdrpMinimalMapModule - ENTER: DLL name: C:\WINDOWS\System32\WS2_32.dll
2ef4:4478 @ 00179843 - LdrpComputeLazyDllPath - ERROR: Lazy DLL search path computation failed with status: 0xc0000017.
2ef4:4478 @ 00179843 - LdrpSearchPath - RETURN: Status: 0xc0000017
2ef4:4478 @ 00179843 - LdrpProcessWork - ERROR: Unable to load DLL: "WSOCK32.dll", Parent Module: "C:\Program Files\Mozilla Firefox\nss3.dll", Status: 0xc0000017
ModLoad: 00007ffd`23a50000 00007ffd`23abf000   C:\WINDOWS\System32\WS2_32.dll
2ef4:04a8 @ 00179859 - LdrpMinimalMapModule - RETURN: Status: 0x00000000

No, system memory is not full when running into this problem. My system has ~24GB of memory available while the firefox processes each use less than 200MB.

LdrpComputeLazyDllPath basically does string manipulation to build a DLL search path based on the current directory, the environment variable, etc.

Analyzing the provided dumps, I found the environment variable PATH was insanely long, that includes repetitive entries as below. If it's too long, I confirmed the process fails to launch with 0xc0000017. Can you clean up the environment variable (removing duplicate entries) and try launching Firefox?

C:\Python310\Scripts\
C:\Python310\
C:\Program Files\Eclipse Adoptium\jdk-17.0.1.12-hotspot\bin
C:\Program Files\Eclipse Foundation\jdk-16.0.2.7-hotspot\bin
C:\Program Files\Common Files\Oracle\Java\javapath
C:\Program Files (x86)\Common Files\Oracle\Java\javapath
C:\ffmpeg\bin
C:\Program Files (x86)\VMware\VMware Player\bin\
C:\Program Files\Eclipse Foundation\jdk-16.0.2.7-hotspot\bin
C:\Program Files\Common Files\Oracle\Java\javapath
C:\Program Files (x86)\Common Files\Oracle\Java\javapath
C:\ffmpeg\bin
...

I also noticed the environment variable in a child process is longer than one in the main process in the dumps. I think that's why only child processes failed to load modules while the main process worked fine. There may be a bug in our code launching a child process.

Flags: needinfo?(marc)

Wow, nice catch! That completely fixed the issue. I'm able to install without any problems and the browser works as expected. No child processes are left over in Task Manager when I close the main program either.

I had no idea my system PATH variable was that bloated with duplicate values. Either some program must've modified it incorrectly... or I did haha. There was an entry in the PATH with the value "%PATH%", so maybe I set it wrong a couple times!

I probably won't be able to test any fixes since my PATH is fixed now, but if there's anything more you need from me, let me know!

Flags: needinfo?(marc)

(In reply to Marc from comment #18)

Wow, nice catch! That completely fixed the issue. I'm able to install without any problems and the browser works as expected. No child processes are left over in Task Manager when I close the main program either.

I had no idea my system PATH variable was that bloated with duplicate values. Either some program must've modified it incorrectly... or I did haha. There was an entry in the PATH with the value "%PATH%", so maybe I set it wrong a couple times!

I probably won't be able to test any fixes since my PATH is fixed now, but if there's anything more you need from me, let me know!

Glad to hear the problem was solved! I'll dig into this behavior more closely, but I think we got everything we need now. Thank you for your patience and kindly providing the dumps and logs. Have a good weekend!

I found we explicitly expand environment variables in PATH in the process. So if the PATH includes "%PATH%" (it did in this case) and expanding it crosses the limit, it's quite possible that the browser process fails to start a child process. I think it's a bug we should fix. The fix would be to drop %PATH% in PATH when expanding it (or leave it as is).

See Also: → 552864

(In reply to Toshihito Kikuchi [:toshi] from comment #20)

I found we explicitly expand environment variables in PATH in the process. So if the PATH includes "%PATH%" (it did in this case) and expanding it crosses the limit, it's quite possible that the browser process fails to start a child process. I think it's a bug we should fix. The fix would be to drop %PATH% in PATH when expanding it (or leave it as is).

Just to confirm that finding, I added "%PATH%" to my PATH once and nothing "broke". But after it was in there four times, that must've been enough to cross that limit, and Firefox started showing that white box and not launching properly again. Removing those entries from PATH fixed Firefox completely - so there's an easily reproducible test case!

Whiteboard: [win:stability]

Handing this over to Ray.

(In reply to Marc from comment #21)

Just to confirm that finding, I added "%PATH%" to my PATH once and nothing "broke". But after it was in there four times, that must've been enough to cross that limit, and Firefox started showing that white box and not launching properly again. Removing those entries from PATH fixed Firefox completely - so there's an easily reproducible test case!

Thank you for confirming that. The function SanitizeEnvironmentVariables was introduced in 2011. We found a long-standing regression.

Assignee: nobody → rkraesig
Has Regression Range: --- → yes
Keywords: regression
Regressed by: CVE-2011-2980
See Also: CVE-2011-2980

Unchecked, %PATH% may expand beyond the documented limits of Windows
environment variables, and in fact, to a long enough string that
processes inheriting this environment will fail to spawn due to
insufficient memory.

Avoid this by capping the length to that documented limit.

It probably isn't going to help to expand PATH multiple times: if the
first pass didn't get them all, there's probably some recursion afoot.

Expand PATH only in the launcher process (including when the launcher
process is just the browser).

Depends on D139072

Attachment #9264999 - Attachment is obsolete: true

It probably isn't going to help to expand PATH multiple times: if the
first pass didn't get them all, there's likely some recursion afoot.

Expand PATH only in the topmost Firefox process in the process tree.

Depends on D139072

See Also: → 1757038
Pushed by rkraesig@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cd5cd2b632a6
avoid pathological PATH expansion  r=tkikuchi
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Flags: in-testsuite+

The patch landed in nightly and beta is affected.
:rkraesig, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(rkraesig)

No; this affects very few users and has been present in Firefox for quite a while.

Flags: needinfo?(rkraesig)
OS: Unspecified → Windows

Comment on attachment 9265070 [details]
Bug 1753910 - limit PATH expansion to launcher process r?tkikuchi

Revision D139432 was moved to bug 1757038. Setting attachment 9265070 [details] to obsolete.

Attachment #9265070 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: