Closed Bug 1440103 Opened 7 years ago Closed 4 years ago

Windows: Firefox fails to release child process handles, leading to zombie processes

Categories

(Core :: IPC, defect, P3)

58 Branch
defect

Tracking

()

RESOLVED FIXED
92 Branch
Tracking Status
firefox90 --- wontfix
firefox91 --- wontfix
firefox92 --- verified

People

(Reporter: wcruppel, Assigned: jld)

References

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 Build ID: 20180206200532 Steps to reproduce: 1) Open tabs, do some browsing, close some tabs, etc. 2) Run the FindZombieHandles tool and note that there are zombie processes being held onto by FireFox: C:\>FindZombieHandles.exe -verbose 9 total zombie processes. 9 zombies held by \Device\HarddiskVolume7\Program Files (x86)\Mozilla Firefox\firefox.exe(11420) 8 zombies of \Device\HarddiskVolume7\Program Files (x86)\Mozilla Firefox\firefox.exe 1 zombie of \Device\HarddiskVolume7\Program Files (x86)\Mozilla Firefox\plugin-container.exe Please see the following article: https://randomascii.wordpress.com/2018/02/11/zombie-processes-are-eating-your-memory/ Actual results: Firefox is preventing zombie processes from exiting, presumably due to not closing process handles (see the article referenced above). This leads to increased memory usage in Windows. Expected results: Firefox should not prevent zombie processes from exiting.
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 I have tested this issue on Windows 10 x64 with the latest Firefox release (58.0.2) and the latest Nightly (60.0a1-20180225220119) and managed to reproduce it. After opening the browser, navigating to different webpages, opening and closing different tabs, closing and opening the browser, you can observe using the FindZombieHandles tool, that multiple "Firefox" zombie processes are not exiting. I do not have to much knowledge in this area, however, I am going to assign the "Core:DOM" component to this issue and maybe someone with more experience will have a look over this.
Component: Untriaged → DOM
Product: Firefox → Core
I have this issue on Windows 8.1 as well, the processes have command line arguments "-osint http://some-website-uri-i-visted-long-ago" and their size is always arount 2 MB.
Unclear if this belongs in DOM, but if it does, it's probably P2.
Priority: -- → P2
Component: DOM → DOM: Core & HTML

I can reproduce the issue on Windows 10 1909 and the latest firefox stable (75) and nightly (77.0a1 20200420095323).
After hours of surfing the web:

PS C:\Soft portable\FindZombieHandles> .\fzh -verbose
75 total zombie processes.
    75 zombies held by \Device\HarddiskVolume8\Soft\Firefox Stable\firefox.exe(3364)
        75 zombies of \Device\HarddiskVolume8\Soft\Firefox Stable\firefox.exe

Process Explorer and Process Hacker can also show handles to non-existent content processes. Probably DOM: Content Processes is the right component for this bug.

Attached image ProcessHacker

This is still the case, I took a full memory dump

150 zombies held by firefox.exe(13504)
140 zombies of firefox.exe - process handle count: 140 - thread handle count: 0
10 zombies of plugin-container.exe - process handle count: 10 - thread handle count: 0

Windows version 89.0.0

Do you need any more details?

I have a procdump of firefox, but unsure if you need a kernel dump to figure out where the leak is.

I assume it's either plugin container or firefox calling itself

Flags: needinfo?(jstutte)

I can confirm that there seem to be some leak of handles here (tested on Windows 10). It does not seem to me that there is a huge impact on most users, though.

It is unclear, if DOM Core & HTML is the right call here. From WindowsProcessLauncher I would start the search in IPC, though there might be other process handles, as well.

Severity: normal → S3
Component: DOM: Core & HTML → IPC
Flags: needinfo?(jstutte)
Priority: P2 → P3

Jens, the problem is not just "leaking handles"... The zombie processes are still in-memory, some of which may be considerably large. The leaking handles are the CAUSE of the zombie processes. All clients must close their process handle when done with the process.

Please review https://randomascii.wordpress.com/2018/02/11/zombie-processes-are-eating-your-memory/

I have an idea about where things might be going wrong. In this code, the handle from CreateProcess is used to derive another handle with more permissions, and then the original handle is used to resolve a promise… but I don't think any of the Then callbacks ever consume that handle.

Also, if I've correctly understood the blog post linked in comment #9, each dead process appears to consume 64k of (unswappable?) kernel memory; that's not immediately catastrophic but it would add up over time, especially with Fission. (See also bug 1719391; I suspect that some users have sessions where thousands of child processes have been started and stopped.)

Assignee: nobody → jld

We use the process handle returned from CreateProcess to derive
another handle with more permissions, but the original handle is never
closed. This bug appears to be fairly old: it existed before this code
was converted to use MozPromise.

Currently we provide the original handle to external consumers of the
launch promise; this patch resolves the promise with the privileged
handle instead and closes the original one. (One consumer uses the
handle only to obtain the pid, and the rest don't use it at all, so this
shouldn't change anything.)

Could I request a bit of code that logs CreateProcess for the pid and when it closes it?
So wherever it is called in the codebase the log will tell us who created it and didn't clean up after?

Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7079e9f454fa Fix handle leak in IPC process launching. r=handyman
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

Comment on attachment 9231064 [details]
Bug 1440103 - Fix handle leak in IPC process launching.

Beta/Release Uplift Approval Request

  • User impact if declined: Memory leaks, which are significantly worse with Fission (e.g., opening cnn.com without an ad blocker leaks ~1MB of physical memory each time), but are also present without Fission.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: You'll need FindZombieHandles from here: https://github.com/randomascii/blogstuff/tree/main/FindZombieHandles/prebuilt (and maybe the other files in that directory)

Start Firefox, optionally enable Fission, open tabs to websites and close them, then run FindZombieHandles (ignore the warning about debug privileges). Expected: no zombies of firefox.exe.

I've verified the latest Nightly this way, and of course my own development builds, so I can check a Beta build myself if that would help.

  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): The code being modified is relatively complicated, and runs asynchronously on several different threads. However, the process launching code is heavily exercised by automated testing (and by users).
  • String changes made/needed: none
Attachment #9231064 - Flags: approval-mozilla-beta?
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Comment on attachment 9231064 [details]
Bug 1440103 - Fix handle leak in IPC process launching.

This seems too risky to me to uplift into 91 sorry. This is a P3/S3 that we have had for years and the fix just landed on nightly, the risk/benefit ratio of an uplift mid-beta for our users doesn't seem favorable to me. I understand that this is a fix of interest for the Fission work but we are not shipping Fission by default in 91 and we have a short release cycle which means that this fix will be in beta 92 on 2 August 10. If there is an important data I missed in my evaluation, don't hesitate to reach out to me.

Attachment #9231064 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Reproduced with Fx 92.0a1 (12-07-2021) on Windows 10.
Verified fixed with Fx 92.0a1 (19-06-2021) on Windows 10.
As per comment 16 I will remove for now the qa+ flag.

Flags: qe-verify+
See Also: → 1734737
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: