Intermittent JSM shared global leaks when Fission is enabled
Categories
(Core :: XPConnect, task, P2)
Tracking
()
People
(Reporter: mccr8, Assigned: mccr8)
References
(Blocks 1 open bug)
Details
(Keywords: memory-leak, Whiteboard: [MemShrink:P2])
Attachments
(3 files)
4.16 KB,
patch
|
Details | Diff | Splinter Review | |
6.31 KB,
patch
|
Details | Diff | Splinter Review | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
As seen in bug 1576498, in some Fission tests we frequently see intermittent leaks of a few XPConnect related objects. I've papered these over by setting a pref to reuse Fission processes (for same-origin). I suspect this is due to some kind of race, maybe where shutting down a content process is racing with starting it up, in some kind of way, because the tests that hit this leak most often seem to create a lot of child processes.
Investigation is difficult because I have not been able to reproduce it locally. I have managed to figure out that it appears to be the JSM shared global that is leaking.
Assignee | ||
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
I haven't spent a ton of time investigating this, because this is not a "real leak", as we expect the JSM global to always be held alive until shutdown. This is mostly a matter of avoiding TreeHerder failures, which are a hassle for people working on developing Firefox.
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
These XPConnect objects that are leaking could potentially form a cycle, and thus require the cycle collector to destroy them. XPCJSRuntime::mLoaderGlobal is a reference to the loader global that is not cleared until nsXPConnect is being destroyed, which is after the cycle collector shuts down.
Maybe these objects are normally acyclic, but under whatever unknown race conditions they are cyclic. Thus maybe clearing mLoaderGlobal earlier would fix the leak, was one theory I had.
mozJSComponentLoader::mLoaderGlobal is another strong reference to the loader global, but this one is cleared before we do our final CC.
However, another weird thing is that XPCJSRuntime::mLoaderGlobal and mozJSComponentLoader::mLoaderGlobal are lazily created, and as far as I can see there is no guarantee that once we clear either of those references, they can't be created again if something asks for them. Another theory I had was that maybe something is calling the accessor methods for these fields late in shutdown, thus creating a new cyclic object that isn't freed.
To investigate that, I wrote the attached hacky patch. This adds a new state variable, gCCState, that records the current state of the main thread's cycle collector. In state 0, the CC hasn't been initialized. In state 1, the CC has been initialized (the normal state). In state 2, we've begun the shutdown CC, and in state 3, we've shut down the CC.
If we're in state 2 or later, and somebody calls XPCJSRuntime::LoaderGlobal() or mozJSComponentLoader::GetSharedGlobal, then we crash. Also, when we begin state 2, it clears XPCJSRuntime::mLoaderGlobal.
Anyways, with my patch, we don't hit either of those crashes (which is good, and maybe we could try enforcing that more in debug builds), but we still leak. I should figure out if I can get a CC log on try without disrupting whatever condition is causing the leak.
Assignee | ||
Comment 5•6 years ago
|
||
One possible startup/shutdown race related to the JSM loader is the ScriptPreloader, but I don't know how this could cause a leak. I see that it already watches for XPCOM shutdown and does some cleanup.
Assignee | ||
Comment 7•5 years ago
|
||
I wrote a patch to simplify XPConnect shutdown a tiny bit, so that XPConnect shouldn't be holding onto the JSM global when we start doing the shutdown CCs. Then I added in some hacky patches to record shutdown CCs on treeherder (which ended up recording them for way too many processes, but that's another story...) and managed to get a log where the global is leaking. Here's what is keeping alive the XPCWN for the BackstagePass:
0x7f7c8e21bf70 [JSStackFrame]
--[mStack]--> 0x32b570b78900 [JS Object (SavedFrame)]
--[group_global]--> 0x11eeef679100 [JS Object (BackstagePass)]
--[js::GetObjectPrivate(obj)]--> 0x7f7c9639d940 [XPCWrappedNative (BackstagePass)]
Root 0x7f7c8e21bf70 is a ref counted object with 1 unknown edge(s).
Assignee | ||
Comment 8•5 years ago
|
||
I could only find a few heap references to nsIStackFrame. ConsoleCallData looks pretty dodgy. It isn't cycle collected because it can be used either to hold worker stuff or main thread stuff.
Assignee | ||
Comment 9•5 years ago
|
||
I thought this was about Console, because we clear the Consoles for windows when the windows hit their destroy event, but Consoles are also available to JSMs, and we never clear that Console, which seemed mighty suspicious. However, clearing all Consoles at xpcom-shutdown didn't fix the leak.
However, I did find another heap reference to nsIStackFrame, CycleCollectedJSContext::mPendingException (well, the dom::Exception holds an mLocation). Clearing that before we do the shutdown CC seems to fix the leak. I'm not sure if that's the best fix, but at least I seem to have figured out where the leak is coming from.
Assignee | ||
Comment 10•5 years ago
|
||
Assignee | ||
Comment 11•5 years ago
|
||
Try looks green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e08188fbab98eaece26099199afe6f74c6fc4afe
I removed all of the uses of the shared content process pref. I'll land that separately if the intermitents on TreeHerder clear up.
Assignee | ||
Comment 12•5 years ago
|
||
This could be causing transient leaks in other places besides shutdown.
Comment 13•5 years ago
|
||
Assignee | ||
Comment 14•5 years ago
|
||
I forgot to update the patch summary when I updated the patch. The "top level" part shouldn't be there.
Comment 15•5 years ago
|
||
bugherder |
Assignee | ||
Comment 16•5 years ago
|
||
Comment on attachment 9138914 [details]
Bug 1612364 - Clear mPendingException when returning to the top level event loop.
Beta/Release Uplift Approval Request
- User impact if declined: This fixes an intermittent leak that shows up on TreeHerder. There should be no user impact, because it is more of a measurement issue than a real leak.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It just clears out a pointer while we're shutting down.
- String changes made/needed: none
Comment 17•5 years ago
|
||
Comment on attachment 9138914 [details]
Bug 1612364 - Clear mPendingException when returning to the top level event loop.
Fixes an intermittent leak seen in Treeherder. Approved for 76.0b3.
Comment 18•5 years ago
|
||
bugherder uplift |
Updated•5 years ago
|
Assignee | ||
Comment 19•5 years ago
|
||
This issue is likely extremely old (maybe dating back to 2004). We're only just noticing it with Fission because it creates so many processes.
Description
•