Closed Bug 1051995 Opened 5 years ago Closed 5 years ago

Leak of ContentParent when repeatedly killing the homescreen

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla35
blocking-b2g 2.1+
Tracking Status
firefox33 --- wontfix
firefox34 --- fixed
firefox35 --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: mccr8, Assigned: mtseng)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(1 file, 7 obsolete files)

In testing bug 1051114, Fabrice found that repeatedly killing the homescreen, then waiting 4 seconds to let it restart leaks the homescreen.  We seem to leak the home screen every time.  Unfortunately, the CC logs are not very useful.  Each of the leaking ContentParents has a refcount of 1, with no known references in the CC log.
> then waiting 4 seconds to let it restart leaks the homescreen

Oops, that should say "leaks a ContentParent".
In IRC, khuey said the best place to start investigating these leaks is to look for "(orphan)" iframes, but there's only one in this case.  I did notice that there are a little over a hundred "XPCWrappedNative (ChromeMessageSender)" in there, which seem to be the wrapper for nsFrameMessageManager.  

Here is what is keeping alive two of these nsFrameMessageManagers:

0xa9c2e5e0 [FragmentOrElement (xhtml) iframe id='browser0' class='browser' app://system.gaiamobile.org/index.html]
    --[[via hash] mListenerManager]--> 0xad24d940 [EventListenerManager]
    --[mListeners event=onmozdocommand listenerType=3 [i]]--> 0xa819f4e0 [CallbackObject]
    --[mCallback]--> 0xa92646a0 [JS Object (Function)]
    --[**UNKNOWN SLOT 0**]--> 0xa8fc8640 [JS Object (Object)]
    --[_frameLoader]--> 0xafcba2e0 [JS Object (XPCWrappedNative_NoHelper)]
    --[js::GetObjectPrivate(obj)]--> 0xa817da60 [XPCWrappedNative]
    --[mIdentity]--> 0xa74b3100 [nsFrameLoader]
    --[mMessageManager]--> 0xa74b31c0 [nsFrameMessageManager]

0xa9c2e5e0 [FragmentOrElement (xhtml) iframe id='browser0' class='browser' app://system.gaiamobile.org/index.html]
    --[[via hash] mListenerManager]--> 0xad24d940 [EventListenerManager]
    --[mListeners event=onmozdocommand listenerType=3 [i]]--> 0xa8114aa0 [CallbackObject]
    --[mCallback]--> 0xa7789380 [JS Object (Function)]
    --[**UNKNOWN SLOT 0**]--> 0xaa488bb0 [JS Object (Object)]
    --[_mm]--> 0xb1d05c60 [JS Object (ChromeMessageSender)]
    --[js::GetObjectPrivate(obj)]--> 0xa790ef40 [XPCWrappedNative (ChromeMessageSender)]
    --[mIdentity]--> 0xa749f520 [nsFrameMessageManager]

The browser element and ELM are the same, which makes sense, but then the event is different.  So it looks like something is registering for the event onmozdocommand and forgetting to unregister.

khuey said the frame loading keeps alive the ContentParent, so I think this is the owning path that is causing the leak.
khuey pointed out this: http://mxr.mozilla.org/mozilla-central/source/dom/browser-element/BrowserElementParent.jsm#163 and that it seems to be something added in bug 987040.
Blocks: 987040
Hmm, mContentParent handling in nsFrameLoader looks leaky.
DestroyChild should probably set mContentParent to null.
Component: DOM → DOM: Content Processes
So I think we need to deal this in two steps.
Fix the ContentParent leak, but also fix frameloader/messagemanager leak.
Attached patch clear_content_parent.diff (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=5fd329a3417c

Fabrice, does this help with the ContentParent leak?
Flags: needinfo?(fabrice)
mccr8, what is keeping iframe element alive? Since that is supposed to keep
event listeners alive etc.

Fabrice, how do we handle homescreen dying? Do we remove the old iframe and create a new one, 
or do we just try to reload using the existing iframe?

(In other words, it is not clear to me whether that messagemanager part is a leak at all.)
(In reply to Olli Pettay [:smaug] from comment #6)
> Created attachment 8471590 [details] [diff] [review]
> clear_content_parent.diff
> 
> https://tbpl.mozilla.org/?tree=Try&rev=5fd329a3417c
> 
> Fabrice, does this help with the ContentParent leak?

I would rather not do this because it obscures the leak of the iframe/etc.
Well, those iframes aren't orphan, so they are just in document and should stay alive.
(I don't know whether it is a leak or not that they are in document.)
But the ContentParent objects certainly could be released.
Whiteboard: [MemShrink] → [MemShrink:P2]
Morris, can you take this?
Assignee: nobody → mtseng
OK, I'll take a look at this.
Attached patch bug 1051995 (obsolete) — Splinter Review
Remove event listener before browser destroyed.
Attachment #8476528 - Flags: review?(fabrice)
Comment on attachment 8471590 [details] [diff] [review]
clear_content_parent.diff

Assuming the other patch fixes the leak, we wouldn't need this.
Though, I still think we should take this.
Flags: needinfo?(fabrice)
We should not take the patch from comment 6, because making iframes entrain the content parent is the only way we find these problems.
That makes no sense. We can add some other logging to indicate we're leaking iframes, but leaking
possibly significant amounts memory isn't great.
Why do you think ContentParent is a significant amount of memory?
Hmm, looks like we do clear stuff when child process dies.
Ok, maybe it isn't too bad then, just a bit silly to require an explicit leak in order to detect
something going wrong. And in principle FrameLoader should release ContentParent once it doesn't need it anymore.
Attachment #8476528 - Flags: review?(fabrice) → review+
Are we ignoring the numerous test failures on that Try run?
Keywords: checkin-needed
Fix try fail.
Attachment #8476528 - Attachment is obsolete: true
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #19)
> Are we ignoring the numerous test failures on that Try run?
No, I missed those test failures. Sorry about it. I uploaded a new patch to resolve those failures.

new try run: https://tbpl.mozilla.org/?tree=Try&rev=68bc47445752
https://hg.mozilla.org/mozilla-central/rev/cb0615aada17
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
This didn't fix the issue.  The subject of ipc:browser-destroyed is the tab parent, not the frame loader, so we never execute the code this bug added since we compare the tab parent to the frame loader and that is always false.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
blocking because we leak a bunch of stuff in the system app.
blocking-b2g: --- → 2.1+
Attached patch wip (obsolete) — Splinter Review
That patch fixes the leak but I had to add a new observer notification which I don't like too much.

For some reason, the TabParent sent by ipc:browser-destroyed never matches this._frameLoader.tabParent in BEP.jsm. I still don't know why.
Attachment #8483885 - Flags: feedback?(mtseng)
Attachment #8483885 - Flags: feedback?(khuey)
Attachment #8483885 - Flags: feedback?(mtseng) → feedback+
Attached patch Patch (obsolete) — Splinter Review
This works for me.
Attachment #8490431 - Flags: review?(fabrice)
Attachment #8490431 - Flags: review?(fabrice) → review+
Attachment #8483885 - Attachment is obsolete: true
Attachment #8483885 - Flags: feedback?(khuey)
Attachment #8471590 - Attachment is obsolete: true
Despite the general inscrutability of that error message, this did find a real issue.  Not sure how to fix it yet.
Flags: needinfo?(khuey)
The bug that the test found is that this doesn't work for in process mozbrowsers because there's no TabParent to fire the ipc:browser-destroyed message.  That also means this approach is not going to work.
So, the way this event is implemented is fundamentally broken.  The relationship between BrowserElementParents/nsFrameLoaders and iframe mozbrowsers is not bijective.  If the iframe is removed from the document and readded we will create a new nsFrameLoader and BrowserElementParent for the iframe.  This means that machinery that is tied to the frame loader or BEP can't listen for events on the iframe, or you'll get problems like this.  And there's no obvious place to clean up, because the frameloader doesn't tell us when it goes away if it's in-process.  It's worth noting that none of the other events are hooked up this way.  It's also not obvious to me how to fix this.  We have JS data to pass around.  Maybe if we can JSON.stringify it we can pass it through the observer service.

Regardless, I've spent far too much time on this already.  Morris needs to pick this back up.
Status: REOPENED → NEW
Attached patch bug1051995 (obsolete) — Splinter Review
Using observer service to pass docommand message to avoid leak.

try run:  https://tbpl.mozilla.org/?tree=Try&rev=d8759fdc3ef5
Attachment #8492841 - Flags: review?(fabrice)
Attached patch bug1051995 v2 (obsolete) — Splinter Review
Fix mochitest failure.

try run: https://tbpl.mozilla.org/?tree=Try&rev=d2ec2f021897
Attachment #8492841 - Attachment is obsolete: true
Attachment #8492841 - Flags: review?(fabrice)
Attachment #8492957 - Flags: review?(fabrice)
Comment on attachment 8492957 [details] [diff] [review]
bug1051995 v2

Review of attachment 8492957 [details] [diff] [review]:
-----------------------------------------------------------------

Not a fan of adding another weak observer, but it's better than the current leak.

::: b2g/chrome/content/shell.js
@@ +737,5 @@
>    },
>  
>    handleEvent: function docommand_handleEvent(cmd) {
>      if (this._event) {
> +      Services.obs.notifyObservers({wrappedJSObject: this._event.target},

nit: { wrappedJSObject: this._event.target }
Attachment #8492957 - Flags: review?(fabrice) → review+
Fix nit.
Attachment #8492957 - Attachment is obsolete: true
Target Milestone: mozilla34 → ---
https://hg.mozilla.org/mozilla-central/rev/52dc4f4d5837
Status: NEW → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Please request Aurora approval on this patch when you get a chance.
Flags: needinfo?(mtseng)
Comment on attachment 8493562 [details] [diff] [review]
bug1051995 v3 (carry r+: fabrice)

Approval Request Comment
[Feature/regressing bug #]: Bug 1051995
[User impact if declined]: Memory leak when repeatly killing homescreen.
[Describe test coverage new/current, TBPL]:
[Risks and why]: low risk
[String/UUID change made/needed]: None
Attachment #8493562 - Flags: approval-mozilla-aurora?
Flags: needinfo?(mtseng)
Comment on attachment 8493562 [details] [diff] [review]
bug1051995 v3 (carry r+: fabrice)

Review of attachment 8493562 [details] [diff] [review]:
-----------------------------------------------------------------

a=me
Attachment #8493562 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.