TabChild leak during WindowTest app running

RESOLVED FIXED in Firefox 31

Status

()

defect
RESOLVED FIXED
5 years ago
3 months ago

People

(Reporter: sotaro, Assigned: bkelly)

Tracking

({regression})

30 Branch
mozilla32
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox30 unaffected, firefox31+ fixed, firefox32+ fixed, b2g-v1.3 unaffected, b2g-v1.3T unaffected, b2g-v1.4 unaffected, b2g-v2.0 fixed)

Details

(Whiteboard: [MemShrink])

Attachments

(3 attachments, 6 obsolete attachments)

Reporter

Description

5 years ago
+++ This bug was initially created as a clone of Bug #998504 +++

On the following STR, TabChild is created for each window.open(). But the TabChild's destructor is not called even after window.close() is called. I confirmed that TabChild::RecvDestroy() is called.

STR:
* Run the test app at bug 964386 attachment 8366455 [details] on a QRD device.

Confirmed device:
 master nexus-5
Reporter

Updated

5 years ago
Blocks: 998504
Whiteboard: [MemShrink]
I'll take this.  I probably won't be able to start looking until tomorrow or Monday, so if someone can look sooner feel free to steal it.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
The question will come up whether or not this bug needs to be a 1.4 blocker since it blocks bug 998504.  Ben and I discussed earlier how the fd leakage will likely be fixed by Sotaro's work in bug 1000525 and bug 1004191.

Does this contribute enough of a leak that we need to block 1.4 on a fix?  Sotaro, what do you think?
Flags: needinfo?(sotaro.ikeda.g)
Reporter

Comment 3

5 years ago
I created this bug as TabChild leak. I am not sure TabChild leak directly causes the window.open()/close() leak. I think that we need to investigate more about window.open()/close() leak happens other place than gfx layers.
Flags: needinfo?(sotaro.ikeda.g)
Reporter

Comment 4

5 years ago
The following are about:memory got after several hundreds window.open()/close() iteration.

-------------------------------------

WindowTest (pid 1069)
Explicit Allocations

93.72 MB (100.0%) -- explicit
├──79.75 MB (85.09%) -- js-non-window
│  ├──76.14 MB (81.24%) -- zones
│  │  ├──75.13 MB (80.17%) -- zone(0xb3c32c00)
│  │  │  ├──68.44 MB (73.03%) -- compartment([System Principal], outOfProcessTabChildGlobal)
│  │  │  │  ├──24.81 MB (26.47%) -- shapes
│  │  │  │  │  ├──14.41 MB (15.37%) -- gc-heap
│  │  │  │  │  │  ├───9.61 MB (10.25%) ── tree/global-parented [413]
│  │  │  │  │  │  ├───4.78 MB (05.10%) ── base [413]
│  │  │  │  │  │  └───0.02 MB (00.03%) ── dict [3]
│  │  │  │  │  └──10.40 MB (11.09%) ── malloc-heap/compartment-tables [413]
│  │  │  │  ├──23.67 MB (25.25%) -- objects
│  │  │  │  │  ├──17.05 MB (18.19%) -- gc-heap
│  │  │  │  │  │  ├──13.23 MB (14.12%) ── function [413]
│  │  │  │  │  │  └───3.82 MB (04.07%) ── ordinary [413]
│  │  │  │  │  ├───6.62 MB (07.06%) ── malloc-heap/slots [413]
│  │  │  │  │  └───0.00 MB (00.00%) ── non-heap/code/asm.js
│  │  │  │  ├──11.63 MB (12.41%) -- sundries
│  │  │  │  │  ├───7.46 MB (07.96%) ── malloc-heap [413]
│  │  │  │  │  └───4.17 MB (04.45%) ── gc-heap [413]
│  │  │  │  ├───8.32 MB (08.88%) ── scripts/gc-heap [413]
│  │  │  │  └───0.02 MB (00.02%) ── baseline/fallback-stubs [2]
│  │  │  ├───3.12 MB (03.33%) ── type-objects/gc-heap
│  │  │  ├───2.41 MB (02.57%) ++ (8 tiny)
│  │  │  └───1.16 MB (01.24%) ++ compartment([System Principal])
│  │  └───1.00 MB (01.07%) ++ (9 tiny)
│  ├───2.10 MB (02.24%) -- gc-heap
│  │   ├──1.10 MB (01.17%) ++ (2 tiny)
│  │   └──1.00 MB (01.07%) ── unused-chunks
│  └───1.51 MB (01.61%) ++ runtime
├───5.11 MB (05.45%) ── heap-unclassified
├───2.46 MB (02.62%) -- xpconnect
│   ├──1.75 MB (01.87%) ── scopes
│   └──0.71 MB (00.75%) ++ (3 tiny)
├───2.15 MB (02.30%) ++ (15 tiny)
├───2.13 MB (02.27%) -- heap-overhead
│   ├──1.21 MB (01.29%) ── waste
│   └──0.92 MB (00.98%) ++ (2 tiny)
└───2.13 MB (02.27%) -- window-objects
    ├──1.61 MB (01.72%) -- top(none)/detached
    │  ├──1.09 MB (01.17%) -- window(app://windowtest.gaiamobile.org/helloworld.html)
    │  │  ├──1.05 MB (01.12%) ++ js-compartment(app://windowtest.gaiamobile.org/helloworld.html, about:blank)
    │  │  └──0.04 MB (00.04%) ++ (3 tiny)
    │  └──0.52 MB (00.55%) ++ window(about:blank)
    └──0.52 MB (00.55%) ++ top(app://windowtest.gaiamobile.org/index.html, id=1)
Okay, thanks, Sotaro.  Let's *not* make this block 1.4 and see how testing goes once the patches in bug 1000525 and bug 1004191 have landed.
Reporter

Comment 6

5 years ago
memory-reports got during WindowTest app running on master nexus-5 with gfx layers fix.
Reporter

Comment 7

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> Created attachment 8416139 [details]

Oh, it is not the entire memory report. I am going to update soon.
Reporter

Comment 8

5 years ago
Update memory report by entire files.
Attachment #8416139 - Attachment is obsolete: true
Thanks Sotaro!  Seems to be leaking the windows:

429 (100.0%) -- js-main-runtime-compartments
├──417 (97.20%) -- system
│  ├──413 (96.27%) ── [System Principal], outOfProcessTabChildGlobal [413]
│  └────4 (00.93%) ++ (4 tiny)
└───12 (02.80%) -- user
    ├───7 (01.63%) ── app://windowtest.gaiamobile.org/helloworld.html, about:blank [7]
    └───5 (01.17%) ++ (3 tiny)

I'll have to get gc/cc logs to investigate.  Will do so once I finish this other bug...
I reproduced this on the buri with mozilla-central.  I'm going to work on that device for now to avoid the gfx fence issues Sotaro is working through.
Some possibly relevant things from my local about:memory diff.

In the WindowTest process:

98 (100.0%) -- js-main-runtime-compartments
├──98 (100.0%) ── system/[System Principal], outOfProcessTabChildGlobal [99]
└───0 (00.00%) ++ user

118 (100.0%) -- observer-service
└──118 (100.0%) -- referent
   ├──113 (95.76%) ── strong
   └────5 (04.24%) ── weak/alive

131 (100.0%) -- observer-service-suspect
└──131 (100.0%) ── referent(topic=xpcom-shutdown) [+]

30 (100.0%) -- preference-service
└──30 (100.0%) ── referent/strong

In the parent process:


211 (100.0%) -- observer-service
└──211 (100.0%) -- referent
   ├──195 (92.42%) -- weak
   │  ├──190 (90.05%) ── dead
   │  └────5 (02.37%) ── alive
   └───16 (07.58%) ── strong

211 (100.0%) -- observer-service-suspect
├──106 (50.24%) ── referent(topic=oop-frameloader-crashed) [+]
└──105 (49.76%) ── referent(topic=ask-children-to-exit-fullscreen) [+]
So I fixed a bunch of cases where observers, event listeners, and message listeners were not be removed in various files.  I'm not sure if these are really necessary, but they let me get down to the following state:

1) For each window.open()/window.close() cycle I see an |outOfProcessTabChildGlobal| compartment leaked.
2) For each window.open()/window.close() cycle I see an XPCWrappedNative flattened JS object for the ContentFrameMessageManager leaked.

I'm assuming if I can get (2) released, then the compartments will go away.  (Although I thought we combined compartments on b2g, so I'm a little confused why these system compartments are separate to begin with.)

This is the sort of thing I am seeing for the XPCWrappedNatives at the moment:

bkelly@lenir:/srv/b2g-hamachi-master/about-memory-9$ /srv/heapgraph/find_roots.py gc-edges.520.log 0x44fea820
Parsing gc-edges.520.log. Done loading graph. Reversing graph. Done.

via XPCWrappedNative::mFlatJSObject :
0x44fea820 [ContentFrameMessageManager 449b93d0]

Found and displayed 1 paths.
bkelly@lenir:/srv/b2g-hamachi-master/about-memory-9$ /srv/heapgraph/find_roots.py cc-edges.520.log 0x44fea820
Parsing cc-edges.520.log. Done loading graph. Reversing graph. Done.

0x449b93d0 [XPCWrappedNative (ContentFrameMessageManager)]
    --[mFlatJSObject]--> 0x44fea820 [JS Object (ContentFrameMessageManager)]

    Root 0x449b93d0 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x44fea820 [JS Object (ContentFrameMessageManager)] --[js::GetObjectPrivate(obj)]--> 0x449b93d0

(I'll zip and attach the memory report directory for this output.)

Right now I'm having difficulty tracking down where that extra edge is coming from.  Any advice on where to look would be appreciated.
Posted patch tabchild-leak-debug.patch (obsolete) — Splinter Review
This patch contains some debugging code and my current efforts to cleanup observers, event listeners, and message listeners.  I don't think all of these are really necessary, though, if I can get the event listener service to be cleaned up.
(In reply to Ben Kelly [:bkelly] from comment #12)
> Right now I'm having difficulty tracking down where that extra edge is
> coming from.  Any advice on where to look would be appreciated.

Ah, we store the wrapped native as mGlobal here:

  http://dxr.mozilla.org/mozilla-central/source/content/base/src/nsFrameMessageManager.cpp#1598

Releasing mTabChildGlobal and mGlobal in TabChild::RecvDestroy() seems to fix the leak.

I'm checking now to see if I need my other patches or if this was the root cause of all the leaking.
It seems this might be fallout from bug 999213 which recently switched TabChildBase from a weak reference scheme to a cycle-collected, strong reference approach.
Attachment #8416927 - Flags: review?(bobbyholley)
Blocks: 999213
It looked like there may have still been a leak, but I think the GC/CC just needed to kick in.  With attachment 8416927 [details] [diff] [review] I see no net, long term memory leaks.

We could more aggressively unregister listeners, observers, and message listeners in a number of files to save temporary memory usage, but it does not seem worth the complexity.

So "Part 1" is the only part.
Attachment #8416927 - Flags: feedback+
Also, I added some instrumentation to the TabChild destructor and I can confirm it is now getting called after some delay.  I assume the delay is the time until the next CC/GC.
Let's make sure we don't ship this leak.
Just to update, it appears that the TabChild class overrides the mRefCnt member declared by TabChildBase.  I'm working on new patches now.
Posted patch Make TabChild Cycle Collected (obsolete) — Splinter Review
Here is my current work-in-progress.  This seems to fix the memory leak by making TabChild a fully cycle collected object.  Of course, this doesn't make total sense since it doesn't have any direct members that need to be cycle collected.

Andrew tells me that I should be able to drop the NS_DECL_ISUPPORTS and declare a QI, but I have not found the correct set of macros to let that compile and link.  I'll look further tomorrow if no one figures this out tonight.

Thanks!
Attachment #8416927 - Attachment is obsolete: true
Attachment #8416927 - Flags: review?(bobbyholley)
Attachment #8417657 - Attachment is patch: true
If we want to go with that patch approach we should probably make a NS_IMPL_CYCLE_COLLECTION_INHERITED_0 macro since the variadic macros don't like zero length args.
Attachment #8416843 - Attachment is obsolete: true
blocking-b2g: --- → 1.4+
Based on Andrew's help, here is a new patch that makes TabChild inherit TabChildBase's mRefCnt without becoming a full cycle collected class itself.
Attachment #8417657 - Attachment is obsolete: true
Attachment #8418221 - Flags: review?(bugs)
Attachment #8418221 - Attachment is patch: true
Comment on attachment 8418221 [details] [diff] [review]
Make TabChild inherit mRefCnt from TabChildBase to fix CC leak.

>-NS_INTERFACE_MAP_BEGIN(TabChild)
>+NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(TabChild)
You want NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED here


>   NS_INTERFACE_MAP_ENTRY(nsISupportsWeakReference)
>   NS_INTERFACE_MAP_ENTRY(nsITooltipListener)
> NS_INTERFACE_MAP_END
And then NS_INTERFACE_MAP_END_INHERITING(TabChildBase)
Attachment #8418221 - Flags: review?(bugs) → review+
Actually, could you fix nsISupports inheritance to be saner.
So make TabChild to inherit first TabChildBase, then make TabChildBase's QI implementation to
support nsISupports, and remove nsISupports from TabChild's QI.

I could take a look at the new patch.
I think this does what you were suggesting in the last comment.  Look any better?

Thanks!
Attachment #8418221 - Attachment is obsolete: true
Attachment #8418337 - Flags: review?(bugs)
Comment on attachment 8418337 [details]
Make TabChild inherit mRefCnt from TabChildBase to fix CC leak. (v2)

>+NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(TabChild)
>   NS_INTERFACE_MAP_ENTRY(nsIWebBrowserChrome)
>   NS_INTERFACE_MAP_ENTRY(nsIWebBrowserChrome2)
>   NS_INTERFACE_MAP_ENTRY(nsIEmbeddingSiteWindow)
>   NS_INTERFACE_MAP_ENTRY(nsIWebBrowserChromeFocus)
>   NS_INTERFACE_MAP_ENTRY(nsIInterfaceRequestor)
>   NS_INTERFACE_MAP_ENTRY(nsIWindowProvider)
>   NS_INTERFACE_MAP_ENTRY(nsIDOMEventListener)
>   NS_INTERFACE_MAP_ENTRY(nsIWebProgressListener)
>   NS_INTERFACE_MAP_ENTRY(nsITabChild)
>   NS_INTERFACE_MAP_ENTRY(nsIObserver)
>   NS_INTERFACE_MAP_ENTRY(nsISupportsWeakReference)
>   NS_INTERFACE_MAP_ENTRY(nsITooltipListener)
> NS_INTERFACE_MAP_END
Should inherit TabChildBase
Attachment #8418337 - Flags: review?(bugs) → review+
Sorry about that!  I got too focused on your other comment.

I'll wait to see this green before pushing:

  https://tbpl.mozilla.org/?tree=Try&rev=53062294d509
Attachment #8418337 - Attachment is obsolete: true
The try looked reasonably green.  Pushed to mozilla-inbound:

  https://hg.mozilla.org/integration/mozilla-inbound/rev/89d8e5e16a74
Comment on attachment 8418360 [details] [diff] [review]
Make TabChild inherit mRefCnt from TabChildBase to fix CC leak. (v3)

Forgot to carry r+ forward when I added this attachment.
Attachment #8418360 - Flags: review+
Attachment #8418360 - Attachment is patch: true
Comment on attachment 8418360 [details] [diff] [review]
Make TabChild inherit mRefCnt from TabChildBase to fix CC leak. (v3)

Review of attachment 8418360 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/TabChild.cpp
@@ +975,5 @@
>          }
>      }
>  }
>  
> +NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED(TabChild)

I might be missing something, but why can't this just use NS_INTERFACE_MAP_BEGIN (given that TabChild seems to use TabChildBase's cycle collection participant)?
Indeed NS_INTERFACE_MAP_BEGIN could have been used too, for now. _INHERITED doesn't cause any harm here though.
(I expect TabChild to traverse/unlink stuff in the near future)
https://hg.mozilla.org/mozilla-central/rev/89d8e5e16a74
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
(In reply to Olli Pettay [:smaug] from comment #35)
> Indeed NS_INTERFACE_MAP_BEGIN could have been used too, for now. _INHERITED
> doesn't cause any harm here though.
> (I expect TabChild to traverse/unlink stuff in the near future)

So a previous incarnation of the patch used NS_INTERFACE_MAP_BEGIN, but the device was very unstable and the child process kept crashing.  Andrew suggested switching to NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED and it fixed this issue.
I don't believe this affects b2g v1.4 since bug 999213 only landed in gecko 31.  Dropping the dependency to bug 998504 since that is mainly about leaks in v1.4.

I'll nom the patch for aurora uplift.
No longer blocks: 998504
blocking-b2g: 1.4+ → ---
Comment on attachment 8418360 [details] [diff] [review]
Make TabChild inherit mRefCnt from TabChildBase to fix CC leak. (v3)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 999213
User impact if declined: Memory leak
Testing completed (on m-c, etc.): mozilla-central
Risk to taking this patch (and alternatives if risky): low risk
String or IDL/UUID changes made by this patch: none

Ollie, Andrew, please correct if you think the risk is higher.
Attachment #8418360 - Flags: approval-mozilla-aurora?
Attachment #8418360 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

5 years ago
Blocks: 998504
Flags: needinfo?(fabrice)
This is a Fx31 regression, so 1.3 should be unaffected.
(In reply to Andrew McCreight [:mccr8] from comment #41)
> This is a Fx31 regression, so 1.3 should be unaffected.

thanks for verifying!
Flags: needinfo?(fabrice)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.