Last Comment Bug 897684 - Calling navigator.mozApps.mgmt.getAll() leaks icon data url's via DOMApplication objects
: Calling navigator.mozApps.mgmt.getAll() leaks icon data url's via DOMApplicat...
Status: RESOLVED WORKSFORME
[MemShrink:P1]
:
Product: Core
Classification: Components
Component: DOM: Apps (show other bugs)
: unspecified
: All All
: -- normal with 1 vote (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 900221
Blocks: 886217
  Show dependency treegraph
 
Reported: 2013-07-24 14:25 PDT by Jonathan Griffin (:jgriffin)
Modified: 2014-06-10 17:12 PDT (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
test_leak.py (340 bytes, text/plain)
2013-07-24 14:26 PDT, Jonathan Griffin (:jgriffin)
no flags Details
about-memory.zip (4.43 MB, application/octet-stream)
2013-07-24 14:30 PDT, Jonathan Griffin (:jgriffin)
no flags Details
new about-memory.zip (5.24 MB, application/octet-stream)
2013-07-31 11:01 PDT, Jonathan Griffin (:jgriffin)
no flags Details

Description Jonathan Griffin (:jgriffin) 2013-07-24 14:25:09 PDT
Calling navigator.mozApps.mgmt.getAll() in a Marionette test leaks icon data url's for the Marketplace app and the HostStubTest app, on my inari using mozilla-b2g18/v1-train.

After a reboot, before running the test:

│   ├──1.49 MB (13.17%) -- string-chars
│   │  ├──1.05 MB (09.27%) ── non-huge
│   │  └──0.44 MB (03.90%) -- huge
│   │     ├──0.27 MB (02.38%) ++ (4 tiny)
│   │     └──0.17 MB (01.52%) ── string(length=44735, "...") [2]

After running the test:

│  ├───6.48 MB (27.15%) -- string-chars
│  │   ├──3.81 MB (15.97%) ── non-huge
│  │   └──2.67 MB (11.18%) -- huge
│  │      ├──1.19 MB (04.99%) ── string(length=9114, "...") [61]
│  │      ├──1.19 MB (04.99%) ── string(length=9646, "...") [61]
│  │      └──0.29 MB (01.19%) ++ (3 tiny)

The test only calls execute_script in a loop like so:

    for i in range(0, 50):
        print 'iteration', i
        m.execute_script("""
                let appsReq = navigator.mozApps.mgmt.getAll();
                let foo = window.document;
            """)

If you comment out the first line, the leak doesn't occur:

    for i in range(0, 50):
        print 'iteration', i
        m.execute_script("""
                //let appsReq = navigator.mozApps.mgmt.getAll();
                let foo = window.document;
            """)

After running the above script (with the commented-out line), about-memory shows string consumption comparable to the freshly booted case:

│  ├───1.49 MB (13.24%) -- string-chars
│  │   ├──1.05 MB (09.33%) ── non-huge
│  │   └──0.44 MB (03.92%) -- huge
│  │      ├──0.27 MB (02.39%) ++ (4 tiny)
│  │      └──0.17 MB (01.53%) ── string(length=44735, "...") [2]

I'll attach the test script and the relevant memory dumps.
Comment 1 Jonathan Griffin (:jgriffin) 2013-07-24 14:26:33 PDT
Created attachment 780593 [details]
test_leak.py

Test case to reproduce leak.
Comment 2 Jonathan Griffin (:jgriffin) 2013-07-24 14:30:17 PDT
Created attachment 780594 [details]
about-memory.zip

about-memory report after leak
Comment 3 Jason Smith [:jsmith] 2013-07-24 14:51:42 PDT
Does this need to block the release for leo given this being a key cause of bug 851626?
Comment 4 Justin Lebar (not reading bugmail) 2013-07-24 15:01:25 PDT
> Does this need to block the release for leo given this being a key cause of bug 851626?

I think that particular bug's  STR leak in this particular way only when run through a test framework.

The real question is whether there's a user-facing STR that tickles this bug.  I don't know, but I wouldn't be surprised.
Comment 5 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2013-07-30 18:26:33 PDT
Most of the copies of the strings are held alive by paths that look like this:

via nsXPCWrappedJS[nsIObserver,0x48e26580:0x48e019a0].mJSObj :
0x482056d0 [Object <no private>]
    --[_manifest]--> 0x4787f6b0 [Object <no private>]
    --[icons]--> 0x4787f6d0 [Object <no private>]
    --[6]--> 0x47881e10 [string <length 9646> ]

I'm not really sure what that means.

Unfortunately, this seems to have been run on a version of b2g18 before bug 890135 landed, so I'm not sure why the observer is alive there.
Comment 6 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2013-07-30 19:22:00 PDT
FWIW, this is how to generate this information (now that I have updated my scripts[1] a bit):
1. Run heapgraph/stringy.py to figure out what the big strings are:
$ python ~/mz/heapgraph/g/stringy.py gc-edges.112.log total chars :: num copies x strings511238 :: 53 x ''
483042 :: 53 x ''
...

2. Search in the GC edges file for the actual start of the string in the log (this is a little crude), which is the same in, except it has "<length 9646> " prefixed (length will vary, of course).

3. Run the find_roots script in "string mode":

$ python ~/mz/heapgraph/find_roots.py gc-edges.112.log '<length 9114> ' --string-mode --num-paths=1

This will print out all of the paths to all of the strings that start with that.  The "--num-paths=1" only prints out the first path it finds for each string.  This is important to do because otherwise it will print out thousands of paths for the string or two that looks like they are legitimately alive.  If you add "--simple-path", then you can do |uniq -c| and |sort| on the output to make it easier to see that almost all of the paths are the same.

[1] https://github.com/amccreight/heapgraph
Comment 7 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2013-07-30 19:26:19 PDT
The wrapped JS seem to also implement nsIMessageListener and mozIDOMApplication.  Beyond that, I think I'll need a log from a more up-to-date b2g18 tree to figure out anything else.
Comment 8 [:fabrice] Fabrice Desré 2013-07-30 19:32:20 PDT
(In reply to Andrew McCreight [:mccr8] from comment #7)
> The wrapped JS seem to also implement nsIMessageListener and
> mozIDOMApplication.  Beyond that, I think I'll need a log from a more
> up-to-date b2g18 tree to figure out anything else.

This means that this is a WebappsApplication object from https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.js#336

I don't understand how this could end up in an observer though, since getAll() doesn't trigger any observer notification.
Comment 9 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2013-07-30 19:36:50 PDT
It isn't necessarily actually registered as an observer.  I think it might just be showing up that way in the log because it implements the nsIObserver interface, and something in C++ is holding alive the XPCWrappedJS for it.
Comment 10 Justin Lebar (not reading bugmail) 2013-07-31 00:56:31 PDT
I bet this is reproducible on desktop.
Comment 11 Jonathan Griffin (:jgriffin) 2013-07-31 09:58:23 PDT
I'll reproduce using a current mozilla-b2g18 build.
Comment 12 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2013-07-31 10:40:25 PDT
Thanks!  I should figure out how to actually run b2g...
Comment 13 Jonathan Griffin (:jgriffin) 2013-07-31 11:01:28 PDT
Created attachment 783860 [details]
new about-memory.zip
Comment 14 Jonathan Griffin (:jgriffin) 2013-07-31 11:02:35 PDT
Running the procedure from comment #6 yields the same information, it seems:

Found and displayed 1 paths.
via nsXPCWrappedJS[nsIObserver,0x48ff6740:0x49239ac0].mJSObj :
0x491c95b0 [Object <no private>]
    --[_manifest]--> 0x4ca25950 [Object <no private>]
    --[icons]--> 0x4ca25970 [Object <no private>]
    --[6]--> 0x4ca28b20 [string <length 9646> ]
Comment 15 Justin Lebar (not reading bugmail) 2013-07-31 12:07:09 PDT
And, to be clear, these objects don't go away if you do get_about_memory.py --minimize?
Comment 16 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2013-07-31 12:19:32 PDT
Thanks for generating that!

(In reply to Jonathan Griffin (:jgriffin) from comment #14)
> Running the procedure from comment #6 yields the same information, it seems:

Yup, that's too be expected.  The difference here is that now when we search for that mJSObject in the CC graph, it will give us something useful:

$ python ~/mz/heapgraph/find_roots.py cc-edges.112.log 0x491c95b0
Parsing cc-edges.112.log. Done loading graph. Reversing graph. Done.

0x4493d640 [nsFrameMessageManager]
    --[mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener]--> 0x48ff6548 [nsXPCWrappedJS (nsIMessageListener)]
    --[root]--> 0x48ff6388 [nsXPCWrappedJS (mozIDOMApplication)]
    --[mJSObj]--> 0x491c95b0 [JS Object (Object)]

    Root 0x4493d640 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x43d99040 [XPCWrappedNative] --[mIdentity]--> 0x4493d640

0x48ff6748 [nsXPCWrappedJS (nsIObserver)]
    --[root]--> 0x48ff6388 [nsXPCWrappedJS (mozIDOMApplication)]
    --[mJSObj]--> 0x491c95b0 [JS Object (Object)]

    Root 0x48ff6748 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x48ff6748 [nsXPCWrappedJS (nsIObserver)] --[self]--> 0x48ff6748


What the first one means is that the JS object 0x491c95b0 is being added onto the nsFrameMessageManager 6 times (!).  The JS object in question (0x491c95b0) has fields wrappedJSObject (so somebody is wrapping it), _window, _manifest, _messages and _requests.

In fact, if you go and look at 0x4493d640, it has around 13,000 listeners registered on it, which is quite a few.  So it seems like something is going wrong there.

I'm not really sure what the nsIObserver thing is, or why we aren't seeing why it is alive.

Every one of these objects that is holding onto an image that I looked at is being kept alive in the CC graph in a similar way, on the same nsFrameMessageManager.
Comment 17 Jonathan Griffin (:jgriffin) 2013-07-31 12:23:01 PDT
(In reply to Justin Lebar [:jlebar] from comment #15)
> And, to be clear, these objects don't go away if you do get_about_memory.py
> --minimize?

They do not go away with --minimize.
Comment 18 Justin Lebar (not reading bugmail) 2013-07-31 12:28:27 PDT
> The JS object in question (0x491c95b0) has fields wrappedJSObject (so somebody is 
> wrapping it), _window, _manifest, _messages and _requests. 

I bet this object is DOMRequestHelper, as inherited by Webapps.js.

Maybe we're leaking DOMRequests again.
Comment 19 Jonathan Griffin (:jgriffin) 2013-07-31 12:30:44 PDT
Marionette doesn't use any observers, but SpecialPowers does; I wonder if this is another SpecialPowers problem?  I'll disable it and see if it still persists.
Comment 20 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2013-07-31 12:48:58 PDT
It looks like the definition of mgmt.getAll() sticks a message on the message manager, so that would explain why we have so many of them, I guess:
709   getAll: function() {
710     let request = this.createRequest();
711     cpmm.sendAsyncMessage("Webapps:GetAll", { oid: this._id,
712                                               requestID: this.getRequestId(request) });
713     return request;
714   },

What mechanism is supposed to clear those off?
Comment 21 Justin Lebar (not reading bugmail) 2013-07-31 13:05:07 PDT
sendAsyncMessage just sends the message; it shouldn't queue it (unless something weird is going on).

OTOH notice that this.createRequest() (defined in DOMRequestHelper) creates and stores the DOMRequest.  It's supposed to be released in takeRequest(), but if that isn't being called, or if there's a bug in it, we'll definitely leak.
Comment 22 Justin Lebar (not reading bugmail) 2013-07-31 13:05:59 PDT
Also, I don't see sendAsyncMessage touching the request object itself; it just sees the request's ID field.
Comment 23 [:fabrice] Fabrice Desré 2013-07-31 13:16:35 PDT
(In reply to Justin Lebar [:jlebar] from comment #21)
> sendAsyncMessage just sends the message; it shouldn't queue it (unless
> something weird is going on).
> 
> OTOH notice that this.createRequest() (defined in DOMRequestHelper) creates
> and stores the DOMRequest.  It's supposed to be released in takeRequest(),
> but if that isn't being called, or if there's a bug in it, we'll definitely
> leak.

We remove it at https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.js#786
The getAll() code in the parent is at https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.jsm#2902 and always sends back Webapps:GetAll:Return:OK, even for empty lists.
Comment 24 Jonathan Griffin (:jgriffin) 2013-07-31 13:23:01 PDT
(In reply to Jonathan Griffin (:jgriffin) from comment #19)
> Marionette doesn't use any observers, but SpecialPowers does; I wonder if
> this is another SpecialPowers problem?  I'll disable it and see if it still
> persists.

No, disabling special powers doesn't affect this problem.
Comment 25 Justin Lebar (not reading bugmail) 2013-07-31 13:28:48 PDT
> In fact, if you go and look at 0x4493d640, it has around 13,000 listeners registered on 
> it, which is quite a few.

This might not be mysterious.  We get one DOMRequestIpcMessageListener for every WebappsApplication object.  Even after the WebappsApplication object goes away, the DOMRequestIpcMessageListener stays alive until its window dies.

So we need to fix that.  But that does not explain why the WebappsApplication objects are alive.
Comment 26 Justin Lebar (not reading bugmail) 2013-07-31 13:32:03 PDT
Aha, we never took bug 889984 on branches.
Comment 27 Justin Lebar (not reading bugmail) 2013-07-31 13:38:43 PDT
This is my fault for not nom'ing bug 889984.  I'm sorry, everyone.
Comment 28 Justin Lebar (not reading bugmail) 2013-07-31 14:47:49 PDT
I'm pretty sure bug 889984 will fix this bug as stated in the bug summary, but let's leave it open until we get it ported to b2g18 and verify that it actually fixes things.  If everything goes as planned, we can dupe this bug.

I expect we'll still have the issue of the 10,000+ message listeners.  I think I understand why this is happening, and I'll try to write up a fix, in a separate bug.
Comment 29 Justin Lebar (not reading bugmail) 2013-08-08 13:58:29 PDT
The fix for this will be coming in bug 900221.
Comment 30 Nathan Froyd [:froydnj] 2014-02-28 09:00:59 PST
According to comment 28 and comment 29, it looks like this bug is fixed?
Comment 31 Jonathan Griffin (:jgriffin) 2014-02-28 12:48:39 PST
I think it will be hard for us to verify.  We've modified our automation not to use this API because of this problem, so we'll have to come up with some custom tests to test this.
Comment 32 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2014-06-10 17:12:23 PDT
It sounds like we think this is fixed, so I'm going to close it as works for me.  Please reopen or file a new bug if the symptoms reappear.

Note You need to log in before you can comment on or make changes to this bug.