Calling navigator.mozApps.mgmt.getAll() leaks icon data url's via DOMApplication objects

RESOLVED WORKSFORME

Status

()

Core
DOM: Apps
RESOLVED WORKSFORME
4 years ago
3 years ago

People

(Reporter: jgriffin, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(3 attachments)

(Reporter)

Description

4 years ago
Calling navigator.mozApps.mgmt.getAll() in a Marionette test leaks icon data url's for the Marketplace app and the HostStubTest app, on my inari using mozilla-b2g18/v1-train.

After a reboot, before running the test:

│   ├──1.49 MB (13.17%) -- string-chars
│   │  ├──1.05 MB (09.27%) ── non-huge
│   │  └──0.44 MB (03.90%) -- huge
│   │     ├──0.27 MB (02.38%) ++ (4 tiny)
│   │     └──0.17 MB (01.52%) ── string(length=44735, "data:image//jpeg;base64,//9j//4AAQ...") [2]

After running the test:

│  ├───6.48 MB (27.15%) -- string-chars
│  │   ├──3.81 MB (15.97%) ── non-huge
│  │   └──2.67 MB (11.18%) -- huge
│  │      ├──1.19 MB (04.99%) ── string(length=9114, "data:image//png;base64,iVBORw0KG...") [61]
│  │      ├──1.19 MB (04.99%) ── string(length=9646, "data:image//png;base64,iVBORw0KG...") [61]
│  │      └──0.29 MB (01.19%) ++ (3 tiny)

The test only calls execute_script in a loop like so:

    for i in range(0, 50):
        print 'iteration', i
        m.execute_script("""
                let appsReq = navigator.mozApps.mgmt.getAll();
                let foo = window.document;
            """)

If you comment out the first line, the leak doesn't occur:

    for i in range(0, 50):
        print 'iteration', i
        m.execute_script("""
                //let appsReq = navigator.mozApps.mgmt.getAll();
                let foo = window.document;
            """)

After running the above script (with the commented-out line), about-memory shows string consumption comparable to the freshly booted case:

│  ├───1.49 MB (13.24%) -- string-chars
│  │   ├──1.05 MB (09.33%) ── non-huge
│  │   └──0.44 MB (03.92%) -- huge
│  │      ├──0.27 MB (02.39%) ++ (4 tiny)
│  │      └──0.17 MB (01.53%) ── string(length=44735, "data:image//jpeg;base64,//9j//4AAQ...") [2]

I'll attach the test script and the relevant memory dumps.
(Reporter)

Comment 1

4 years ago
Created attachment 780593 [details]
test_leak.py

Test case to reproduce leak.
(Reporter)

Comment 2

4 years ago
Created attachment 780594 [details]
about-memory.zip

about-memory report after leak
Does this need to block the release for leo given this being a key cause of bug 851626?
> Does this need to block the release for leo given this being a key cause of bug 851626?

I think that particular bug's  STR leak in this particular way only when run through a test framework.

The real question is whether there's a user-facing STR that tickles this bug.  I don't know, but I wouldn't be surprised.
Most of the copies of the strings are held alive by paths that look like this:

via nsXPCWrappedJS[nsIObserver,0x48e26580:0x48e019a0].mJSObj :
0x482056d0 [Object <no private>]
    --[_manifest]--> 0x4787f6b0 [Object <no private>]
    --[icons]--> 0x4787f6d0 [Object <no private>]
    --[6]--> 0x47881e10 [string <length 9646> data:image/png;base64,iVBORw0]

I'm not really sure what that means.

Unfortunately, this seems to have been run on a version of b2g18 before bug 890135 landed, so I'm not sure why the observer is alive there.
FWIW, this is how to generate this information (now that I have updated my scripts[1] a bit):
1. Run heapgraph/stringy.py to figure out what the big strings are:
$ python ~/mz/heapgraph/g/stringy.py gc-edges.112.log total chars :: num copies x strings511238 :: 53 x 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAIAAAAlC+aJAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK'
483042 :: 53 x 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADwAAAA8CAYAAAA6/NlyAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbW'
...

2. Search in the GC edges file for the actual start of the string in the log (this is a little crude), which is the same in, except it has "<length 9646> " prefixed (length will vary, of course).

3. Run the find_roots script in "string mode":

$ python ~/mz/heapgraph/find_roots.py gc-edges.112.log '<length 9114> data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADwAAAA8CAYAAAA6/NlyAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbW' --string-mode --num-paths=1

This will print out all of the paths to all of the strings that start with that.  The "--num-paths=1" only prints out the first path it finds for each string.  This is important to do because otherwise it will print out thousands of paths for the string or two that looks like they are legitimately alive.  If you add "--simple-path", then you can do |uniq -c| and |sort| on the output to make it easier to see that almost all of the paths are the same.

[1] https://github.com/amccreight/heapgraph
The wrapped JS seem to also implement nsIMessageListener and mozIDOMApplication.  Beyond that, I think I'll need a log from a more up-to-date b2g18 tree to figure out anything else.
(In reply to Andrew McCreight [:mccr8] from comment #7)
> The wrapped JS seem to also implement nsIMessageListener and
> mozIDOMApplication.  Beyond that, I think I'll need a log from a more
> up-to-date b2g18 tree to figure out anything else.

This means that this is a WebappsApplication object from https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.js#336

I don't understand how this could end up in an observer though, since getAll() doesn't trigger any observer notification.
It isn't necessarily actually registered as an observer.  I think it might just be showing up that way in the log because it implements the nsIObserver interface, and something in C++ is holding alive the XPCWrappedJS for it.
I bet this is reproducible on desktop.
Summary: Calling navigator.mozApps.mgmt.getAll() leaks icon data url's → Calling navigator.mozApps.mgmt.getAll() leaks icon data url's via DOMApplication objects
(Reporter)

Comment 11

4 years ago
I'll reproduce using a current mozilla-b2g18 build.
Thanks!  I should figure out how to actually run b2g...
Assignee: nobody → continuation
(Reporter)

Comment 13

4 years ago
Created attachment 783860 [details]
new about-memory.zip
(Reporter)

Comment 14

4 years ago
Running the procedure from comment #6 yields the same information, it seems:

Found and displayed 1 paths.
via nsXPCWrappedJS[nsIObserver,0x48ff6740:0x49239ac0].mJSObj :
0x491c95b0 [Object <no private>]
    --[_manifest]--> 0x4ca25950 [Object <no private>]
    --[icons]--> 0x4ca25970 [Object <no private>]
    --[6]--> 0x4ca28b20 [string <length 9646> data:image/png;base64,iVBORw0]
And, to be clear, these objects don't go away if you do get_about_memory.py --minimize?
Thanks for generating that!

(In reply to Jonathan Griffin (:jgriffin) from comment #14)
> Running the procedure from comment #6 yields the same information, it seems:

Yup, that's too be expected.  The difference here is that now when we search for that mJSObject in the CC graph, it will give us something useful:

$ python ~/mz/heapgraph/find_roots.py cc-edges.112.log 0x491c95b0
Parsing cc-edges.112.log. Done loading graph. Reversing graph. Done.

0x4493d640 [nsFrameMessageManager]
    --[mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener, mListeners[i] mListener]--> 0x48ff6548 [nsXPCWrappedJS (nsIMessageListener)]
    --[root]--> 0x48ff6388 [nsXPCWrappedJS (mozIDOMApplication)]
    --[mJSObj]--> 0x491c95b0 [JS Object (Object)]

    Root 0x4493d640 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x43d99040 [XPCWrappedNative] --[mIdentity]--> 0x4493d640

0x48ff6748 [nsXPCWrappedJS (nsIObserver)]
    --[root]--> 0x48ff6388 [nsXPCWrappedJS (mozIDOMApplication)]
    --[mJSObj]--> 0x491c95b0 [JS Object (Object)]

    Root 0x48ff6748 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x48ff6748 [nsXPCWrappedJS (nsIObserver)] --[self]--> 0x48ff6748


What the first one means is that the JS object 0x491c95b0 is being added onto the nsFrameMessageManager 6 times (!).  The JS object in question (0x491c95b0) has fields wrappedJSObject (so somebody is wrapping it), _window, _manifest, _messages and _requests.

In fact, if you go and look at 0x4493d640, it has around 13,000 listeners registered on it, which is quite a few.  So it seems like something is going wrong there.

I'm not really sure what the nsIObserver thing is, or why we aren't seeing why it is alive.

Every one of these objects that is holding onto an image that I looked at is being kept alive in the CC graph in a similar way, on the same nsFrameMessageManager.
(Reporter)

Comment 17

4 years ago
(In reply to Justin Lebar [:jlebar] from comment #15)
> And, to be clear, these objects don't go away if you do get_about_memory.py
> --minimize?

They do not go away with --minimize.
> The JS object in question (0x491c95b0) has fields wrappedJSObject (so somebody is 
> wrapping it), _window, _manifest, _messages and _requests. 

I bet this object is DOMRequestHelper, as inherited by Webapps.js.

Maybe we're leaking DOMRequests again.
(Reporter)

Comment 19

4 years ago
Marionette doesn't use any observers, but SpecialPowers does; I wonder if this is another SpecialPowers problem?  I'll disable it and see if it still persists.
It looks like the definition of mgmt.getAll() sticks a message on the message manager, so that would explain why we have so many of them, I guess:
709   getAll: function() {
710     let request = this.createRequest();
711     cpmm.sendAsyncMessage("Webapps:GetAll", { oid: this._id,
712                                               requestID: this.getRequestId(request) });
713     return request;
714   },

What mechanism is supposed to clear those off?
Assignee: continuation → nobody
sendAsyncMessage just sends the message; it shouldn't queue it (unless something weird is going on).

OTOH notice that this.createRequest() (defined in DOMRequestHelper) creates and stores the DOMRequest.  It's supposed to be released in takeRequest(), but if that isn't being called, or if there's a bug in it, we'll definitely leak.
Also, I don't see sendAsyncMessage touching the request object itself; it just sees the request's ID field.
(In reply to Justin Lebar [:jlebar] from comment #21)
> sendAsyncMessage just sends the message; it shouldn't queue it (unless
> something weird is going on).
> 
> OTOH notice that this.createRequest() (defined in DOMRequestHelper) creates
> and stores the DOMRequest.  It's supposed to be released in takeRequest(),
> but if that isn't being called, or if there's a bug in it, we'll definitely
> leak.

We remove it at https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.js#786
The getAll() code in the parent is at https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.jsm#2902 and always sends back Webapps:GetAll:Return:OK, even for empty lists.
(Reporter)

Comment 24

4 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #19)
> Marionette doesn't use any observers, but SpecialPowers does; I wonder if
> this is another SpecialPowers problem?  I'll disable it and see if it still
> persists.

No, disabling special powers doesn't affect this problem.
> In fact, if you go and look at 0x4493d640, it has around 13,000 listeners registered on 
> it, which is quite a few.

This might not be mysterious.  We get one DOMRequestIpcMessageListener for every WebappsApplication object.  Even after the WebappsApplication object goes away, the DOMRequestIpcMessageListener stays alive until its window dies.

So we need to fix that.  But that does not explain why the WebappsApplication objects are alive.
Aha, we never took bug 889984 on branches.
Depends on: 889984
This is my fault for not nom'ing bug 889984.  I'm sorry, everyone.
I'm pretty sure bug 889984 will fix this bug as stated in the bug summary, but let's leave it open until we get it ported to b2g18 and verify that it actually fixes things.  If everything goes as planned, we can dupe this bug.

I expect we'll still have the issue of the 10,000+ message listeners.  I think I understand why this is happening, and I'll try to write up a fix, in a separate bug.
Whiteboard: [MemShrink] → [MemShrink:P1]
The fix for this will be coming in bug 900221.
Depends on: 900221
No longer depends on: 889984
No longer blocks: 851626
See Also: → bug 941789
According to comment 28 and comment 29, it looks like this bug is fixed?
(Reporter)

Comment 31

4 years ago
I think it will be hard for us to verify.  We've modified our automation not to use this API because of this problem, so we'll have to come up with some custom tests to test this.
It sounds like we think this is fixed, so I'm going to close it as works for me.  Please reopen or file a new bug if the symptoms reappear.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.