Closed Bug 1315044 Opened 3 years ago Closed 3 years ago

about:debugging leaks memory in child process

Categories

(DevTools :: about:debugging, defect, P1)

45 Branch
defect

Tracking

(firefox49 wontfix, firefox50 wontfix, firefox51 wontfix, firefox52 wontfix, firefox-esr52 wontfix, firefox53 fixed)

RESOLVED FIXED
Firefox 53
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox53 --- fixed

People

(Reporter: bugzilla.mozilla.org, Assigned: ochameau)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P2])

Attachments

(2 files, 1 obsolete file)

Spun off from bug 1308332 comment 40 and following.

I'm seeing increased footprint of the SDK in child processes.

memory report from 29.10 (build date is probably several days older)

├───83.42 MB (15.42%) -- js-non-window
│   ├──51.91 MB (09.60%) -- zones
│   │  ├──37.21 MB (06.88%) -- zone(0x1e64d7a6000)
│   │  │  ├──30.82 MB (05.70%) -- (223 tiny)
│   │  │  │  ├───4.92 MB (00.91%) ++ compartment([System Principal], Addon-SDK (from: resource://gre/modules/commonjs/toolkit/loader.js:414))

this is from nightly 02-11-2016:


├────701.46 MB (53.11%) -- js-non-window
│    ├──670.12 MB (50.74%) -- zones
│    │  ├──660.08 MB (49.97%) -- zone(0x1f2b6b86000)
│    │  │  ├──285.96 MB (21.65%) -- compartment([System Principal], Addon-SDK (from: resource://gre/modules/commonjs/toolkit/loader.js:414))
│    │  │  │  ├──129.76 MB (09.82%) -- classes
│    │  │  │  │  ├───72.74 MB (05.51%) -- class(Function)/objects
│    │  │  │  │  │   ├──69.44 MB (05.26%) ── gc-heap [487]
│    │  │  │  │  │   └───3.30 MB (00.25%) ── malloc-heap/slots [487]
│    │  │  │  │  ├───33.47 MB (02.53%) -- class(Object)/objects
│    │  │  │  │  │   ├──23.88 MB (01.81%) ── gc-heap [487]
│    │  │  │  │  │   └───9.59 MB (00.73%) ++ malloc-heap
│    │  │  │  │  ├───23.46 MB (01.78%) -- class(<non-notable classes>)/objects
│    │  │  │  │  │   ├──19.55 MB (01.48%) ── gc-heap [487]
│    │  │  │  │  │   └───3.91 MB (00.30%) ++ malloc-heap
│    │  │  │  │  └────0.10 MB (00.01%) ++ (2 tiny)
│    │  │  │  ├──121.88 MB (09.23%) -- scripts
│    │  │  │  │  ├───76.28 MB (05.78%) ── gc-heap [487]
│    │  │  │  │  └───45.60 MB (03.45%) ── malloc-heap/data [487]
│    │  │  │  ├───17.94 MB (01.36%) ++ (2 tiny)
│    │  │  │  └───16.37 MB (01.24%) -- type-inference
│    │  │  │      ├──16.37 MB (01.24%) ── object-type-tables [485]
│    │  │  │      └───0.01 MB (00.00%) ── allocation-site-tables
│    │  │  ├───90.99 MB (06.89%) -- compartment([System Principal], resource://gre/modules/commonjs/toolkit/loader.js)
│    │  │  │   ├──78.97 MB (05.98%) -- classes
│    │  │  │   │  ├──56.64 MB (04.29%) -- class(Object)/objects
│    │  │  │   │  │  ├──37.46 MB (02.84%) ── malloc-heap/slots
│    │  │  │   │  │  └──19.19 MB (01.45%) ── gc-heap
│    │  │  │   │  └──22.33 MB (01.69%) ++ (7 tiny)
│    │  │  │   └──12.01 MB (00.91%) ++ (2 tiny)
│    │  │  ├───80.90 MB (06.13%) -- shapes
│    │  │  │   ├──53.61 MB (04.06%) -- gc-heap
│    │  │  │   │  ├──44.35 MB (03.36%) ── tree
│    │  │  │   │  └───9.26 MB (00.70%) ++ (2 tiny)
│    │  │  │   └──27.29 MB (02.07%) -- malloc-heap
│    │  │  │      ├──22.76 MB (01.72%) ── tree-tables
│    │  │  │      └───4.53 MB (00.34%) ++ (2 tiny)
│    │  │  ├───74.41 MB (05.63%) -- scopes
│    │  │  │   ├──41.16 MB (03.12%) ── malloc-heap
│    │  │  │   └──33.25 MB (02.52%) ── gc-heap
│    │  │  ├───58.86 MB (04.46%) ++ (205 tiny)
│    │  │  ├───33.36 MB (02.53%) ── type-pool
│    │  │  ├───21.39 MB (01.62%) -- object-groups
│    │  │  │   ├──21.18 MB (01.60%) ── gc-heap
│    │  │  │   └───0.21 MB (00.02%) ── malloc-heap
│    │  │  └───14.21 MB (01.08%) ++ compartment([System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:185))
│    │  └───10.05 MB (00.76%) ++ (15 tiny)

I'm also seeing increased CC pause times


CC(T+54785.3)[content] max pause: 3420ms, total time: 3838ms, slices: 38, suspected: 171, visited: 19669 RCed and 4615878 GCed, collected: 167 RCed and 912 GCed (912|0|0 waiting for GC)
ForgetSkippable 8 times before CC, min: 1 ms, max: 6 ms, avg: 1 ms, total: 15 ms, max sync: 0 ms, removed: 663


It does seem like the number of SDK compartments has gone up a lot:

old:

547 (100.0%) -- js-main-runtime-compartments
├──338 (61.79%) ++ user
└──209 (38.21%) -- system
   ├──182 (33.27%) ++ (176 tiny)
   └───27 (04.94%) ── [System Principal], outOfProcessTabChildGlobal [27]


new:

2,228 (100.0%) -- js-main-runtime-compartments
├──2,112 (94.79%) -- system
│  ├────487 (21.86%) ── [System Principal], Addon-SDK (from: resource://gre/modules/commonjs/toolkit/loader.js:414) [487]
│  ├────485 (21.77%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:185) [485]
│  ├────485 (21.77%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:209) [485]
│  ├────485 (21.77%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:24) [485]
│  └────170 (07.63%) ++ (159 tiny)
└────116 (05.21%) ++ user
The Addon SDK loader changes were uplifted to 50/51 as well. It would be good to know if they're also affected.
I don't see anything in telemetry to suggest that this is a wide scale regression. In particular, metrics like GC and CC pauses have been similar, or slightly improved:

https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!5th-percentile!95th-percentile&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F52&measure=CYCLE_COLLECTOR_MAX_PAUSE&min_channel_version=nightly%252F52&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0

It would be good to have more details about the add-ons you have installed, and the first build that you see this regression in.
> and the first build that you see this regression in.

I don't have a STR yet as I only noticed this after a day of uptime, so i can't narrow the range yet. I'll check if this happens immediately after startup or is a buildup over time during the next nightly update.

> In particular, metrics like GC and CC pauses have been similar, or slightly improved:

Can that be filtered by uptime and profiles that have sdk addons? If the issue accumulates then maybe it gets drowned out in aggregate stats that contain sessions not matching those criteria.

> It would be good to have more details about the add-ons you have installed,

As far as I can tell the SDK consumers are

* https://addons.mozilla.org/en-US/firefox/addon/ipfs-gateway-redirect/
* https://addons.mozilla.org/en-US/firefox/addon/new-tab-override/
* devtools

Disabling the addons + minimize memory usage does not free up the compartments.
(In reply to The 8472 from comment #3)
> > In particular, metrics like GC and CC pauses have been similar, or slightly improved:
>
> Can that be filtered by uptime and profiles that have sdk addons? If the
> issue accumulates then maybe it gets drowned out in aggregate stats that
> contain sessions not matching those criteria.

Not easily, but even if it were limited to long uptimes and profiles with SDK
add-ons, I'd expect to see it reflected in the 95th percentile of at least
some metrics.

> > It would be good to have more details about the add-ons you have installed,
>
> As far as I can tell the SDK consumers are
>
> * https://addons.mozilla.org/en-US/firefox/addon/ipfs-gateway-redirect/
> * https://addons.mozilla.org/en-US/firefox/addon/new-tab-override/
> * devtools

A full list of extensions and their versions would be helpful. Even if they're
not SDK add-ons, other add-ons may be involved in the issue. As much other
information that you can provide from about:support (or other preferences
you've changed that aren't reflected there) would also be helpful.

> Disabling the addons + minimize memory usage does not free up the
> compartments.

That's suspicious, but not entirely surprising. As far as I can tell, the
SDK never nukes the sandboxes for its child process loader when extensions are
disabled, so if they're leaking, there's a good chance they'd stay leaked.
It might be notable that the ipfs-gateway-redirect add-on is one of only about 4 or 5 add-ons on AMO that use remoteRequire directly.
Whiteboard: [MemShrink]
I have restarted my FF session and the # of compartments in the child processes is low at the moment. So it's probably something that accumulates over time. I'll keep observing it.

(In reply to Kris Maglione [:kmag] from comment #5)
> It might be notable that the ipfs-gateway-redirect add-on is one of only
> about 4 or 5 add-ons on AMO that use remoteRequire directly.

I wrote that part for e10s compatibility. There is something that might be a semi-leak in there, but not to the extent of the [485] SDK duplicates in comment 0, since for each leak the addon would need to be turned off and on again.
(In reply to The 8472 from comment #6)
> (In reply to Kris Maglione [:kmag] from comment #5)
> > It might be notable that the ipfs-gateway-redirect add-on is one of only
> > about 4 or 5 add-ons on AMO that use remoteRequire directly.
> 
> I wrote that part for e10s compatibility. There is something that might be a
> semi-leak in there, but not to the extent of the [485] SDK duplicates in
> comment 0, since for each leak the addon would need to be turned off and on
> again.

Oh, sorry, I didn't notice that you added compartment counts to this bug. Yeah, something's definitely wrong there. But I still suspect it may have something to do with that add-on calling remoteRequire directly, especially since the impact of these changes on that metric is pretty clearly in the other direction:

https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!mean!5th-percentile!25th-percentile!75th-percentile!95th-percentile&cumulative=0&end_date=2016-10-17&keys=!__none__!__none__&max_channel_version=nightly%252F52&measure=MEMORY_JS_COMPARTMENTS_SYSTEM&min_channel_version=nightly%252F52&processType=true&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-10-17&trim=1&use_submission_date=0

Either way, I can't reproduce it, and I still don't know nearly enough about your configuration to really try.
(In reply to The 8472 from comment #6)
> I have restarted my FF session and the # of compartments in the child
> processes is low at the moment. So it's probably something that accumulates
> over time. I'll keep observing it.

You could try opening and closing windows six or so times and see if anything goes up. That's a common pattern to trigger leaks that are slowly growing.
I think I have narrowed down the issue and it may not be related to installed addons at all.

1. open some tabs so child processes exist
2. open about:debugging
3. repeatedly switch back and forth between "add-ons" and "workers"
4. count sdk compartments in the child process via about:memory

On a profile with only the built-in addons it still causes the compartment count go up.

I guess I did leave about:debugging open over night and it occasionally refreshes, or something like that.
Yup, I can reproduce it with those steps. Thanks.
I just tested and was able to reproduce this in an older version of Aurora without the changes from bug 1308332, so it seems unrelated.
No longer blocks: 1308332
Component: General → Developer Tools: about:debugging
Product: Add-on SDK → Firefox
Summary: Increased SDK footprint in child processes → about:debugging leaks memory in child process
Thanks a ton for the STR!
It looks like the Worker tab in about:debugging creates tons of compartments.
Assignee: nobody → poirot.alex
Every single layer in the devtools onion miss some caching.
Here, we end up creating brand new SDK loader when opening the Worker panel.
The very precise issue is in content-server.jsm which create a new loader and a new debugger server.
This method somewhat assumes we ever try to create a ContentActor only once per process,
but no code is actually ensuring that it doesn't happen.

With these two patches, the situation is cleared on the backend side. I imagine there is also things to improve in the client API and surely in about:debugging. Also, it looks like about:debugging never closes its client, I think we should try to do that in order to ensure things get cleaned up when we close it.
Keywords: mlk
Version: unspecified → 45 Branch
Whiteboard: [MemShrink] → [MemShrink:P2]
Keywords: regression
See Also: → 1311422
Did you mean to request a review?  You posted to MozReview but no reviewer is set.  Just want to make sure it's not lost!
Flags: needinfo?(poirot.alex)
It is not lost. I want to have a test before proceeding with these fixes.
Flags: needinfo?(poirot.alex)
I only test client.getProcess caching. I would like to test that the loader/debugger server get cleaned up on client disconnection, but there is no observer service notification being sent? So it looks like it would require pulling content-server.jsm internal? Is that reasonable to do that in a test?
Status: NEW → ASSIGNED
Priority: -- → P1
Comment on attachment 8807556 [details]
Bug 1315044 - Cache ContentActor forms to prevent creating new one when calling RootActor.getProcess multiple times.

https://reviewboard.mozilla.org/r/90706/#review91766

Great, these patches combined seem to solve this particular issue!  However (as I think you noticed as well) it seems like we leave the server alive when closing about:debugging.  So if you close, reopen, close, reopen about:debugging, we keep stacking more compartments in the child.  Can you file a follow up for this?

::: devtools/server/tests/mochitest/test_getProcess.html:109
(Diff revision 2)
>      });
>    }
>  
> +  // Assert that calling client.getProcess against the same process id is
> +  // returning the same actor.
> +  function reGetProcess(firstActor, id) {

Maybe `getProcessAgain`? `reGet` reads a bit strangely...
Attachment #8807556 - Flags: review?(jryans) → review+
Comment on attachment 8807555 [details]
Bug 1315044 - Prevent loading multiple loaders and debugger servers when creating multiple ContentActors.

https://reviewboard.mozilla.org/r/90704/#review91662

Nit: ContentActor's -> ContentActors in commit message

Looks good, but would be even better with a test!  Can you use the SDK loader's unload observer notification "sdk:loader:destroy" to build a test for this?
Attachment #8807555 - Flags: review?(jryans) → review+
Also, should add a similar destruction path in the child.js frame script for tab debugging, maybe once all connections are closed, we would shut down the server in the child?
Last three revisions on mozreview:
* address first batch of comments
* rebase
* add the last changeset, to be merged into the first changeset. This adds a test for loader destruction.
Comment on attachment 8810451 [details]
Bug 1315044 - Test loader destruction.

https://reviewboard.mozilla.org/r/92774/#review92820

Thanks, looks like a good addition to the test!

::: devtools/server/tests/mochitest/test_getProcess.html:141
(Diff revision 1)
> +    Services.ppmm.loadProcessScript(script, true);
> +    client.close();
> +
> +    onLoaderDestroyed.then(function () {
> +      Services.ppmm.removeDelayedProcessScript(script);
> +      info("Loader destroyed in the content process\n");

Nit: don't need \n with `info`
Attachment #8810451 - Flags: review?(jryans) → review+
Attachment #8810451 - Attachment is obsolete: true
Pushed by apoirot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f57c3052af5a
Cache ContentActor forms to prevent creating new one when calling RootActor.getProcess multiple times. r=jryans
https://hg.mozilla.org/integration/autoland/rev/eca8f585871b
Prevent loading multiple loaders and debugger servers when creating multiple ContentActors. r=jryans
https://hg.mozilla.org/mozilla-central/rev/f57c3052af5a
https://hg.mozilla.org/mozilla-central/rev/eca8f585871b
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 53
I guess this is going to ride the train with 53.
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.