Closed Bug 1330018 Opened 7 years ago Closed 7 years ago

windows intermittently leaked until shutdown via Cpows with e10s-multi

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- unaffected
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: gkrizsanits, Assigned: mccr8)

References

Details

(Keywords: memory-leak, regression, Whiteboard: [e10s-multi:M1][MemShrink:P1])

Attachments

(4 files, 2 obsolete files)

For some reason with multiple content processes on I experienced some tests to leak. After changing one of the leaking test a bit it seems like the problem is with the leak detector.

This modified test created a bunch of windows and closes them. I use the BrowserTestUtils.closeWindow which is waiting for the final session storage message from the children (in case of the test one for each window) + "domwindowclosed" event on the parent side. After this we start the GC on the parent side and send messages to the child processes to do the same.

Problem is that at this point there is no guarantee that the GC/CC will be able to collect the windows in the child process, we're in the middle of the frame loader destroying, and depending how busy the CPU is, in debug mode it can take a while until we get to the point where it make sense to run GC/CC.

Multiple content processes make things worse here but this modified test in debug mode leaks even with single content process (executed with --repeat 5)

side notes:
- on the parent side we are quite aggressive with GC: http://searchfox.org/mozilla-central/source/testing/mochitest/browser-test.js#644
compared to the child side... should we do more on the child side too?

- seems like we don't wait for the child processes finish their GC before launching the next test: http://searchfox.org/mozilla-central/source/testing/mochitest/browser-test.js#647 can that cause a problem?
Attached patch leaky test (obsolete) — — Splinter Review
I'm blocked by this issue to land multiple content processes on nightly, could you please help me figuring out what to do about this issue?
Flags: needinfo?(continuation)
Blocks: 1303113
Whiteboard: [e10s-multi:M1]
Sure, I can take a look. The basic way this works is that the parent process sends the async browser-test:collect-request message, which the child gets. Then the child runs some GCs synchronously, and prints out a message with its pid. Then the leak detector (which is a Python script that runs on the log) looks for that message, checking that windows created during the test in that process were destroyed before the message.

Here's the various pieces of that: http://searchfox.org/mozilla-central/search?q=Completed%20ShutdownLeaks%20collections%20in%20process&path=

I wasn't thinking at all about e10s multi when I was converting this test to work with e10s, so it is possible something doesn't work.

It is possible that running the same test repeatedly confuses the leak detector, which could explain why you'd see issues with --repeat.
How do I run mochitests locally with e10s multi enabled?
Assignee: nobody → continuation
Flags: needinfo?(continuation) → needinfo?(gkrizsanits)
I don't know the pref offhand, but you can pass in --setpref <pref>=<value> to the mach command. If that doesn't work for some reason, let me know and just add it to testing/profiles/prefs_general.js in the meantime.
(In reply to Andrew McCreight [:mccr8] from comment #3)
> Sure, I can take a look. The basic way this works is that the parent process

Thanks!

> sends the async browser-test:collect-request message, which the child gets.
> Then the child runs some GCs synchronously, and prints out a message with
> its pid. Then the leak detector (which is a Python script that runs on the
> log) looks for that message, checking that windows created during the test
> in that process were destroyed before the message.

Yeah, I think I've understood that part. I'm just not sure when is the right time
to do a GC in the child... if we do it too early something might still hold a reference
to the window and we fail to collect, no? I fail to see why BrowserTestUtils way of
waiting for those messages/events is a sure way to know that we can finish the test and start
the sync GC work. And I think in some cases it is too early.

> 
> I wasn't thinking at all about e10s multi when I was converting this test to
> work with e10s, so it is possible something doesn't work.
> 
> It is possible that running the same test repeatedly confuses the leak
> detector, which could explain why you'd see issues with --repeat.

Oh, I though since we have the windows ID's / raw pointers that should not be a problem
but I have not looked into that. The original test fails intermittently if you set the process
count to 2 in all.js

Some more leaky tests are under bug 1303113.

(In reply to Andrew McCreight [:mccr8] from comment #4)
> How do I run mochitests locally with e10s multi enabled?
dom.ipc.processCount in all.js
Flags: needinfo?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #6)
> Yeah, I think I've understood that part. I'm just not sure when is the right
> time to do a GC in the child... if we do it too early something might still hold
> a reference to the window and we fail to collect, no? I fail to see why BrowserTestUtils
> way of waiting for those messages/events is a sure way to know that we can finish
> the test and start the sync GC work. And I think in some cases it is too early.

I'm not too clear on how we decide to do this, but the current approach has worked for a few years, so I don't think it can be too terrible. The entire point of this test is that if you close all the tabs a tests open, and GC a bunch, the windows the test creates should go away. I think there's even some kind of observer service thing code can listen for to drop stuff if it really needs to.

> Oh, I though since we have the windows ID's / raw pointers that should not
> be a problem but I have not looked into that.

I can reproduce the issue with --repeat. For at least one window, I see the "Completed ShutdownLeaks collections in process" message for the process before the window is even created. Then it never happens again. So we may not be doing the full cleanup procedure in the --repeat case. I looked over the code in leaks.py and it does look like it should be able to handle repeated testing.

> The original test fails intermittently if you set the process count to 2 in all.js

I'm not able to reproduce a leak with the patch you attached here, and running it alone using
  ./mach mochitest dom/tests/browser/browser_test_toolbars_visibility.js

Does that consistently leak for you, or only intermittently? Also, if you have a log of a full run that fails I can look at that and try to analyze if something is going wrong at the test harness level.
I'll look around in the bugs blocking bug 1303113 and see if I can figure anything out.
Depends on: 1330085
No longer depends on: 1330085
Gabor, do you have any steps to reproduce this locally, without --repeat?
Flags: needinfo?(gkrizsanits)
Here's what a log from bug 1328396 looks like (with many intermediate lines removed):

11:58:39 INFO - TEST-START | toolkit/mozapps/extensions/test/browser/browser_updateid.js
11:58:39 INFO - ++DOMWINDOW == 3 (0x111d64800) [pid = 2536] [serial = 249] [outer = 0x0]
11:58:41 INFO - TEST-OK | toolkit/mozapps/extensions/test/browser/browser_updateid.js | took 2071ms
11:58:46 INFO - Completed ShutdownLeaks collections in process 2536
11:58:53 (some warnings from nsThread::Shutdown() in process 2536, indicating we're probably pretty late in shut down)
11:58:53 INFO - --DOMWINDOW == 0 (0x111d64800) [pid = 2536] [serial = 249] [outer = 0x0] [url = about:blank]

So, it looks like we are properly running the ShutdownLeaks collection for this process, and I only see it for this process once, so this isn't leaks.py getting confused. The test is in fact just keeping alive an inner/outer window pair alive until very late in shutdown. Note that the window isn't destroyed until 7 seconds after the shutdown collection, which seems like a long time. Though maybe it actually became garbage quickly, and that's just the next time we ran the GC.

I'm not sure how easy it will be to figure out without a way to reproduce it. Maybe rr is the best hope here.
(In reply to Andrew McCreight [:mccr8] from comment #7)
> I'm not too clear on how we decide to do this, but the current approach has
> worked for a few years, so I don't think it can be too terrible. The entire
> point of this test is that if you close all the tabs a tests open, and GC a
> bunch, the windows the test creates should go away. I think there's even
> some kind of observer service thing code can listen for to drop stuff if it
> really needs to.

Yes. The "memory-pressure" thing. Speaking of which, I did some logging for another
test (browser_captivePortal_certErrorUI.js), and I think there
in the non-leaking case, it's not the GC initiated from the ShutdownLeakCollector that
picks up the window but the one that is requested from here from the parent side
http://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2360
and then handled here on child side:
http://searchfox.org/mozilla-central/source/dom/ipc/ContentChild.cpp#2355
and then starts somewhere an async CC...

> I can reproduce the issue with --repeat. For at least one window, I see the
> "Completed ShutdownLeaks collections in process" message for the process
> before the window is even created. Then it never happens again. So we may
> not be doing the full cleanup procedure in the --repeat case. I looked over
> the code in leaks.py and it does look like it should be able to handle
> repeated testing.

Let's forget about the --repeat case then, I created 5 different copy of the
test above and run them in a suite (instead of using --repeat on the same test file)
and that did not report any leaks... So you're probably right about --repeat.

> Does that consistently leak for you, or only intermittently? Also, if you

Intermittently, in the most annoying way. I can reproduce it sometimes several times in a row
sometimes I cannot reproduce it at all no matter how much I try. You could try browser_captivePortal_certErrorUI.js it is pretty frequent with two content processes, especially for some reason right after a fresh build (./mach build && ./mach mochitest browser/base/content/test/captivePortal/browser_captivePortal_certErrorUI.js) I know it does not make much sense, but the annoying part is that if I can reproduce it then I can usually reproduce it consistently (and the other way around), so it is worth to try a couple of builds. You could also give some extra stress to the CPU while running maybe, or use a VM with single core.

> have a log of a full run that fails I can look at that and try to analyze if
> something is going wrong at the test harness level.

Yeah I'll attach one. It probably does not help much though since if I run a single test, then the window gets cleaned up only during the process shutdown. The output seems alright, just something seems to hold a reference to the window _sometimes_. Btw I tried using setTimeout before the GC (to see if it's just some event), and it helped locally (I wouldn't trust it because of the intermittent nature) but did not solve to problem on try.

(In reply to Andrew McCreight [:mccr8] from comment #10)
> Here's what a log from bug 1328396 looks like (with many intermediate lines
> removed):
> 
> 11:58:39 INFO - TEST-START |
> toolkit/mozapps/extensions/test/browser/browser_updateid.js
> 11:58:39 INFO - ++DOMWINDOW == 3 (0x111d64800) [pid = 2536] [serial = 249]
> [outer = 0x0]
> 11:58:41 INFO - TEST-OK |
> toolkit/mozapps/extensions/test/browser/browser_updateid.js | took 2071ms
> 11:58:46 INFO - Completed ShutdownLeaks collections in process 2536
> 11:58:53 (some warnings from nsThread::Shutdown() in process 2536,
> indicating we're probably pretty late in shut down)
> 11:58:53 INFO - --DOMWINDOW == 0 (0x111d64800) [pid = 2536] [serial = 249]
> [outer = 0x0] [url = about:blank]
> 
> So, it looks like we are properly running the ShutdownLeaks collection for
> this process, and I only see it for this process once, so this isn't
> leaks.py getting confused. The test is in fact just keeping alive an
> inner/outer window pair alive until very late in shutdown. Note that the
> window isn't destroyed until 7 seconds after the shutdown collection, which
> seems like a long time. Though maybe it actually became garbage quickly, and
> that's just the next time we ran the GC.

This means that we are able to GC the window before the final cleanup during the
process shutdown right?

> 
> I'm not sure how easy it will be to figure out without a way to reproduce
> it. Maybe rr is the best hope here.

On the try server it might be easier to reproduce it... My problem is that even if
I attached a debugger or rr, I don't think I would know what to do / look for. I mean,
I don't see what guarantees that after the window.close() the destruction of the DOM
tree / JS scopes / docshell tree got to the point where a GC/CC can free the window,
other than that it's worked so far with a single content process. If you have some
hints for me where to look, I can give it a try.
Flags: needinfo?(gkrizsanits) → needinfo?(continuation)
Attached file leakoutput —
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> Yes. The "memory-pressure" thing.

I was thinking of the "shutdown-leaks-before-check" barrier thing in browser-test.js.

> can reproduce it then I can usually reproduce it consistently (and the other
> way around), so it is worth to try a couple of builds. You could also give
> some extra stress to the CPU while running maybe, or use a VM with single
> core.

That's quite odd!

> > have a log of a full run that fails I can look at that and try to analyze if
> > something is going wrong at the test harness level.
> 
> Yeah I'll attach one. It probably does not help much though since if I run a
> single test, then the window gets cleaned up only during the process
> shutdown. The output seems alright, just something seems to hold a reference
> to the window _sometimes_. Btw I tried using setTimeout before the GC (to
> see if it's just some event), and it helped locally (I wouldn't trust it
> because of the intermittent nature) but did not solve to problem on try.

Hmm strange.

> This means that we are able to GC the window before the final cleanup during
> the process shutdown right?

Yup.

> On the try server it might be easier to reproduce it... My problem is that
> even if I attached a debugger or rr, I don't think I would know what to do / look
> for.

The basic idea is to get a CC log at the point where the window is alive, but you wish it was cleaned up. This will show you that there's some missing reference to an object. Then you look at the releases of that object after that point, and compare it to the known references, to figure out what the missing reference was. Or I have another heap-scanning based approach that does something similar. It is not very easy to do, in a case like this where even reproducing it is hard.

> I mean, I don't see what guarantees that after the window.close() the destruction of
> the DOM tree / JS scopes / docshell tree got to the point where a GC/CC can free the
> window, other than that it's worked so far with a single content process. If you
> have some hints for me where to look, I can give it a try.

I'm not really sure what the guarantees are.

One question is, why is e10s multi different? One possibility is that browser-test.js sets dom.ipc.keepProcessesAlive. It might be worth seeing if setting that causes these leaks even in the regular e10s case, and if not setting it results in leaks in the e10s-multi case (assuming the timeouts are not too much of an issue). Maybe we discard things more aggressively if we're not keeping the process alive?

It might be interesting to try running with MOZ_IPC_MESSAGE_LOG=1 and see if there's any obvious difference in messages sent during shutdown. Maybe it is too noisy.

I've been looking over the various bugs blocking bug 1303113 to see if I can figure out any kind of pattern. One interesting log I found was in bug 1328374 comment 0. Here the test that creates the leaked window is actually the second-to-last one. The next test does not reuse the same process, so you'd think it would just be sitting around for a while, so weird races would be less of an issue.
Flags: needinfo?(continuation)
Whiteboard: [e10s-multi:M1] → [e10s-multi:M1][MemShrink]
Whiteboard: [e10s-multi:M1][MemShrink] → [e10s-multi:M1][MemShrink:P1]
(In reply to Andrew McCreight [:mccr8] from comment #13)
> The basic idea is to get a CC log at the point where the window is alive,
> but you wish it was cleaned up. This will show you that there's some missing
> reference to an object. Then you look at the releases of that object after
> that point, and compare it to the known references, to figure out what the
> missing reference was. Or I have another heap-scanning based approach that
> does something similar. It is not very easy to do, in a case like this where
> even reproducing it is hard.

Thanks, I'll try this approach.

> One question is, why is e10s multi different? One possibility is that
> browser-test.js sets dom.ipc.keepProcessesAlive. It might be worth seeing if
> setting that causes these leaks even in the regular e10s case, and if not
> setting it results in leaks in the e10s-multi case (assuming the timeouts
> are not too much of an issue). Maybe we discard things more aggressively if
> we're not keeping the process alive?

I suspected that as well, but I could not detect any difference when it's turned on
or off.

> 
> It might be interesting to try running with MOZ_IPC_MESSAGE_LOG=1 and see if
> there's any obvious difference in messages sent during shutdown. Maybe it is
> too noisy.
> 
> I've been looking over the various bugs blocking bug 1303113 to see if I can
> figure out any kind of pattern. One interesting log I found was in bug
> 1328374 comment 0. Here the test that creates the leaked window is actually
> the second-to-last one. The next test does not reuse the same process, so
> you'd think it would just be sitting around for a while, so weird races
> would be less of an issue.

Interesting. On a side note, I wonder if we should just nuke all the windows if
a content process is kept alive but the last tab/window is closed that it has
been hosting.
Attached file cclogcrash.txt —
I'm trying to get the CC logger to work but it's crashing on me reliably after this scary assertion... Is there something I'm doing wrong? (more errors in the full output)

command:
---------
MOZ_CC_LOG_ALL=1 MOZ_CC_LOG_PROCESS="all" MOZ_CC_LOG_DIRECTORY="~/Documents/log" MOZ_DISABLE_CONTENT_SANDBOX=1 ./mach mochitest browser/base/content/test/captivePortal/browser_captivePortal_certErrorUI.js


stack:
-------
Assertion failure: !DebugFileIDs.Contains(aHandle), at /Users/gkrizsanits/Documents/development/mozilla-central/xpcom/build/PoisonIOInterposerBase.cpp:231
#01: MozillaRegisterDebugHandle (PoisonIOInterposerBase.cpp:231, in XUL)
#02: MozillaRegisterDebugFD (PoisonIOInterposerBase.cpp:239, in XUL)
#03: MozillaRegisterDebugFILE (PoisonIOInterposerBase.cpp:249, in XUL)
#04: nsCycleCollectorLogSinkToFile::OpenLog(nsCycleCollectorLogSinkToFile::FileInfo*) (nsCycleCollector.cpp:1687, in XUL)
#05: nsCycleCollectorLogSinkToFile::Open(__sFILE**, __sFILE**) (nsCycleCollector.cpp:1579, in XUL)
#06: nsCycleCollectorLogger::Begin() (nsCycleCollector.cpp:1832, in XUL)
#07: nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (nsCycleCollector.cpp:3828, in XUL)
#08: nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (nsCycleCollector.cpp:3651, in XUL)
#09: nsCycleCollector_collect(nsICycleCollectorListener*) (nsCycleCollector.cpp:4143, in XUL)
#10: nsJSContext::CycleCollectNow(nsICycleCollectorListener*, int) (nsJSEnvironment.cpp:1439, in XUL)
#11: nsXPCComponents_Utils::ForceCC(nsICycleCollectorListener*) (XPCComponents.cpp:2582, in XUL)
Depends on: 1330712
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #15)
> I'm trying to get the CC logger to work but it's crashing on me reliably
> after this scary assertion... Is there something I'm doing wrong? (more
> errors in the full output)

What OS are you on? We have code that checks if you do IO late in shutdown. The fds used for CC/GC logging are supposed to be whitelisted, but I think bkelly or somebody hit the same issue as you on Windows. Maybe it is broken there?

I managed to reproduce the issue with a variation of what you posted, so thanks for putting that up:
MOZ_DISABLE_CONTENT_SANDBOX=1 MOZ_CC_LOG_THREAD=main MOZ_CC_LOG_ALL=1 MOZ_CC_LOG_PROCESS=content MOZ_CC_LOG_DIRECTORY="~/logs/multileaks/1" ./mach mochitest -f browser browser/base/content/test/captivePortal/
Ok, so that wasn't useful, but by adding all traces I was able to come up with something more useful.

MOZ_CC_ALL_TRACES=all MOZ_DISABLE_CONTENT_SANDBOX=1 MOZ_CC_LOG_THREAD=main MOZ_CC_LOG_ALL=1 MOZ_CC_LOG_PROCESS=content MOZ_CC_LOG_DIRECTORY="~/logs/multileaks/2" ./mach mochitest -f browser browser/base/content/test/captivePortal/

The INFO line after the UNEXPECTED-FAIL gives you the serial number for the leaking windows:
  windows(s) leaked: [pid = 13260] [serial = 6], [pid = 13260] [serial = 3]
You can use that to find out the pointer value of the window by looking earlier in the log:
  --DOMWINDOW == 1 (0x7f777b714000) [pid = 13260] [serial = 6]  ...
From this, you can use find_roots.py to see what is holding the window alive:
  python ~/tools/heapgraph/find_roots.py cc-edges.13260-3.log 0x7f777b714000
This just said that the window reflector for the window (0x7f777c861240) was holding it alive, because it was marked black. You then find out why that is alive in the GC graph (with the -bro option to ignore gray roots that don't result in black marking):
  python ~/tools/heapgraph/find_roots.py gc-edges.13260-3.log -bro 0x7f777c861240
The output from that is this:
  via ipc-object :
  0x7f777c872240 [Proxy <no private>]
      --[private]--> 0x7f777c861240 [Window <no private>]

This means that the window reflector is held alive via some proxy, and the proxy is held alive by a root named "ipc-object". If you look for that in search fox you get the method IdToObjectMap::trace(). I think this means that a cpow is holding alive the window.

I don't know about the lifetime of cpows, unfortunately.
Summary: Leak detector gives false positives → windows intermittently leaked until shutdown via Cpows with e10s-multi
Blake, maybe you have some idea of how cleanup of cpows is supposed to work here? I'll try reading the source code in the interim.
Flags: needinfo?(mrbkap)
Also, the GC log analysis tool is slightly flakey when it comes to deciding what a root is, so it is possible that this is a weak root and the tool just can't tell. If that's the case, we should be able to poke around in the logs for the real root.
Attached file GC log for 0x7f777c861240 —
Bug 1311212 made these references strong "until the next turn of the event loop". Maybe that's related?
If I'm reading the comments correctly, the strong references are only really needed when MOZ_DEBUG_DEAD_CPOWS is set, so maybe I can avoid the strong rooting unless that is set.
Yeah that seems to work. With my patch, I was unable to reproduce the leak in browser browser/base/content/test/captivePortal/ with around 10 repetitions, whereas it was reproducing around 1/3 of the time before.
Flags: needinfo?(mrbkap)
Attachment #8825442 - Attachment is obsolete: true
Component: Mochitest → JavaScript Engine
Product: Testing → Core
Version: Version 3 → Trunk
(In reply to Andrew McCreight [:mccr8] from comment #16)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #15)
> > I'm trying to get the CC logger to work but it's crashing on me reliably
> > after this scary assertion... Is there something I'm doing wrong? (more
> > errors in the full output)
> 
> What OS are you on? We have code that checks if you do IO late in shutdown.
> The fds used for CC/GC logging are supposed to be whitelisted, but I think
> bkelly or somebody hit the same issue as you on Windows. Maybe it is broken
> there?

This was on OSX.

By the way I find the detailed steps you posted for the leak tracing super
useful, next time I won't have to bother you if I have to track down a leak :)
Is there any documentation for this? If not we could ask Will to help us put
something together...

Anyway, thanks a lot for fixing this!
Attachment #8826394 - Attachment is obsolete: true
Attachment #8826394 - Flags: review?(mrbkap)
Attachment #8826395 - Attachment is obsolete: true
Attachment #8826395 - Flags: review?(mrbkap)
Attachment #8826394 - Attachment is obsolete: false
Attachment #8826395 - Attachment is obsolete: false
Ryan said that the original patch fixed a bunch of intermittent CPOW failures, so I guess I can't just revert it. I can probably add some CPOW kill switch that we fire off right before we do the ShutdownLeaks collection in the child.
> Is there any documentation for this?

Not really. I've been slowly working on writing some up, but I should really finish that off.

Anyways, I spent some more time looking into this, and talking it over with Bill, and what seems to be happening here is that under some circumstances the message manager gets a message, but then doesn't unwrap the CPOWs (like if nobody is listening for messages or there's no frame loader, which I could imagine happening after all tabs are removed maybe).

Without the unwrap, the parent is never aware that the CPOW was sent to it, so nextCPOWNumber_ never gets set, so we end up keeping alive the CPOW until another one is created. Bill said that in some other circumstances (I think when a CPOW is sent in the other direction) the CPOW is strongly held for real, so you'd actually end up leaking something forever.

We want to make sure the CPOWs are always unwrapped, even if we're not going to use them (I had a hacky version that just did the nextCPOWNumber_ thing, but Bill said due to the other problem we should just always unwrap them fully.)
Blocks: 1331647
I have two followups. First, in 6 places, such as ContentChild::RecvAsyncMessage, the CrossProcessCpowHolder is not created unless a message manager exists. Second, my patch only deals with the case where we send a CpowEntry, but there are many other ways we can send a RemoteObject, so I should audit those to see if a similar failure to unwrap can occur.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=82eaf33b73f19bd4e7575ad4aa8befdd034d0456
I'm leaving those to followups because they don't seem to be needed to fix the intermittent leak issue with e10s-multi, which is my main priority. Moving the holders is simple, but I worry it may introduce weird null pointer crashes, so I want to separate it. Auditing will require a bit of time. I looked at a few cases, and they seemed fine to me, so hopefully nothing will need to be done for the other acses.
This is a regression from bug 1311212. Bug 1331647 suggests this can result in leaks for users.
Keywords: mlk, regression
Attachment #8826395 - Attachment is obsolete: true
I'm asking for review from Bobby as XPConnect module owner to approve the additional use of the junk scope. The basic idea here is that we need to go through the steps of creating this JS object and destroying it to avoid a leak, but the object is then immediately garbage, and we may not have a handy global sitting around to allocate it in.
Blocks: 1331743
See Also: → 1331748
Comment on attachment 8826394 [details]
Bug 1330018 - Ensure we always unwrap CpowEntries.

https://reviewboard.mozilla.org/r/104334/#review106088

::: js/ipc/JavaScriptShared.cpp:674
(Diff revision 2)
>          js_ = managerGetter->GetCPOWManager();
>  }
>  
> +CrossProcessCpowHolder::~CrossProcessCpowHolder()
> +{
> +    if (cpows_.Length() && !unwrapped_) {

Please comment that this should only happen if a message manager message gets ignored somehow. We just need to create the CPOW on our side so that it can be GCed on the other side.

::: js/ipc/JavaScriptShared.cpp:676
(Diff revision 2)
>  
> +CrossProcessCpowHolder::~CrossProcessCpowHolder()
> +{
> +    if (cpows_.Length() && !unwrapped_) {
> +        AutoJSAPI jsapi;
> +        if (!jsapi.Init(xpc::PrivilegedJunkScope()))

Perhaps comment that we're creating an object that will immediately become garbage, so the scope doesn't matter too much.
Attachment #8826394 - Flags: review?(wmccloskey) → review+
Comment on attachment 8826394 [details]
Bug 1330018 - Ensure we always unwrap CpowEntries.

https://reviewboard.mozilla.org/r/104334/#review106094

Fine by me. Thanks for flagging, though if there's anyone I trust to use the junk scopes responsibly, it's you and bill. :-)
Attachment #8826394 - Flags: review?(bobbyholley) → review+
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9ca73fb60c36
Ensure we always unwrap CpowEntries. r=bholley,billm
https://hg.mozilla.org/mozilla-central/rev/9ca73fb60c36
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Blocks: 1320395
Blocks: 1311212
Comment on attachment 8826394 [details]
Bug 1330018 - Ensure we always unwrap CpowEntries.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1311212
[User impact if declined]: bad leaks, though maybe only rarely (bug 1331647 is an example of a user leak that looks like this)
[Is this code covered by automated tests?]: yes, though without e10s multi enabled I'm not sure how much it has been tested.
[Has the fix been verified in Nightly?]: it only landed on m-c earlier today
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: not really
[Why is the change risky/not risky?]: it just creates and destroys some objects in rare cases, rather than leaks, so the code running shouldn't be too different.
[String changes made/needed]: none
Attachment #8826394 - Flags: approval-mozilla-aurora?
I did a bunch of retriggers for the e10s-multi try run in comment 30 without any leaks, so I'm going to optimistically assume this fixes them all.
No longer blocks: 1331647
Comment on attachment 8826394 [details]
Bug 1330018 - Ensure we always unwrap CpowEntries.

fix leaks with e10s-multi, aurora52+
Attachment #8826394 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: