Closed
Bug 1330018
Opened 7 years ago
Closed 7 years ago
windows intermittently leaked until shutdown via Cpows with e10s-multi
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla53
Tracking | Status | |
---|---|---|
firefox-esr45 | --- | unaffected |
firefox51 | --- | unaffected |
firefox52 | --- | fixed |
firefox53 | --- | fixed |
People
(Reporter: gkrizsanits, Assigned: mccr8)
References
Details
(Keywords: memory-leak, regression, Whiteboard: [e10s-multi:M1][MemShrink:P1])
Attachments
(4 files, 2 obsolete files)
For some reason with multiple content processes on I experienced some tests to leak. After changing one of the leaking test a bit it seems like the problem is with the leak detector. This modified test created a bunch of windows and closes them. I use the BrowserTestUtils.closeWindow which is waiting for the final session storage message from the children (in case of the test one for each window) + "domwindowclosed" event on the parent side. After this we start the GC on the parent side and send messages to the child processes to do the same. Problem is that at this point there is no guarantee that the GC/CC will be able to collect the windows in the child process, we're in the middle of the frame loader destroying, and depending how busy the CPU is, in debug mode it can take a while until we get to the point where it make sense to run GC/CC. Multiple content processes make things worse here but this modified test in debug mode leaks even with single content process (executed with --repeat 5) side notes: - on the parent side we are quite aggressive with GC: http://searchfox.org/mozilla-central/source/testing/mochitest/browser-test.js#644 compared to the child side... should we do more on the child side too? - seems like we don't wait for the child processes finish their GC before launching the next test: http://searchfox.org/mozilla-central/source/testing/mochitest/browser-test.js#647 can that cause a problem?
Reporter | ||
Comment 1•7 years ago
|
||
Reporter | ||
Comment 2•7 years ago
|
||
I'm blocked by this issue to land multiple content processes on nightly, could you please help me figuring out what to do about this issue?
Flags: needinfo?(continuation)
Assignee | ||
Comment 3•7 years ago
|
||
Sure, I can take a look. The basic way this works is that the parent process sends the async browser-test:collect-request message, which the child gets. Then the child runs some GCs synchronously, and prints out a message with its pid. Then the leak detector (which is a Python script that runs on the log) looks for that message, checking that windows created during the test in that process were destroyed before the message. Here's the various pieces of that: http://searchfox.org/mozilla-central/search?q=Completed%20ShutdownLeaks%20collections%20in%20process&path= I wasn't thinking at all about e10s multi when I was converting this test to work with e10s, so it is possible something doesn't work. It is possible that running the same test repeatedly confuses the leak detector, which could explain why you'd see issues with --repeat.
Assignee | ||
Comment 4•7 years ago
|
||
How do I run mochitests locally with e10s multi enabled?
Assignee: nobody → continuation
Flags: needinfo?(continuation) → needinfo?(gkrizsanits)
Comment 5•7 years ago
|
||
I don't know the pref offhand, but you can pass in --setpref <pref>=<value> to the mach command. If that doesn't work for some reason, let me know and just add it to testing/profiles/prefs_general.js in the meantime.
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #3) > Sure, I can take a look. The basic way this works is that the parent process Thanks! > sends the async browser-test:collect-request message, which the child gets. > Then the child runs some GCs synchronously, and prints out a message with > its pid. Then the leak detector (which is a Python script that runs on the > log) looks for that message, checking that windows created during the test > in that process were destroyed before the message. Yeah, I think I've understood that part. I'm just not sure when is the right time to do a GC in the child... if we do it too early something might still hold a reference to the window and we fail to collect, no? I fail to see why BrowserTestUtils way of waiting for those messages/events is a sure way to know that we can finish the test and start the sync GC work. And I think in some cases it is too early. > > I wasn't thinking at all about e10s multi when I was converting this test to > work with e10s, so it is possible something doesn't work. > > It is possible that running the same test repeatedly confuses the leak > detector, which could explain why you'd see issues with --repeat. Oh, I though since we have the windows ID's / raw pointers that should not be a problem but I have not looked into that. The original test fails intermittently if you set the process count to 2 in all.js Some more leaky tests are under bug 1303113. (In reply to Andrew McCreight [:mccr8] from comment #4) > How do I run mochitests locally with e10s multi enabled? dom.ipc.processCount in all.js
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #6) > Yeah, I think I've understood that part. I'm just not sure when is the right > time to do a GC in the child... if we do it too early something might still hold > a reference to the window and we fail to collect, no? I fail to see why BrowserTestUtils > way of waiting for those messages/events is a sure way to know that we can finish > the test and start the sync GC work. And I think in some cases it is too early. I'm not too clear on how we decide to do this, but the current approach has worked for a few years, so I don't think it can be too terrible. The entire point of this test is that if you close all the tabs a tests open, and GC a bunch, the windows the test creates should go away. I think there's even some kind of observer service thing code can listen for to drop stuff if it really needs to. > Oh, I though since we have the windows ID's / raw pointers that should not > be a problem but I have not looked into that. I can reproduce the issue with --repeat. For at least one window, I see the "Completed ShutdownLeaks collections in process" message for the process before the window is even created. Then it never happens again. So we may not be doing the full cleanup procedure in the --repeat case. I looked over the code in leaks.py and it does look like it should be able to handle repeated testing. > The original test fails intermittently if you set the process count to 2 in all.js I'm not able to reproduce a leak with the patch you attached here, and running it alone using ./mach mochitest dom/tests/browser/browser_test_toolbars_visibility.js Does that consistently leak for you, or only intermittently? Also, if you have a log of a full run that fails I can look at that and try to analyze if something is going wrong at the test harness level.
Assignee | ||
Comment 8•7 years ago
|
||
I'll look around in the bugs blocking bug 1303113 and see if I can figure anything out.
Assignee | ||
Comment 9•7 years ago
|
||
Gabor, do you have any steps to reproduce this locally, without --repeat?
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 10•7 years ago
|
||
Here's what a log from bug 1328396 looks like (with many intermediate lines removed): 11:58:39 INFO - TEST-START | toolkit/mozapps/extensions/test/browser/browser_updateid.js 11:58:39 INFO - ++DOMWINDOW == 3 (0x111d64800) [pid = 2536] [serial = 249] [outer = 0x0] 11:58:41 INFO - TEST-OK | toolkit/mozapps/extensions/test/browser/browser_updateid.js | took 2071ms 11:58:46 INFO - Completed ShutdownLeaks collections in process 2536 11:58:53 (some warnings from nsThread::Shutdown() in process 2536, indicating we're probably pretty late in shut down) 11:58:53 INFO - --DOMWINDOW == 0 (0x111d64800) [pid = 2536] [serial = 249] [outer = 0x0] [url = about:blank] So, it looks like we are properly running the ShutdownLeaks collection for this process, and I only see it for this process once, so this isn't leaks.py getting confused. The test is in fact just keeping alive an inner/outer window pair alive until very late in shutdown. Note that the window isn't destroyed until 7 seconds after the shutdown collection, which seems like a long time. Though maybe it actually became garbage quickly, and that's just the next time we ran the GC. I'm not sure how easy it will be to figure out without a way to reproduce it. Maybe rr is the best hope here.
Reporter | ||
Comment 11•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #7) > I'm not too clear on how we decide to do this, but the current approach has > worked for a few years, so I don't think it can be too terrible. The entire > point of this test is that if you close all the tabs a tests open, and GC a > bunch, the windows the test creates should go away. I think there's even > some kind of observer service thing code can listen for to drop stuff if it > really needs to. Yes. The "memory-pressure" thing. Speaking of which, I did some logging for another test (browser_captivePortal_certErrorUI.js), and I think there in the non-leaking case, it's not the GC initiated from the ShutdownLeakCollector that picks up the window but the one that is requested from here from the parent side http://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2360 and then handled here on child side: http://searchfox.org/mozilla-central/source/dom/ipc/ContentChild.cpp#2355 and then starts somewhere an async CC... > I can reproduce the issue with --repeat. For at least one window, I see the > "Completed ShutdownLeaks collections in process" message for the process > before the window is even created. Then it never happens again. So we may > not be doing the full cleanup procedure in the --repeat case. I looked over > the code in leaks.py and it does look like it should be able to handle > repeated testing. Let's forget about the --repeat case then, I created 5 different copy of the test above and run them in a suite (instead of using --repeat on the same test file) and that did not report any leaks... So you're probably right about --repeat. > Does that consistently leak for you, or only intermittently? Also, if you Intermittently, in the most annoying way. I can reproduce it sometimes several times in a row sometimes I cannot reproduce it at all no matter how much I try. You could try browser_captivePortal_certErrorUI.js it is pretty frequent with two content processes, especially for some reason right after a fresh build (./mach build && ./mach mochitest browser/base/content/test/captivePortal/browser_captivePortal_certErrorUI.js) I know it does not make much sense, but the annoying part is that if I can reproduce it then I can usually reproduce it consistently (and the other way around), so it is worth to try a couple of builds. You could also give some extra stress to the CPU while running maybe, or use a VM with single core. > have a log of a full run that fails I can look at that and try to analyze if > something is going wrong at the test harness level. Yeah I'll attach one. It probably does not help much though since if I run a single test, then the window gets cleaned up only during the process shutdown. The output seems alright, just something seems to hold a reference to the window _sometimes_. Btw I tried using setTimeout before the GC (to see if it's just some event), and it helped locally (I wouldn't trust it because of the intermittent nature) but did not solve to problem on try. (In reply to Andrew McCreight [:mccr8] from comment #10) > Here's what a log from bug 1328396 looks like (with many intermediate lines > removed): > > 11:58:39 INFO - TEST-START | > toolkit/mozapps/extensions/test/browser/browser_updateid.js > 11:58:39 INFO - ++DOMWINDOW == 3 (0x111d64800) [pid = 2536] [serial = 249] > [outer = 0x0] > 11:58:41 INFO - TEST-OK | > toolkit/mozapps/extensions/test/browser/browser_updateid.js | took 2071ms > 11:58:46 INFO - Completed ShutdownLeaks collections in process 2536 > 11:58:53 (some warnings from nsThread::Shutdown() in process 2536, > indicating we're probably pretty late in shut down) > 11:58:53 INFO - --DOMWINDOW == 0 (0x111d64800) [pid = 2536] [serial = 249] > [outer = 0x0] [url = about:blank] > > So, it looks like we are properly running the ShutdownLeaks collection for > this process, and I only see it for this process once, so this isn't > leaks.py getting confused. The test is in fact just keeping alive an > inner/outer window pair alive until very late in shutdown. Note that the > window isn't destroyed until 7 seconds after the shutdown collection, which > seems like a long time. Though maybe it actually became garbage quickly, and > that's just the next time we ran the GC. This means that we are able to GC the window before the final cleanup during the process shutdown right? > > I'm not sure how easy it will be to figure out without a way to reproduce > it. Maybe rr is the best hope here. On the try server it might be easier to reproduce it... My problem is that even if I attached a debugger or rr, I don't think I would know what to do / look for. I mean, I don't see what guarantees that after the window.close() the destruction of the DOM tree / JS scopes / docshell tree got to the point where a GC/CC can free the window, other than that it's worked so far with a single content process. If you have some hints for me where to look, I can give it a try.
Flags: needinfo?(gkrizsanits) → needinfo?(continuation)
Reporter | ||
Comment 12•7 years ago
|
||
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11) > Yes. The "memory-pressure" thing. I was thinking of the "shutdown-leaks-before-check" barrier thing in browser-test.js. > can reproduce it then I can usually reproduce it consistently (and the other > way around), so it is worth to try a couple of builds. You could also give > some extra stress to the CPU while running maybe, or use a VM with single > core. That's quite odd! > > have a log of a full run that fails I can look at that and try to analyze if > > something is going wrong at the test harness level. > > Yeah I'll attach one. It probably does not help much though since if I run a > single test, then the window gets cleaned up only during the process > shutdown. The output seems alright, just something seems to hold a reference > to the window _sometimes_. Btw I tried using setTimeout before the GC (to > see if it's just some event), and it helped locally (I wouldn't trust it > because of the intermittent nature) but did not solve to problem on try. Hmm strange. > This means that we are able to GC the window before the final cleanup during > the process shutdown right? Yup. > On the try server it might be easier to reproduce it... My problem is that > even if I attached a debugger or rr, I don't think I would know what to do / look > for. The basic idea is to get a CC log at the point where the window is alive, but you wish it was cleaned up. This will show you that there's some missing reference to an object. Then you look at the releases of that object after that point, and compare it to the known references, to figure out what the missing reference was. Or I have another heap-scanning based approach that does something similar. It is not very easy to do, in a case like this where even reproducing it is hard. > I mean, I don't see what guarantees that after the window.close() the destruction of > the DOM tree / JS scopes / docshell tree got to the point where a GC/CC can free the > window, other than that it's worked so far with a single content process. If you > have some hints for me where to look, I can give it a try. I'm not really sure what the guarantees are. One question is, why is e10s multi different? One possibility is that browser-test.js sets dom.ipc.keepProcessesAlive. It might be worth seeing if setting that causes these leaks even in the regular e10s case, and if not setting it results in leaks in the e10s-multi case (assuming the timeouts are not too much of an issue). Maybe we discard things more aggressively if we're not keeping the process alive? It might be interesting to try running with MOZ_IPC_MESSAGE_LOG=1 and see if there's any obvious difference in messages sent during shutdown. Maybe it is too noisy. I've been looking over the various bugs blocking bug 1303113 to see if I can figure out any kind of pattern. One interesting log I found was in bug 1328374 comment 0. Here the test that creates the leaked window is actually the second-to-last one. The next test does not reuse the same process, so you'd think it would just be sitting around for a while, so weird races would be less of an issue.
Flags: needinfo?(continuation)
Assignee | ||
Updated•7 years ago
|
Whiteboard: [e10s-multi:M1] → [e10s-multi:M1][MemShrink]
Updated•7 years ago
|
Whiteboard: [e10s-multi:M1][MemShrink] → [e10s-multi:M1][MemShrink:P1]
Reporter | ||
Comment 14•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #13) > The basic idea is to get a CC log at the point where the window is alive, > but you wish it was cleaned up. This will show you that there's some missing > reference to an object. Then you look at the releases of that object after > that point, and compare it to the known references, to figure out what the > missing reference was. Or I have another heap-scanning based approach that > does something similar. It is not very easy to do, in a case like this where > even reproducing it is hard. Thanks, I'll try this approach. > One question is, why is e10s multi different? One possibility is that > browser-test.js sets dom.ipc.keepProcessesAlive. It might be worth seeing if > setting that causes these leaks even in the regular e10s case, and if not > setting it results in leaks in the e10s-multi case (assuming the timeouts > are not too much of an issue). Maybe we discard things more aggressively if > we're not keeping the process alive? I suspected that as well, but I could not detect any difference when it's turned on or off. > > It might be interesting to try running with MOZ_IPC_MESSAGE_LOG=1 and see if > there's any obvious difference in messages sent during shutdown. Maybe it is > too noisy. > > I've been looking over the various bugs blocking bug 1303113 to see if I can > figure out any kind of pattern. One interesting log I found was in bug > 1328374 comment 0. Here the test that creates the leaked window is actually > the second-to-last one. The next test does not reuse the same process, so > you'd think it would just be sitting around for a while, so weird races > would be less of an issue. Interesting. On a side note, I wonder if we should just nuke all the windows if a content process is kept alive but the last tab/window is closed that it has been hosting.
Reporter | ||
Comment 15•7 years ago
|
||
I'm trying to get the CC logger to work but it's crashing on me reliably after this scary assertion... Is there something I'm doing wrong? (more errors in the full output) command: --------- MOZ_CC_LOG_ALL=1 MOZ_CC_LOG_PROCESS="all" MOZ_CC_LOG_DIRECTORY="~/Documents/log" MOZ_DISABLE_CONTENT_SANDBOX=1 ./mach mochitest browser/base/content/test/captivePortal/browser_captivePortal_certErrorUI.js stack: ------- Assertion failure: !DebugFileIDs.Contains(aHandle), at /Users/gkrizsanits/Documents/development/mozilla-central/xpcom/build/PoisonIOInterposerBase.cpp:231 #01: MozillaRegisterDebugHandle (PoisonIOInterposerBase.cpp:231, in XUL) #02: MozillaRegisterDebugFD (PoisonIOInterposerBase.cpp:239, in XUL) #03: MozillaRegisterDebugFILE (PoisonIOInterposerBase.cpp:249, in XUL) #04: nsCycleCollectorLogSinkToFile::OpenLog(nsCycleCollectorLogSinkToFile::FileInfo*) (nsCycleCollector.cpp:1687, in XUL) #05: nsCycleCollectorLogSinkToFile::Open(__sFILE**, __sFILE**) (nsCycleCollector.cpp:1579, in XUL) #06: nsCycleCollectorLogger::Begin() (nsCycleCollector.cpp:1832, in XUL) #07: nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (nsCycleCollector.cpp:3828, in XUL) #08: nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (nsCycleCollector.cpp:3651, in XUL) #09: nsCycleCollector_collect(nsICycleCollectorListener*) (nsCycleCollector.cpp:4143, in XUL) #10: nsJSContext::CycleCollectNow(nsICycleCollectorListener*, int) (nsJSEnvironment.cpp:1439, in XUL) #11: nsXPCComponents_Utils::ForceCC(nsICycleCollectorListener*) (XPCComponents.cpp:2582, in XUL)
Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #15) > I'm trying to get the CC logger to work but it's crashing on me reliably > after this scary assertion... Is there something I'm doing wrong? (more > errors in the full output) What OS are you on? We have code that checks if you do IO late in shutdown. The fds used for CC/GC logging are supposed to be whitelisted, but I think bkelly or somebody hit the same issue as you on Windows. Maybe it is broken there? I managed to reproduce the issue with a variation of what you posted, so thanks for putting that up: MOZ_DISABLE_CONTENT_SANDBOX=1 MOZ_CC_LOG_THREAD=main MOZ_CC_LOG_ALL=1 MOZ_CC_LOG_PROCESS=content MOZ_CC_LOG_DIRECTORY="~/logs/multileaks/1" ./mach mochitest -f browser browser/base/content/test/captivePortal/
Assignee | ||
Comment 17•7 years ago
|
||
Ok, so that wasn't useful, but by adding all traces I was able to come up with something more useful. MOZ_CC_ALL_TRACES=all MOZ_DISABLE_CONTENT_SANDBOX=1 MOZ_CC_LOG_THREAD=main MOZ_CC_LOG_ALL=1 MOZ_CC_LOG_PROCESS=content MOZ_CC_LOG_DIRECTORY="~/logs/multileaks/2" ./mach mochitest -f browser browser/base/content/test/captivePortal/ The INFO line after the UNEXPECTED-FAIL gives you the serial number for the leaking windows: windows(s) leaked: [pid = 13260] [serial = 6], [pid = 13260] [serial = 3] You can use that to find out the pointer value of the window by looking earlier in the log: --DOMWINDOW == 1 (0x7f777b714000) [pid = 13260] [serial = 6] ... From this, you can use find_roots.py to see what is holding the window alive: python ~/tools/heapgraph/find_roots.py cc-edges.13260-3.log 0x7f777b714000 This just said that the window reflector for the window (0x7f777c861240) was holding it alive, because it was marked black. You then find out why that is alive in the GC graph (with the -bro option to ignore gray roots that don't result in black marking): python ~/tools/heapgraph/find_roots.py gc-edges.13260-3.log -bro 0x7f777c861240 The output from that is this: via ipc-object : 0x7f777c872240 [Proxy <no private>] --[private]--> 0x7f777c861240 [Window <no private>] This means that the window reflector is held alive via some proxy, and the proxy is held alive by a root named "ipc-object". If you look for that in search fox you get the method IdToObjectMap::trace(). I think this means that a cpow is holding alive the window. I don't know about the lifetime of cpows, unfortunately.
Assignee | ||
Updated•7 years ago
|
Summary: Leak detector gives false positives → windows intermittently leaked until shutdown via Cpows with e10s-multi
Assignee | ||
Comment 18•7 years ago
|
||
Blake, maybe you have some idea of how cleanup of cpows is supposed to work here? I'll try reading the source code in the interim.
Flags: needinfo?(mrbkap)
Assignee | ||
Comment 19•7 years ago
|
||
Also, the GC log analysis tool is slightly flakey when it comes to deciding what a root is, so it is possible that this is a weak root and the tool just can't tell. If that's the case, we should be able to poke around in the logs for the real root.
Assignee | ||
Comment 20•7 years ago
|
||
Assignee | ||
Comment 21•7 years ago
|
||
Bug 1311212 made these references strong "until the next turn of the event loop". Maybe that's related?
Assignee | ||
Comment 22•7 years ago
|
||
If I'm reading the comments correctly, the strong references are only really needed when MOZ_DEBUG_DEAD_CPOWS is set, so maybe I can avoid the strong rooting unless that is set.
Assignee | ||
Comment 23•7 years ago
|
||
Yeah that seems to work. With my patch, I was unable to reproduce the leak in browser browser/base/content/test/captivePortal/ with around 10 repetitions, whereas it was reproducing around 1/3 of the time before.
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(mrbkap)
Assignee | ||
Updated•7 years ago
|
Attachment #8825442 -
Attachment is obsolete: true
Assignee | ||
Comment 24•7 years ago
|
||
try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=99a52be61191eb7cf57d910548d8f62869aed512 try run with e10s multi enabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=33c909d1348ae666e05d6fdb1d2098d3dbe212a4 I'll put my patches up for review even though the try runs have not finished.
Assignee | ||
Updated•7 years ago
|
Component: Mochitest → JavaScript Engine
Product: Testing → Core
Version: Version 3 → Trunk
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Reporter | ||
Comment 27•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #16) > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #15) > > I'm trying to get the CC logger to work but it's crashing on me reliably > > after this scary assertion... Is there something I'm doing wrong? (more > > errors in the full output) > > What OS are you on? We have code that checks if you do IO late in shutdown. > The fds used for CC/GC logging are supposed to be whitelisted, but I think > bkelly or somebody hit the same issue as you on Windows. Maybe it is broken > there? This was on OSX. By the way I find the detailed steps you posted for the leak tracing super useful, next time I won't have to bother you if I have to track down a leak :) Is there any documentation for this? If not we could ask Will to help us put something together... Anyway, thanks a lot for fixing this!
Assignee | ||
Updated•7 years ago
|
Attachment #8826394 -
Attachment is obsolete: true
Attachment #8826394 -
Flags: review?(mrbkap)
Assignee | ||
Updated•7 years ago
|
Attachment #8826395 -
Attachment is obsolete: true
Attachment #8826395 -
Flags: review?(mrbkap)
Assignee | ||
Updated•7 years ago
|
Attachment #8826394 -
Attachment is obsolete: false
Attachment #8826395 -
Attachment is obsolete: false
Assignee | ||
Comment 28•7 years ago
|
||
Ryan said that the original patch fixed a bunch of intermittent CPOW failures, so I guess I can't just revert it. I can probably add some CPOW kill switch that we fire off right before we do the ShutdownLeaks collection in the child.
Assignee | ||
Comment 29•7 years ago
|
||
> Is there any documentation for this?
Not really. I've been slowly working on writing some up, but I should really finish that off.
Anyways, I spent some more time looking into this, and talking it over with Bill, and what seems to be happening here is that under some circumstances the message manager gets a message, but then doesn't unwrap the CPOWs (like if nobody is listening for messages or there's no frame loader, which I could imagine happening after all tabs are removed maybe).
Without the unwrap, the parent is never aware that the CPOW was sent to it, so nextCPOWNumber_ never gets set, so we end up keeping alive the CPOW until another one is created. Bill said that in some other circumstances (I think when a CPOW is sent in the other direction) the CPOW is strongly held for real, so you'd actually end up leaking something forever.
We want to make sure the CPOWs are always unwrapped, even if we're not going to use them (I had a hacky version that just did the nextCPOWNumber_ thing, but Bill said due to the other problem we should just always unwrap them fully.)
Assignee | ||
Comment 30•7 years ago
|
||
I have two followups. First, in 6 places, such as ContentChild::RecvAsyncMessage, the CrossProcessCpowHolder is not created unless a message manager exists. Second, my patch only deals with the case where we send a CpowEntry, but there are many other ways we can send a RemoteObject, so I should audit those to see if a similar failure to unwrap can occur. try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=82eaf33b73f19bd4e7575ad4aa8befdd034d0456
Assignee | ||
Comment 31•7 years ago
|
||
I'm leaving those to followups because they don't seem to be needed to fix the intermittent leak issue with e10s-multi, which is my main priority. Moving the holders is simple, but I worry it may introduce weird null pointer crashes, so I want to separate it. Auditing will require a bit of time. I looked at a few cases, and they seemed fine to me, so hopefully nothing will need to be done for the other acses.
Assignee | ||
Comment 32•7 years ago
|
||
This is a regression from bug 1311212. Bug 1331647 suggests this can result in leaks for users.
status-firefox51:
--- → unaffected
status-firefox52:
--- → affected
status-firefox53:
--- → affected
status-firefox-esr45:
--- → unaffected
Assignee | ||
Updated•7 years ago
|
Keywords: mlk,
regression
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8826395 -
Attachment is obsolete: true
Assignee | ||
Comment 34•7 years ago
|
||
I'm asking for review from Bobby as XPConnect module owner to approve the additional use of the junk scope. The basic idea here is that we need to go through the steps of creating this JS object and destroying it to avoid a leak, but the object is then immediately garbage, and we may not have a handy global sitting around to allocate it in.
Comment 35•7 years ago
|
||
mozreview-review |
Comment on attachment 8826394 [details] Bug 1330018 - Ensure we always unwrap CpowEntries. https://reviewboard.mozilla.org/r/104334/#review106088 ::: js/ipc/JavaScriptShared.cpp:674 (Diff revision 2) > js_ = managerGetter->GetCPOWManager(); > } > > +CrossProcessCpowHolder::~CrossProcessCpowHolder() > +{ > + if (cpows_.Length() && !unwrapped_) { Please comment that this should only happen if a message manager message gets ignored somehow. We just need to create the CPOW on our side so that it can be GCed on the other side. ::: js/ipc/JavaScriptShared.cpp:676 (Diff revision 2) > > +CrossProcessCpowHolder::~CrossProcessCpowHolder() > +{ > + if (cpows_.Length() && !unwrapped_) { > + AutoJSAPI jsapi; > + if (!jsapi.Init(xpc::PrivilegedJunkScope())) Perhaps comment that we're creating an object that will immediately become garbage, so the scope doesn't matter too much.
Attachment #8826394 -
Flags: review?(wmccloskey) → review+
Comment 36•7 years ago
|
||
mozreview-review |
Comment on attachment 8826394 [details] Bug 1330018 - Ensure we always unwrap CpowEntries. https://reviewboard.mozilla.org/r/104334/#review106094 Fine by me. Thanks for flagging, though if there's anyone I trust to use the junk scopes responsibly, it's you and bill. :-)
Attachment #8826394 -
Flags: review?(bobbyholley) → review+
Comment hidden (mozreview-request) |
Comment 38•7 years ago
|
||
Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9ca73fb60c36 Ensure we always unwrap CpowEntries. r=bholley,billm
Comment 39•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9ca73fb60c36
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Assignee | ||
Comment 40•7 years ago
|
||
Comment on attachment 8826394 [details] Bug 1330018 - Ensure we always unwrap CpowEntries. Approval Request Comment [Feature/Bug causing the regression]: bug 1311212 [User impact if declined]: bad leaks, though maybe only rarely (bug 1331647 is an example of a user leak that looks like this) [Is this code covered by automated tests?]: yes, though without e10s multi enabled I'm not sure how much it has been tested. [Has the fix been verified in Nightly?]: it only landed on m-c earlier today [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: not really [Why is the change risky/not risky?]: it just creates and destroys some objects in rare cases, rather than leaks, so the code running shouldn't be too different. [String changes made/needed]: none
Attachment #8826394 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 41•7 years ago
|
||
I did a bunch of retriggers for the e10s-multi try run in comment 30 without any leaks, so I'm going to optimistically assume this fixes them all.
Comment 43•7 years ago
|
||
Comment on attachment 8826394 [details] Bug 1330018 - Ensure we always unwrap CpowEntries. fix leaks with e10s-multi, aurora52+
Attachment #8826394 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 44•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/5a6e8c89b306
You need to log in
before you can comment on or make changes to this bug.
Description
•