Closed
Bug 1035454
Opened 10 years ago
Closed 10 years ago
Child DOMApplicationRegistry never sends "Webapps:UnregisterForMessages" to parent, causes parent leak and crashes debug content processes
Categories
(Core Graveyard :: DOM: Apps, defect)
Core Graveyard
DOM: Apps
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla36
People
(Reporter: bent.mozilla, Assigned: bent.mozilla)
References
Details
Attachments
(5 files)
880 bytes,
patch
|
fabrice
:
review+
|
Details | Diff | Splinter Review |
1.23 KB,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
1.57 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
1.04 KB,
patch
|
jst
:
review+
|
Details | Diff | Splinter Review |
1.79 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
The DOMApplicationRegistry in AppsServiceChild.jsm tries to send "Webapps:UnregisterForMessages" in response to the "xpcom-shutdown" notification. This doesn't ever succeed for two reasons: 1. We never run XPCOM shutdown in non-DEBUG builds. See http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentChild.cpp#1367 2. In DEBUG builds the "xpcom-shutdown" notification happens after the connection to the parent process has already been severed. Any attempt to send a message to the parent will fail and trigger a child process crash. We need to make sure that any cleanup is listening to the proper notification in the parent process rather than relying on messages from the child. The child must be expected to die at any time.
Comment 1•10 years ago
|
||
Ben, we listen to child-process-shutdown that triggers a cleanup of all the listeners associated to this child from http://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.jsm#1140 So is that enough to just remove the useless code in AppsServiceChild.jsm ?
Assignee | ||
Comment 2•10 years ago
|
||
Probably!
Assignee | ||
Comment 3•10 years ago
|
||
Like this?
Assignee: fabrice → bent.mozilla
Status: NEW → ASSIGNED
Attachment #8468616 -
Flags: review?(fabrice)
Updated•10 years ago
|
Attachment #8468616 -
Flags: review?(fabrice) → review+
Assignee | ||
Comment 4•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/db69c702369d
Ben, I just happened to see this. I think the "leak" this got backed out for is pre-existing. Previously we always crashed before getting a chance to report leaks in the child. I think it probably makes sense to disable leak checking for child processes until we can get the leaks fixed.
Comment 6•10 years ago
|
||
Sorry, forgot to comment here about the backout yesterday. https://hg.mozilla.org/integration/mozilla-inbound/rev/e13073de5179 I was seeing a lot of B2G debug crashes as well, so please give this a |try: -b d -p all -u all -t none| run before re-landing.
Assignee | ||
Comment 7•10 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #5) > I think it probably makes sense to disable leak > checking for child processes until we can get the leaks fixed. Is there a bug/patch for this yet?
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #7) > (In reply to Bill McCloskey (:billm) from comment #5) > > I think it probably makes sense to disable leak > > checking for child processes until we can get the leaks fixed. > > Is there a bug/patch for this yet? Filed bug 1052224.
Comment 9•10 years ago
|
||
I have a patch to disable leak checking for child processes, but even with that, there is some kind of shutdown assertion failure in GMP in M3: https://tbpl.mozilla.org/?tree=Try&rev=0ff144b4e0d8
It looks like it's getting an XPCOM thread shutdown notification without getting a profile-change-teardown notification first. Maybe that doesn't happen in the child?
Comment 11•10 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #10) > It looks like it's getting an XPCOM thread shutdown notification without > getting a profile-change-teardown notification first. Maybe that doesn't > happen in the child? It looks like the profile-change-teardown notification does not go out, because mProfileNotified is false. That is set to "true" in nsXREDirProvider::DoStartup(), which seems to be only called once, maybe from XREMain::XRE_mainRun().
Comment 12•10 years ago
|
||
Should the GeckoMediaPluginService even be running in the child process?
(In reply to Andrew McCreight [:mccr8] from comment #12) > Should the GeckoMediaPluginService even be running in the child process? Ultimately we display the video in the child, so I would expect so. I wonder how this will interact with sandboxing of the child process though. The child isn't supposed to be able to create new processes--it will have to ask the parent to do that for it.
Randell, would you mind taking a look at this starting at comment 9 (or ni? the right person)?
Flags: needinfo?(rjesup)
Comment 15•10 years ago
|
||
Georg, Chris - this is about GMP shutdown in e10s (and also will be about the interaction with sandboxing, I presume). bsmedberg: should profile-change-teardown occur in e10s child processes? I presume so...
Flags: needinfo?(rjesup)
Flags: needinfo?(georg.fritzsche)
Flags: needinfo?(cpearce)
Flags: needinfo?(benjamin)
Comment 16•10 years ago
|
||
No, content processes don't "have" a profile and so they don't receive any profile-shutdown notifications. But content processes shouldn't be launching GMP children anyway: they should be children of the parent process and subsequently connected to whatever content process(es) need them.
Flags: needinfo?(benjamin)
Comment 17•10 years ago
|
||
OK, we'll need to route the GMPService usage in child processes through the master process before we can enable e10s. The GMPService needs shutdown notifications.
Flags: needinfo?(cpearce)
Comment 18•10 years ago
|
||
I filed bug 1057908 to track making the GeckoMediaPluginService work in e10s.
Updated•10 years ago
|
Flags: needinfo?(georg.fritzsche)
Comment 19•10 years ago
|
||
Green try run, with the assertion work around patch in bug 1058903: https://tbpl.mozilla.org/?tree=Try&rev=929b76bb71f5
No longer depends on: 1065139
Comment 20•10 years ago
|
||
Ben, are you okay with me landing this patch once I get enough stuff fixed up so that it will be green?
Flags: needinfo?(bent.mozilla)
Comment 23•10 years ago
|
||
Backed out for Android debug perma-fail. https://hg.mozilla.org/integration/mozilla-inbound/rev/37c6b40e9dcb https://tbpl.mozilla.org/php/getParsedLog.php?id=47846888&tree=Mozilla-Inbound
Comment 24•10 years ago
|
||
And it blew up B2G debug. https://tbpl.mozilla.org/php/getParsedLog.php?id=47851124&tree=Mozilla-Inbound
Comment 25•10 years ago
|
||
For the Android debug crash, the test it is reporting crashes ( dom/base/test/test_appname_override.html ) looks pretty innocuous and does not involve a child process as far as I can see. However, looking at /dev/log/main below, it seems that there is a child process being started up by dom/audiochannel/tests/test_telephonyPolicy.html, which sets dom.ipc.browser_frames.oop_by_default to true. I guess that's why it runs out of process. Then there's a crash in the crash reporter in CrashReporter::OnChildProcessDumpRequested, as it hits an assert where it looks like it is trying and failing to get a minidump for the child process. From asking in #developers, child content processes are not expected to work on Android, so I think the path forward is to disable this test on Android. I'll file a bug for that. I have no idea why this worked before. (Then there's a crash in ElfLoader, but I'll just assume for now that is due to the crash reporter crashing.)
Comment 26•10 years ago
|
||
minidump_stackwalk fails on the B2G emulator crashes, so I have no idea what is going on there. The next step there is probably to get a B2G emulator build going with some real stacks. Presumably something else is failing in a horrible way at shutdown. To get this working on desktop at least, we could add some kind of quick exit for B2G, though that would require figuring out precisely where to put it.
Comment 27•10 years ago
|
||
I have a patch to work around the mysterious B2G shutdown crash by calling exit(0) at the end of XPCOM shutdown on B2G. That works for B2G, but the patch seems to have a near permaorange on XP M3 debug: https://tbpl.mozilla.org/?tree=Try&rev=27546760d4a3 (The Android M5 orange is something that looks like a frequent problem on trunk, so hopefully the patch for that will fix this too.)
Comment 28•10 years ago
|
||
INFO - 291 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | Unexpected callback for 'INTERNAL_ERROR' with message = 'Cannot start media channels cause = OK' at ["PCW_setLocalDescription@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:1841:1","PCT_setLocalDescription@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:746:3","commandsPeerConnection<@http://mochi.test:8888/tests/dom/media/tests/mochitest/templates.js:305:1","_executeNext@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:103:9",""] - expected PASS Randell, do you know what this means?
Flags: needinfo?(rjesup)
Updated•10 years ago
|
Whiteboard: idempotent
Updated•10 years ago
|
Flags: needinfo?(rjesup)
Comment 30•10 years ago
|
||
The small amount we leak when a content process starts up a GMP process seems to be increasing a little bit, so we need the extra headroom. I mostly only care about much larger leaks (100kb+) that involve windows and documents so I think this is fine for right now.
Attachment #8502804 -
Flags: review?(khuey)
Attachment #8502804 -
Flags: review?(khuey) → review+
Comment 31•10 years ago
|
||
Comment 32•10 years ago
|
||
Comment on attachment 8502830 [details] [diff] [review] part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early. I'm not sure what happened with this name. Anyways, this is what happens without this patch on B2G: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=78e727657e9b The context here is that right now we're not getting content process shutdown logs, because we crash in shutdown, so I'm trying to get things at least to the point where we shut down cleanly on Linux, where we currently have e10s tests enabled, so we can start detecting leak regressions. We've had one or two in the last few weeks.
Attachment #8502830 -
Attachment description: This patch adds an exit(0) in XPCOMShutdown() on B2G and Windows, in content processes. This is the latest point where we avoid the WebRTC intermittent failure. For B2G, it can be pushed all the way to the end of XPCOMShutdown(), but I combined the two fo → Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early.
Attachment #8502830 -
Flags: review?(benjamin)
Comment 33•10 years ago
|
||
Comment on attachment 8502830 [details] [diff] [review] part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early. How does this help? IIRC, if you exit(0) here you *still* won't get leak logs because the leak logs are produced from the final call to NS_LogTerm.
Flags: needinfo?(continuation)
Comment 34•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #33) > How does this help? IIRC, if you exit(0) here you *still* won't get leak > logs because the leak logs are produced from the final call to NS_LogTerm. Yeah, this is pretty weird. Maybe I should break it down a bit. Currently on trunk, in tab processes, we exit(0) during the xpcom-shutdown notification. bent's patch in this bug fixes that, which is enough for us to (sometimes) cleanly shut down child processes on Linux and Linux64, giving us leak logs. However, bent's patch can't land by itself, because with it we hit different failures on B2G and Windows from things that happen later in shutdown. This patch hacks around that by adding an exit(0) later than we currently run it, but before we hit whatever shutdown badness. This is gross, but in combination with bent's patch we'll be running strictly more shutdown code. So, with the patches here, we still will not get leak logs on Windows or B2G, but we will get them (sometimes) on Linux and Linux64. (Also, I meant to mention that the exit(0) for B2G can actually go at the very end of ShutdownXPCOM, as the failure we are hitting there seems to be later, when we're shutting down the Chromium IPC stuff. I just put it in at the place Windows needs for simplicity. But I can split it out if you think that is better.)
Flags: needinfo?(continuation)
Comment 35•10 years ago
|
||
Comment on attachment 8502830 [details] [diff] [review] part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early. In the comment here, please indicate that this exit is intended to be temporary. That is the thing I think was missing when I read it the first time.
Attachment #8502830 -
Flags: review?(benjamin) → review+
Comment 36•10 years ago
|
||
For now, I'm just going to disable this test on e10s. I filed bug 1035454 for investigating it.
Attachment #8504341 -
Flags: review?(jst)
Updated•10 years ago
|
Attachment #8504341 -
Flags: review?(jst) → review+
Comment 37•10 years ago
|
||
The patch in bug 1081415 fixes a bunch of e10s M1 shutdown leaks.
Depends on: 1081415
Comment 38•10 years ago
|
||
Originally when I was testing, I had the exit(0) on B2G at the end of ShutdownXPCOM, but I decided to just move it up to where it needed to be for Windows for simplicity. However, for some reason this leads to test failures on some B2G tests: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9a84615c976d Specifically, we seem to hit an assert after or during the call to exit(0): 18:07:09 INFO - 10-14 00:57:12.125 I/Gecko ( 779): [Child 779] WARNING: Exiting child process early!: file ../../../gecko/xpcom/build/XPCOMInit.cpp, line 961 18:07:09 INFO - 10-14 00:57:12.135 F/MOZ_Assert( 779): Assertion failure: isEmpty(), at ../../dist/include/mozilla/LinkedList.h:305 There's no stack for that, so I don't know what is happening, but moving the exit(0) back to the very end of ShutdownXPCOM seems to fix it, so that's what this patch does: https://tbpl.mozilla.org/?tree=Try&rev=2af1506dd53e I'll land this folded into the other ShutdownXPCOM patch.
Attachment #8505511 -
Flags: review?(benjamin)
Updated•10 years ago
|
Attachment #8505511 -
Flags: review?(benjamin) → review+
Comment 39•10 years ago
|
||
all platform all tests try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9a84615c976d M-e10s failures are leaks fixed by bug 1081415: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1d69baf3ddd3 B2G emulator opt M3 failures are due to a frequent existing intermittent orange. B2G emulator debug failures are fixed by part 2b: https://tbpl.mozilla.org/?tree=Try&rev=2af1506dd53e I think that should be everything. I'm planning to land this when the tree reopens.
Updated•10 years ago
|
Attachment #8468616 -
Attachment description: Fix, v1 → part 4 - Don't send 'Webapps:UnregisterForMessages' at xpcom-shutdown
Updated•10 years ago
|
Attachment #8502804 -
Attachment description: Increase the leak threshold for tab processes. → part 1 - Increase the leak threshold for tab processes.
Updated•10 years ago
|
Attachment #8502830 -
Attachment description: Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early. → part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early.
Updated•10 years ago
|
Attachment #8504341 -
Attachment description: Disable test_bug345339.html on e10s for leaking. → part 3 - Disable test_bug345339.html on e10s for leaking.
Updated•10 years ago
|
Summary: Child DOMApplicationRegistry never sends "Webapps:UnregisterForMessages" to parent, causes parent leak → Child DOMApplicationRegistry never sends "Webapps:UnregisterForMessages" to parent, causes parent leak and crashes debug content processes
Comment 40•10 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/a65f0462a0f2 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/9134c098f0ee remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/5577b0525651 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/4431e4327a4f
Comment 41•10 years ago
|
||
If this has to get backed out for weird intermittent failures, probably just backing out out the last part should be enough.
Comment 42•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a65f0462a0f2 https://hg.mozilla.org/mozilla-central/rev/9134c098f0ee https://hg.mozilla.org/mozilla-central/rev/5577b0525651 https://hg.mozilla.org/mozilla-central/rev/4431e4327a4f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Comment 43•10 years ago
|
||
I'm pretty sure bug 1083664 was caused by this, which is a debug OSX 10.6-only intermittent assertion failure at shutdown. In the short term, I can probably work around this by adding another exit(0) somewhere, for OSX...
Comment 44•10 years ago
|
||
A few pushes after I landed this, leak checking in M3 stopped working intermittently. Some number of pushes after that, something started leaking. So, for now, I'm bumping up the leak threshold to 2MB so we can open the tree again... https://hg.mozilla.org/integration/mozilla-inbound/rev/a0c0f98c8482
Comment 45•10 years ago
|
||
Merge of the re-land: https://hg.mozilla.org/mozilla-central/rev/a0c0f98c8482
Updated•7 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•