Closed Bug 1035454 Opened 10 years ago Closed 10 years ago

Child DOMApplicationRegistry never sends "Webapps:UnregisterForMessages" to parent, causes parent leak and crashes debug content processes

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla36

People

(Reporter: bent.mozilla, Assigned: bent.mozilla)

References

Details

Attachments

(5 files)

part 4 - Don't send 'Webapps:UnregisterForMessages' at xpcom-shutdown 10 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 880 bytes, patch	fabrice : review+	Details \| Diff \| Splinter Review
part 1 - Increase the leak threshold for tab processes. 10 years ago Andrew McCreight [:mccr8] 1.23 KB, patch	khuey : review+	Details \| Diff \| Splinter Review
part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early. 10 years ago Andrew McCreight [:mccr8] 1.57 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review
part 3 - Disable test_bug345339.html on e10s for leaking. 10 years ago Andrew McCreight [:mccr8] 1.04 KB, patch	jst : review+	Details \| Diff \| Splinter Review
part 2b - Call exit(0) at the end of XPCOM shut down instead of in the middle. 10 years ago Andrew McCreight [:mccr8] 1.79 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Description

•

10 years ago

The DOMApplicationRegistry in AppsServiceChild.jsm tries to send "Webapps:UnregisterForMessages" in response to the "xpcom-shutdown" notification. This doesn't ever succeed for two reasons:

1. We never run XPCOM shutdown in non-DEBUG builds. See http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentChild.cpp#1367

2. In DEBUG builds the "xpcom-shutdown" notification happens after the connection to the parent process has already been severed. Any attempt to send a message to the parent will fail and trigger a child process crash.

We need to make sure that any cleanup is listening to the proper notification in the parent process rather than relying on messages from the child. The child must be expected to die at any time.

[:fabrice] Fabrice Desré

Comment 1

•

10 years ago

Ben, we listen to child-process-shutdown that triggers a cleanup of all the listeners associated to this child from http://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.jsm#1140

So is that enough to just remove the useless code in AppsServiceChild.jsm ?

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 2

•

10 years ago

Probably!

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 3

•

10 years ago

Attached patch part 4 - Don't send 'Webapps:UnregisterForMessages' at xpcom-shutdown — Details — Splinter Review

Like this?

Assignee: fabrice → bent.mozilla

Status: NEW → ASSIGNED

Attachment #8468616 - Flags: review?(fabrice)

[:fabrice] Fabrice Desré

Updated

•

10 years ago

Attachment #8468616 - Flags: review?(fabrice) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 4

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/db69c702369d

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 5

•

10 years ago

Ben, I just happened to see this. I think the "leak" this got backed out for is pre-existing. Previously we always crashed before getting a chance to report leaks in the child. I think it probably makes sense to disable leak checking for child processes until we can get the leaks fixed.

Ryan VanderMeulen [:RyanVM]

Comment 6

•

10 years ago

Sorry, forgot to comment here about the backout yesterday.
https://hg.mozilla.org/integration/mozilla-inbound/rev/e13073de5179

I was seeing a lot of B2G debug crashes as well, so please give this a |try: -b d -p all -u all -t none| run before re-landing.

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 7

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #5)
> I think it probably makes sense to disable leak
> checking for child processes until we can get the leaks fixed.

Is there a bug/patch for this yet?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 8

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #7)
> (In reply to Bill McCloskey (:billm) from comment #5)
> > I think it probably makes sense to disable leak
> > checking for child processes until we can get the leaks fixed.
> 
> Is there a bug/patch for this yet?

Filed bug 1052224.

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Blocks: 1051230

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1052224

Andrew McCreight [:mccr8]

Comment 9

•

10 years ago

I have a patch to disable leak checking for child processes, but even with that, there is some kind of shutdown assertion failure in GMP in M3:
  https://tbpl.mozilla.org/?tree=Try&rev=0ff144b4e0d8

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 10

•

10 years ago

It looks like it's getting an XPCOM thread shutdown notification without getting a profile-change-teardown notification first. Maybe that doesn't happen in the child?

Andrew McCreight [:mccr8]

Comment 11

•

10 years ago

(In reply to Bill McCloskey (:billm) from comment #10)
> It looks like it's getting an XPCOM thread shutdown notification without
> getting a profile-change-teardown notification first. Maybe that doesn't
> happen in the child?

It looks like the profile-change-teardown notification does not go out, because mProfileNotified is false.  That is set to "true" in nsXREDirProvider::DoStartup(), which seems to be only called once, maybe from  XREMain::XRE_mainRun().

Andrew McCreight [:mccr8]

Comment 12

•

10 years ago

Should the GeckoMediaPluginService even be running in the child process?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 13

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #12)
> Should the GeckoMediaPluginService even be running in the child process?

Ultimately we display the video in the child, so I would expect so. I wonder how this will interact with sandboxing of the child process though. The child isn't supposed to be able to create new processes--it will have to ask the parent to do that for it.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 14

•

10 years ago

Randell, would you mind taking a look at this starting at comment 9 (or ni? the right person)?

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 15

•

10 years ago

Georg, Chris - this is about GMP shutdown in e10s (and also will be about the interaction with sandboxing, I presume).

bsmedberg: should profile-change-teardown occur in e10s child processes?  I presume so...

Flags: needinfo?(rjesup)

Flags: needinfo?(georg.fritzsche)

Flags: needinfo?(cpearce)

Flags: needinfo?(benjamin)

Benjamin Smedberg

Comment 16

•

10 years ago

No, content processes don't "have" a profile and so they don't receive any profile-shutdown notifications. But content processes shouldn't be launching GMP children anyway: they should be children of the parent process and subsequently connected to whatever content process(es) need them.

Flags: needinfo?(benjamin)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 17

•

10 years ago

OK, we'll need to route the GMPService usage in child processes through the master process before we can enable e10s. The GMPService needs shutdown notifications.

Flags: needinfo?(cpearce)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 18

•

10 years ago

I filed bug 1057908 to track making the GeckoMediaPluginService work in e10s.

Georg Fritzsche [:gfritzsche]

Updated

•

10 years ago

Flags: needinfo?(georg.fritzsche)

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1058903

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1065139

Andrew McCreight [:mccr8]

Comment 19

•

10 years ago

Green try run, with the assertion work around patch in bug 1058903:
  https://tbpl.mozilla.org/?tree=Try&rev=929b76bb71f5

No longer depends on: 1065139

Andrew McCreight [:mccr8]

Comment 20

•

10 years ago

Ben, are you okay with me landing this patch once I get enough stuff fixed up so that it will be green?

Flags: needinfo?(bent.mozilla)

atifrea

Updated

•

10 years ago

Blocks: 1015466

Ben Turner (not reading bugmail, use the needinfo flag!)

Assignee

Comment 21

•

10 years ago

Yep!

Flags: needinfo?(bent.mozilla)

Andrew McCreight [:mccr8]

Comment 22

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5a952abc6c2f

Ryan VanderMeulen [:RyanVM]

Comment 23

•

10 years ago

Backed out for Android debug perma-fail.
https://hg.mozilla.org/integration/mozilla-inbound/rev/37c6b40e9dcb

https://tbpl.mozilla.org/php/getParsedLog.php?id=47846888&tree=Mozilla-Inbound

Ryan VanderMeulen [:RyanVM]

Comment 24

•

10 years ago

And it blew up B2G debug.
https://tbpl.mozilla.org/php/getParsedLog.php?id=47851124&tree=Mozilla-Inbound

Andrew McCreight [:mccr8]

Comment 25

•

10 years ago

For the Android debug crash, the test it is reporting crashes ( dom/base/test/test_appname_override.html ) looks pretty innocuous and does not involve a child process as far as I can see.  However, looking at /dev/log/main below, it seems that there is a child process being started up by dom/audiochannel/tests/test_telephonyPolicy.html, which sets dom.ipc.browser_frames.oop_by_default to true.  I guess that's why it runs out of process.

Then there's a crash in the crash reporter in CrashReporter::OnChildProcessDumpRequested, as it hits an assert where it looks like it is trying and failing to get a minidump for the child process.  From asking in #developers, child content processes are not expected to work on Android, so I think the path forward is to disable this test on Android.  I'll file a bug for that.  I have no idea why this worked before.  (Then there's a crash in ElfLoader, but I'll just assume for now that is due to the crash reporter crashing.)

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1066212

Andrew McCreight [:mccr8]

Comment 26

•

10 years ago

minidump_stackwalk fails on the B2G emulator crashes, so I have no idea what is going on there.  The next step there is probably to get a B2G emulator build going with some real stacks.  Presumably something else is failing in a horrible way at shutdown.  To get this working on desktop at least, we could add some kind of quick exit for B2G, though that would require figuring out precisely where to put it.

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1066318

Andrew McCreight [:mccr8]

Comment 27

•

10 years ago

I have a patch to work around the mysterious B2G shutdown crash by calling exit(0) at the end of XPCOM shutdown on B2G.  That works for B2G, but the patch seems to have a near permaorange on XP M3 debug:
  https://tbpl.mozilla.org/?tree=Try&rev=27546760d4a3

(The Android M5 orange is something that looks like a frequent problem on trunk, so hopefully the patch for that will fix this too.)

Andrew McCreight [:mccr8]

Comment 28

•

10 years ago

INFO -  291 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | Unexpected callback for 'INTERNAL_ERROR' with message = 'Cannot start media channels cause = OK' at ["PCW_setLocalDescription@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:1841:1","PCT_setLocalDescription@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:746:3","commandsPeerConnection<@http://mochi.test:8888/tests/dom/media/tests/mochitest/templates.js:305:1","_executeNext@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:103:9",""] - expected PASS

Randell, do you know what this means?

Flags: needinfo?(rjesup)

Andrew McCreight [:mccr8]

Updated

•

10 years ago

No longer depends on: 1066318

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1073310

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Updated

•

10 years ago

Whiteboard: idempotent

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Comment 29

•

10 years ago

ugh, meant to hit the bugzilla search field. :-/

Whiteboard: idempotent

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1073597

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Flags: needinfo?(rjesup)

Andrew McCreight [:mccr8]

Comment 30

•

10 years ago

Attached patch part 1 - Increase the leak threshold for tab processes. — Details — Splinter Review

The small amount we leak when a content process starts up a GMP process seems to be increasing a little bit, so we need the extra headroom.  I mostly only care about much larger leaks (100kb+) that involve windows and documents so I think this is fine for right now.

Attachment #8502804 - Flags: review?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

Attachment #8502804 - Flags: review?(khuey) → review+

Andrew McCreight [:mccr8]

Comment 31

•

10 years ago

Attached patch part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early. — Details — Splinter Review

Andrew McCreight [:mccr8]

Comment 32

•

10 years ago

Comment on attachment 8502830 [details] [diff] [review]
part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early.

I'm not sure what happened with this name.

Anyways, this is what happens without this patch on B2G:
  https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=78e727657e9b

The context here is that right now we're not getting content process shutdown logs, because we crash in shutdown, so I'm trying to get things at least to the point where we shut down cleanly on Linux, where we currently have e10s tests enabled, so we can start detecting leak regressions.  We've had one or two in the last few weeks.

Attachment #8502830 - Attachment description: This patch adds an exit(0) in XPCOMShutdown() on B2G and Windows, in content processes. This is the latest point where we avoid the WebRTC intermittent failure. For B2G, it can be pushed all the way to the end of XPCOMShutdown(), but I combined the two fo → Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early.

Attachment #8502830 - Flags: review?(benjamin)

Benjamin Smedberg

Comment 33

•

10 years ago

Comment on attachment 8502830 [details] [diff] [review]
part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early.

How does this help? IIRC, if you exit(0) here you *still* won't get leak logs because the leak logs are produced from the final call to NS_LogTerm.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 34

•

10 years ago

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #33)
> How does this help? IIRC, if you exit(0) here you *still* won't get leak
> logs because the leak logs are produced from the final call to NS_LogTerm.

Yeah, this is pretty weird.  Maybe I should break it down a bit.

Currently on trunk, in tab processes, we exit(0) during the xpcom-shutdown notification.  bent's patch in this bug fixes that, which is enough for us to (sometimes) cleanly shut down child processes on Linux and Linux64, giving us leak logs.

However, bent's patch can't land by itself, because with it we hit different failures on B2G and Windows from things that happen later in shutdown.  This patch hacks around that by adding an exit(0) later than we currently run it, but before we hit whatever shutdown badness.  This is gross, but in combination with bent's patch we'll be running strictly more shutdown code.

So, with the patches here, we still will not get leak logs on Windows or B2G, but we will get them (sometimes) on Linux and Linux64.

(Also, I meant to mention that the exit(0) for B2G can actually go at the very end of ShutdownXPCOM, as the failure we are hitting there seems to be later, when we're shutting down the Chromium IPC stuff.  I just put it in at the place Windows needs for simplicity. But I can split it out if you think that is better.)

Flags: needinfo?(continuation)

Benjamin Smedberg

Comment 35

•

10 years ago

Comment on attachment 8502830 [details] [diff] [review]
part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early.

In the comment here, please indicate that this exit is intended to be temporary. That is the thing I think was missing when I read it the first time.

Attachment #8502830 - Flags: review?(benjamin) → review+

Andrew McCreight [:mccr8]

Comment 36

•

10 years ago

Attached patch part 3 - Disable test_bug345339.html on e10s for leaking. — Details — Splinter Review

For now, I'm just going to disable this test on e10s.  I filed bug 1035454 for investigating it.

Attachment #8504341 - Flags: review?(jst)

Johnny Stenback (:jst)

Updated

•

10 years ago

Attachment #8504341 - Flags: review?(jst) → review+

Andrew McCreight [:mccr8]

Comment 37

•

10 years ago

The patch in bug 1081415 fixes a bunch of e10s M1 shutdown leaks.

Depends on: 1081415

Andrew McCreight [:mccr8]

Comment 38

•

10 years ago

Attached patch part 2b - Call exit(0) at the end of XPCOM shut down instead of in the middle. — Details — Splinter Review

Originally when I was testing, I had the exit(0) on B2G at the end of ShutdownXPCOM, but I decided to just move it up to where it needed to be for Windows for simplicity.

However, for some reason this leads to test failures on some B2G tests:
  https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9a84615c976d
Specifically, we seem to hit an assert after or during the call to exit(0):
18:07:09 INFO - 10-14 00:57:12.125 I/Gecko ( 779): [Child 779] WARNING: Exiting child process early!: file ../../../gecko/xpcom/build/XPCOMInit.cpp, line 961
18:07:09 INFO - 10-14 00:57:12.135 F/MOZ_Assert( 779): Assertion failure: isEmpty(), at ../../dist/include/mozilla/LinkedList.h:305 

There's no stack for that, so I don't know what is happening, but moving the exit(0) back to the very end of ShutdownXPCOM seems to fix it, so that's what this patch does:
  https://tbpl.mozilla.org/?tree=Try&rev=2af1506dd53e

I'll land this folded into the other ShutdownXPCOM patch.

Attachment #8505511 - Flags: review?(benjamin)

Benjamin Smedberg

Updated

•

10 years ago

Attachment #8505511 - Flags: review?(benjamin) → review+

Andrew McCreight [:mccr8]

Comment 39

•

10 years ago

all platform all tests try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9a84615c976d

M-e10s failures are leaks fixed by bug 1081415:
  https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1d69baf3ddd3

B2G emulator opt M3 failures are due to a frequent existing intermittent orange.

B2G emulator debug failures are fixed by part 2b:
  https://tbpl.mozilla.org/?tree=Try&rev=2af1506dd53e

I think that should be everything.  I'm planning to land this when the tree reopens.

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Attachment #8468616 - Attachment description: Fix, v1 → part 4 - Don't send 'Webapps:UnregisterForMessages' at xpcom-shutdown

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Attachment #8502804 - Attachment description: Increase the leak threshold for tab processes. → part 1 - Increase the leak threshold for tab processes.

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Attachment #8502830 - Attachment description: Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early. → part 2 - Paper over late shutdown crashes in debug B2G content processes by exiting XPCOM early.

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Attachment #8504341 - Attachment description: Disable test_bug345339.html on e10s for leaking. → part 3 - Disable test_bug345339.html on e10s for leaking.

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Summary: Child DOMApplicationRegistry never sends "Webapps:UnregisterForMessages" to parent, causes parent leak → Child DOMApplicationRegistry never sends "Webapps:UnregisterForMessages" to parent, causes parent leak and crashes debug content processes

Andrew McCreight [:mccr8]

Updated

•

10 years ago

No longer depends on: 1073310

Andrew McCreight [:mccr8]

Comment 40

•

10 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/a65f0462a0f2
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/9134c098f0ee
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/5577b0525651
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/4431e4327a4f

Andrew McCreight [:mccr8]

Comment 41

•

10 years ago

If this has to get backed out for weird intermittent failures, probably just backing out out the last part should be enough.

Carsten Book [:Tomcat]

Comment 42

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/a65f0462a0f2
https://hg.mozilla.org/mozilla-central/rev/9134c098f0ee
https://hg.mozilla.org/mozilla-central/rev/5577b0525651
https://hg.mozilla.org/mozilla-central/rev/4431e4327a4f

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla36

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1083664

Andrew McCreight [:mccr8]

Comment 43

•

10 years ago

I'm pretty sure bug 1083664 was caused by this, which is a debug OSX 10.6-only intermittent assertion failure at shutdown.  In the short term, I can probably work around this by adding another exit(0) somewhere, for OSX...

Andrew McCreight [:mccr8]

Comment 44

•

10 years ago

A few pushes after I landed this, leak checking in M3 stopped working intermittently.  Some number of pushes after that, something started leaking.  So, for now, I'm bumping up the leak threshold to 2MB so we can open the tree again...

https://hg.mozilla.org/integration/mozilla-inbound/rev/a0c0f98c8482

Ryan VanderMeulen [:RyanVM]

Comment 45

•

10 years ago

Merge of the re-land:
https://hg.mozilla.org/mozilla-central/rev/a0c0f98c8482

Benoit Girard (:BenWa)

Updated

•

8 years ago

Blocks: 1242119

BMO Automation

Updated

•

7 years ago

Product: Core → Core Graveyard

You need to log in before you can comment on or make changes to this bug.