1188056 - ContentParent and child actors not being destroyed upon child process crash

Reporter

Description

•

9 years ago

We're regularly creating ContentParents, seemingly for live thumbnail drawing so even with e10s switched off. It appears that when the child process for these dies we never end up destroying some of the ContentParents (and their associated child actors). Even when we are destroying them some seem to be destroyed through P*Parent::OnChannelError() messages so it seems even when we are shutting down it's not going in the way we wanted to.

Over the course of a session this seems to cause us to accumulate a collection of ContentParent objects and for example also some ImageBridgeParent objects which in turn are responsible for bug 1127270. This is bad.

Any suggestions as to what's going on or for who else might be helpful in looking at this will be appreciated!

Mike Conley (:mconley) (:⚙️)

Updated

•

9 years ago

tracking-e10s: --- → ?

Benjamin Smedberg

Updated

•

9 years ago

Component: IPC → DOM: Content Processes

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 1

•

9 years ago

This is usually caused by a leak elsewhere.  A GC/CC log is the starting point to investigate that.

Bas Schouten (:bas.schouten)

Reporter

Comment 2

•

9 years ago

So it -appears- at the moment that this is occurring when my content process that's being created:

First-chance exception at 0x77140A31 (ntdll.dll) in plugin-container.exe: 0xC0000710: A threadpool worker thread is impersonating a client, after a callback to 0x%p(0x%p). This is unexpected, indicating that the callback is missing a call to revert the impersonation (parameters: 0x770D4200, 0x0124A158).
Unhandled exception at 0x77140A31 (ntdll.dll) in plugin-container.exe: 0xC0000710: A threadpool worker thread is impersonating a client, after a callback to 0x%p(0x%p). This is unexpected, indicating that the callback is missing a call to revert the impersonation (parameters: 0x770D4200, 0x0124A158).

The child process then dies somewhat spectacularly and on this crash it appears the protocols in the parent process never get cleaned up. This exception may very well somehow be debug build only, I'll try to verify this.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 3

•

9 years ago

I've seen cases where we fail to handle FailedConstructor correctly before.

Bas Schouten (:bas.schouten)

Reporter

Comment 4

•

9 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> I've seen cases where we fail to handle FailedConstructor correctly before.

Is there any information I can get for you here?

Jim Mathies [:jimm]

Comment 5

•

9 years ago

> First-chance exception at 0x77140A31 (ntdll.dll) in plugin-container.exe:
> 0xC0000710: A threadpool worker thread is impersonating a client, after a
> callback to 0x%p(0x%p). This is unexpected, indicating that the callback is
> missing a call to revert the impersonation (parameters: 0x770D4200,
> 0x0124A158).
> Unhandled exception at 0x77140A31 (ntdll.dll) in plugin-container.exe:


I ran into this on Win10 a couple weeks ago in a debug build, are you testing on 10? At the time I chalked it up to pre-release Windows bugs as I wasn't able to reproduce on win7. That said I think Arron added some thread pool code recently, he might be interested in this. Overall though I doubt it has anything to do with the compositor shutdown problems.

Bas Schouten (:bas.schouten)

Reporter

Comment 6

•

9 years ago

(In reply to Jim Mathies [:jimm] from comment #5)
> > First-chance exception at 0x77140A31 (ntdll.dll) in plugin-container.exe:
> > 0xC0000710: A threadpool worker thread is impersonating a client, after a
> > callback to 0x%p(0x%p). This is unexpected, indicating that the callback is
> > missing a call to revert the impersonation (parameters: 0x770D4200,
> > 0x0124A158).
> > Unhandled exception at 0x77140A31 (ntdll.dll) in plugin-container.exe:
> 
> 
> I ran into this on Win10 a couple weeks ago in a debug build, are you
> testing on 10? At the time I chalked it up to pre-release Windows bugs as I
> wasn't able to reproduce on win7. That said I think Arron added some thread
> pool code recently, he might be interested in this. Overall though I doubt
> it has anything to do with the compositor shutdown problems.

No, I'm running on Windows 8.1. I've seen this exception before as well, I feel I traced it back to something at some point. But I recall it not being very important for our users in the wild. I'm really more interested in us not cleaning up after a content process crash properly fairly consistently, since this would for one mean the shutdown will hang but it also means we'd be leaking resources.

Bas Schouten (:bas.schouten)

Reporter

Comment 7

•

9 years ago

I've also manually bypassed the exception, and I've added a crash in the content process manually during drawing to see what would happen if the crash occurred at another point in time, and I observed similar consequences, essentially the ContentParent and all associated objects did not get cleaned up.

Bas Schouten (:bas.schouten)

Reporter

Updated

•

9 years ago

Summary: ContentParent and child actors not being destroyed upon child process destruction → ContentParent and child actors not being destroyed upon child process crash

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 8

•

9 years ago

(In reply to Bas Schouten (:bas.schouten) from comment #4)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> > I've seen cases where we fail to handle FailedConstructor correctly before.
> 
> Is there any information I can get for you here?

Comment 1 ...

Bas Schouten (:bas.schouten)

Reporter

Comment 9

•

9 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #8)
> (In reply to Bas Schouten (:bas.schouten) from comment #4)
> > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> > > I've seen cases where we fail to handle FailedConstructor correctly before.
> > 
> > Is there any information I can get for you here?
> 
> Comment 1 ...

Do you think this is actually caused by a leak rather than just the shutdown not running for some other reason (I don't know much about how IPDL shuts things down)? What's the best way of generating a log that has the information you think will be useful?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 10

•

9 years ago

This probably isn't an IPDL issue. All IPDL handles is calling ContentParent::ActorDestroy when the content process crashes. Its job is to actually clean up. However, since ContentParent is a refcounted class, other outstanding refs can stop that from happening.

As far as I know, TabParent has the main ref to ContentParent, so the first thing to check is probably that all those are getting destroyed properly. They also get torn down by IPDL, through here:
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/nsIContentParent.cpp#133

If there are other refs, then you'll probably need to do some kind of leak logging to figure it out.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 11

•

9 years ago

(In reply to Bas Schouten (:bas.schouten) from comment #9)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #8)
> > (In reply to Bas Schouten (:bas.schouten) from comment #4)
> > > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> > > > I've seen cases where we fail to handle FailedConstructor correctly before.
> > > 
> > > Is there any information I can get for you here?
> > 
> > Comment 1 ...
> 
> Do you think this is actually caused by a leak rather than just the shutdown
> not running for some other reason (I don't know much about how IPDL shuts
> things down)? What's the best way of generating a log that has the
> information you think will be useful?

My experience with b2g is that these issues are virtually always caused by JS not properly cleaning up after mozbrowsererror is fired.  Go to about:memory and get the verbose GC/CC logs.

Bas Schouten (:bas.schouten)

Reporter

Comment 12

•

9 years ago

(In reply to Bill McCloskey (:billm) from comment #10)
> This probably isn't an IPDL issue. All IPDL handles is calling
> ContentParent::ActorDestroy when the content process crashes. Its job is to
> actually clean up. However, since ContentParent is a refcounted class, other
> outstanding refs can stop that from happening.
> 
> As far as I know, TabParent has the main ref to ContentParent, so the first
> thing to check is probably that all those are getting destroyed properly.
> They also get torn down by IPDL, through here:
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/nsIContentParent.
> cpp#133
> 
> If there are other refs, then you'll probably need to do some kind of leak
> logging to figure it out.

Are we using TabParent even when we're just creating thumbnails in the background for the newtab screen?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 13

•

9 years ago

(In reply to Bas Schouten (:bas.schouten) from comment #12)
> Are we using TabParent even when we're just creating thumbnails in the
> background for the newtab screen?

Yes. There's one TabParent for every remote <browser> element, which is what we use for background thumbnail generation.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

9 years ago

tracking-e10s: ? → +

Jim Mathies [:jimm]

Updated

•

8 years ago

Blocks: e10s-multi

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 14

•

8 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #11)
> My experience with b2g is that these issues are virtually always caused by
> JS not properly cleaning up after mozbrowsererror is fired.  Go to
> about:memory and get the verbose GC/CC logs.

khuey: Here's a verbose GC/CC log:

https://1drv.ms/u/s!AsfWgS0rNKaKyWpiqCXtBFb4U47n

I'd be most interested in getting this fixed, as I hit it every time I try to run e10s on my main Win10 work machine.

Flags: needinfo?(khuey)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 15

•

8 years ago

I was hitting this error if I have Notepad++ installed and set as the Windows shell default file type handler for .js files in our call to IApplicationAssociationRegistration::QueryCurrentDefault here:
https://dxr.mozilla.org/mozilla-central/source/uriloader/exthandler/win/nsOSHelperAppService.cpp#565

This is for assocType=".js".

I set the default file type handler for ".js" to (Windows' built in) Notepad, and the exception no longer throws. However, once I set the default file type handler back to Notepad++, the exception *still* no longer throws.

So at least I can work again. But now I can't verify whatever fix we do here helps. :(

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 16

•

8 years ago

(In reply to Chris Pearce (:cpearce) from comment #14)
> I'd be most interested in getting this fixed, as I hit it every time I try
> to run e10s on my main Win10 work machine.

Is there a particular reason this is a big deal for you?  It's generally just a small memory leak, and not terribly concerning.

Flags: needinfo?(khuey) → needinfo?(cpearce)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 17

•

8 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16)
> Is there a particular reason this is a big deal for you?  It's generally
> just a small memory leak, and not terribly concerning.

I was getting the exception thrown dialog with the same message as listed in comment #2 and then a crash in the content process, which made it impossible to run a (locally built) e10s build. Now that I've unset and reset my OS default *.js editor, the exception doesn't throw, and everything works as expected. Which is weird.

Flags: needinfo?(cpearce)

Jim Mathies [:jimm]

Updated

•

7 years ago

Priority: -- → P3

Andrew McCreight [:mccr8]

Comment 18

•

2 years ago

Not clear what was going on here. If somebody is still having some issues, we could look at a new bug.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → INCOMPLETE

Bugzilla

Quick Search

ContentParent and child actors not being destroyed upon child process crash

Categories

(Core :: DOM: Content Processes, defect, P3)

Tracking

()

People

(Reporter: bas.schouten, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Updated

Comment 14

Comment 15

Comment 16

Comment 17

Updated

Comment 18