Closed Bug 1278443 Opened 8 years ago Closed 8 years ago

ChildImpl::OpenProtocolOnMainThread crashes when attempting to send a message during shutdown

Categories

(Core :: IPC, defect, P2)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
e10s + ---
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- wontfix
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: ting, Assigned: cyu)

References

Details

(Keywords: crash, topcrash-mac, Whiteboard: btpp-backlog)

Crash Data

Attachments

(3 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-c35a89a5-deb8-4c1e-be49-a46202160606.
=============================================================

#2 of Mac OS X Nightly 20160605030215, 4 crashes from 2 installations. It's a MOZ_CRASH from opening PBackground failure.
Is this a new regression? I don't think that 4 crashes would typically be enough to investigate a bug report, in case this is developers doing odd things.
Flags: needinfo?(janus926)
That's four crashes for a single build, on OSX, which is fairly high for OSX. I see 51 crashes in the last 7 days, mostly on 49.

These are all happening during shutdown, so it is likely more fallout from the "submit old crash reports" thing.

BackgroundChild does have a sShutdownHasStarted variable that tracks when shutdown has started, but it is waiting for NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID, which I think is partway through XPCOM shutdown, so it isn't catching when we trigger this in the earlier part of XPCOM shutdown, when we are presumably still not allowed to send IPC messages. That would be easy enough to fix to listen for whatever earlier point when we can't send IPC messages any more... but when we hit that condition we also crash, so it wouldn't really improve things! Maybe the behavior could be changed to fail more gracefully.
Summary: Crash in (anonymous namespace)::ChildImpl::OpenProtocolOnMainThread → ChildImpl::OpenProtocolOnMainThread crashes when attempting to send a message during shutdown
I don't know how much it matters, but in two of the crash reports I looked at, what is happening is that the child is receiving a destroy message, which causes it to trigger PageHide, which runs some script, which triggers what I think is a sync XHR that spins the event loop, which then tries to create this background child thing.
See Also: → 1277582
Ting-Yu, are you interested/able to investigate more here? At first glance this doesn't seem super-urgent.
Whiteboard: btpp-backlog
Bill has a patch in bug 1268559 that might help this.
Depends on: 1268559
Let's see if bug 1268559 could help this.
Flags: needinfo?(janus926)
Priority: -- → P2
Crash Signature: [@ (anonymous namespace)::ChildImpl::OpenProtocolOnMainThread] → [@ (anonymous namespace)::ChildImpl::OpenProtocolOnMainThread] [@ `anonymous namespace''::ChildImpl::OpenProtocolOnMainThread ]
See Also: → 1140115
Crash volume for signature '(anonymous namespace)::ChildImpl::OpenProtocolOnMainThread':
 - nightly (version 50): 35 crashes from 2016-06-06.
 - aurora  (version 49): 110 crashes from 2016-06-07.
 - beta    (version 48): 1 crash from 2016-06-06.
 - release (version 47): 0 crashes from 2016-05-31.
 - esr     (version 45): 0 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       6       3       1       2       0       7       9
 - aurora       11       7       6       8      12      29      29
 - beta          0       0       1       0       0       0       0
 - release       0       0       0       0       0       0       0
 - esr           0       0       0       0       0       0       0

Affected platform: Mac OS X
Crash volume for signature '`anonymous namespace''::ChildImpl::OpenProtocolOnMainThread':
 - nightly (version 51): 147 crashes from 2016-08-01.
 - aurora  (version 50): 280 crashes from 2016-08-01.
 - beta    (version 49): 185 crashes from 2016-08-02.
 - release (version 48): 6 crashes from 2016-07-25.
 - esr     (version 45): 5 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly      42      51      31
 - aurora      105      89      26
 - beta          5      85      78
 - release       3       0       1
 - esr           2       0       1

Affected platform: Windows

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #13
 - aurora  #16
 - beta    #803
 - release #4554
 - esr     #4043
bug 1277582 just landed, let's see if this will disappear too.
Hello Kan-Ru, I just checked the crash signature and I see crash reports from 09-12 aurora50 build. I don't think the fix from bug 1277582 address this signature. Will you be working on an alternate fix for it?
Flags: needinfo?(kchen)
I can investigate this.
Assignee: nobody → kchen
Flags: needinfo?(kchen)
Assignee: kchen → cyu
Crash volume for signature '`anonymous namespace''::ChildImpl::OpenProtocolOnMainThread':
 - nightly (version 52): 148 crashes from 2016-09-19.
 - aurora  (version 51): 56 crashes from 2016-09-19.
 - beta    (version 50): 2 crashes from 2016-09-20.
 - release (version 49): 10 crashes from 2016-09-05.
 - esr     (version 45): 21 crashes from 2016-06-01.

Crash volume on the last weeks (Week N is from 10-03 to 10-09):
            W. N-1  W. N-2
 - nightly      48     100
 - aurora       45      11
 - beta          2       0
 - release       4       6
 - esr           1       5

Affected platform: Windows

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #11
 - aurora  #21
 - beta    #4616
 - release #6227
 - esr     #4483
After applying the patch, just open and then close the browser. Then the content process crashes.
Attached patch WIP (obsolete) — Splinter Review
This patch fails ChildImpl::OpenProtocolOnMainThread() gently after PContentChild is closed.
It looks like CamerasChild can handle the failure in opening BackgroundChild.
The change in RuntimeService seems to work, but I am not sure if it's the right fix.
Attachment #8797584 - Flags: review?(wmccloskey)
Comment on attachment 8797584 [details] [diff] [review]
WIP

Review of attachment 8797584 [details] [diff] [review]:
-----------------------------------------------------------------

If we decide to make the worker change, Blake should probably review that. I know very little about workers. Otherwise this looks good to me.

::: dom/workers/RuntimeService.cpp
@@ +2547,5 @@
>  
>    // Too many idle threads, just shut this one down.
>    if (shutdownThread) {
>      MOZ_ALWAYS_SUCCEEDS(aThread->Shutdown());
>    } else if (scheduleTimer) {

Maybe combine the mIdleThreadTimer check with this one?

@@ -2787,5 @@
>    //       mWorkerPrivate->SetThread() in order to avoid accidentally consuming
>    //       worker messages here.
>    if (NS_WARN_IF(!BackgroundChild::SynchronouslyCreateForCurrentThread())) {
>      // XXX need to fire an error at parent.
> -    return NS_ERROR_UNEXPECTED;

I guess I don't see why we need to change this. If the worker fails to start up when we're shutting down, would that be bad? It seems pretty reasonable to me.

::: ipc/glue/BackgroundImpl.cpp
@@ +2049,5 @@
>  
>      return true;
>    }
>  
> +  if (sIPCShutdown) {

Can you just use
  if (XRE_IsContentProcess() && ContentChild::GetSingleton()->IsShuttingDown())
?

Then you could remove the observer.
Attachment #8797584 - Flags: review?(wmccloskey)
Attachment #8797584 - Attachment is obsolete: true
(In reply to Bill McCloskey (:billm) from comment #16)
> Comment on attachment 8797584 [details] [diff] [review]
> WIP
> 
> Review of attachment 8797584 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> If we decide to make the worker change, Blake should probably review that. I
> know very little about workers. Otherwise this looks good to me.
> 
> ::: dom/workers/RuntimeService.cpp
> @@ +2547,5 @@
> >  
> >    // Too many idle threads, just shut this one down.
> >    if (shutdownThread) {
> >      MOZ_ALWAYS_SUCCEEDS(aThread->Shutdown());
> >    } else if (scheduleTimer) {
> 
> Maybe combine the mIdleThreadTimer check with this one?
> 
This is related to the problem in test case design. Starting a worker in "xpcom-shutdow" phase confuses RuntimeService's cleanup sequence because RuntimeService also observes "xpcom-shutdow" to Shutdown() itself.

> @@ -2787,5 @@
> >    //       mWorkerPrivate->SetThread() in order to avoid accidentally consuming
> >    //       worker messages here.
> >    if (NS_WARN_IF(!BackgroundChild::SynchronouslyCreateForCurrentThread())) {
> >      // XXX need to fire an error at parent.
> > -    return NS_ERROR_UNEXPECTED;
> 
> I guess I don't see why we need to change this. If the worker fails to start
> up when we're shutting down, would that be bad? It seems pretty reasonable
> to me.
> 
It'd be pretty bad because the worker will not be unregistered. Then RuntimeService will wait forever for this leaked worker during shutdown.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #19)
> (In reply to Bill McCloskey (:billm) from comment #16)
> > Comment on attachment 8797584 [details] [diff] [review]
> > WIP
> > 
> > Review of attachment 8797584 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > If we decide to make the worker change, Blake should probably review that. I
> > know very little about workers. Otherwise this looks good to me.
> > 
> > ::: dom/workers/RuntimeService.cpp
> > @@ +2547,5 @@
> > >  
> > >    // Too many idle threads, just shut this one down.
> > >    if (shutdownThread) {
> > >      MOZ_ALWAYS_SUCCEEDS(aThread->Shutdown());
> > >    } else if (scheduleTimer) {
> > 
> > Maybe combine the mIdleThreadTimer check with this one?
> > 
> This is related to the problem in test case design. Starting a worker in
> "xpcom-shutdow" phase confuses RuntimeService's cleanup sequence because
> RuntimeService also observes "xpcom-shutdow" to Shutdown() itself.

I just meant it would be easier to write this:
  } else if (scheduleTimer && mIdleThreadTimer) {
rather than:
  } else if (scheduleTimer) {
    if (mIdleThreadTimer) {

> > @@ -2787,5 @@
> > >    //       mWorkerPrivate->SetThread() in order to avoid accidentally consuming
> > >    //       worker messages here.
> > >    if (NS_WARN_IF(!BackgroundChild::SynchronouslyCreateForCurrentThread())) {
> > >      // XXX need to fire an error at parent.
> > > -    return NS_ERROR_UNEXPECTED;
> > 
> > I guess I don't see why we need to change this. If the worker fails to start
> > up when we're shutting down, would that be bad? It seems pretty reasonable
> > to me.
> > 
> It'd be pretty bad because the worker will not be unregistered. Then
> RuntimeService will wait forever for this leaked worker during shutdown.

OK, I was afraid that might be the case. I wonder how hard it would be to actually notify the parent of the error.
Comment on attachment 8797962 [details]
Bug 1278443 - Part 1: Don't open PBackground actors after PContentChild is closed.

https://reviewboard.mozilla.org/r/83580/#review83282
Attachment #8797962 - Flags: review?(wmccloskey) → review+
Hey Cervantes, sorry for my delay in reviewing this. I'll make sure to finish this review either later today or tomorrow.
Comment on attachment 8797963 [details]
Bug 1278443 - Part 2: Continue to run the worker even if BackgroundChild fails to create.

https://reviewboard.mozilla.org/r/83582/#review84330

I'm not sure I fully understand all of the implications of this, but let's try it.
Attachment #8797963 - Flags: review?(mrbkap) → review+
Flags: needinfo?(cyu)
Pushed by cyu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/53a144cdc973
Part 1: Don't open PBackground actors after PContentChild is closed. r=billm
https://hg.mozilla.org/integration/autoland/rev/d4858e65d890
Part 2: Continue to run the worker even if BackgroundChild fails to create. r=mrbkap
https://hg.mozilla.org/mozilla-central/rev/53a144cdc973
https://hg.mozilla.org/mozilla-central/rev/d4858e65d890
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Is this worth trying to backport?
Hi Kan-Ru,
Since :cyu is OOO, this might need your help to create uplift request. Thanks.
Flags: needinfo?(cyu) → needinfo?(kchen)
Comment on attachment 8797962 [details]
Bug 1278443 - Part 1: Don't open PBackground actors after PContentChild is closed.

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Fixed a shutdown crash
[Describe test coverage new/current, TreeHerder]: on m-c. Baked on Nightly.
[Risks and why]: Low. This patch disallows PBackground creation when shutdown; PBackground client should already handle connection failure gracefully.
[String/UUID change made/needed]:
Flags: needinfo?(kchen)
Attachment #8797962 - Flags: approval-mozilla-aurora?
Comment on attachment 8797962 [details]
Bug 1278443 - Part 1: Don't open PBackground actors after PContentChild is closed.

Fix a crash. Take it in 51 aurora.
Attachment #8797962 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
kanru, cervantes: does part 2 also need uplifting ?
Flags: needinfo?(cyu)
Bug 925645 seems a lot more likely.
Flags: needinfo?(cyu)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: