Closed Bug 1225121 Opened 4 years ago Closed 4 years ago

[B2G] All the rest of content processes crash if user tries to run app that uses Service Workers after restart

Categories

(Core :: DOM: Service Workers, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: azasypkin, Assigned: ferjm)

References

Details

Attachments

(2 files, 3 obsolete files)

STR:

1. Apply PR from bug 1210704 and do "make raptor" to have several _packaged_ test apps that use Service Workers;
2. Eg. run  "Empty Service Worker" or "Simple Fetch Service Worker" app;
3. Close app and restart device;
4. Once device is booted try to run any of this apps.

In logcat I see:

I/Gecko   (  205): [Parent 205] WARNING: waitpid failed pid:1160 errno:10: file /builds/slave/b2g_m-cen_flm-kk_eng_ntly-0000/build/gecko/ipc/chromium/src/base/process_util_posix.cc, line 267
I/Gecko   (  205): 
I/Gecko   (  205): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x9C0018,name=PNecko::Msg_PRemoteOpenFileConstructor) Value error: message was deserialized, but contained an illegal value
I/Gecko   (  205): 
I/Gecko   (  205): [Parent 205] WARNING: waitpid failed pid:952 errno:10: file /builds/slave/b2g_m-cen_flm-kk_eng_ntly-0000/build/gecko/ipc/chromium/src/base/process_util_posix.cc, line 267
I/Gecko   (  205): ###!!! [Parent][MessageChannel] Error: (msgtype=0xCE0003,name=PSharedBufferManager::Msg_DropGrallocBuffer) Channel error: cannot send/recv
.....

If I try to close app with "Home" button I see that Homescreen is restarted.
Assignee: nobody → ferjmoreno
Priority: -- → P1
Have you disabled the network.disable.ipc.security pref?  This is likely bug 1125961.
Flags: needinfo?(azasypkin)
(In reply to Ben Kelly [:bkelly] from comment #1)
> Have you disabled the network.disable.ipc.security pref?  This is likely bug
> 1125961.

Nope, let me check, thanks!
Also, can you link to the service worker that is running for the home screen?  It was not obvious from the PR in the other bug.
(In reply to Oleg Zasypkin [:azasypkin][⏰UTC+1] from comment #2)
> (In reply to Ben Kelly [:bkelly] from comment #1)
> > Have you disabled the network.disable.ipc.security pref?  This is likely bug
> > 1125961.
> 
> Nope, let me check, thanks!

Well, same thing with network.disable.ipc.security set to "true".

> Also, can you link to the service worker that is running for the home
> screen?  It was not obvious from the PR in the other bug.

Sorry for not being clear, there is no service worker in home screen, only inside apps that I run from home screen.

Both apps [1] and [2] are packaged (hence work via app:// protocol), "make raptor" with PR from bug 1210704 just preloads those packages and pushes them to the device. 

[1] https://github.com/fxos-performance/service-worker-empty
[2] https://github.com/fxos-performance/service-worker-simple-fetch
Flags: needinfo?(azasypkin)
Depends on: 1110010
Attached file logcat
This is what the logcat says with a debug build.

It seems that this is the same error that we are seeing on bug 1179191 and others and that should be fixed by the refactor from bug 1182117.
Depends on: 1182117
Doesn't the app open in a new content process?  Why is the service worker attempting to run the app's service worker?  Its possible we have a new b2g problem here.
Flags: needinfo?(ferjmoreno)
That last comment should read "Why is the homescreen attempting to run the app's service worker?"
Yes, the app is opened in a new content process (pid: 10863), but as you say, it seems that we are accessing the sw script cache from the homescreen process (pid: 10769). I am trying to figure out why.
Flags: needinfo?(ferjmoreno)
Is it running ServiceWorkerManager::SoftUpdate() in the homescreen?  It could be the update-on-navigation logic triggering a propagation of the update to the homescreen process.
Yes, that seems to be the case. And not only the homescreen is receiving the propagation:

Breakpoint 1, mozilla::dom::workers::ServiceWorkerRegisterJob::Start (this=0xad684920) at /Volumes/mozilladev/m-i/dom/workers/ServiceWorkerManager.cpp:978
978	    Update();
(gdb) bt
#0  mozilla::dom::workers::ServiceWorkerRegisterJob::Start (this=0xad684920) at /Volumes/mozilladev/m-i/dom/workers/ServiceWorkerManager.cpp:978
#1  0xb414207c in mozilla::dom::workers::ServiceWorkerJobQueue::Append (this=0xad1cf9c0, aJob=0xad684920) at /Volumes/mozilladev/m-i/dom/workers/ServiceWorkerManager.cpp:204
#2  0xb414d5dc in mozilla::dom::workers::ServiceWorkerManager::SoftUpdate (this=this@entry=0xaff385c0, aScopeKey=..., aScope=..., aCallback=aCallback@entry=0x0)
    at /Volumes/mozilladev/m-i/dom/workers/ServiceWorkerManager.cpp:3380
#3  0xb414d64a in mozilla::dom::workers::ServiceWorkerManager::SoftUpdate (this=this@entry=0xaff385c0, aOriginAttributes=..., aScope=..., aCallback=aCallback@entry=0x0)
    at /Volumes/mozilladev/m-i/dom/workers/ServiceWorkerManager.cpp:3332
#4  0xb415523a in mozilla::dom::workers::ServiceWorkerManagerChild::RecvNotifySoftUpdate (this=<optimized out>, aOriginAttributes=..., aScope=...)
    at /Volumes/mozilladev/m-i/dom/workers/ServiceWorkerManagerChild.cpp:45
#5  0xb345d9dc in mozilla::dom::PServiceWorkerManagerChild::OnMessageReceived (this=0xb04fa3a0, msg__=...) at PServiceWorkerManagerChild.cpp:341
#6  0xb339e5fe in mozilla::ipc::PBackgroundChild::OnMessageReceived (this=0xb0432c00, msg__=...) at PBackgroundChild.cpp:1555
Andrea, should we be running the PropagateSoftUpdate() mechanism when .update() is executed by script in a child process?  Or is it only intended to be run due to about:serviceworkers?

I wonder if we can make PropagateSoftUpdate() check to see if the given principal matches the appId of the current child process.
Flags: needinfo?(amarchesini)
Summary: [B2G] Homescreen crashes if user tries to run app that uses Service Workers after restart → [B2G] All the rest of content processes crash if user tries to run app that uses Service Workers after restart
This happens not only with apps but also with regular web pages with sw registered.
Fernando, is there a way to determine the appID of the current process?  Could you maybe try to work around this issue by doing the check I suggested in comment 11?  Basically make PropagateSoftUpdate() not do anything in processes that don't match the principal app ID?
Flags: needinfo?(ferjmoreno)
(In reply to Ben Kelly [:bkelly] from comment #13)
> Fernando, is there a way to determine the appID of the current process? 
> Could you maybe try to work around this issue by doing the check I suggested
> in comment 11?  Basically make PropagateSoftUpdate() not do anything in
> processes that don't match the principal app ID?

Wouldn't this work around break requesting a SW update from about:sw?
Flags: needinfo?(ferjmoreno)
(In reply to Ben Kelly [:bkelly] from comment #11)
> Andrea, should we be running the PropagateSoftUpdate() mechanism when
> .update() is executed by script in a child process?  Or is it only intended
> to be run due to about:serviceworkers?

In both cases we do it.

> I wonder if we can make PropagateSoftUpdate() check to see if the given
> principal matches the appId of the current child process.

We do it. I mean, PropagetSoftUpdate() uses the OriginAttribute and we use that to check if the current SWM of the current process has a valid registration. But yes, we should check if the current process is actually able to manage that update.
Flags: needinfo?(amarchesini)
> Wouldn't this work around break requesting a SW update from about:sw?

It should not. Do you mind if I take this bug?
(In reply to Andrea Marchesini (:baku) from comment #16)
> > Wouldn't this work around break requesting a SW update from about:sw?
> 
> It should not. Do you mind if I take this bug?

If you don't have a process running for that appID, nothing would happen.  Pressing the update button would be a no-op.  But it wouldn't crash either.  Sounds like an improvement to me.

Please do take it!  But its probably a lower priority than v1 stuff for this week if you have any of that on your plate.  Thanks!
Ok, thanks for the explanation. I agree that no-op is better than crash, but we may still want to disable the update and unregister buttons from the about:sw settings app panel. I spoke with Andrea and I am going to try the work around that you suggested. Thank you!
Attached patch WIP (obsolete) — Splinter Review
Attached patch WIP2 (obsolete) — Splinter Review
Comment on attachment 8690744 [details] [diff] [review]
WIP2

Review of attachment 8690744 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/ServiceWorkerManagerParent.cpp
@@ +162,5 @@
>  }
>  
> +already_AddRefed<ContentParent>
> +ServiceWorkerManagerParent::GetContentParent()
> +{

AssertIsOnBackgroundThread();

and write a comment saying that this object must be released on main-thread.

::: dom/workers/ServiceWorkerManagerParent.h
@@ +28,5 @@
>  
>  public:
> +  NS_INLINE_DECL_REFCOUNTING(ServiceWorkerManagerParent)
> +
> +  bool ActorDestroyed()

const

@@ +38,5 @@
>    {
>      return mID;
>    }
>  
> +  already_AddRefed<ContentParent> GetContentParent();

maybe can be const as well.

::: dom/workers/ServiceWorkerManagerService.cpp
@@ +8,5 @@
>  #include "ServiceWorkerManagerParent.h"
>  #include "ServiceWorkerRegistrar.h"
>  #include "mozilla/ipc/BackgroundParent.h"
>  #include "mozilla/unused.h"
> +#include "mozilla/dom/TabParent.h"

alphabetic order.

@@ +23,5 @@
>  ServiceWorkerManagerService* sInstance = nullptr;
>  
> +struct NotifySoftUpdateData
> +{
> +  NS_INLINE_DECL_REFCOUNTING(NotifySoftUpdateData)

why refcounted?

@@ +33,5 @@
> +                       already_AddRefed<ContentParent> aContentParent)
> +    : mParent(aParent)
> +    , mContentParent(aContentParent)
> +  {
> +  }

Add:

~NotifySoftUpdateData()
{
  MOZ_ASSERT(!mContentParent);
  MOZ_ASSERT(In which thread are you going to delete it?)
}

@@ +95,5 @@
> +          }
> +          (*mData)[i]->mParent = nullptr;
> +          (*mData)[i]->mContentParent = nullptr;
> +          (*mData)[i] = nullptr;
> +          (*mData).RemoveElementAt(i);

You should not need to nullify all the elements before removing it.

@@ +106,5 @@
> +    AssertIsOnBackgroundThread();
> +
> +    for (uint32_t i = 0; i < (*mData).Length(); ++i) {
> +      if (!(*mData)[i]->mParent->ActorDestroyed()) {
> +        Unused << (*mData)[i]->mParent->SendNotifySoftUpdate(mOriginAttributes,

MOZ_ASSERT(!mData[i]->mContentParent);

@@ +112,5 @@
> +      }
> +    }
> +
> +    RefPtr<CleanUpNotifySoftUpdateRunnable> runnable =
> +      new CleanUpNotifySoftUpdateRunnable(mData);

This is not needed. yes.

@@ +221,5 @@
>                                        const nsAString& aScope)
>  {
>    AssertIsOnBackgroundThread();
>  
> +  nsTArray<RefPtr<NotifySoftUpdateData> >* notifySoftUpdateDataArray =

no space between > >

@@ +238,5 @@
> +      Unused << parent->SendNotifySoftUpdate(aOriginAttributes, scope);
> +      continue;
> +    }
> +
> +    RefPtr<NotifySoftUpdateData> data =

I don't think we need to have it refcounted.
Just move the ownership of this array to the runnable with a SwapElements().

@@ +243,5 @@
> +      new NotifySoftUpdateData(parent.forget(), contentParent.forget());
> +    notifySoftUpdateDataArray->AppendElement(data.forget());
> +
> +    RefPtr<NotifySoftUpdateIfPrincipalOkRunnable> runnable =
> +      new NotifySoftUpdateIfPrincipalOkRunnable(notifySoftUpdateDataArray,

We want just 1 single runnable, right? This should be outside the for loop.
Attachment #8690744 - Flags: feedback+
Attached patch v1 (obsolete) — Splinter Review
Thank you for all your feedback, Andrea!
Attachment #8690001 - Attachment is obsolete: true
Attachment #8690744 - Attachment is obsolete: true
Attachment #8690983 - Flags: review?(amarchesini)
Comment on attachment 8690983 [details] [diff] [review]
v1

Review of attachment 8690983 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/ServiceWorkerManagerParent.h
@@ +38,5 @@
>    {
>      return mID;
>    }
>  
> +  already_AddRefed<ContentParent> GetContentParent();

const?

::: dom/workers/ServiceWorkerManagerService.cpp
@@ +45,5 @@
> +class NotifySoftUpdateIfPrincipalOkRunnable final : public nsRunnable
> +{
> +public:
> +  NotifySoftUpdateIfPrincipalOkRunnable(
> +      nsAutoPtr<nsTArray<nsAutoPtr<NotifySoftUpdateData>>> aData,

1. nsAutoPtr<...>& aData,
2. This should be a nsAutoPtr<nsTArray<NotifySoftUpdateData>>

@@ +47,5 @@
> +public:
> +  NotifySoftUpdateIfPrincipalOkRunnable(
> +      nsAutoPtr<nsTArray<nsAutoPtr<NotifySoftUpdateData>>> aData,
> +      const OriginAttributes& aOriginAttributes,
> +      const nsString& aScope)

const nsAString&

@@ +69,5 @@
> +          (*mData)[i]->mContentParent->GetManagedTabContext();
> +        for (uint32_t j = 0; j < contextArray.Length(); ++j) {
> +          if ((contextArray[j].OwnOrContainingAppId() == mOriginAttributes.mAppId) &&
> +              (contextArray[j].IsBrowserElement() == mOriginAttributes.mInBrowser)) {
> +            (*mData)[i]->mContentParent = nullptr;

I think you should move this immediately after:

nsTArray<TabContext> contextArray = ...
mData[i]->mContentParent = nullptr;

for (...

@@ +76,5 @@
> +          (*mData)[i]->mParent = nullptr;
> +          (*mData)[i]->mContentParent = nullptr;
> +        }
> +      }
> +      mBackgroundThread->Dispatch(this, NS_DISPATCH_NORMAL);

You don't need to dispatch it to the background thread if we don't have any parent to notify.

@@ +95,5 @@
> +    return NS_OK;
> +  }
> +
> +private:
> +  nsAutoPtr<nsTArray<nsAutoPtr<NotifySoftUpdateData>>> mData;

nsAutoPtr<nsTArray<NotifySoftUpdateData>> mData;

@@ +196,5 @@
>                                        const nsAString& aScope)
>  {
>    AssertIsOnBackgroundThread();
>  
> +  nsString scope(aScope);

This is not needed. Use aScope directly.

@@ +197,5 @@
>  {
>    AssertIsOnBackgroundThread();
>  
> +  nsString scope(aScope);
> +  nsAutoPtr<nsTArray<nsAutoPtr<NotifySoftUpdateData>>> notifySoftUpdateDataArray(

nsAutoPtr<nsTArray<NotifySoftUpdateData>>

@@ +218,5 @@
>        parentFound = true;
>      }
>  #endif
> +
> +    nsAutoPtr<NotifySoftUpdateData> data(new NotifySoftUpdateData(

notifySoftUpdateData* data = notifySoftUpdateDataArray->AppendElement();
data->mContentParent.swap(contentParent);
data->mParent.swap(parent);

and of course, mContentPArent and mParent must be RefPtr<>.
Otherwise we are leaking them. They are not refPtr, so when you set them to null, we are decreasing the refcounting.
Attachment #8690983 - Flags: review?(amarchesini) → review-
Attached patch v2Splinter Review
Attachment #8690983 - Attachment is obsolete: true
Attachment #8691375 - Flags: review?(amarchesini)
Comment on attachment 8691375 [details] [diff] [review]
v2

Review of attachment 8691375 [details] [diff] [review]:
-----------------------------------------------------------------

looks good with these changes.

::: dom/workers/ServiceWorkerManagerService.cpp
@@ +46,5 @@
> +    , mBackgroundThread(NS_GetCurrentThread())
> +  {
> +    AssertIsInMainProcess();
> +    AssertIsOnBackgroundThread();
> +

MOZ_ASSERT(mData && !aData);

@@ +51,5 @@
> +    MOZ_ASSERT(mBackgroundThread);
> +  }
> +
> +  NS_IMETHODIMP
> +  Run()

override

@@ +54,5 @@
> +  NS_IMETHODIMP
> +  Run()
> +  {
> +    if (NS_IsMainThread()) {
> +      for (uint32_t i = 0; i < mData->Length(); ++i) {

NotifySoftUpdateData& data = mData->ElementAt(i);
then use data instead (*mData[i])

@@ +57,5 @@
> +    if (NS_IsMainThread()) {
> +      for (uint32_t i = 0; i < mData->Length(); ++i) {
> +        nsTArray<TabContext> contextArray =
> +          (*mData)[i].mContentParent->GetManagedTabContext();
> +        (*mData)[i].mContentParent = nullptr;

write a comment here.

@@ +66,5 @@
> +          }
> +          (*mData)[i].mParent = nullptr;
> +        }
> +      }
> +      mBackgroundThread->Dispatch(this, NS_DISPATCH_NORMAL);

MOZ_ASSERT about the result of this call.

@@ +73,5 @@
> +
> +    AssertIsOnBackgroundThread();
> +
> +    for (uint32_t i = 0; i < mData->Length(); ++i) {
> +      MOZ_ASSERT(!((*mData)[i].mContentParent));

ditto.

@@ +75,5 @@
> +
> +    for (uint32_t i = 0; i < mData->Length(); ++i) {
> +      MOZ_ASSERT(!((*mData)[i].mContentParent));
> +      ServiceWorkerManagerParent* parent = (*mData)[i].mParent;
> +      if (parent && !(parent->ActorDestroyed())) {

!parent->ActorDestroyed()

@@ +79,5 @@
> +      if (parent && !(parent->ActorDestroyed())) {
> +        Unused << (*mData)[i].mParent->SendNotifySoftUpdate(mOriginAttributes,
> +                                                            mScope);
> +      }
> +      parent = nullptr;

this doesn't make any sense. You are nullifying a pointer.
What you want is: (*mData)[i].mParent = nullptr;
But again, SWMParent is thread safe, so you don't need to do it at all,
Just remove this line.

@@ +206,2 @@
>  #ifdef DEBUG
>      if (parent->ID() == aParentID) {

You want to move this at the beginning of the for loop, otherwise if we don't have contentParent, but this parentID == aPArentID we don't set parentFound to true.

@@ +215,3 @@
>    }
>  
> +  if (!notifySoftUpdateDataArray->Length()) {

if (notifySoftUpdateDataArray->IsEmpty())

@@ +219,5 @@
> +  }
> +
> +  RefPtr<NotifySoftUpdateIfPrincipalOkRunnable> runnable =
> +    new NotifySoftUpdateIfPrincipalOkRunnable(notifySoftUpdateDataArray,
> +                                              aOriginAttributes, aScope);

MOZ_ASSERT(!notifySoftUpdateDataArray);
Attachment #8691375 - Flags: review?(amarchesini) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/94ac8462f12422760091070377787553cf9a647d
Bug 1225121 - [B2G] All the rest of content processes crash if user tries to run app that uses Service Workers after restart. r=baku
Do you know how can I run wpt-e10s tests on try?
Flags: needinfo?(philringnalda)
I'd bet on web-platform-tests-e10s-4.

But there's no need to push again, when you already have a perfectly good build. "pip install mozci", and then "mozci-trigger -b "Ubuntu VM 12.04 try opt test web-platform-tests-e10s-4" -r 8f5cf54db384 --times 20" with the buildername coming by just finding someone else's wasteful -u all push and copying it from the self-serve "BuildAPI" link in the dropdown menu at the top right of the push on treeherder is far faster and easier and more efficient (and you don't need to do it now, since I already did it for you on https://treeherder.mozilla.org/#/jobs?repo=try&revision=40f0addd8ecd&selectedJob=14202849). You just have to be sure you're copying the buildername for the right platform, that being why you got a Linux64 build that I had to cancel, since mozci-trigger helpfully triggered a Linux64 build when I accidentally copied the buildername for 64 instead of 32.
Flags: needinfo?(philringnalda)
Thank you Phil!
Duplicate of this bug: 1179191
Depends on: 1227015
https://hg.mozilla.org/integration/mozilla-inbound/rev/184a432d20c5cb63d86d08bd7459a53bb81b2466
Bug 1225121 - [B2G] All the rest of content processes crash if user tries to run app that uses Service Workers after restart. r=baku
https://hg.mozilla.org/mozilla-central/rev/184a432d20c5
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.