Closed Bug 1013571 Opened 10 years ago Closed 10 years ago

support PBackground on workers

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla33

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

Attachments

(1 file, 25 obsolete files)

Part 1: Support detaching PBackground from current thread. (v0) 10 years ago Ben Kelly [:bkelly, not reviewing] 2.98 KB, patch	bent.mozilla : feedback+	Details \| Diff \| Splinter Review
Part 2: Enable PBackground in workers. (v0) 10 years ago Ben Kelly [:bkelly, not reviewing] 2.98 KB, patch		Details \| Diff \| Splinter Review
Part 1: Support detaching PBackground from current thread. (v1) 10 years ago Ben Kelly [:bkelly, not reviewing] 3.33 KB, patch		Details \| Diff \| Splinter Review
Part 2: Enable PBackground in workers. (v1) 10 years ago Ben Kelly [:bkelly, not reviewing] 8.22 KB, patch		Details \| Diff \| Splinter Review
Part 2: Enable PBackground in workers. (v2) 10 years ago Ben Kelly [:bkelly, not reviewing] 8.04 KB, patch		Details \| Diff \| Splinter Review
Part 3: Allow IPC DoWorkRunnables to be canceled. (v0) 10 years ago Ben Kelly [:bkelly, not reviewing] 3.25 KB, patch		Details \| Diff \| Splinter Review
Part 4: Add PBackgroundTest mode to workers (v0) 10 years ago Ben Kelly [:bkelly, not reviewing] 3.81 KB, patch		Details \| Diff \| Splinter Review
Part 3: Allow IPC DoWorkRunnables to be canceled. (v1) 10 years ago Ben Kelly [:bkelly, not reviewing] 3.20 KB, patch		Details \| Diff \| Splinter Review
Part 1: Support detaching PBackground from current thread. (v2) 10 years ago Ben Kelly [:bkelly, not reviewing] 3.57 KB, patch	bent.mozilla : review-	Details \| Diff \| Splinter Review
Part 2: Enable PBackground in workers. (v3) 10 years ago Ben Kelly [:bkelly, not reviewing] 7.98 KB, patch	bent.mozilla : review-	Details \| Diff \| Splinter Review
Part 4: Add PBackgroundTest mode to workers (v1) 10 years ago Ben Kelly [:bkelly, not reviewing] 3.99 KB, patch		Details \| Diff \| Splinter Review
Part 1: Support detaching PBackground from current thread. (v3) 10 years ago Ben Kelly [:bkelly, not reviewing] 4.47 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v0) 10 years ago Ben Kelly [:bkelly, not reviewing] 18.56 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v1) 10 years ago Ben Kelly [:bkelly, not reviewing] 18.72 KB, patch	bent.mozilla : review-	Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v2) 10 years ago Ben Kelly [:bkelly, not reviewing] 19.75 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v3) 10 years ago Ben Kelly [:bkelly, not reviewing] 19.75 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v4) 10 years ago Ben Kelly [:bkelly, not reviewing] 19.87 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v5) 10 years ago Ben Kelly [:bkelly, not reviewing] 23.07 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v6) 10 years ago Ben Kelly [:bkelly, not reviewing] 20.96 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v7) 10 years ago Ben Kelly [:bkelly, not reviewing] 22.06 KB, patch		Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v8) 10 years ago Ben Kelly [:bkelly, not reviewing] 21.13 KB, patch	bent.mozilla : review+	Details \| Diff \| Splinter Review
Support PBackground on workers. (merged v9) 10 years ago Ben Kelly [:bkelly, not reviewing] 21.25 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
Reland support for PBackground on workers. (v10) 10 years ago Ben Kelly [:bkelly, not reviewing] 21.95 KB, patch	bent.mozilla : review+	Details \| Diff \| Splinter Review
Reland support for PBackground on workers. (v11) 10 years ago Ben Kelly [:bkelly, not reviewing] 21.95 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
Reland support for PBackground on workers. (v12) 10 years ago Ben Kelly [:bkelly, not reviewing] 23.97 KB, patch		Details \| Diff \| Splinter Review
Reland support for PBackground on workers. (v13) 10 years ago Ben Kelly [:bkelly, not reviewing] 23.60 KB, patch	bent.mozilla : review+	Details \| Diff \| Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Description

•

10 years ago

In bug 940273 we'd like to use PBackground to implement the ServiceWorker cache interface between the child and parent processes.  Currently PBackground is not available on worker threads.

From talking with Ben Turner about this, it appears the main issue is that PBackground uses a thread-local to store some state.  Since workers reuse and share existing threads, this thread-local must be cleaned up to avoid having IPC messages routed to the wrong worker.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 1

•

10 years ago

It seems like RuntimeService::WorkerThread::SetWorker() may be a good place to teardown the previous PBackground child and setup the new PBackground child:

http://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2377

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 2

•

10 years ago

Attached patch Part 1: Support detaching PBackground from current thread. (v0) (obsolete) — Details — Splinter Review

Work-in-progress to allow the background child to be detached from the current thread.

Ben, is there anything I need to do other than null out the thread local and let the registered destructor run?

Attachment #8426549 - Flags: feedback?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 3

•

10 years ago

Attached patch Part 2: Enable PBackground in workers. (v0) (obsolete) — Details — Splinter Review

Work-in-progress to enable PBackground in workers.  I'm running into a fun link error on windows at the moment, though, so I'm not sure if this actually works yet or not.

It links on linux, but I don't have a convenient way to do adhoc testing on my headless linux VM.  Here is a try build to see if it completely blows up:

https://tbpl.mozilla.org/?tree=Try&rev=ccf1ecc1a7f1

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 4

•

10 years ago

Comment on attachment 8426549 [details] [diff] [review]
Part 1: Support detaching PBackground from current thread. (v0)

Review of attachment 8426549 [details] [diff] [review]:
-----------------------------------------------------------------

I believe that this is the basic approach that should work, yes.

However, it's unclear to me what happens when Close() is called on a channel when there are pending messages in the queue... This will need some testing because I expect that there will be problems.

Attachment #8426549 - Flags: feedback?(bent.mozilla) → feedback+

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 5

•

10 years ago

Comment on attachment 8426554 [details] [diff] [review]
Part 2: Enable PBackground in workers. (v0)

Review of attachment 8426554 [details] [diff] [review]:
-----------------------------------------------------------------

For workers I think that we shouldn't do a 'primer' pattern (like we do with the main thread of child processes) but instead explicitly wait for the actor to be connected before we run the worker's script. It should only be a slight startup delay but then the code that uses such actors can be much simpler by assuming that it always has already been connected.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 6

•

10 years ago

Attached patch Part 1: Support detaching PBackground from current thread. (v1) (obsolete) — Details — Splinter Review

Fix a worker shutdown crash by making ChildImpl use thread safe reference counting.

I still need to verify if this works when closing a channel with pending messages.

Attachment #8426549 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 7

•

10 years ago

Attached patch Part 2: Enable PBackground in workers. (v1) (obsolete) — Details — Splinter Review

Delay worker startup until after PBackground actor has been created.  This was necessary not just for the reasons discussed in comment 5, but also to avoid assertions about running non-worker runnables on the worker thread after start.

Note, I currently have an extra bounce to the main thread here to make initialization work with the current assertions.  See the big XXX comment in the patch for details.  I hope to tweak the assertions around with an extra state variable to avoid this.

Attachment #8426554 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 8

•

10 years ago

Attached patch Part 2: Enable PBackground in workers. (v2) (obsolete) — Details — Splinter Review

> Note, I currently have an extra bounce to the main thread here to make
> initialization work with the current assertions.  See the big XXX comment in
> the patch for details.  I hope to tweak the assertions around with an extra
> state variable to avoid this.

Fix this by treating |mWorkerPrivate == null && PR_CurrentThread() == mThread| as legal in WorkerThread::Dispatch().

Attachment #8428269 - Attachment is obsolete: true

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 9

•

10 years ago

Hrm, you shouldn't need to do any thread bouncing here I don't think...

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 10

•

10 years ago

So these are the current steps as I think I have them arranged:

1) RuntimeService::ScheduleWorker() schedules a WorkerThreadInitPBackgroundRunnable() to run on the selected WorkerThread.
2) WorkerThreadInitPBackgroundRunnable() calls BackgroundChild::GetOrCreateForCurrentThread() from the WorkerThread.  A WorkerBackgroundChildCallback() is registered with the WorkerThread as the target.
3) ChildImpl::GetOrCreateForCurrentThread() calls OpenProtocolOnMainThread() from the WorkerThread.
4) The actor protocol gets opened on the main thread.
5) WorkerBackgroundChildCallback is called on the WorkerThread.
6) WorkerBackgroundChildCallback dispatches a WorkerThreadPrimaryRunnable to the WorkerThread.
7) WorkerThreadPrimaryRunnable executes WorkerPrivate::DoRunLoop() for the WorkerThread.

The issue described in comment 7 was that I initially could not perform step (6) with hitting an assert.  This was due to WorkerThread::Dispatch() expecting mWorkerPrivate to be set before any runnable is dispatched from its own thread.  I worked around this with a main thread bounce.

In attachment 8429382 [details] [diff] [review] I loosened this restriction a bit so that the main thread bounce is not needed.

Does this all sound correct/ok?

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 11

•

10 years ago

Attached patch Part 3: Allow IPC DoWorkRunnables to be canceled. (v0) (obsolete) — Details — Splinter Review

Allow IPC DoWorkRunnable objects to be canceled.  This is a requirement for runnables executed on active worker threads.

Right now the Cancel() method is implemented in a somewhat lame manner.  I prevent the Run() method from executing if cancel was previously called, but I don't terminate nested execution within the MessageLoop.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 12

•

10 years ago

Attached patch Part 4: Add PBackgroundTest mode to workers (v0) (obsolete) — Details — Splinter Review

When ENABLE_TESTS is compiled in and pbackground.testing is true, then send PBackgroundTest messages at the beginning and end of worker threads.

Right now this uses a fixed test string, but I plan to make it random or unique per-thread in order to help verify that message routing works correctly.

Note that the test messages at the end of the worker thread do trigger these warnings on the console:

###!!! [Parent][MessageChannel] Error: Channel closing: too late to send/recv, messages will be lost

On b2g these are generally deemed harmless.  Is that the case here as well?

I can drain the MessageChannel on worker close if necessary, but it would be nice to avoid the added complexity.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 13

•

10 years ago

https://tbpl.mozilla.org/?tree=Try&rev=42c4242b07cb

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 14

•

10 years ago

Attached patch Part 3: Allow IPC DoWorkRunnables to be canceled. (v1) (obsolete) — Details — Splinter Review

Fix the build. Seems perhaps ./mach build binaries was not enough to catch that for some reason.

https://tbpl.mozilla.org/?tree=Try&rev=fd772b7a06f8

Attachment #8429578 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 15

•

10 years ago

Attached patch Part 1: Support detaching PBackground from current thread. (v2) (obsolete) — Details — Splinter Review

When detaching the PBackground thread local, explicitly flush the pending queue before closing the actor.  I'm not sure this really buys us much, though, as there will still be a race between the flush and the Close().

Attachment #8428267 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 16

•

10 years ago

Attached patch Part 2: Enable PBackground in workers. (v3) (obsolete) — Details — Splinter Review

Attachment #8429382 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 17

•

10 years ago

Attached patch Part 4: Add PBackgroundTest mode to workers (v1) (obsolete) — Details — Splinter Review

Randomize PBackgroundTest constructor values to help catch any cross-posting of messages.

Attachment #8429582 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 18

•

10 years ago

Ben, do you have any thoughts on how to further test PBackground on workers?

Currently I am re-using PBackgroundTest and simply sending a test message at worker startup and shutdown.

Anything else you think I am missing before flagging for review?

Flags: needinfo?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 19

•

10 years ago

Hmm, this try build suggests there might be a problem in Win7 debug M2 with the patches:

 https://tbpl.mozilla.org/?tree=Try&rev=67265addfa29

An assertion failure in:

 /tests/dom/datastore/tests/test_sync_worker.html

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 20

•

10 years ago

That try also shows OSX 10.6 M5 crashing in DeadlockDetector::CheckAquisition() which was called under MessageChannel::OnMaybeDequeueOne().

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 21

•

10 years ago

I'm having a hard time seeing how this error could be caused by PBackground changes since its in PContentParent's destructor:

  https://tbpl.mozilla.org/php/getParsedLog.php?id=40567673&tree=Try&full=1#error2

Should the IToplevelProtocol LinkedList be protected by a mutex?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 22

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #21)
> Should the IToplevelProtocol LinkedList be protected by a mutex?

This sounds a lot like bug 976479 ...

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 23

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #20)
> That try also shows OSX 10.6 M5 crashing in
> DeadlockDetector::CheckAquisition() which was called under
> MessageChannel::OnMaybeDequeueOne().

This looks like bug 966972, so probably unrelated to these changes.

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Comment 24

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #21)
> I'm having a hard time seeing how this error could be caused by PBackground
> changes since its in PContentParent's destructor:
> 
>  
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=40567673&tree=Try&full=1#error2
> 
> Should the IToplevelProtocol LinkedList be protected by a mutex?

As far as I can tell this is a case of a PContentParent being cycle collected before all of its dependent, open actors have been released.  This seems unrelated to the changes here so I opened bug 1017639 to track that issue.  Its very low frequency (hasn't reproduced after 20 pushes) but it would be worth fixing if we can.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 25

•

10 years ago

Since try looks relatively green I'm going to go ahead and ask for review here.

Flags: needinfo?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Attachment #8430088 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Attachment #8430103 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Attachment #8429674 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Attachment #8430104 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 26

•

10 years ago

Comment on attachment 8430088 [details] [diff] [review]
Part 1: Support detaching PBackground from current thread. (v2)

Review of attachment 8430088 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/BackgroundImpl.cpp
@@ +377,5 @@
>    {
>      AssertIsOnMainThread();
>    }
>  
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(ChildImpl)

This shouldn't be necessary... How is this ever being touched on a different thread?

@@ +1662,5 @@
> +  auto threadLocalInfo =
> +    static_cast<ThreadLocalInfo*>(PR_GetThreadPrivate(sThreadLocalIndex));
> +  if (threadLocalInfo) {
> +    if (threadLocalInfo->mActor) {
> +      threadLocalInfo->mActor->FlushPendingInterruptQueue();

Splitting patches like this is hard to follow...

Attachment #8430088 - Flags: review?(bent.mozilla) → review-

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 27

•

10 years ago

Comment on attachment 8430088 [details] [diff] [review]
Part 1: Support detaching PBackground from current thread. (v2)

Review of attachment 8430088 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/BackgroundChild.h
@@ +57,5 @@
>    GetOrCreateForCurrentThread(nsIIPCBackgroundChildCreateCallback* aCallback);
>  
> +  // See above.
> +  static void
> +  DetachFromCurrentThread();

Also let's call this something like "CloseForCurrentThread"... Detaching implies something else.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 28

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #26)
> ::: ipc/glue/BackgroundImpl.cpp
> @@ +377,5 @@
> >    {
> >      AssertIsOnMainThread();
> >    }
> >  
> > +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(ChildImpl)
> 
> This shouldn't be necessary... How is this ever being touched on a different
> thread?

I got consistent crashes when clearing the thread local without this.  I believe it was dying on this line:

  http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsISupportsImpl.h#429

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 29

•

10 years ago

Comment on attachment 8430103 [details] [diff] [review]
Part 2: Enable PBackground in workers. (v3)

Review of attachment 8430103 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/RuntimeService.cpp
@@ +907,5 @@
>  private:
>    WorkerPrivate* mWorkerPrivate;
>  };
>  
> +class WorkerThreadInitPBackgroundRunnable MOZ_FINAL : public nsRunnable

Rather than splitting this process into two runnables I think it would be much simpler to have WorkerThreadPrimaryRunnable::Run() just use NS_ProcessNextEvent and spin a nested loop waiting for the actor to be created.

Something like:

  WorkerThreadPrimaryRunnable::Run()
  {
    // ...

    bool done;
    bool success;
    nsCOMPtr<nsIIPCBackgroundChildCreateCallback> callback =
      new WorkerBackgroundChildCallback(&done, &success);

    if (!BackgroundChild::GetOrCreateForCurrentThread(callback)) {
      // Handle failure.
    }

    while (!done) {
      if (NS_WARN_IF(!NS_ProcessNextEvent(mThread))) {
        // Handle failure.
      }
    }

    if (!success) {
      // Handle failure.
    }

    // ...
  }

Does that make sense?

@@ +2683,5 @@
>  
>    // It is no longer safe to touch mWorkerPrivate.
>    mWorkerPrivate = nullptr;
>  
> +  BackgroundChild::DetachFromCurrentThread();

Hrm, I think it might be safer to call this earlier... Too many messages may go to components that expect the runtime to exist. Maybe move this to right before we unregister the runtime from the sampler?

Attachment #8430103 - Flags: review?(bent.mozilla) → review-

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 30

•

10 years ago

(In reply to Ben Kelly (PTO, back Mon June 9) [:bkelly] from comment #28)
> I got consistent crashes when clearing the thread local without this.

Hrm, I guess we'll need stacks.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 31

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #30)
> (In reply to Ben Kelly (PTO, back Mon June 9) [:bkelly] from comment #28)
> > I got consistent crashes when clearing the thread local without this.
> 
> Hrm, I guess we'll need stacks.

I got this without the threadsafe refcounting on my mac closing a tab:

Hit MOZ_CRASH(ChildImpl not thread-safe) at /Users/bkelly/devel/hg/mozilla-central/ipc/glue/BackgroundImpl.cpp:381
PR_SetThreadPrivate+0x000000A6 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x002159A6]
Hit MOZ_CRASH(ChildImpl not thread-safe) at /Users/bkelly/devel/hg/mozilla-central/ipc/glue/BackgroundImpl.cpp:381
Hit MOZ_CRASH(ChildImpl not thread-safe) at /Users/bkelly/devel/hg/mozilla-central/ipc/glue/BackgroundImpl.cpp:381
mozilla::ipc::BackgroundChild::DetachFromCurrentThread()+0x00000035 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x003FE495]
PR_SetThreadPrivate+0x000000A6 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x002159A6]
PR_SetThreadPrivate+0x000000A6 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x002159A6]
(anonymous namespace)::WorkerThreadPrimaryRunnable::Run()+0x00000790 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x016930D0]
mozilla::ipc::BackgroundChild::DetachFromCurrentThread()+0x00000035 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x003FE495]
mozilla::ipc::BackgroundChild::DetachFromCurrentThread()+0x00000035 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x003FE495]
nsThread::ProcessNextEvent(bool, bool*)+0x0000050B [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x000B819B]
(anonymous namespace)::WorkerThreadPrimaryRunnable::Run()+0x00000790 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x016930D0]
(anonymous namespace)::WorkerThreadPrimaryRunnable::Run()+0x00000790 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x016930D0]
NS_ProcessNextEvent(nsIThread*, bool)+0x00000033 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.2.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00020293]

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 32

•

10 years ago

I think I understand why we encounter MOZ_CRASH() when using the non-threadsafe NS_INLINE_DECL_REFCOUNTING macro in ChildImpl.

1) GetOrCreateForCurrentThread() is called on the worker thread
2) CreateActorRunnable is dispatched to the main thread
3) ChildImpl is constructed on the main thread
4) ChildImpl's owning thread is set to the main thread
5) Any call to AddRef() or Release() from the worker thread will trigger MOZ_CRASH() via NS_ASSERT_OWNINGTHREAD()

Right now we can trigger a ChildImpl::AddRef() just by calling GetForCurrentThread().  It seems we can avoid this by using threadLocalInfo->mActor.get() instead of just returning mActor.

We then just have to handle the release side of things.  It seems we must do that by dispatching a runnable to the main thread to destroy the actor.

I'll see if I can get that working.  To be honest, though, I'm not sure what the advantage of this is over just using thread safe reference counting.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 33

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #32)
> Right now we can trigger a ChildImpl::AddRef() just by calling
> GetForCurrentThread().  It seems we can avoid this by using
> threadLocalInfo->mActor.get() instead of just returning mActor.

That isn't possible, we'd be leaking all over. Returning a nsRefPtr like that will convert it to the raw pointer type and will not add a reference.

> We then just have to handle the release side of things.  It seems we must do
> that by dispatching a runnable to the main thread to destroy the actor.

Yes, from my read this should be correct. In general it's always best to create and destroy things on the same thread.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 34

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #33)
> (In reply to Ben Kelly [:bkelly] from comment #32)
> > Right now we can trigger a ChildImpl::AddRef() just by calling
> > GetForCurrentThread().  It seems we can avoid this by using
> > threadLocalInfo->mActor.get() instead of just returning mActor.
> 
> That isn't possible, we'd be leaking all over. Returning a nsRefPtr like
> that will convert it to the raw pointer type and will not add a reference.

It seems that the conditional operator causes a temporary nsRefPtr object to be created here.  If I change the code to use an if-statement I can avoid the temporary.  Alternatively, using .get() avoids the AddRef() call.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 35

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #34)
> It seems that the conditional operator causes a temporary nsRefPtr object to
> be created here.

Oh, ick. That's subtle and horrible.

> If I change the code to use an if-statement I can avoid
> the temporary.

Yes, this one please.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 36

•

10 years ago

Attached patch Part 1: Support detaching PBackground from current thread. (v3) (obsolete) — Details — Splinter Review

Updated part 1 that reverts back to using non-threadsafe refcounts for ChildImpl.  Ben suggested the code to run the ChildImpl::Release on the main thread.

I'll address the remaining review feedback and submit a combined patch for review tomorrow.

Attachment #8430088 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 37

•

10 years ago

Attached patch Support PBackground on workers. (merged v0) (obsolete) — Details — Splinter Review

This addresses previous review feedback and combines the previous parts into a single patch.

https://tbpl.mozilla.org/?tree=Try&rev=fa1d4d2ff3ad

Attachment #8429674 - Attachment is obsolete: true

Attachment #8430103 - Attachment is obsolete: true

Attachment #8430104 - Attachment is obsolete: true

Attachment #8437232 - Attachment is obsolete: true

Attachment #8429674 - Flags: review?(bent.mozilla)

Attachment #8430104 - Flags: review?(bent.mozilla)

Attachment #8437955 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 38

•

10 years ago

It occurs to me I could possibly address the XXX comment in MessagePump.cpp by setting |loop->SetNestableTasksAllowed(false)| in DoWorkRunnable::Cancel().  That's assuming I can get the right loop without running on the thread.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 39

•

10 years ago

Attached patch Support PBackground on workers. (merged v1) (obsolete) — Details — Splinter Review

Updated with a few things I was thinking about last night:

1) Assert that ChildImpl::ActorDestroy() has been called in the thread local destructor in addition to ~ChildImpl.  This avoids racing with the main thread.

2) Don't MOZ_CRASH if we fail to create the PBackground actor.  Instead return NS_ERROR_FAILURE from the PrimaryWorkerRunnable::Run() similar to what is down if the context can't be created.

Attachment #8437955 - Attachment is obsolete: true

Attachment #8437955 - Flags: review?(bent.mozilla)

Attachment #8438678 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 40

•

10 years ago

Comment on attachment 8438678 [details] [diff] [review]
Support PBackground on workers. (merged v1)

Review of attachment 8438678 [details] [diff] [review]:
-----------------------------------------------------------------

This looks really good! The big thing I think that still needs to be worked on is the lengthy comment in BackgroundImpl.cpp

::: dom/workers/RuntimeService.cpp
@@ +62,5 @@
>  #include "SharedWorker.h"
>  #include "WorkerPrivate.h"
>  #include "WorkerRunnable.h"
> +#include "nsIIPCBackgroundChildCreateCallback.h"
> +#include "BackgroundChild.h"

Nit: Please alphabetize these

@@ +917,5 @@
>  
> +class WorkerBackgroundChildCallback MOZ_FINAL :
> +  public nsIIPCBackgroundChildCreateCallback
> +{
> +  bool& mDone;

Nit: I prefer pointers to refs in cases like this because it makes it obvious at the call site that you're modifying the value (e.g. |new WorkerBackgroundChildCallback(&done)|)

@@ +1094,5 @@
> +  {
> +    using mozilla::ipc::BackgroundChild;
> +    using mozilla::ipc::PBackgroundChild;
> +    if (gTestPBackground) {
> +      // Randomize value to validate workers are not cross-posting messages.

Maybe also append the value of PR_GetCurrentThread()?

@@ +1096,5 @@
> +    using mozilla::ipc::PBackgroundChild;
> +    if (gTestPBackground) {
> +      // Randomize value to validate workers are not cross-posting messages.
> +      uint32_t testValue;
> +      PR_GetRandomNoise(&testValue, sizeof(testValue));

Might as well MOZ_RELEASE_ASSERT that PR_GetRandomNoise returns the correct size.

@@ -1537,5 @@
>      return false;
>    }
>  
> -#ifdef DEBUG
> -  thread->SetAcceptingNonWorkerRunnables(false);

I'd actually prefer that we leave this here, and then set true (and then false again) in WorkerThreadPrimaryRunnable::Run() before and after calling SynchronouslyCreatePBackground().

That will reduce the amount of time that something could slip in without triggering the assertions.

@@ +2564,5 @@
>    mWorkerPrivate->AssertIsOnWorkerThread();
> +
> +  // If the PBackground child is not created yet, then we must permit
> +  // blocking event processing to support SynchronouslyCreatePBackground().
> +  MOZ_ASSERT(!aMayWait || !BackgroundChild::GetForCurrentThread());

Hrm, if we're waiting on PBackground then we shouldn't call into mWorkerPrivate here since it's not yet set up. That's probably fine today but I don't want that to change in the future. How about we change this to:

  if (aMayWait) {
    MOZ_ASSERT(aRecursionDepth == 2);
    MOZ_ASSERT(!BackgroundChild::GetForCurrentThread());
    return NS_OK;
  }

@@ +2640,5 @@
> +  if (NS_WARN_IF(NS_FAILED(rv))) {
> +    // XXX need to fire an error at parent.
> +    return rv;
> +  }
> +#ifdef ENABLE_TESTS

Nit: Please add an extra line before #ifdef

@@ +2705,5 @@
>  
> +#ifdef ENABLE_TESTS
> +  mThread->TestPBackground();
> +#endif
> +  BackgroundChild::CloseForCurrentThread();

I think we should do this sooner... Can it be moved to right after the block with DoRunLoop() but before we unregister from the profiler?

@@ +2729,5 @@
> +  bool done = false;
> +  nsCOMPtr<nsIIPCBackgroundChildCreateCallback> callback =
> +    new WorkerBackgroundChildCallback(done);
> +
> +  if (!BackgroundChild::GetOrCreateForCurrentThread(callback)) {

NS_WARN_IF here please, and in the other spots below where you return early due to failures.

@@ +2734,5 @@
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  while (!done) {
> +    if (!NS_ProcessNextEvent(mThread)) {

Nit: Since this matters earlier it would be clearer if this was:

  NS_ProcessNextEvent(mThread, true /* aMayWait */);

::: ipc/glue/BackgroundImpl.cpp
@@ +1612,5 @@
>      static_cast<ThreadLocalInfo*>(PR_GetThreadPrivate(sThreadLocalIndex));
>  
> +  if (threadLocalInfo) {
> +    return threadLocalInfo->mActor;
> +  } else {

Nit: No need for else after a return.

@@ +1678,5 @@
>  
>    return true;
>  }
>  
> +/* static */

Uber-nit: |// static| to match the others.

@@ +1687,5 @@
> +             "BackgroundChild::Startup() was never called!");
> +  auto threadLocalInfo =
> +    static_cast<ThreadLocalInfo*>(PR_GetThreadPrivate(sThreadLocalIndex));
> +  if (threadLocalInfo) {
> +    if (threadLocalInfo->mActor) {

I'm a little concerned about this... If we have a |threadLocalInfo| then someone called |GetOrCreateForCurrentThread()|. If we have an actor already then everything is fine.

If we don't have an actor then the actor is being created somewhere and it just hasn't been assigned yet. In this case we're not going to do anything here which seems wrong. We're basically allowing a race condition to cause different behavior.

I think we could do two things here:

1. MOZ_CRASH if we get into this state.
2. Set some flags and add extra code so that we close the actor once it finally gets created.

What do you think? I would prefer not to allow a racy API like this if it's not too difficult...

::: ipc/glue/MessagePump.cpp
@@ +220,5 @@
>  
>  NS_IMETHODIMP
>  DoWorkRunnable::Run()
>  {
> +  if (mCanceled) {

I actually think we can't bail out here... There's a 1:1 relationship between DoWorkRunnable and tasks in the chromium queue so if we skip one we'll leave something else unprocessed.

For now just have Cancel() call Run() directly and we can revisit if something breaks?

Attachment #8438678 - Flags: review?(bent.mozilla) → review-

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 41

•

10 years ago

Attached patch Support PBackground on workers. (merged v2) (obsolete) — Details — Splinter Review

> @@ +1687,5 @@
> I think we could do two things here:
> 
> 1. MOZ_CRASH if we get into this state.
> 2. Set some flags and add extra code so that we close the actor once it
> finally gets created.

I chose to MOZ_CRASH() for a few reasons:

1) This could also be a double close.
2) Lets not add complexity unless we find we need it.
3) I don't want to add something I can't test and writing good, racy test cases can be hard without a real use case that triggers it.

New try:

https://tbpl.mozilla.org/?tree=Try&rev=fca7c291b8cf

Attachment #8438678 - Attachment is obsolete: true

Attachment #8442596 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 42

•

10 years ago

Also, I assumed if the runnable Cancel() is called, then I can assume Run() won't be called.  Is that correct?  Or do I still need the flag to short-circuit Run() to avoid double execution?

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 43

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #42)
> Also, I assumed if the runnable Cancel() is called, then I can assume Run()
> won't be called.  Is that correct?

This is true for the worker run loop but I can't say for other things.

> Or do I still need the flag to short-circuit Run() to avoid double execution?

I would do that to be safe. Maybe assert too.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 44

•

10 years ago

Attached patch Support PBackground on workers. (merged v3) (obsolete) — Details — Splinter Review

Sorry, realized after posting the previous patch I needed to fix the CloseForCurrentThread() description in BackgroundChild.h.  It now indicates that the call may only be made once and only after the actor has been created.

Also fix another case of whitespace missing around an #ifdef block.

Attachment #8442596 - Attachment is obsolete: true

Attachment #8442596 - Flags: review?(bent.mozilla)

Attachment #8442601 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 45

•

10 years ago

Attached patch Support PBackground on workers. (merged v4) (obsolete) — Details — Splinter Review

Sigh.  Uploaded wrong patch.

Attachment #8442601 - Attachment is obsolete: true

Attachment #8442601 - Flags: review?(bent.mozilla)

Attachment #8442602 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 46

•

10 years ago

Hmm, try is unhappy.  It seems the modified code is allowing the callback from the main thread to occur while SetAcceptingNonWorkerRunnables(false) for some reason.  Investigating.

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Depends on: 1027772

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

No longer depends on: 1027772

Comment 47

•

10 years ago

Attached patch Support PBackground on workers. (merged v5) (obsolete) — Details — Splinter Review

So it turns out at least some, if not all, the try errors were fallout from implementing this:

> >  
> > -#ifdef DEBUG
> > -  thread->SetAcceptingNonWorkerRunnables(false);
> 
> I'd actually prefer that we leave this here, and then set true (and then
> false again) in WorkerThreadPrimaryRunnable::Run() before and after calling
> SynchronouslyCreatePBackground().
> 
> That will reduce the amount of time that something could slip in without
> triggering the assertions.

The problem is that this creates a race between setting SetAcceptingNonWorkerRunnables(false) in ScheduleWorker() and setting it true on the worker thread.

Usually it works, but every now and then we hit this condition:

1) Main thread calls thread->Dispatch() from ScheduleWorker()
2) Worker thread wakes up
3) Worker thread runs PrimaryWorkerRunnable
4) Worker thread calls SetAcceptingNonWorkerRunnables(true)
5) Worker thread blocks waiting for actor callback
6) Main thread wakes up
7) Main thread calls SetAcceptingNonWorkerRunnables(false)
8) Main thread creates actor
9) Main thread calls thread->Dispatch() for callback

At that point we hit the assert because mAcceptingNonWorkerRunnables is false.

I think the correct thing here is to make the final dispatch from the main thread in ScheduleWorker atomic with setting mAcceptingNonWorkerRunnable false.

This patch does this by adding a DispatchAndStopAcceptingNonWorkerRunnables() method.  This performs the Dispatch() and sets the boolean with a lock.

Note, I had to switch to using a ReentrantMonitor because Dispatch() immediately tries to lock again by calling IsAcceptingNonWorkerRunnables().

I attempted to make lock recursion only allowed in IsAcceptingNonWorkerRunnables() by checking mMonitor.AssertNotCurrentThreadIn() in the other methods.  It looks like this is not actually implemented yet, though.  I wrote bug 1027772 to fix that.

Anyway, what do you think of this solution?

Attachment #8442602 - Attachment is obsolete: true

Attachment #8442602 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 48

•

10 years ago

This fixed the errors locally for me.  Lets see how the new try does:

  https://tbpl.mozilla.org/?tree=Try&rev=a4180b4248a5

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 49

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #47)
> Anyway, what do you think of this solution?

This sounds like a lot of work just to tighten that assertion... Sorry to send you down that rabbit hole, but I think we can just live with the old assertion behavior.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 50

•

10 years ago

No problem.  That's what I was thinking too. :-)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 51

•

10 years ago

Attached patch Support PBackground on workers. (merged v6) (obsolete) — Details — Splinter Review

I believe this addresses all the review feedback to-date.  Also passed on past try runs, I expect this to be green:

  https://tbpl.mozilla.org/?tree=Try&rev=1ac3e93dcf0e

Attachment #8443002 - Attachment is obsolete: true

Attachment #8443214 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Comment 52

•

10 years ago

Attached patch Support PBackground on workers. (merged v7) (obsolete) — Details — Splinter Review

Using the debugger Ben confirmed that early delivery of worker messages is still safe with this patch because the WorkerPrivate is still in a pending state while spinning the event loop in SynchronouslyCreatePBackground().  This causes those early worker messages to be queued separately until its state moves to running.

This patch update adds asserts to SynchronouslyCreatePBackground() to verify that the worker private is indeed in its pending state.

https://tbpl.mozilla.org/?tree=Try&rev=f8affed3ca5d

Attachment #8443214 - Attachment is obsolete: true

Attachment #8443214 - Flags: review?(bent.mozilla)

Attachment #8447268 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 53

•

10 years ago

Attached patch Support PBackground on workers. (merged v8) (obsolete) — Details — Splinter Review

Talking with Ben, it sounds like we can guarantee this safely without the assert.  So remove the assert and just add a comment instead.

Attachment #8447268 - Attachment is obsolete: true

Attachment #8447268 - Flags: review?(bent.mozilla)

Attachment #8447315 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 54

•

10 years ago

Comment on attachment 8447315 [details] [diff] [review]
Support PBackground on workers. (merged v8)

Review of attachment 8447315 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great, thanks!

::: dom/workers/RuntimeService.cpp
@@ +1095,5 @@
> +  void
> +  TestPBackground()
> +  {
> +    using mozilla::ipc::BackgroundChild;
> +    using mozilla::ipc::PBackgroundChild;

Nit: using namespace mozilla::ipc;

@@ +1103,5 @@
> +      PRSize randomSize = PR_GetRandomNoise(&testValue, sizeof(testValue));
> +      MOZ_RELEASE_ASSERT(randomSize == sizeof(testValue));
> +      nsCString testStr;
> +      testStr.AppendInt(testValue);
> +      testStr.AppendInt(reinterpret_cast<uint64_t>(PR_GetCurrentThread()));

Nit: intptr_t here instead of uint64_t

@@ +2735,5 @@
>    return NS_OK;
>  }
>  
> +nsresult
> +WorkerThreadPrimaryRunnable::SynchronouslyCreatePBackground()

Please assert here that BackgroundChild::GetForCurrentThread returns null.

::: ipc/glue/BackgroundImpl.cpp
@@ +2009,5 @@
>    AssertIsOnBoundThread();
>  
>    BackgroundChildImpl::ActorDestroy(aWhy);
> +
> +  mActorDestroyed = true;

Nit: Let's move this to before you call BackgroundChildImpl::ActorDestroy.

::: ipc/glue/MessagePump.cpp
@@ +264,5 @@
> +  // here.  If we don't process all of these then we will leave something
> +  // unprocessed in the chromium queue.  Therefore, eagerly complete our work
> +  // instead by immediately calling Run().
> +  mCanceled = true;
> +  mCallingRunWhileCanceled = true;

Nit: Please use AutoRestore for this.

Attachment #8447315 - Flags: review?(bent.mozilla) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 55

•

10 years ago

Attached patch Support PBackground on workers. (merged v9) (obsolete) — Details — Splinter Review

Fixed review feedback, except for the intptr_t comment.  Turns out we don't have an intptr_t compatible AppendInt() method.  Updated commit log.

Attachment #8447315 - Attachment is obsolete: true

Attachment #8447398 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 56

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/303027a0da95

Phil Ringnalda (:philor)

Comment 57

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/303027a0da95

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla33

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

10 years ago

Depends on: 1032011

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Depends on: 1032605, 1032783

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 58

•

10 years ago

I pushed a backout of mozilla-inbound since fixing the oranges has been much trickier than I thought:

  https://hg.mozilla.org/integration/mozilla-inbound/rev/306fc8865aae

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 59

•

10 years ago

This landed on m-c as well:

  https://hg.mozilla.org/mozilla-central/rev/306fc8865aae

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Target Milestone: mozilla33 → ---

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 60

•

10 years ago

Attached patch Reland support for PBackground on workers. (v10) (obsolete) — Details — Splinter Review

After talking with Ben and much experimentation, we think the best approach is to make MessageChannel::Close() call ActorDestroy() synchronously in the case where the channel is still opening.

This patch does this along with a few other minor fixes we uncovered.  It works in the debugger, but needs a full try run to see if any of the oranges are still a problem.

https://tbpl.mozilla.org/?tree=Try&rev=d6dc74bd5e77

Attachment #8447398 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 61

•

10 years ago

Comment on attachment 8451878 [details] [diff] [review]
Reland support for PBackground on workers. (v10)

Try looks good, so flagging for review.  The splinter interdiff to v9 is mostly correct, although you have to ignore the first two hunks.

Attachment #8451878 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 62

•

10 years ago

And by try being good, I mean that there have been no failures in the previously orange tests.  B2G ICS Emulator Debug M12 was the worst before.  So far no failures after 20 pushes.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 63

•

10 years ago

Comment on attachment 8451878 [details] [diff] [review]
Reland support for PBackground on workers. (v10)

Review of attachment 8451878 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: dom/workers/RuntimeService.cpp
@@ +919,5 @@
> +public:
> +  WorkerBackgroundChildCallback(bool* aDone)
> +  : mDone(aDone)
> +  {
> +    MOZ_ASSERT(mDone);

Let's also MOZ_ASSERT(!NS_IsMainThread())

::: ipc/glue/BackgroundChild.h
@@ +38,5 @@
>  // time.
>  //
> +// CloseForCurrentThread() will close the current PBackground actor.  Subsequent
> +// calls to GetForCurrentThread will return null.  CloseForCurrentThread() may
> +// only be called exactly once per thread.  Currently it is illegal to call this

CloseForCurrentThread() may be called more than once, you just have to create another actor in between. This comment needs more specificity.

::: ipc/glue/BackgroundImpl.cpp
@@ +422,5 @@
>      if (threadLocalInfo) {
>        if (threadLocalInfo->mActor) {
>          threadLocalInfo->mActor->Close();
> +        threadLocalInfo->mActor->AssertActorDestroyed();
> +        // Since the actor is created on the main thread it must only

Nit: Please add a newline between the assert and the comment.

@@ +1697,5 @@
> +    MOZ_CRASH("Attempting to close a non-existent PBackground actor!");
> +  }
> +
> +  if (threadLocalInfo->mActor) {
> +    threadLocalInfo->mActor->FlushPendingInterruptQueue();

This really shouldn't be needed (it only does anything if we have rpc messages, which we shouldn't...) but I don't think it hurts either.

@@ +1699,5 @@
> +
> +  if (threadLocalInfo->mActor) {
> +    threadLocalInfo->mActor->FlushPendingInterruptQueue();
> +  }
> +  DebugOnly<PRStatus> status = PR_SetThreadPrivate(sThreadLocalIndex, nullptr);

Nit: Please add a newline before this one, and maybe a comment saying that clearing the threadlocal will close the actor synchronously.

@@ +2007,5 @@
>  ChildImpl::ActorDestroy(ActorDestroyReason aWhy)
>  {
>    AssertIsOnBoundThread();
>  
> +  mActorDestroyed = true;

Nit: Please also assert that mActorDestroyed is false before you set it here.

::: ipc/glue/MessageChannel.cpp
@@ +1662,5 @@
>          }
>  
>          if (ChannelOpening == mChannelState) {
> +            // SynchronouslyClose() waits for an ack from the other side, so
> +            // the opening process should complete before this returns.

Nit: s/process/sequence/ maybe? "process" is overloaded in this context.

::: ipc/glue/MessagePump.cpp
@@ +12,5 @@
>  #include "base/basictypes.h"
>  #include "base/logging.h"
>  #include "base/scoped_nsautorelease_pool.h"
>  #include "mozilla/Assertions.h"
> +#include "mozilla/AutoRestore.h"

Nit: This isn't needed any longer.

Attachment #8451878 - Flags: review?(bent.mozilla) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 64

•

10 years ago

Attached patch Reland support for PBackground on workers. (v11) (obsolete) — Details — Splinter Review

Updated with review feedback.  Retested on debugger for good measure due to that extra assert we added in ChildImpl::ActorDestroy.

Attachment #8451878 - Attachment is obsolete: true

Attachment #8452511 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 65

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/725b7db309c3

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 66

•

10 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/99c726eaa286 at bkelly's request.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 67

•

10 years ago

Lets see if we can reproduce these android ThreadLink crashes in this try:

  https://tbpl.mozilla.org/?tree=Try&rev=02dff17f5139

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 68

•

10 years ago

I got it to reproduce once in that try.  Took about 40 pushes of Android 2.3 Opt M6.  That's about a 2% or 3% reproduction rate.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 69

•

10 years ago

It appears this is easy to reproduce locally just by adding a printf in here:

  http://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageLink.cpp#189

This code looks very racy and could use more checking.

Also, whats the history of this comment:

  // :TODO: MonitorAutoLock lock(*mChan->mMonitor);

Ben, do you know?

Flags: needinfo?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 70

•

10 years ago

I think that TODO comment is about prevent messages from being sent on the channel, not about racing with the other ThreadLink destructor.  So probably not strictly relevant here.

Flags: needinfo?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 71

•

10 years ago

Attached patch Reland support for PBackground on workers. (v12) (obsolete) — Details — Splinter Review

This patch adds a static mutex to protect the operations in ~ThreadLink().  This code needs to run automically with respect to the other side's ~ThreadLink().

In theory a static mutex is a bit broad as we could have separate channels tearing down, but this does not seem too common a situation and probably won't contend much.  Also, it seems the most straightforward as its not clear where else to stick a mutex that would be common to both ThreadLink objects.  Taking both channel monitors seems like a recipe for deadlock.

Here's a new try:

  https://tbpl.mozilla.org/?tree=Try&rev=3b44eaef7693

Based on local debugging I'm fairly confident this will solve the problem, so flagging for review now.

Attachment #8452511 - Attachment is obsolete: true

Attachment #8453057 - Flags: review?(bent.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 72

•

10 years ago

I suppose to avoid that mutex being abused elsewhere, I could declare it as a static within the ~ThreadLink() method scope itself.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 73

•

10 years ago

Attached patch Reland support for PBackground on workers. (v13) — Details — Splinter Review

On IRC Ben pointed out that mMonitor is shared between the two MessageChannel objects, so we can just lock that.

I also dug into why it was commented out with a TODO.  It appears that it was an unintended bug from merging channels together into on class in bug 901789.

This patch restores the mMonitor lock in ~ThreadLink(), adds a comment about the monitor being shared, and adds some MOZ_ASSERTs for good measure.

https://tbpl.mozilla.org/?tree=Try&rev=2cb59580ab0c

Attachment #8453057 - Attachment is obsolete: true

Attachment #8453057 - Flags: review?(bent.mozilla)

Attachment #8453181 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

10 years ago

Attachment #8453181 - Flags: review?(bent.mozilla) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 74

•

10 years ago

Try looks good with over 40 triggers without a failure.  Back to mozilla-inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/fc4d6b5ee8ce

Carsten Book [:Tomcat]

Comment 75

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/fc4d6b5ee8ce

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla33

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Depends on: 1036970

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Depends on: 1037294

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Depends on: 1040288

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Depends on: 1049552

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Depends on: 1049801

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Depends on: 1034309

You need to log in before you can comment on or make changes to this bug.