Intermittent fetch-event-redirect.https.html | application crashed [@ mozilla::dom::workers::WorkerPrivateParent<mozilla::dom::workers::WorkerPrivate>::MaybeWrapAsWorkerRunnable(already_AddRefed<nsIRunnable> &&)]

RESOLVED FIXED in Firefox 43

Status

()

defect
RESOLVED FIXED
4 years ago
6 days ago

People

(Reporter: RyanVM, Assigned: khuey)

Tracking

(Regressed 1 bug, {crash, intermittent-failure})

Trunk
mozilla43
Unspecified
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox43 fixed)

Details

Attachments

(1 attachment)

No description provided.
So, this one is fun.  What's happening is that code running on a worker thread is calling NS_AsyncCopy, which is sending the actual copy off to the stream transport service.

But if nsThreadPool decides to spawn a new thread to process the event, and races with another thread to do so, it will try to shut down the newly created thread from the thread that called Dispatch[0].  It dispatches a runnable to the current thread to avoid spinning the event loop if the caller doesn't expect it.  Dispatching that runnable fails, because workers expect all runnables they receive to implement nsICancelableRunnable.

The easiest way to fix this is just to implement nsICancelableRunnable on the event we dispatch, but I'm not convinced that's the correct approach.  Spinning the event loop doesn't work well with workers, which have a much more well-defined event loop and script execution behavior than the main thread.

I would like to add an AsyncShutdown method to nsIThread.  Instead of spinning the event loop, we would maintain a list of threads we're waiting for on the *calling* thread and simply continue processing events during shutdown until that list is empty.  Then nsThreadPool could use AsyncShutdown and side step this issue entirely.  There are probably other places in our code that call nsThread::Shutdown that really don't need the thread to shut down before unwinding the stack, and they could be changed to use this too.  Thoughts?

[0] http://hg.mozilla.org/mozilla-central/annotate/1b687fcb5213/xpcom/threads/nsThreadPool.cpp#l108
Component: DOM: Workers → XPCOM
Flags: needinfo?(nfroyd)
Flags: needinfo?(benjamin)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> So, this one is fun.  What's happening is that code running on a worker
> thread is calling NS_AsyncCopy, which is sending the actual copy off to the
> stream transport service.
> 
> But if nsThreadPool decides to spawn a new thread to process the event, and
> races with another thread to do so, it will try to shut down the newly
> created thread from the thread that called Dispatch[0].  It dispatches a
> runnable to the current thread to avoid spinning the event loop if the
> caller doesn't expect it.  Dispatching that runnable fails, because workers
> expect all runnables they receive to implement nsICancelableRunnable.

I love our thread pool implementation.

Just to make sure I understand, the sequence of steps here is something like:

[Worker] Calls NS_AsyncCopy
[Worker] Dispatches event to STS
[STS] Decides to spawn a new thread, T1, to handle worker's runnable
[STS] Loses timeslice before new thread can actually be made to handle work
[Other] Dispatches event to STS
[STS] Handles event from [Other], spawning a new thread, T2, to handle it
[STS] Inserts T2 into its list of threads, dispatches work to T2
[STS] Returns to handling T1, discovers that we now have too many threads.
[STS] Attempts to shut down T1 by dispatching event to [Worker]
[Worker] Hilarity ensues.

> The easiest way to fix this is just to implement nsICancelableRunnable on
> the event we dispatch, but I'm not convinced that's the correct approach. 
> Spinning the event loop doesn't work well with workers, which have a much
> more well-defined event loop and script execution behavior than the main
> thread.

Cancelable thread shutdown doesn't sound like the right thing here.

The second easiest thing seems like modifying the thread spawning behavior to make it more of an atomic operation.  Like extending the monitor lock to cover more ground, or ensuring there's a spot for the thread in mThreads here:

http://hg.mozilla.org/mozilla-central/annotate/1b687fcb5213/xpcom/threads/nsThreadPool.cpp#l96

thereby ensuring that when we spawn the thread later, we're not racing with another thread for it.  Doing that (without creating the thread there) does mean that we're changing how mThreads works, so maybe that's not so great.

> I would like to add an AsyncShutdown method to nsIThread.  Instead of
> spinning the event loop, we would maintain a list of threads we're waiting
> for on the *calling* thread and simply continue processing events during
> shutdown until that list is empty.  Then nsThreadPool could use
> AsyncShutdown and side step this issue entirely.  There are probably other
> places in our code that call nsThread::Shutdown that really don't need the
> thread to shut down before unwinding the stack, and they could be changed to
> use this too.  Thoughts?

I am all for using AsyncShutdown more, but I'm unclear how this helps the above scenario--we're placing the killed thread on a list of threads we're waiting for?  Can you elaborate?
Flags: needinfo?(nfroyd) → needinfo?(khuey)
(In reply to Nathan Froyd [:froydnj] from comment #5)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> > So, this one is fun.  What's happening is that code running on a worker
> > thread is calling NS_AsyncCopy, which is sending the actual copy off to the
> > stream transport service.
> > 
> > But if nsThreadPool decides to spawn a new thread to process the event, and
> > races with another thread to do so, it will try to shut down the newly
> > created thread from the thread that called Dispatch[0].  It dispatches a
> > runnable to the current thread to avoid spinning the event loop if the
> > caller doesn't expect it.  Dispatching that runnable fails, because workers
> > expect all runnables they receive to implement nsICancelableRunnable.
> 
> I love our thread pool implementation.
> 
> Just to make sure I understand, the sequence of steps here is something like:
> 
> [Worker] Calls NS_AsyncCopy
> [Worker] Dispatches event to STS
> [STS] Decides to spawn a new thread, T1, to handle worker's runnable
> [STS] Loses timeslice before new thread can actually be made to handle work
> [Other] Dispatches event to STS
> [STS] Handles event from [Other], spawning a new thread, T2, to handle it
> [STS] Inserts T2 into its list of threads, dispatches work to T2
> [STS] Returns to handling T1, discovers that we now have too many threads.
> [STS] Attempts to shut down T1 by dispatching event to [Worker]
> [Worker] Hilarity ensues.

The sequence of events is right, but you've got the threads things are running on confused.  What is really happens is:

[Worker] Calls NS_AsyncCopy
[Worker] Dispatches event to STS
[Worker] nsThreadPool::Dispatch decides to spawn a new thread, T1, to handle worker's runnable
[Worker] Loses timeslice before new thread can actually be made to handle work
[OtherThread] Dispatches event to STS
[OtherThread] nsThreadPool::Dispatch decides to spawn a new thread, T2, to handle this event
[OtherThread] Inserts T2 into its list of threads, dispatches work to T2
[Worker] Returns to handling T1, discovers that we now have too many threads.
[Worker] Attempts to shut down T1 by posting an event to the current thread.
[Worker] Hilarity ensues.

> > The easiest way to fix this is just to implement nsICancelableRunnable on
> > the event we dispatch, but I'm not convinced that's the correct approach. 
> > Spinning the event loop doesn't work well with workers, which have a much
> > more well-defined event loop and script execution behavior than the main
> > thread.
> 
> Cancelable thread shutdown doesn't sound like the right thing here.

To be clear, all XPCOM runnables sent to DOM worker threads implement nsICancelableRunnable.  Those that can be safely discarded simply have a nop Cancel, while those that need to run partially or fully do so from Cancel.  So we're not talking about actually having shutdown be cancelable.

> The second easiest thing seems like modifying the thread spawning behavior
> to make it more of an atomic operation.  Like extending the monitor lock to
> cover more ground, or ensuring there's a spot for the thread in mThreads
> here:
> 
> http://hg.mozilla.org/mozilla-central/annotate/1b687fcb5213/xpcom/threads/
> nsThreadPool.cpp#l96
> 
> thereby ensuring that when we spawn the thread later, we're not racing with
> another thread for it.  Doing that (without creating the thread there) does
> mean that we're changing how mThreads works, so maybe that's not so great.

It also means that the racing thread gets to a point where it says "ok, some other thread is creating the thread but hasn't done so yet"  What does that thread do then?  We could add another condvar or something I guess.

> > I would like to add an AsyncShutdown method to nsIThread.  Instead of
> > spinning the event loop, we would maintain a list of threads we're waiting
> > for on the *calling* thread and simply continue processing events during
> > shutdown until that list is empty.  Then nsThreadPool could use
> > AsyncShutdown and side step this issue entirely.  There are probably other
> > places in our code that call nsThread::Shutdown that really don't need the
> > thread to shut down before unwinding the stack, and they could be changed to
> > use this too.  Thoughts?
> 
> I am all for using AsyncShutdown more, but I'm unclear how this helps the
> above scenario--we're placing the killed thread on a list of threads we're
> waiting for?  Can you elaborate?

So in this scenario it helps because we can call this new AsyncShutdown immediately (since it doesn't spin the event loop its safe to call from nsThreadPool::Dispatch).
Flags: needinfo?(khuey)
Flags: needinfo?(benjamin)
Something like this.
Assignee: nobody → khuey
Status: NEW → ASSIGNED
Attachment #8657369 - Flags: review?(nfroyd)
Comment on attachment 8657369 [details] [diff] [review]
0009-async-nsThread-shutdown.patch

Review of attachment 8657369 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/tests/gtest/TestThreads.cpp
@@ +125,5 @@
>      }
>  }
>  
> +mozilla::Monitor* asyncShutdownReadyMonitor;
> +mozilla::Monitor* beginAsyncShutdownMonitor;

gAsyncShutdownReadyMonitor etc.?

@@ +127,5 @@
>  
> +mozilla::Monitor* asyncShutdownReadyMonitor;
> +mozilla::Monitor* beginAsyncShutdownMonitor;
> +
> +class nsAsyncShutdownThread1 : public nsIRunnable {

It would be much nicer if you gave these classes real names.  Even just removing "Thread" from their names would be an improvement.

::: xpcom/threads/nsIThread.idl
@@ +83,5 @@
>     */
>    boolean processNextEvent(in boolean mayWait);
> +
> +  /**
> +   * Shutdown the thread asynchronously.  This method immediately prvents

Typo: "prevents"

@@ +94,5 @@
> +   *
> +   * This method MAY NOT be executed from the thread itself.  Instead, it is
> +   * meant to be executed from another thread (usually the thread that created
> +   * this thread or the main application thread).  When this function returns,
> +   * it will no longer be possible to dispatch events to the thread, but the

Is this really true?  We can't actually ensure that, can we?

::: xpcom/threads/nsThread.cpp
@@ +248,5 @@
>  
>  // This event is responsible for notifying nsThread::Shutdown that it is time
>  // to call PR_JoinThread.
> +class nsThreadShutdownAckEvent : public nsRunnable,
> +                                 public nsICancelableRunnable

Can you add a comment for why we are implementing this interface?

@@ +757,5 @@
> +
> +  nsThreadShutdownContext* context = AsyncShutdownInternal();
> +  NS_ENSURE_TRUE(context, NS_ERROR_UNEXPECTED);
> +
> +  context->sync = true;

What do you say to passing syncness as a parameter into AsyncShutdownInternal, so we can ensure that AsyncShutdownInternal fully initializes the context before passing it to any consumers?
Attachment #8657369 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/mozilla-central/rev/6bb087e3ec08
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Regressions: 1573072
You need to log in before you can comment on or make changes to this bug.