nsThread::ProcessNextEvent can busy-wait when a nested event loop is running on a thread that is being shut down

RESOLVED FIXED in mozilla29

Status

()

Core
XPCOM
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: roc, Assigned: roc)

Tracking

Trunk
mozilla29
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [rr])

Attachments

(1 attachment)

This bug was found with rr.

I have a content/media/test run leading to a state with the following two thread stacks:

nsThread 0x79e88710: (MediaDecoderStateMachine thread) 
#6  0x56f4efe7 in nsEventQueue::PutEvent (this=0x79e88978, runnable=0x63e5b0c0)
    at /home/roc/mozilla-central/xpcom/threads/nsEventQueue.cpp:124
#7  0x56f4f0cb in PutEvent (event=0x63e5b0c0, this=<optimized out>)
    at /home/roc/mozilla-central/xpcom/threads/nsThread.h:101
#8  nsThread::PutEvent (this=0x79e88940, event=0x63e5b0c0, target=0x0)
    at /home/roc/mozilla-central/xpcom/threads/nsThread.cpp:414
#9  0x56f54531 in nsThread::Shutdown (this=0x79e88940) at /home/roc/mozilla-central/xpcom/threads/nsThread.cpp:523
#10 0x588b6e8a in mozilla::MediaDecoderStateMachine::StopDecodeThread (this=0x774b1400)
    at /home/roc/mozilla-central/content/media/MediaDecoderStateMachine.cpp:1708
#11 0x588c596f in mozilla::MediaDecoderStateMachine::RunStateMachine (this=0x774b1400)
    at /home/roc/mozilla-central/content/media/MediaDecoderStateMachine.cpp:2193
#12 0x588c659c in mozilla::MediaDecoderStateMachine::CallRunStateMachine (this=0x774b1400)
    at /home/roc/mozilla-central/content/media/MediaDecoderStateMachine.cpp:2789
#13 0x588c6777 in mozilla::MediaDecoderStateMachine::Run (this=0x774b1400)
    at /home/roc/mozilla-central/content/media/MediaDecoderStateMachine.cpp:2766
#14 0x56f53836 in nsThread::ProcessNextEvent (this=0x79e88710, mayWait=true, result=0x7a9001ff)
    at /home/roc/mozilla-central/xpcom/threads/nsThread.cpp:680

nsThread 0x79e88940: (MediaDecoder thread)
#0  nsThread::ProcessNextEvent (this=0x79e88940, mayWait=true, result=0x8e500fdf)
    at /home/roc/mozilla-central/xpcom/threads/nsThread.cpp:681
#1  0x56ebea65 in NS_ProcessNextEvent (thread=<optimized out>, mayWait=true)
    at /home/roc/mozilla-central/xpcom/glue/nsThreadUtils.cpp:263
#2  0x56f51e01 in nsThread::DispatchInternal (this=0x559d14e0, event=0x7b518860, flags=1, target=0x0)
    at /home/roc/mozilla-central/xpcom/threads/nsThread.cpp:455
#3  0x56f51ef8 in nsThread::Dispatch (this=0x559d14e0, event=0x7b518860, flags=1)
    at /home/roc/mozilla-central/xpcom/threads/nsThread.cpp:471
#4  0x56ebed72 in NS_DispatchToMainThread (event=0x7b518860, dispatchFlags=1)
    at /home/roc/mozilla-central/xpcom/glue/nsThreadUtils.cpp:182
#5  0x588b6298 in mozilla::MediaDecoderStateMachine::DecodeMetadata (this=0x774b1400)
    at /home/roc/mozilla-central/content/media/MediaDecoderStateMachine.cpp:1943
#6  0x588c7b26 in mozilla::MediaDecoderStateMachine::DecodeThreadRun (this=0x774b1400)
    at /home/roc/mozilla-central/content/media/MediaDecoderStateMachine.cpp:508
#7  0x588cb6a8 in nsRunnableMethodImpl<void (mozilla::MediaDecoderStateMachine::*)(), void, true>::Run (
    this=0x7b518600) at ../../dist/include/nsThreadUtils.h:383
#8  0x56f53836 in nsThread::ProcessNextEvent (this=0x79e88940, mayWait=false, result=0x8e5011ff)
    at /home/roc/mozilla-central/xpcom/threads/nsThread.cpp:680

You can see that the MediaDecoderStateMachine thread has called Shutdown on the MediaDecoder thread while the MediaDecoder thread is doing a sync dispatch to the main thread. The sync dispatch does an endless sequence of NS_ProcessNextEvent() calls until the reply is received from the main thread. The problem is that in this case NS_ProcessNextEvent() does not block even though there is no event in the queue. That's because nsThreadShutdownEvent has run, setting mShutdownContext on the MediaDecoder thread, so ShuttingDown() returns true. In NS_ProcessNextEvent we have this code:

    nsCOMPtr<nsIRunnable> event;
    mEvents->GetEvent(mayWait && !ShuttingDown(), getter_AddRefs(event));
    ...
    if (event) {
      ...
      event->Run();
    } else if (mayWait) {
      ...
      rv = NS_ERROR_UNEXPECTED;
    }

mayWait is true, but GetEvent is not allowed to block. We return NS_ERROR_UNEXPECTED, which is ignored.

This is not a very serious problem in practice since eventually we'll switch to the main thread, process its event, send the reply, and the MediaDecoder thread will break out of its loop. But it is a bit wasteful, for example if the main thread is busy for some reason we'll be burning twice as much CPU while we wait for it to complete.

I think the right fix is to ignore ShuttingDown() if we're in a nested event loop. Obviously if we're in the toplevel event loop ShuttingDown() means we should exit as soon as there are no more events, but if we're in a nested event loop that doesn't make sense since we must be waiting for some other state change. In the sync dispatch case, we could consider not waiting for the sync reply and just returning if we're supposed to shut down, but that sounds rather dangerous.
Created attachment 8368505 [details] [diff] [review]
fix
Assignee: nobody → roc
Attachment #8368505 - Flags: review?(benjamin)

Comment 2

4 years ago
jmathies and I were just talking about something like this in bug 951120 comment 47. I didn't understand what the ShuttingDown check was for, although I think I get it now: we want to empty the pending event queue before shutting down.

Comment 3

4 years ago
Comment on attachment 8368505 [details] [diff] [review]
fix

Please add a comment above the reallyWait variable explaining why we're doing this. This method is complicated enough that I forget why the conditions are all there.
Attachment #8368505 - Flags: review?(benjamin) → review+

Comment 4

4 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> jmathies and I were just talking about something like this in bug 951120
> comment 47. I didn't understand what the ShuttingDown check was for,
> although I think I get it now: we want to empty the pending event queue
> before shutting down.

Yeah we don't want to wait when shutting down, we just want to clear and pending events and get out of there. 

You have to be careful about nsThread::ProcessNextEvent for the main thread, it can get wrapped up in a lot of crazy code via app shell.
https://hg.mozilla.org/mozilla-central/rev/2c7dbe305a8a
https://hg.mozilla.org/mozilla-central/rev/935b9884d770
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
You need to log in before you can comment on or make changes to this bug.