Closed Bug 1082206 Opened 10 years ago Closed 10 years ago

Media no longer plays due to MDStateMachine blocked by old MediaSourceReader::Seek()

Categories

(Core :: Audio/Video, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: karlt, Assigned: karlt)

References

Details

Attachments

(1 file, 1 obsolete file)

There is one MDStateMachine thread, which is blocked in a MDStateMachine at
DECODER_STATE_SHUTDOWN waiting for FlushDecoding() to return.

FlushDecoding is waiting for decode tasks to complete.

A DecodeSeek() task is waiting in SourceBufferResource::Read() from
WebMReader::Seek().

The pages using MSE are no longer the current document.

Reproduced by

1. enabling these web platform tests annotated with current timeouts and
   failures.

  testing/web-platform/meta/media-source/mediasource-duration.html.ini
  testing/web-platform/meta/media-source/mediasource-play-then-seek-back.html.ini
  testing/web-platform/meta/media-source/mediasource-seek-beyond-duration.html.ini
  testing/web-platform/meta/media-source/mediasource-seek-during-pending-seek.html.ini

2. run

  ./mach web-platform-tests --include=media-source/ --include=old-tests/submission/Microsoft/sandbox/sandbox_002.htm --test-types testharness --pause-on-unexpected

Expected: sandbox_002.htm passes.
Actual: times out.
(gdb) thread 13
[Switching to thread 13 (Thread 0x7ffaefd4e700 (LWP 1097))]
#0  pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt 19
#0  pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffb3e353846 in PR_WaitCondVar (cvar=0x7ffb0a84ae80, 
    timeout=4294967295)
    at /mnt/ssd1/karl/moz/dev/nsprpub/pr/src/pthreads/ptsynch.c:385
#2  0x00007ffb35cef699 in mozilla::CondVar::Wait (this=0x7ffaea8f7630, 
    aInterval=4294967295)
    at /mnt/ssd1/karl/moz/dev/xpcom/glue/BlockingResourceBase.cpp:497
#3  0x00007ffb35cc25ee in mozilla::Monitor::Wait (this=0x7ffaea8f7610, 
    aInterval=4294967295) at ../../dist/include/mozilla/Monitor.h:40
#4  0x00007ffb381f5e6f in mozilla::MediaTaskQueue::AwaitIdleLocked (
    this=0x7ffaea8f7600)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaTaskQueue.cpp:116
#5  0x00007ffb381f5ff0 in mozilla::MediaTaskQueue::FlushAndDispatch (
    this=0x7ffaea8f7600, aRunnable=...)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaTaskQueue.cpp:138
#6  0x00007ffb381de4e6 in mozilla::MediaDecoderStateMachine::FlushDecoding (
    this=0x7ffae77c9000)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaDecoderStateMachine.cpp:2526
#7  0x00007ffb381dd657 in mozilla::MediaDecoderStateMachine::RunStateMachine (
    this=0x7ffae77c9000)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaDecoderStateMachine.cpp:2305
#8  0x00007ffb381e0407 in mozilla::MediaDecoderStateMachine::CallRunStateMachine (this=0x7ffae77c9000)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaDecoderStateMachine.cpp:3009
#9  0x00007ffb381e042a in mozilla::MediaDecoderStateMachine::TimeoutExpired (
    aClosure=0x7ffae77c9000)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaDecoderStateMachine.cpp:3015
#10 0x00007ffb381e1655 in mozilla::MediaDecoderStateMachineScheduler::TimeoutExpired (this=0x7ffb0a511380, aTimerId=11)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaDecoderStateMachineScheduler.cpp:160
#11 0x00007ffb381e0c51 in (anonymous namespace)::TimerEvent::Run (this=
    0x7ffaaf362130)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaDecoderStateMachineScheduler.cpp:23
#12 0x00007ffb35cbf743 in nsThreadPool::Run (this=0x7ffaefa9fd40)
    at /mnt/ssd1/karl/moz/dev/xpcom/threads/nsThreadPool.cpp:220
#13 0x00007ffb35cbcc20 in nsThread::ProcessNextEvent (this=0x7ffab99bed50, 
    aMayWait=false, aResult=0x7ffaefd4dd1f)
    at /mnt/ssd1/karl/moz/dev/xpcom/threads/nsThread.cpp:830
#14 0x00007ffb35cfe931 in NS_ProcessNextEvent (aThread=0x7ffab99bed50, 
    aMayWait=false) at /mnt/ssd1/karl/moz/dev/xpcom/glue/nsThreadUtils.cpp:265
#15 0x00007ffb361b0dd4 in mozilla::ipc::MessagePumpForNonMainThreads::Run (
    this=0x7ffad3b52280, aDelegate=0x7ffabef79720)
    at /mnt/ssd1/karl/moz/dev/ipc/glue/MessagePump.cpp:339
(More stack frames follow...)
(gdb) f 4
#4  0x00007ffb381f5e6f in mozilla::MediaTaskQueue::AwaitIdleLocked (
    this=0x7ffaea8f7600)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaTaskQueue.cpp:116
(gdb) p mRunningThread.mPtr->mThread->tid
$13 = 1095
(gdb) thread 14
[Switching to thread 14 (Thread 0x7ffaf455d700 (LWP 1095))]
#0  pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	in ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt 27
#0  pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffb3e354465 in PR_Wait (mon=0x7ffab37115b0, timeout=4294967295)
    at /mnt/ssd1/karl/moz/dev/nsprpub/pr/src/pthreads/ptsynch.c:691
#2  0x00007ffb35cef5dc in mozilla::ReentrantMonitor::Wait (
    this=0x7ffae7dcf378, aInterval=4294967295)
    at /mnt/ssd1/karl/moz/dev/xpcom/glue/BlockingResourceBase.cpp:466
#3  0x00007ffb35c8fbcd in mozilla::ReentrantMonitorAutoEnter::Wait (
    this=0x7ffaf455c5b0, aInterval=4294967295)
    at ../../dist/include/mozilla/ReentrantMonitor.h:190
#4  0x00007ffb38263de5 in mozilla::SourceBufferResource::Read (
    this=0x7ffae7dcf350, aBuffer=0x7ffaf455c69b "9\373\177", aCount=1, 
    aBytes=0x7ffaf455c60c)
    at /mnt/ssd1/karl/moz/dev/content/media/mediasource/SourceBufferResource.cpp:57
#5  0x00007ffb382e3f16 in mozilla::webm_read (aBuffer=0x7ffaf455c69b, 
    aLength=1, aUserData=0x7ffb0a8cb640)
    at /mnt/ssd1/karl/moz/dev/content/media/webm/WebMReader.cpp:78
#6  0x00007ffb391e4be9 in ne_io_read (io=0x7ffafd29bec0, 
    buffer=0x7ffaf455c69b, length=1)
    at /mnt/ssd1/karl/moz/dev/media/libnestegg/src/nestegg.c:551
#7  0x00007ffb391e4d05 in ne_bare_read_vint (io=0x7ffafd29bec0, 
    value=0x7ffadc9894f8, length=0x0, maskflag=MASK_NONE)
    at /mnt/ssd1/karl/moz/dev/media/libnestegg/src/nestegg.c:592
#8  0x00007ffb391e4e09 in ne_read_id (io=0x7ffafd29bec0, 
    value=0x7ffadc9894f8, length=0x0)
    at /mnt/ssd1/karl/moz/dev/media/libnestegg/src/nestegg.c:624
#9  0x00007ffb391e5626 in ne_peek_element (ctx=0x7ffadc9894e0, 
    id=0x7ffaf455c758, size=0x7ffaf455c750)
    at /mnt/ssd1/karl/moz/dev/media/libnestegg/src/nestegg.c:903
#10 0x00007ffb391e5bda in ne_parse (ctx=0x7ffadc9894e0, 
    top_level=0x7ffb3bd67ba0 <ne_cues_elements>, max_offset=-1)
    at /mnt/ssd1/karl/moz/dev/media/libnestegg/src/nestegg.c:1050
#11 0x00007ffb391e77b5 in ne_init_cue_points (ctx=0x7ffadc9894e0, 
    max_offset=-1)
    at /mnt/ssd1/karl/moz/dev/media/libnestegg/src/nestegg.c:1768
#12 0x00007ffb391e8483 in nestegg_track_seek (ctx=0x7ffadc9894e0, track=0, 
    tstamp=3021000000)
    at /mnt/ssd1/karl/moz/dev/media/libnestegg/src/nestegg.c:2140
#13 0x00007ffb382e7a23 in mozilla::WebMReader::Seek (this=0x7ffae77ab800, 
    aTarget=3021000, aStartTime=0, aEndTime=6042000, aCurrentTime=0)
    at /mnt/ssd1/karl/moz/dev/content/media/webm/WebMReader.cpp:1066
#14 0x00007ffb3825fc59 in mozilla::MediaSourceReader::Seek (
    this=0x7ffae77c8c00, aTime=3021000, aStartTime=0, aEndTime=6042000, 
    aCurrentTime=0)
    at /mnt/ssd1/karl/moz/dev/content/media/mediasource/MediaSourceReader.cpp:429
#15 0x00007ffb381dcd81 in mozilla::MediaDecoderStateMachine::DecodeSeek (
    this=0x7ffae77c9000)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaDecoderStateMachine.cpp:2136
#16 0x00007ffb3821bef0 in nsRunnableMethodImpl<void (mozilla::MediaDecoderStateMachine::*)(), void, true>::Run (this=0x7ffae7fac130)
    at ../../dist/include/nsThreadUtils.h:388
#17 0x00007ffb381f6316 in mozilla::MediaTaskQueue::Runner::Run (
    this=0x7ffb0a2741a0)
    at /mnt/ssd1/karl/moz/dev/content/media/MediaTaskQueue.cpp:194
#18 0x00007ffb35cbf743 in nsThreadPool::Run (this=0x7ffafd2527f0)
    at /mnt/ssd1/karl/moz/dev/xpcom/threads/nsThreadPool.cpp:220
#19 0x00007ffb35cbcc20 in nsThread::ProcessNextEvent (this=0x7ffab99bec60, 
    aMayWait=false, aResult=0x7ffaf455cd1f)
    at /mnt/ssd1/karl/moz/dev/xpcom/threads/nsThread.cpp:830
#20 0x00007ffb35cfe931 in NS_ProcessNextEvent (aThread=0x7ffab99bec60, 
    aMayWait=false) at /mnt/ssd1/karl/moz/dev/xpcom/glue/nsThreadUtils.cpp:265
#21 0x00007ffb361b0dd4 in mozilla::ipc::MessagePumpForNonMainThreads::Run (
    this=0x7ffad3b28c40, aDelegate=0x7ffabef793e0)
    at /mnt/ssd1/karl/moz/dev/ipc/glue/MessagePump.cpp:339
Could this be caused by bug 1056441?
The mReader->Shutdown() added to MediaSourceDecoder::Shutdown() in
https://bugzilla.mozilla.org/attachment.cgi?id=8496656&action=diff#a/content/media/mediasource/MediaSourceDecoder.cpp_sec2
works around this, but I'm not sure that the reader should be shut down from the main thread before decode tasks have completed.  (This is contrary to the MDStateMachine's shutdown dance.)

I'm wondering why Detach() hasn't set mEnded on the SourceBufferResource.
This bug happens when the MediaSource is destroyed before
MediaSourceDecoder::Shutdown() runs.

When HTMLMediaElement initiates MediaSourceDecoder Shutdown() in cycle
collection, it may no longer have access to the SourceBuffers to notify
through Detach.
Attachment #8505273 - Flags: review?(kinetik)
Comment on attachment 8505273 [details] [diff] [review]
detach TrackBuffer when unlinking SourceBuffer

Review of attachment 8505273 [details] [diff] [review]:
-----------------------------------------------------------------

If the TrackBuffer hasn't had Shutdown called on it (implied by this patch since Shutdown would've called DiscardDecoder), we're going to hit the assertion in ~MediaTaskQueue that it's Shutdown has not been called.

Also, it seems like it should be possible to call this from TrackBuffer's destructor rather than the guts of the CC, since the Detach isn't resolving any lifetime issues.
Attachment #8505273 - Flags: review?(kinetik) → review-
(Also, bug 865847 suggests NS_IMPL_CYCLE_COLLECTION_UNLINK_END_INHERITED is now preferred, but there are virtually no users in the tree compared to NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN_INHERITED...)
(In reply to Matthew Gregan [:kinetik] from comment #5)
> If the TrackBuffer hasn't had Shutdown called on it (implied by this patch
> since Shutdown would've called DiscardDecoder), we're going to hit the
> assertion in ~MediaTaskQueue that it's Shutdown has not been called.

TrackBuffer hasn't had Shutdown called on it, but Shutdown will be called
after MediaDecoderStateMachine::FlushDecoding() completes.

The fix for bug 1080958 did not make the DiscardDecoder() call early enough.
I've removed that call in this version, because TrackBuffer::Shutdown() is too
late.  The resource must be ended before FlushDecoding() can complete.

mCurrentDecoder is really a main thread field, but TrackBuffer::Shutdown()
occurs during reader shutdown on the decode queue.

> Also, it seems like it should be possible to call this from TrackBuffer's
> destructor rather than the guts of the CC, since the Detach isn't resolving
> any lifetime issues.

TrackBuffer's destructor runs later than TrackBuffer::Shutdown() and so is
also too late.

The way this usually works is that MediaSourceDecoder::Shutdown(), a
main-thread method, uses the MediaSource -> SourceBufferList ->
SourceBuffer::Detach() chain to end the resources.

When CC happens, MediaSourceDecoder::Shutdown is called from the
HTMLMediaElement to initiate decode queue shutdown.

However the HTMLMediaElement, SourceBufferLists, and SourceBuffers are all
part of a single strongly connected component at this point.  Their unlinks
and destruction can happen in any order and so the MediaSource -> SourceBuffer
chain does not necessarily exist.

SourceBuffer::Detach() is usually the method of notification when these
objects are disconnected, either through change of media element source or
removal of SourceBuffers.

In the case of CC, MediaSource cannot call SourceBuffer::Detach().  Instead
Unlink() is the notification that the detach is happening.

I guess this could be done from *Source*Buffer's destructor, but

1) It feels more intuitive to me to think of Unlink() a disconnect
notification, than the destructor.  This is only relevant during CC; a lonely
SourceBuffer does not have a TrackBuffer to notify.

2) The media state machine thread, shared by all MediaDecoders in the process,
is blocked on the resource ending when it gets notification from
MediaSourceDecoder::Shutdown().  I'd like to end the resource as soon as we
know that it should be ended.  I don't know what guarantees we have from the
CCer about when connected objects are deleted.  I could imagine it might want
to break destruction into chunks to limit blockage on the main thread.  It
separates unlink and delete into separate tasks at least.

(In reply to Matthew Gregan [:kinetik] from comment #6)
> (Also, bug 865847 suggests NS_IMPL_CYCLE_COLLECTION_UNLINK_END_INHERITED is
> now preferred, but there are virtually no users in the tree compared to
> NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN_INHERITED...)

Thanks.  I agree with the sentiments in that bug, so this version uses
UNLINK_END_INHERITED.
Attachment #8505808 - Flags: review?(kinetik)
Attachment #8505273 - Attachment is obsolete: true
Comment on attachment 8505808 [details] [diff] [review]
detach TrackBuffer when unlinking SourceBuffer v2

Review of attachment 8505808 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the analysis.
Attachment #8505808 - Flags: review?(kinetik) → review+
https://hg.mozilla.org/mozilla-central/rev/8e108e588348
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.