Closed Bug 1335512 Opened 5 years ago Closed 4 years ago

all video playback stops working after a while when plug or unplug external headphone.

Categories

(Core :: Audio/Video: cubeb, defect, P1)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox53 --- wontfix
firefox54 + wontfix
firefox55 --- affected

People

(Reporter: nika, Assigned: achronop)

References

Details

(Keywords: DevAdvocacy, stale-bug)

Attachments

(4 files)

After using Firefox on macOS sierra for an extended period of time, playback of videos ceases to work, showing only the first frame of playback, not allowing the ticker to progress, and allowing no other videos to play back. Restarting Firefox fixes the issue.

This occurs on all types of video elements as far as I can tell.
I've seen this many times, but it really only occurs after an extended period of usage, usually at least a day. When I was seeing bug 1307013, it might have been because of the same kind of error.
Do we have telemetry for how much errors like "DoDecode: VTDecompressionSessionDecodeFrame:ffffcd93" are encountered in the wild, and on what platform/version?
Component: Audio/Video → Audio/Video: Playback
This occurred again to me in nightly. I restored my computer, opened a tab, and loaded a youtube video in it. The video did not play.

I'm running a decently large session with probably ~50 tabs open across 2 windows.

I've attached a screenshot of what the progress bar looked like after waiting for a few minutes.
Attached file about:support
I've also attached my about:support. I don't _think_ session length is on there, but I'm guessing it's probably around 2.5 days?
Michael,

Thanks for keep updating this bug. We also noticed this issue in-house. However, it's quite hard to reproduce, so we haven't had a solid idea to it.

Could you please help to install the "about:media" add-on on your Nightly, and record the information of "about:media" while you encounter this issue next time? The "about"media" page is much more informative to the media-related issues.

Link to the about:media add-on: https://addons.mozilla.org/En-us/firefox/addon/about-media/?src=category
(In reply to Tzuhao Kuo [:kaku] from comment #5)
> Michael,
> 
> Thanks for keep updating this bug. We also noticed this issue in-house.
> However, it's quite hard to reproduce, so we haven't had a solid idea to it.

I find I reproduce it about once a week, I'll let you know the next time it happens for me. If you have any people in the Toronto office, I can try to keep the broken tab open and bring it to work the next day if someone wants to take a closer look at it.
 
> Could you please help to install the "about:media" add-on on your Nightly,
> and record the information of "about:media" while you encounter this issue
> next time? The "about"media" page is much more informative to the
> media-related issues.

I've installed it, and I'll post results next time the problem comes up.
I just reproduced it.

The failure happened in content process 62344. I was watching a video, and accidentally unplugged my headphones from the audio jack. When I reconnected them the video froze and the YouTube loading spinner appeared, despite the video having buffered data.

Reloading the page caused a freeze frame of the frame in the video which was stopped to appear. The video progress bar stayed in one place, and the video no longer played. Opening a new tab in a different process works, and video playback in that process works. Opening in the same tab as the broken video does not work.

about:media was completely empty. When I load videos in a new process the video runs as normal, but video in the broken process does not appear in about:media. I checked in the DOM and there are definitely <video> elements in the webpages.

The video was this one (if it matters): https://www.youtube.com/watch?v=kejQuaozBS8

I attached a debugger and collected a bunch of information, which I'm attaching to the bug. I noticed that a series of threads in the affected process are waiting on a mutex (in pthread_mutex_lock). I then took a look at what the stacks of these threads are (which will be in the zip file in the next comment), and did some poking around in the assembly in that code. I think that I have identified the locks which the code is currently waiting on in each of these threads:

  thread #40: tid = 0xac1c25, 0x00007fffa5ca0c22 libsystem_kernel.dylib`__psynch_mutexwait + 10, queue = 'com.apple.audio.CADispatchQueue.SerialQueue'
 -- is blocked on this mutex --
http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/media/libcubeb/src/cubeb_audiounit.cpp#704

  thread #41: tid = 0xac343a, 0x00007fffa5ca0c22 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'MediaPl~ack #10'
 -- is blocked on this mutex --
http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/media/libcubeb/src/cubeb_audiounit.cpp#2494

  thread #46: tid = 0xac3448, 0x00007fffa5ca0c22 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'MediaPl~ack #11'
 -- is blocked on this mutex --
http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/media/libcubeb/src/cubeb_audiounit.cpp#2640

  thread #47: tid = 0xac344f, 0x00007fffa5ca0c22 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'MediaPl~ack #12'
 -- is blocked on the same mutex in the same place as #41 --
http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/media/libcubeb/src/cubeb_audiounit.cpp#2494

I don't know which thread holds each of these locks. I tried rooting around to figure it out, but couldn't figure out how to extract the thread id from a pthread mutex object on darwin without any debug information.

I have no familiarity with this code, and did all of this by inspecting the assembly and making my best guess as to which mutex lock each of these threads is holding. I will attach the information I collected from lldb to the bug in another comment so people can check my work.

It looks to me like an audio thread got into a deadlock situation while holding a lock which all media playback threads grab early during their startup, so newly started threads are immediately deadlocking. I imagine I'd need more familiarity with our code and with libcubeb in order to be able to understand what these locks are for or how to fix it.
Flags: needinfo?(kaku)
Duplicate of this bug: 1350392
Awesome finding...

:kinetik, could you have a look or assign to someone who knows here? (seeing that :padenot is on PTO)
Flags: needinfo?(kinetik)
Component: Audio/Video: Playback → Audio/Video: cubeb
Assignee: nobody → achronop
A quick glance at the code, and there's possibility of a deadlock there:

http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/media/libcubeb/src/cubeb_audiounit.cpp#2452

audiounit_stream_init takes a lock there:
http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/media/libcubeb/src/cubeb_audiounit.cpp#2494 :  auto_lock context_lock(context->mutex);

and again:
takes auto_lock lock(stm->mutex); before calling audiounit_setup_stream()

which may call audiounit_property_listener_callback before auto_lock lock(stm->mutex); which will attempt to take the lock again.

I also note that:
audiounit_reinit_stream: http://searchfox.org/mozilla-central/rev/66d9eb3103e429127c85a7921e16c5a02458a127/media/libcubeb/src/cubeb_audiounit.cpp#609

doesn't appear to take stm->mutex before calling audiounit_setup_stream
I have solved a number of deadlocks during plug/unplug and they keep appearing. It is very hard to repro and I use debug patches which sleep one thread in order to make the other one deadlock thus I can never be sure if I have solve the reported bug or something similar. The problem with those bugs is that there is a deadlock somewhere deep down in the CoreAudio framework and all I can do is to avoid the problematic methods.

Last patch landed on m-c about it was on 2017-5-23 and it should have been in Nightly around 1 day after. Can you please check the build id of your Nightly. In addition is there any chance to have the output of `bt all` from the debugger.

Thanks for the analysis, it's very helpful.
Flags: needinfo?(kinetik)
Flags: needinfo?(kaku)
Flags: needinfo?(michael)
It looks as though at least thread 40 died since the last time I suspended my firefox content process. Might have been because of me debugging for too long? Not sure.

I've attached the result of `bt all` to the bug (after additional usage of the browser, but without restarting it). I'm running nightly 2017-05-28.
Flags: needinfo?(michael) → needinfo?(achronop)
Summary: Sierra all video playback stops working → all video playback stops working
OS: Unspecified → Mac OS X
Duplicate of this bug: 1356757
Duplicate of this bug: 1350141
Duplicate of this bug: 1347301
Summary: all video playback stops working → all video playback stops working after a while.
from comment 4 it sounds like 55 is affected
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1360880
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Duplicate of this bug: 1360880
Depends on: 1368910
Running with TSAN enabled clang on OS X, many warning are reported against audiounit backend.  We need to investigate if these are false positives because of atomic<> usage or valid.

:kinetik opened a tracking bug for enabling ASAN/TSAN on cubeb CI (https://github.com/kinetiknz/cubeb/issues/318) but CI infrastructure doesn't have real hardware so we don't get great coverage.
Duplicate of this bug: 1346586
Summary: all video playback stops working after a while. → all video playback stops working after a while when plug or unplug external headphone.
Keywords: DevAdvocacy
Track 54+ as the video playback is impacted.
Gerry, you have r- the uplift request fixing this bug... So this bug can't be fixed in 54..
Flags: needinfo?(gchang)
(In reply to Jean-Yves Avenard [:jya] from comment #25)
> Gerry, you have r- the uplift request fixing this bug... So this bug can't
> be fixed in 54..
I assume you mean Ritu r- in bug 1368910 comment 7.
(In reply to Jean-Yves Avenard [:jya] from comment #25)
> Gerry, you have r- the uplift request fixing this bug... So this bug can't
> be fixed in 54..

Hi :jya,
Right. Mark 54 won't fix.
Flags: needinfo?(gchang)
Hi,

Try switching to another application and switch the output device as a workaround. Haven't tested it much, though. Works fine till now.

I thought it might be because I was using Bongiovi DPS plugin for audio management, but playback on other browsers weren't affected.
Rank: 15
I will close it for now, feel free to reopen if you have it again (test Nightly please).
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
REOPENing per above comment. I am experiencing bug 1360880 comment 0 again. I guess "Stats for nerds" doesn't really help but I am keeping a record here:

Video ID:
jsVnuw5Bv0s
Dimensions:
640 x 480 * 2.4000000953674316
Resolution:
480 x 360@24
Optimal Resolution:
480 x 360@24
Volume:
100%
Relative Loudness:
-10.8 db
Stream Host:
r7---sn-ipoxu-un5z
Stream Type:
https
CPN:
kewvVlcAUikw-4A1
Mime Type:
video/mp4; codecs="avc1.4d4015"
DASH:
yes (134/251)
Connection Speed:
29204 Kbps
Buffer Health:
127.8 s
Network Activity:
0 KB
Dropped Frames:
-
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Can you please provide your version? The description of Bug 1360880 is different than this one. I'm not sure why it is marked as duplicate of this one.
(In reply to Alex Chronopoulos [:achronop] from comment #31)
> Can you please provide your version? 

Nightly 2017-07-11.

> The description of Bug 1360880 is
> different than this one. I'm not sure why it is marked as duplicate of this
> one.

That would be a question for Blake :)
Flags: needinfo?(bwu)
My bad! 
It looks like different bug.
Flags: needinfo?(bwu)
Duplicate of this bug: 1386189
This is an assigned P1 bug without activity in two weeks. 

If you intend to continue working on this bug for the current release/iteration/sprint, remove the 'stale-bug' keyword.

Otherwise we'll reset the priority of the bug back to '--' on Monday, August 28th.
Keywords: stale-bug
Closing this one since it is not the same with the intending reopen
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.