getUserMedia request with audio stalls the video track if no audio output available

RESOLVED FIXED in Firefox 60

Status

()

defect
P3
normal
Rank:
25
RESOLVED FIXED
2 years ago
Last year

People

(Reporter: xpeng, Assigned: pehrsons)

Tracking

(Blocks 1 bug)

56 Branch
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 wontfix, firefox57 wontfix, firefox58 wontfix, firefox59 wontfix, firefox60 fixed)

Details

Attachments

(8 attachments, 1 obsolete attachment)

9.58 MB, text/plain
Details
59 bytes, text/x-review-board-request
padenot
: review+
Details
59 bytes, text/x-review-board-request
padenot
: review+
Details
59 bytes, text/x-review-board-request
padenot
: review+
Details
59 bytes, text/x-review-board-request
padenot
: review+
Details
59 bytes, text/x-review-board-request
bwc
: review+
Details
59 bytes, text/x-review-board-request
padenot
: review+
Details
59 bytes, text/x-review-board-request
padenot
: review+
Details
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
Build ID: 20170926190823

Steps to reproduce:

1. disable the speaker in audio settings of windows.(windows 10)
2. test the demo : https://webrtc.github.io/samples/src/content/devices/input-output/



Actual results:

there is no video display. if enable the speaker again and retest the demo the video will be normal


Expected results:

the video should be shown in the demo
Component: Untriaged → WebRTC: Audio/Video
Product: Firefox → Core
I can reproduce also on our test page with the "Audio & Video" button:
https://mozilla.github.io/webrtc-landing/gum_test.html

This only happens when there's an audio input present and you request it, but no audio output (we use callbacks from the output to drive our tracks). That is a very narrow usecase.

Paul, is this something we want to cover? The easiest short-term fix I imagine would be to reject the gUM request.
Status: UNCONFIRMED → NEW
Rank: 25
Ever confirmed: true
Flags: needinfo?(padenot)
Priority: -- → P3
Summary: fail to get video from camera if disable speaker in ff56 → getUserMedia request with audio stalls the video track if no audio output available
Andreas, in theory we should see [0], and it should fall back to a system thread. What is important is to find why it's not the case. Now, how to reject a promise there or something.

The stack frames are roughly going to look like this:
- MediaManager.cpp: Start on the Engine [1]
- MediaEngineWebRTCAudio.cpp: Start on the AudioInputCubeb [2]
- SourceMediaStream OpenAudioInput [3], then the little ControlMessage dance to tell the MSG to open an input. This leads to setting a new AudioCallbackDriver, and asking for a driver switch.
- In GraphDriver.cpp, AudioCallbackDriver::Init (this is super asynchronous), something is going to fail, but it's not switching to a system thread as a fallback.
- We need to plumb in a closure (in the software engineering sense, not C++ sense) that contains a way to get back to the gUM call. This can be using a void* (so we don't dereference the promise off-main-thread), or it can be using the CoatCheck. I think it would be a tad cleaner.

None of this is of high priority, and the stack trace above (based on the current code) will be incorrect in a few weeks, since a bunch of stuff is being rewritten. However the same general idea will apply.

We should still try to understand why [0] is not being hit. It's a fallback path for when audio devices are not available.

[0]: http://searchfox.org/mozilla-central/source/dom/media/GraphDriver.cpp#693
[1]: http://searchfox.org/mozilla-central/source/dom/media/MediaManager.cpp#1229
[2]: http://searchfox.org/mozilla-central/source/dom/media/webrtc/MediaEngineWebRTCAudio.cpp#525
[3]: http://searchfox.org/mozilla-central/source/dom/media/MediaStreamGraph.cpp#2733
[4]: http://searchfox.org/mozilla-central/source/dom/media/systemservices/MediaUtils.h#279
Flags: needinfo?(padenot) → needinfo?(apehrson)
I tested again, and if the speaker is disabled already when we go for the first gUM request, it will be rejected.

However, if we have already made a successful gUM request with audio and then disable the speaker, we just hang.

The simpler STR then seems to be:
Make sure your windows machine has a camera device, an audio input device and an audio output device
Go to https://mozilla.github.io/webrtc-landing/gum_test.html
Start an Audio & Video capture
Disable all audio output devices in the "Sound" control panel

Expected:
Video still flows

Actual:
Video freezes, "FPS now" goes down to 0.0.
Ok good. We need to implement IAudioSession::OnSessionDisconnected and it will be fixed. 

[0]: https://msdn.microsoft.com/fr-fr/library/windows/desktop/dd370797(v=vs.85).aspx
Here's a log file for the steps in comment 3.

When the output gets disabled we start flip-flopping between an audio driver (that fails to init and switches to clock) and a system clock driver (that seemingly cranks the MSG with duration 0 and then switches back to audio).

After the steps were complete, I also re-enabled the output device. At that point I saw one new frame from the camera, but it was still stalled. I don't see the frame reflected in the log, but I do see the audio driver working again, although the stream doesn't progress.

Paul, can you take a look at these bits?
Flags: needinfo?(apehrson) → needinfo?(padenot)
I don't have time right now. I'm keeping the NI.
I think I know why this happens. I hit something similar when stopping audio devices in bug 1299515.

There's no input callback from the audio layer so no samples get pushed into the audio track in the MediaStreamGraph. This blocks the stream that contains both this audio track and the video track that stalls. Since it's blocked, StreamTime doesn't advance and the `delta` in `NotifyPull()` doesn't become positive, [1].

I can put up a small upliftable patch to fix this, by not making the audio source block.


[1] https://searchfox.org/mozilla-central/rev/797c93d81fe446f78babf20894f0729f15f71ee6/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp#357
Assignee: nobody → apehrson
Flags: needinfo?(padenot)
Status: NEW → ASSIGNED
Comment on attachment 8930883 [details]
Bug 1408294 - Append null audio data when audio input underruns.

https://reviewboard.mozilla.org/r/201992/#review207896

This is the wrong way to fix this (but I'm sure it works). How about implementing comment 4? It's not very hard, I can tell you what to do.

I'd like to stop adding hacks to the code base.
Attachment #8930883 - Flags: review?(padenot) → review-
(In reply to Paul Adenot (:padenot) from comment #9)
> Comment on attachment 8930883 [details]
> Bug 1408294 - Append null audio data when audio input underruns.
> 
> https://reviewboard.mozilla.org/r/201992/#review207896
> 
> This is the wrong way to fix this (but I'm sure it works). How about
> implementing comment 4? It's not very hard, I can tell you what to do.
> 
> I'd like to stop adding hacks to the code base.

Sure. My primary reason for this patch is bug 1299515 however.

There I'm making it possible to stop the input device when all tracks it is feeding become disabled - and start it again when a track gets enabled. So the track doesn't actually end, so we have to keep feeding it data or the SourceMediaStream it is in becomes blocked and video stalls (because `delta` is non-positive at [1]).

Long term video would probably get fixed by the switch to pushing data instead of pulling - or going from streams to tracks as first class citizens in the MSG. Both of those are a bit far away at this point I think.

How would you prefer to solve it with this case in mind?


[1] https://searchfox.org/mozilla-central/rev/33c90c196bc405e628bc868a4f4ba29b992478c0/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp#355
Flags: needinfo?(padenot)
I think we ended up deciding to take this, but to try enforce the invariant that either NotifyPull or NotifyInput is called during an iteration.
Flags: needinfo?(padenot)
We said to signal explicitly that we're switching from NotifyInput to NotifyPull. That would work for bug 1299515 but not for this. I can look at enforcing this invariant instead, to try to solve this case too, but I'm not sure how easy that will be.
Have a look. If it looks very bad, then we can just take this instead.
I went with the invariant instead. It was quite easy in the end. It's not so strong though as I cannot enforce either NotifyPull or NotifyInput per iteration, but only NotifyPull or InsertInGraph per iteration. (the first NotifyInputData doesn't insert anything into the graph so a NotifyPull will still happen and is legit)
This is good yeah
Comment on attachment 8932848 [details]
Bug 1408294 - Set iteration state before calling NotifyInputData.

https://reviewboard.mozilla.org/r/203884/#review209378
Attachment #8932848 - Flags: review?(padenot) → review+
Comment on attachment 8930883 [details]
Bug 1408294 - Append null audio data when audio input underruns.

https://reviewboard.mozilla.org/r/201992/#review209382

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp:215
(Diff revision 2)
>    , mSampleFrequency(MediaEngine::DEFAULT_SAMPLE_RATE)
>    , mTotalFrames(0)
>    , mLastLogFrames(0)
>    , mPlayoutDelay(0)
> +#if DEBUG
> +  , mLastAppendTime(TimeStamp::Now())

Why use a timestamp and not the begin/end of iteration ?

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp:617
(Diff revision 2)
> +  MediaStreamGraphImpl* graph = static_cast<MediaStreamGraphImpl*>(aGraph);
> +  TimeStamp currentTime = graph->CurrentDriver()->GetCurrentTimeStamp();
> +  MOZ_ASSERT(currentTime > mLastAppendTime,
> +             "CurrentTime didn't advance since last append");
> +  mLastAppendTime = currentTime;
> +#endif

Factor this.

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp:700
(Diff revision 2)
>        mLastLogFrames = mTotalFrames;
>      }
>    }
>  
> +#ifdef DEBUG
> +  if (mSources.Length() >= 1) {

if (!mSources.IsEmpty())

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp:702
(Diff revision 2)
>    }
>  
> +#ifdef DEBUG
> +  if (mSources.Length() >= 1) {
> +    MediaStreamGraphImpl* graph = mSources[0]->GraphImpl();
> +    TimeStamp currentTime = graph->CurrentDriver()->GetCurrentTimeStamp();

With this.
Attachment #8930883 - Flags: review?(padenot)
Comment on attachment 8930883 [details]
Bug 1408294 - Append null audio data when audio input underruns.

https://reviewboard.mozilla.org/r/201992/#review209382

> Why use a timestamp and not the begin/end of iteration ?

There's also GraphImpl::IterationEnd(), but there's no guarantee that it's increasing: https://searchfox.org/mozilla-central/rev/9f3bd430c2b132c86c46126a0548661de876799a/dom/media/GraphDriver.cpp#979

GetCurrentTimeStamp() gives such a guarantee as it's the wall clock time of when the iteration started.
Attachment #8930883 - Attachment is obsolete: true
Comment on attachment 8940708 [details]
Bug 1408294 - Simplify MSG integration code for MediaPipelineReceive.

https://reviewboard.mozilla.org/r/210970/#review216704
Attachment #8940708 - Flags: review?(docfaraday) → review+
Comment on attachment 8940705 [details]
Bug 1408294 - Assert that NotifyPull produces data.

https://reviewboard.mozilla.org/r/210964/#review216752
Attachment #8940705 - Flags: review?(padenot) → review+
Comment on attachment 8940706 [details]
Bug 1408294 - Make the SourceMediaStream used with AudioCapture produce silence.

https://reviewboard.mozilla.org/r/210966/#review216754
Attachment #8940706 - Flags: review?(padenot) → review+
Comment on attachment 8940707 [details]
Bug 1408294 - Assert that we don't underrun globally.

https://reviewboard.mozilla.org/r/210968/#review216756
Attachment #8940707 - Flags: review?(padenot) → review+
Comment on attachment 8940710 [details]
Bug 1408294 - Don't assume there is always a listener feeding a SourceMediaStream.

https://reviewboard.mozilla.org/r/210974/#review216758
Attachment #8940710 - Flags: review?(padenot) → review+
Comment on attachment 8940709 [details]
Bug 1408294 - Append null audio data when audio input underruns.

https://reviewboard.mozilla.org/r/210972/#review216760
Attachment #8940709 - Flags: review?(padenot) → review+
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e34c65c76ba9
Set iteration state before calling NotifyInputData. r=padenot
https://hg.mozilla.org/integration/mozilla-inbound/rev/359484c0461e
Assert that NotifyPull produces data. r=padenot
https://hg.mozilla.org/integration/mozilla-inbound/rev/484ae7c12c96
Make the SourceMediaStream used with AudioCapture produce silence. r=padenot
https://hg.mozilla.org/integration/mozilla-inbound/rev/2cbf4ca9a05d
Assert that we don't underrun globally. r=padenot
https://hg.mozilla.org/integration/mozilla-inbound/rev/d0172a1b5d25
Simplify MSG integration code for MediaPipelineReceive. r=bwc
https://hg.mozilla.org/integration/mozilla-inbound/rev/0fdd4ce7e10e
Append null audio data when audio input underruns. r=padenot
https://hg.mozilla.org/integration/mozilla-inbound/rev/c7bd57a9d7ef
Don't assume there is always a listener feeding a SourceMediaStream. r=padenot
You need to log in before you can comment on or make changes to this bug.