Closed Bug 803799 Opened 7 years ago Closed 7 years ago

Audio delay caused by blocking MediaGraph for too long at getUserMedia startup

Categories

(Core :: WebRTC: Audio/Video, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
mozilla19
Tracking Status
firefox18 --- unaffected

People

(Reporter: jesup, Assigned: jesup)

Details

(Whiteboard: [getUserMedia] [blocking-gum+])

Attachments

(1 file, 1 obsolete file)

When getting audio+video getUserMedia(), on Windows and occasionally Mac the audio will be delayed by ~1 second.

The cause is that sometimes the camera takes a while to start, and VieBase->StartCapture() doesn't return quickly.  This means that we can't call VideoSource->Start() from the MediaGraph callback, so we need to proxy it to another thread.
Comment on attachment 673528 [details] [diff] [review]
Move WebRTC media start off MediaGraph; add MediaManager mutex

Initial cut at a solution (works on Linux, crashes on Windows)

Add a mutex to the MediaManager singleton (this should have always been there).
Move code to create the mMediaThread into MediaManager::GetThread()

Add a MediaOperation runnable that runs on mMediaThread, to proxy the consumption-changed notifications from MediaGraph (since starting a camera capture can take seconds to return).
Attachment #673528 - Flags: feedback?(roc)
Attachment #673528 - Flags: feedback?(anant)
Comment on attachment 673528 [details] [diff] [review]
Move WebRTC media start off MediaGraph; add MediaManager mutex

This makes MediaManager quite complicated! This problem wasn't occurring before because we turned on the Camera during Allocate() - which is already called off the main thread - but we changed it to happen in Start() when fixing bug 802411.

I think that's a much more elegant solution, though it does mean the camera will be turned on even if the stream is never used, I think it's well worth the trade off here.
Attachment #673528 - Flags: feedback?(anant)
Alternatively, I also think it might be better for MediaEngineWebRTC to start the camera off the main thread and return from Start() quickly, it will be a simpler  solution than the current one because no locking is required.

I prefer the solution in comment #3, though.
I actually prefer to start the camera as soon as permission is given.  In some cases it will save you a 1-3 second camera start time, and it also reminds you that something still has permission to use the camera any time it wants (avoiding "snoop" apps that turn off the camera after say taking a picture, but hold a copy of the MediaStream to re-enable access whenever it wants).  We'll *really* need to implement stop() though, and the other issues dealt with in this patch still apply.
FYI, it's not quite as simple a switch as currently the channels are allocated off-main-thread *before* a main-thread runnable is created to create the InputStream and return success, so modifying it to do this in the current ProcessGetUserMedia() call will be tricky, and we can't do it on the MainThread runnable that creates the InputStream.  So either we call back off-main-thread again after creating the InputStream to start the channels, or we start them first off-main-thread and assign streams to them after the fact.  I'm tempted to minimize the swizzling by simply dispatching an off-main-thread runnable again once the InputStream has been created, which can basically do what was being done from the NotifyQueuedTrackChanges callback.
Sorry s/NotifyQueuedTrackChanges/NotifyConsumptionChanged/
Attachment #673528 - Attachment is obsolete: true
Attachment #673528 - Flags: feedback?(roc)
Comment on attachment 673690 [details] [diff] [review]
Start gUM streams in Success callback; add MediaManager mutex

No delay on Windows audio with this.  Note you can't stop a stream with just this patch
Attachment #673690 - Flags: review?(roc)
Attachment #673690 - Flags: review?(anant)
Comment on attachment 673690 [details] [diff] [review]
Start gUM streams in Success callback; add MediaManager mutex

Review of attachment 673690 [details] [diff] [review]:
-----------------------------------------------------------------

Added some commentary on why things are done the way they are

::: content/media/webrtc/MediaEngineWebRTCVideo.cpp
@@ +250,5 @@
>  
>    mSource->AddTrack(aID, USECS_PER_S, 0, new VideoSegment());
>    mSource->AdvanceKnownTracksTime(STREAM_TIME_MAX);
>    mLastEndTime = 0;
> +  mState = kStarted;

Note: this was important once we got Start() off the MediaGraphThread, since the NotifyPull() code doesn't push anything at the track (even null images) if it thinks we're not in the kStarted (aka running) state.

::: dom/media/MediaManager.h
@@ +94,5 @@
> +    : mType(aType)
> +    , mAudioSource(aAudioSource)
> +    , mVideoSource(aVideoSource)
> +    , mStream(nullptr)
> +    , mSourceStream(aStream)

We need this separate constructor and the raw SourceMediaStream argument because the Listener can't create a MediaOperation runnable and copy the refptr'd stream during a MediaGraph callback, since nsDOMMediaStream isn't thread-safe.

@@ +160,5 @@
> +    }
> +    if (mType != MEDIA_RELEASE) {
> +      // nsDOMMediaStreams aren't thread-safe... sigh.
> +      mType = MEDIA_RELEASE;
> +      NS_DispatchToMainThread(this);

We can't even create a "ReleaseMe" runnable and assign over the stream to it while off main-thread, so we have to re-use this runnable (which is actually easy).

@@ +200,5 @@
> +    // We can't take a chance on blocking here, so proxy this to another
> +    // thread.
> +    // XXX FIX! I'm cheating and passing a raw pointer to the sourcestream
> +    // which is valid as long as the mStream pointer here is.  Need a better solution.
> +    runnable = new MediaOperationRunnable(MEDIA_STOP, 

yeah, trailing space

@@ +278,5 @@
> +  }
> +  static nsIThread* GetThread() {
> +    MutexAutoLock lock(Get()->mMutex); // only need to call Get() once
> +    if (!sSingleton->mMediaThread) {
> +      NS_NewThread(getter_AddRefs(sSingleton->mMediaThread));

Little need to protect against failure here unless we want to log the failure or force a crash always, which nothing else I looked at does for failed thread creation.  This just effectively captures a repeated sequence done outside the class.  Calling code will get NULL back if it fails in any case, and it can assert or otherwise fail the operation.
I'm still not sure I understand why all this is necessary, wouldn't moving the GIPS StartCapture call from Start() to Allocate() in MediaEngineWebRTCVideo be sufficient?
We could.  We'd have to move the Render calls as well (unless we're certain they can't block), and so something similar for Audio.  It does lead to a confusing API for the Video/AudioSource's, in that allocate starts the camera/mic, and start enables insertion.  It means that if we want to have different behaviors about when the camera starts and stops, we'd need to refactor the A/VSource code again to do so.  

And if we do it in Allocate, we'll block starting of streams by up to a second or more before we start up the MediaStream tracks.  In practice, this would delay audio start in an audio+video stream by up to a second, or more.  Right now audio is added and started first (and is fast), then video which can stall the mMediaThread in StartCapture.  This lets audio work while waiting for video to actually start.  And both are done *after* sending success back to mainthread, so the app can hook up the MediaStream to a video element.

If we really wanted to do this, I'd separate Allocate from StartDevice (which actually starts capturing and throwing away the data) from StartTrack (which adds the track and turns on insertion).  This allows the overlying code to decide when these things happen, instead of baking it all the way down to the lowest API level.  I think those decisions should remain at the MediaManager level.  And I'd keep the ordering.  We could even consider using a thread pool (if we're worried about blocking lots of operations on mMediaThread).  But I don't see a need for that yet, and it wouldn't be hard to implement given the current implementation.
The reason I mention this is that the original MediaEngine interface behaved that way, i.e. Allocate() would start the camera. This is why there is extra logic in MediaManager.cpp to make sure that is done off the main thread, and I was looking to avoid duplicating that again to create yet another runnable with locks.

It was then expected that Start() would be called on the main thread to make sure the track insertion can be done correctly: http://hg.mozilla.org/mozilla-central/file/b80e81edc9f1/content/media/webrtc/MediaEngineWebRTCVideo.cpp#l178

We only changed this behavior during the refactor, and in retrospect I don't think we should have. I don't think the delay is a problem, since Start isn't called until much after Allocate for typical use cases. Also, we don't send the success callback to the main thread until after Allocate returns (which will block, if we move StartCapture/Render there). It also not possible to call Start until Allocate returns, on any thread.

It is definitely cleaner to separate StartDevice from Allocate. But I don't think we lose a whole lot either, since if Start() isn't called, the frames don't go anywhere, so the performance impact is minimal. Would there be a use case in the future where we'd want to Allocate a device but not start it? Or are you thinking more along the lines of us changing our minds about when the camera should turn on?

The smallest possible patch to solve this is to move those 10 lines from Start to Allocate, which is mainly why I prefer it :)
(In reply to Anant Narayanan [:anant] from comment #13)
> it? Or are you thinking more along the lines of us changing our minds about
> when the camera should turn on?

I think we should change our minds about when the camera turns on. I don't like the idea of a page asking for permission as soon as it loads, and then turning on the camera minutes or even hours later when it finally connects the stream to a consumer.
Yes, I think so too, so reverting back to the behavior we had before the refactor seems to be the way to go. Moving StartDevice to Allocate will do just that.
(In reply to Anant Narayanan [:anant] from comment #13)

> We only changed this behavior during the refactor, and in retrospect I don't
> think we should have. I don't think the delay is a problem, since Start
> isn't called until much after Allocate for typical use cases. Also, we don't
> send the success callback to the main thread until after Allocate returns
> (which will block, if we move StartCapture/Render there). It also not
> possible to call Start until Allocate returns, on any thread.
> 
> It is definitely cleaner to separate StartDevice from Allocate. But I don't
> think we lose a whole lot either, since if Start() isn't called, the frames
> don't go anywhere, so the performance impact is minimal. Would there be a
> use case in the future where we'd want to Allocate a device but not start
> it? Or are you thinking more along the lines of us changing our minds about
> when the camera should turn on?
> 
> The smallest possible patch to solve this is to move those 10 lines from
> Start to Allocate, which is mainly why I prefer it :)

The smallest possible patch is not always the right patch ;-)

If we simply move those lines to Allocate(), then we will delay the Success callback until *all* the Allocate calls have succeeded, since we don't DispatchToMainThread the GetUserMediaStreamRunnable until the allocates are done.  This means if I ask for audio+video, we won't get success in the app until both (i.e. video) have started, however long that takes.  Clearly on windows this can add 1 second to getting the stream (and in my testing *always* adds 1 second on windows), and might take more.

You do *NOT* want to hit "answer" in your video+voip app and have it take a second to go mic-hot.  This would be horribly disconcerting.  (For that matter, I don't want a second to get video, and will see if there's *some* way to reduce that later - but even on systems where the StartCapture returns immediately, it can take 1-3 seconds to actually start getting frames, perhaps more.)

The current patch *works*, and may provide a framework for making future blocking WebRTC operations asynchronous easily.

And the patch does turn on the camera as soon as getUserMedia returns.  (There are impacts of this I was discussing in the bug and with roc; see the sister bug about supporting .stop(), and we'll want to add code to stop streams when a MediaStream refcount goes to 0 for example).
Whiteboard: [getUserMedia] [blocking-gum+]
Comment on attachment 673690 [details] [diff] [review]
Start gUM streams in Success callback; add MediaManager mutex

Review of attachment 673690 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! Not a big fan of the locking but I think it's clear the alternative is worse.

::: dom/media/MediaManager.cpp
@@ +813,5 @@
> +    } else if (audiodevice || videodevice) {
> +      // Stream from provided device.
> +      gUMRunnable = new GetUserMediaRunnable(
> +        audio, video, picture, onSuccess.forget(), onError.forget(), listeners,
> +        windowID, 

Trailing space

@@ +822,5 @@
> +      // Stream from default device from WebRTC backend.
> +      gUMRunnable = new GetUserMediaRunnable(
> +        audio, video, picture, onSuccess.forget(), onError.forget(), listeners,
> +        windowID
> +                                             );

Indentation
Attachment #673690 - Flags: review?(anant) → review+
Comment on attachment 673690 [details] [diff] [review]
Start gUM streams in Success callback; add MediaManager mutex

Review of attachment 673690 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaManager.h
@@ +68,5 @@
> +typedef enum {
> +  MEDIA_START,
> +  MEDIA_STOP,
> +  MEDIA_RELEASE
> +} MediaOperation;

I still think this is a bit nasty and probably unnecessary. At least document why you're doing things this strange way.

@@ +72,5 @@
> +} MediaOperation;
> +
> +// Generic class for running long media operations off the main thread
> +// Because nsDOMMediaStreams aren't threadsafe, re-send ourselves to
> +// MainThread to release mStream.

This comment isn't very clear. It seems to me that START and STOP are going to happen off the main thread (but on which thread?), and RELEASE happens on the main thread. Say so.

@@ +103,5 @@
> +  {
> +    // No locking between these is required as all the callbacks for the
> +    // same MediaStream will occur on the same thread.
> +    if (mStream)
> +      mSourceStream = mStream->GetStream()->AsSourceStream();

{}
Attachment #673690 - Flags: review?(roc) → review+
Looks like we need to update /dom/tets/mochitest/general/test_interfaces.html to include LocalMediaStream as well.
https://hg.mozilla.org/mozilla-central/rev/d6a3f003d316
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Keywords: verifyme
Verified on 10/28 nightly build by running anant's test page for audio and audio+video, generating sound at different intervals and loudness, and verifying that the sound is heard immediately.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.