Closed Bug 1521577 Opened 5 years ago Closed 5 years ago

1.5 seconds audio delay in appear.in

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P1)

66 Branch
Unspecified
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla67
Tracking Status
relnote-firefox --- 65+
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- verified
firefox66 --- verified
firefox67 --- verified

People

(Reporter: achronop, Assigned: pehrsons)

References

Details

(Keywords: regression)

Attachments

(2 files)

During 1-1 in appear.in, the far end reported that audio was behind about 1.5 secs comparing to video. I run mozregression with Dan as the remote party and the response was:

daaecb62f373 Andreas Pehrson — Bug 1509548 - Make MediaStreamGraph pull data per track instead of per stream. r=padenot
311cee86bc66 Andreas Pehrson — Bug 1509548 - Remove early PullNewData return. r=padenot
dfde7b2c53a6 Andreas Pehrson — Bug 1509548 - Clean up what appears to be wip-leftover gunk in DecodedStream. r=jya
8a064f0dbb2f Andreas Pehrson — Bug 1509548 - Rename track-end-time methods in StreamTracks. r=padenot
1d1b47e23537 Andreas Pehrson — Bug 1509548 - Remove the concept of a known tracks time from MediaStreamGraph. r=padenot

In order to reproduce it you need to enter in an appear.in room and observe the time lag of audio compering to video (built in devices). It's 100% reproducible on my Linux box. I haven't observed it on OSX.

Feel free to ping me if you need a remote party to reproduce it or even further debugging.

Flags: needinfo?(apehrson)

For what it's worth, I always use Linux for 1:1s and today's conversations with Alex were the first time I've noticed any substantial delay in audio. I wonder if there is something particular with his setup that is triggering this bug.

Just to confirm, this was on a build with bug 1513973?

Flags: needinfo?(apehrson) → needinfo?(achronop)

Yes I have it in latest Nightly. We can try it on a 1-1 call.

Flags: needinfo?(achronop)

It's very possible that my setup is triggering this, otherwise we would have noticed sooner, it's very obvious and it is happening every time. I hadn't done an 1-1 call in my Linux box for a long time, I noticed for first time last week.

An external camera, on the same setup, works fine. The issue is not reproducible there.

(In reply to Alex Chronopoulos [:achronop] from comment #5)

An external camera, on the same setup, works fine. The is issue is not reproducible there.

I use an external camera as well.

(In reply to Dan Minor [:dminor] from comment #6)

I use an external camera as well.

Hmm you could try removing it for your next 1-1 and ask the remote party to report any time lag.

I cannot reproduce. My MBP is a newer one with touchbar. Running Ubuntu 18.04, the integrated microphone and speakers are not working (I'm using a usb soundcard). The internal camera is connected via USB.

No latency noticed on https://mozilla.github.io/webrtc-landing/gum_test.html with Audio&Video being unmuted. Same on appear.in without a peer and with a same-machine peer. Can you confirm whether you can reproduce on gum_test? It was the case for bug 1513973 and if true it would certainly help debug this.

If you can repro on gum_test, knowing whether you can repro with just Audio and/or Audio&Video helps to narrow it down too.

Flags: needinfo?(achronop)

I can reproduce it locally in gum test page and on appear.in with the media element of the local media un-muted. Just Audio on gum test page does not show any delay, Audio&Video has a delay of 1.5-2 seconds. I have recorded an rr run.

I tried with cubeb remote off. I cannot reproduce it locally but the remote party is experiencing delay.

Next, after talking with Andreas on IRC, I'll backout bug 1513973, then bug 1509548 and will record under rr keeping cubeb remote on.

Flags: needinfo?(achronop)
Attached patch bug-1521577-testSplinter Review

This patch solves the error. I am not sure if it is something that we want to do, but at least it can help to move this forward.

Attachment #9038857 - Attachment is patch: true
Attachment #9038857 - Attachment mime type: application/octet-stream → text/plain

Too late for 65.0 at this point, but I'll leave it set to fix-optional in case there's a good dot release ride-along patch available down the road.

(In reply to Alex Chronopoulos [:achronop] from comment #10)

Created attachment 9038857 [details] [diff] [review]
bug-1521577-test

This patch solves the error. I am not sure if it is something that we want to do, but at least it can help to move this forward.

Thanks for this, it helps. It verifies where the problem is, and I've been successful in reproducing with the help of sleep().

Assignee: nobody → apehrson
Status: NEW → ASSIGNED
Rank: 8
Priority: P2 → P1

This is similar to bug 1513973 in that one track in the stream makes the stream blocked.

Audio starts first, and enables pulling before we start video - this starts append silence to the audio track.
Since video hasn't started yet, and thus hasn't enabled pulling, the video track is empty.
In the MediaStream in the graph there is now an audio track with data (silence) and a video track with no data. Since one track has no data, the MediaStream is blocked, and neither track will be consumed.

This is in a way legacy from media element capture where both tracks are pushed and the graph rather than the producer is responsible for keeping them in sync.

But here both tracks are realtime so we want to consume whatever is available.

With the per-track pulling settings we have today it is clearly the intention
of the producer to start consuming a track that has pulling enabled, even if
there are other tracks where pulling is disabled.

This patch fixes that, so that when a stream has at least one pulled track it
will append null data to other tracks (commonly video) if they are lacking
data.

This means that we still block an entire stream if all tracks have pulling
disabled - to maintain the status quo for media element capture, which is
the only push-only producer of data.

I've verified this patch with an artificial delay of camera startup on linux (sleep()ing the mediamanager thread). Beware that with cubeb sandboxing turned on, you can get false positives when reproducing because pulseaudio can incur an output buffer delay due to the sandbox itself, so it's best to test this with the sandbox off. Especially on slow machines (or under rr).

I have verified the patch locally. The error does not repro with the patch on.

Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/a21454f433a1
Don't block a MediaStream where at least one track is pulled. r=achronop
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Please request Beta approval on this when you get a chance. Also keeping this on the radar for possible 65 dot release inclusion down the road.

Flags: needinfo?(apehrson)

Comment on attachment 9039049 [details]
Bug 1521577 - Don't block a MediaStream where at least one track is pulled. r?achronop

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

Bug 1509548

User impact if declined

Delayed audio from microphone when using microphone and camera

Is this code covered by automated tests?

No

Has the fix been verified in Nightly?

No

Needs manual test from QE?

Yes

If yes, steps to reproduce

This is tricky to reproduce. It is triggered by a camera that is slow to start. We've seen it appear mostly on linux but other platforms are in theory affected as well. A slow and/or busy machine could also be making it worse. Try different combinations of cameras, machines and load on linux and you should be able to reproduce.
STR (linux)
Set the pref "media.cubeb.sandbox" to false (it could add further delay that makes it impossible to regression test)
Go to https://mozilla.github.io/webrtc-landing/gum_test.html
Click Audio&Video
Approve the request with the camera/mic combination you want to test
Uncheck "Muted"
Approximate the delay by tapping the microphone

If there's noticable delay (0.5s+) without the fix, it's worth testing the fix in the same environment

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

It makes a change to when we block a stream vs when we let it continue although one track doesn't have enough data (if done wrong this would make audio and video become out of sync). The context in which this change applies is very narrow, which is why it's low risk. I've done an audit and it only applies to streams from GetUserMedia where both audio and video were requested at the same time, i.e., the very case this is fixing. This case is entirely real-time so it's not possible to go out of sync with this fix.

String changes made/needed

Flags: needinfo?(apehrson)
Attachment #9039049 - Flags: approval-mozilla-beta?

Comment on attachment 9039049 [details]
Bug 1521577 - Don't block a MediaStream where at least one track is pulled. r?achronop

Let's give this a try for beta 4.
If we can verify this on beta it has a better chance of getting into 65.

Attachment #9039049 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1523817

I could reproduce this issue on

Version 66.0a1
Build ID 20190121113514
User Agent Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0

System specifications:
Ubuntu 18.04 LTS
Memory 4.8 GiB
Processor AMD A8-7410 apu with amd radeon r5 graphics x4
Graphice AMD Mullins
GNOME 3.28.1

I followed STR from comment 21 and observed noticeable audio delay.

I verified the bug fix on the current nightly 67.0a1 and CI beta 66.0b4 build and things are working as expected. There is no audio delay.

Version 67.0a1
Build ID 20190130094929
User Agent Mozilla/5.0 (X11; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0

Version 66.0b4
Build ID 20190131144541
User Agent Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Please request release approval on this when you get a chance.

Flags: needinfo?(apehrson)

Comment on attachment 9039049 [details]
Bug 1521577 - Don't block a MediaStream where at least one track is pulled. r?achronop

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

Bug 1509548

User impact if declined

Super long delay when in some cases: it depends on the type of device and also on the OS it seems, but it's easy to reproduce when the right condition, that are not too rare, are met.

Is this code covered by automated tests?

Yes

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

No

If yes, steps to reproduce

Done already, and verified in beta.

List of other uplifts needed

none

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Not risky, it has baked for a long time in beta and nightly

String changes made/needed

none

Attachment #9039049 - Flags: approval-mozilla-release?

Comment on attachment 9039049 [details]
Bug 1521577 - Don't block a MediaStream where at least one track is pulled. r?achronop

[Triage Comment]
Fixes lag over WebRTC calls in some circumstances. Approved for 65.0.1.

Attachment #9039049 - Flags: approval-mozilla-release? → approval-mozilla-release+
Flags: needinfo?(apehrson) → qe-verify+
Flags: needinfo?(kkumari)
Whiteboard: [qa-triaged]

I also reproduced this issue following the STR's provided in comment 21. I can confirm this issue is fixed on Fx 65.0.1 RC. I verified on Ubuntu 16.04 LTS.

Flags: qe-verify+
Flags: needinfo?(kkumari)
QA Whiteboard: [qa-triaged]
Whiteboard: [qa-triaged]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: