Closed Bug 1403714 Opened 7 years ago Closed 7 years ago

Firefox stops sending video in low-bandwidth conditions and doesn't recover

Categories

(Core :: WebRTC, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 + verified
firefox57 + verified
firefox58 --- verified

People

(Reporter: drno, Assigned: pehrsons)

References

Details

(Keywords: regression, site-compat)

Attachments

(1 file)

Something causes Firefox on appear.in in rooms with an SFU to stop sending video some time into the call.
Audio continues to work, but from a users perspective the remote video freezes. And on the network it was confirmed that Firefox no longer sends video packets.
Andreas, I just assigned this to you because I heard you are looking into this. Feel free to correct, if that was wrong information.
Flags: needinfo?(apehrson)
I am looking indeed.
Status: NEW → ASSIGNED
Flags: needinfo?(apehrson)
It's the VideoAdapter in VideoConduit that tells us to drop frames, from [1].

It only logs it every 90 frames though, which is probably why I missed it in earlier logs.

Here the first log in my last run;
> 2017-09-28 09:55:16.906098 UTC - [VideoFrameConverter #1]: D/webrtc_trace (videoadapter.cc:151): VAdapt Drop Frame: scaled 0 / out 3907 / in 3997 Changes: 0 Input: 640x480 timestamp: 0 Output: i0


[1] http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/media/webrtc/trunk/webrtc/media/base/videoadapter.cc#151
[Tracking Requested - why for this release]:
In a WebRTC call, typically with multiple participants, Firefox might stop sending video. Audio still works.

In a bit more detail:
When max-fs is *not* present in the sdp we represent it as having the value 0. If we then are under enough load, or perhaps a bandwidth constraint, that webrtc.org code (our encoder stack, more or less) considers it useful to scale the video stream we send down a notch, we tell it to use a maximum frame size of 0 pixels (from the max-fs value), effectively bringing sent video to a halt.

In full detail: see http://searchfox.org/mozilla-central/rev/3dbb47302e114219c53e99ebaf50c5cb727358ab/media/webrtc/signaling/src/media-conduit/VideoConduit.cpp#1913.
this might potentially affect P2P interop with chrome which does not set max-fr or max-fs in the SDP
A confirmed workaround is for the application to set max-fs in the sdp.
This seems easiest to trigger by being in a low-bandwidth environment.
Summary: Firefox stops sending video on appear.in SFU → Firefox stops sending video in low-bandwidth conditions and doesn't recover
Comment on attachment 8913301 [details]
Bug 1403714 - Only regard max_fs when set explicitly.

https://reviewboard.mozilla.org/r/184674/#review189836

Thank you for investigating and fixing this.

It would be nice to have a test for this, but as far as I know we have no way to simulate a low bandwidth situation from a Mochitest and the media conduit gtests are not in good shape.

::: media/webrtc/signaling/src/media-conduit/VideoConduit.cpp:1912
(Diff revision 1)
>      // limit sink wants based upon max-fs constraint
>      int max_fs = mCurSendCodecConfig->mEncodingConstraints.maxFs*(16*16);
>      rtc::Optional<int> max_pixel_count = wants.max_pixel_count;
>      rtc::Optional<int> max_pixel_count_step_up = wants.max_pixel_count_step_up;
>  
> -    if (max_pixel_count.value_or(max_fs) > max_fs) {
> +    if (max_fs > 0 && max_pixel_count.value_or(max_fs) > max_fs) {

I think this would read better with one
if(max_fs > 0)
and the existing if statements nested inside.
Attachment #8913301 - Flags: review?(dminor) → review+
I have confirmed this fix locally and can no longer reproduce. I'll file a followup for covering this with a unit test.

To reproduce I put one client in an appear.in premium room and loaded it with clients from other machines.

Appear.in might soon deploy the workaround from comment 6 to their sfu so don't bet on these steps working for long.


I imagine the easiest way to reproduce would be to do a 1-on-1 with a remote machine over a weak link, like wifi in a loaded office. Or use a network shaper to limit bandwidth and perhaps introduce packet loss.
To make sure the service is eligible for the bug, also ensure that the sdps in use don't contain max-fs on about:webrtc.
possibly adding b=TIAS lines in the tests could help. Depends on how long you need to get...
Blocks: 1404039
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/6314d54af8b8
Only regard max_fs when set explicitly. r=dminor
We'll want to get uplift requests for beta 57 and release 56 in place, and also manual QA verification if possible on a (Try) build of release with the patch, so we're in a position to consider it as a ride-along for any 56 point release.  It's a Very Safe patch.

And thanks!
Flags: needinfo?(apehrson)
Comment on attachment 8913301 [details]
Bug 1403714 - Only regard max_fs when set explicitly.

Approval Request Comment
[Feature/Bug causing the regression]: 1393687
[User impact if declined]: High chances that Firefox during a WebRTC stops sending video.
[Is this code covered by automated tests?]: The positive case from bug 1393687 yes, but the actual faulty behavior not yet.
[Has the fix been verified in Nightly?]: No
[Needs manual test from QE? If yes, steps to reproduce]: Yes please.
- Open https://jsfiddle.net/yxbLvjm6/14/
- Allow access to camera and microphone
- Verify that two videos of yourself appear in the lower right corner
- Leave the call running for 30 seconds
- Verify that the both videos are still showing movement and are not frozen
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: The change is only a one line fix which ensure that the maximum bandwidth never goes to zero.
[String changes made/needed]: N/A
Flags: needinfo?(apehrson)
Attachment #8913301 - Flags: approval-mozilla-beta?
Tracked for 57 since it's a recent regression. Liz FYI, potential dot release ride-along.
Flags: needinfo?(lhenry)
Comment on attachment 8913301 [details]
Bug 1403714 - Only regard max_fs when set explicitly.

Approval Request Comment
[Feature/Bug causing the regression]: 1393687
[User impact if declined]: High chances of Firefox stopping to send video if the machine gets under load.
[Is this code covered by automated tests?]: The patch from bug 1393687 has test coverage, but does not cover the case causing this bug here. Automated test will come in bug 1404039.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: Yes please.
- Open https://jsfiddle.net/yxbLvjm6/14/
- Allow access to camera and microphone
- Verify that two videos of yourself appear in the lower right corner
- Leave the call running for 30 seconds
- Verify that the both videos are still showing movement and are not frozen
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: The change is only a one line fix which ensure that the maximum bandwidth never goes to zero.
[String changes made/needed]: N/A
Attachment #8913301 - Flags: approval-mozilla-release?
(In reply to Nils Ohlmeier [:drno] from comment #18)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=09c519bd62b6eb2fe383ce594e722edfacb9fd82

Try build on Beta.
(In reply to Nils Ohlmeier [:drno] from comment #20)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=904910d851d2bab9d84d6a662a2027a9447ee0c6

Try run for release.
https://hg.mozilla.org/mozilla-central/rev/6314d54af8b8
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
We could consider this as a ridealong, looks like it just missed uplift for the RC2 build in bug 1393687. 
The fix there didn't sound urgent or like a new problem (see https://bugzilla.mozilla.org/show_bug.cgi?id=1393687#c13 and https://bugzilla.mozilla.org/show_bug.cgi?id=1393687#c49)  

Is this a new regression? Is it a widespread problem? Any different from the situation on 9/18?
Flags: needinfo?(lhenry) → needinfo?(apehrson)
Also am I understanding correctly that appear.in has a fix coming?
This is a new regression caused by the patch in bug 1393687 that made it into 56, see [1]. Thus this was present on 9/18, but not something we tested for as part of that bug. I would note though that what bug 1393687 fixed and got uplifted did indeed regress in 56. There was a second part that landed later, was not uplifted, and regressed in 52 according to reports.

It affects everyone sending video in WebRTC to other clients than Firefox, so I'd say widespread, yes. Appear.in has deployed the workaround in comment 6 for both their premium and regular rooms. Whether other services have worked around it I'm not sure, but plenty have been reached out to.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1393687#c35
Flags: needinfo?(apehrson) → needinfo?(lhenry)
OK, thanks for the further explanation (and the try run on release, btw!) 
Are there other services/sites we can test with?

Andrei, can your team verify the fix?  I know your team is overloaded with 57 testing, so I am trying to keep 56 requests to a minimum...
Flags: needinfo?(lhenry) → needinfo?(andrei.vaida)
Can the max-fs setting show up only in offer SDP? Or the answer SDP need to have this setting?
lizzard: appear.in was fixed for 56 and 57 in the p2p mode. If you can change the useragent (which can be done by setting in general.useragent.override apparently) to the FF58 one you should still be able to reproduce in 56 or 57 with the following steps:
1) go to https://appear.in/some-random-room-name-you-use-for-mozilla-qa?bandwidth=100 in Firefox
") go to https://appear.in/some-random-room-name-you-use-for-mozilla-qa?bandwidth=100 in Chrome, on the same machine
This froze pretty instantly for me.

ke shen: add max-fs to any setRemoteDescription call that does not have it, regardless of the type.
(In reply to Ke Shen from comment #27)
> Can the max-fs setting show up only in offer SDP? Or the answer SDP need to
> have this setting?

This is more a question for the IETF, but max-fs can be in the answer only - it says "don't send me anything larger than this please".  There's no negotiation around it; either side can specify a max-fs or not specify it.
(In reply to Randell Jesup [:jesup] from comment #29)
> (In reply to Ke Shen from comment #27)
> > Can the max-fs setting show up only in offer SDP? Or the answer SDP need to
> > have this setting?
> 
> This is more a question for the IETF, but max-fs can be in the answer only -
> it says "don't send me anything larger than this please".  There's no
> negotiation around it; either side can specify a max-fs or not specify it.

So to prevent this bug from affecting your service max-fs needs to be in the remote SDP. Depending on if Firefox is the offerer or answerer, its needs to be in the opposite one.

So safest is to put some value in all SDPs. The values can also simply be insanely high. As long as it's present and not 0 you should get hit by this bug.
I just tried verification with 58 build id 20170929100122 on https://jsfiddle.net/yxbLvjm6/14/, but it fails (it still freezes).
(In reply to Nils Ohlmeier [:drno] from comment #31)
> I just tried verification with 58 build id 20170929100122 on
> https://jsfiddle.net/yxbLvjm6/14/, but it fails (it still freezes).

Doubled checked. The latest build of Nightly does not have the fix yet.
I just verified with the fiddle that the problem is fixed in 58.0a1 (2017-09-29) with build id 20170929220356.
https://jsfiddle.net/o5subvcb/1/ shows how to munge the SDP to add max-fs (and max-fr) to both VP8 and VP9
Keywords: site-compat
Comment on attachment 8913301 [details]
Bug 1403714 - Only regard max_fs when set explicitly.

Fix a recent regression, taking it.
Should be in 57b5
Attachment #8913301 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I managed to reproduce the bug using 56.0 build 6 and beta 57.0b4 on Windows 10x64. I tested using the steps from comment 17 and comment 28. 

I retested everything using the latest Nightly 58.0a1 on Windows 10x64, Ubuntu 16.04x64 and Mac OS 10.11 and the bug is not reproducing anymore. But I have to mention that there is still some jerkiness in the received video.
(In reply to Oana Botisan from comment #36)
> I retested everything using the latest Nightly 58.0a1 on Windows 10x64,
> Ubuntu 16.04x64 and Mac OS 10.11 and the bug is not reproducing anymore. But
> I have to mention that there is still some jerkiness in the received video.

Thanks for checking. And yes jerkiness of the video in this special test case is expected as the test artificially limits the bandwidth to a super low value to trigger the bug. So we don't expect the video to ever look great. The important factor is if the second, remote video moves at all.
Let's keep this tracked in 56 and we can uplift it as a ridealong in case we have a dot release driver.
We have a planned dot release for 64-bit migration only, which will likely happen first.
Flags: needinfo?(andrei.vaida)
the max-fr does not work in nightly 58.0a1(2017-10-12)64bit  &  Firefox Quantum 57.0b7(64bit)- beta , we open camera with 1080P 30fps , and set remote sdp with max-fs = 920, max-fr = 15, we find that the framerate send is more than 15fps .(VP8/VP9)
the case with codec h264 should also be considered  which control the framerate with max-mbps & max-fs.
Andreas, should we reopen this for 58 and possibly 57 because of comment 40? Did something else change?
Flags: needinfo?(apehrson)
Or, if we should open a followup bug instead, please do that.
This appears to be a negative interaction between max-fr and max-fs when max-fs is set to a low value.

I did a bit of experimentation and with max-fr set to 15 and max-fs set to below 1300, the framerate is limited to around 16 - 18 frames per second rather than the 15 requested. If I don't set max-fr, the camera runs at 30. I'll open a new bug.

The parameters for h.264 is Bug 1376255.
Flags: needinfo?(apehrson)
Thanks Dan. Liz, I consider the max-fr issue in comment 40 to be of much less importance than this, and I think it's irrelevant to this bug.
I'm putting together a 56 dot release now (with 3 weeks to go in the release cycle). 
If you would like to include this fix, can you request uplift to m-r? Thanks!
Flags: needinfo?(apehrson)
Thanks. There's an uplift request in comment 17.
Flags: needinfo?(apehrson) → needinfo?(lhenry)
Comment on attachment 8913301 [details]
Bug 1403714 - Only regard max_fs when set explicitly.

Ridealong for 56.0.2
Flags: needinfo?(lhenry)
Attachment #8913301 - Flags: approval-mozilla-release? → approval-mozilla-release+
Flags: qe-verify+
The bug was verified using the steps from comment 17 on RC 56.0.2 - build 1 and beta 57.0b11 on Windows 10 x64, Windows 7 x64, Ubuntu 16.04 x64 and macOS 10.13 and the bug is not reproducing anymore. Even after a few minutes the second video didn't freeze.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: