Closed Bug 1690832 Opened 1 year ago Closed 7 months ago

Terrible video quality after negotiating video tracks 3 times

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

Firefox 87
Unspecified
All
defect

Tracking

()

RESOLVED DUPLICATE of bug 1654112
Tracking Status
firefox-esr78 --- unaffected
firefox85 --- wontfix
firefox86 --- wontfix
firefox87 + wontfix
firefox88 + wontfix
firefox89 --- wontfix

People

(Reporter: nazar, Assigned: mjf)

References

(Regression)

Details

(Keywords: regression)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0

Steps to reproduce:

  1. Go to https://jsbin.com/huxibegado/edit?js,output
  2. Add video track
  3. [optional] Remove video track (optional because doesn't impact results with these steps)
  4. Add second video track
  5. [optional] Remove video track
  6. Add third video track

Actual results:

Regardless of optional steps above video quality and FPS goes down dramatically, with active motion I was able to see FPS as low as 2FPS.
From the testing I've done, audio (if captured) also starts crackling, not just video.

Expected results:

Video quality should be consistent no matter how many renegotiations happen (especially assuming video tracks are removed).

The Bugbug bot thinks this bug should belong to the 'Core::Audio/Video: Playback' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Component: Audio/Video: Playback → WebRTC: Audio/Video
OS: Unspecified → All

Reproducible on macOS 11.2 in Firefox Stable and Linux in Firefox Nightly, which leads me to believe that it is not platform-specific and is not a regression.
It causes major issues for our users at Restream.io that are using our Studio product for live streaming from browser.

[Tracking Requested - why for this release]: Serious video quality degradation in WebRTC web conferencing that can hit after other participants join or leave. Likely to affect sites that use the API in a Plan-B compatible manner.

Thanks Nazar for the jsbin! I've instrumented it in https://jsfiddle.net/jib1/caznsg28/3/ and was able to find the regression range.

Workaround

Uncommenting the following line in the fiddle near the bottom makes the problem go away for me:

//    pc1.getTransceivers().find(tc => tc.sender == sender).stop();

You'll also need this modification in your original jsbin to make the video still disappear when removed:

  event.track.onended =
  event.streams[0].onremovetrack = () => document.body.removeChild(video);

To explain the workaround: In Unified Plan, pc.removeTrack(sender) is insufficient to make senders and their receivers on the other end disappear entirely. Instead, it merely nulls the sender.track, and mutes the receiver.track and temporarily removes it from its stream(s).

To remove the sender and receiver entirely, and end the receiver.track, transceiver.stop() must be called.

Sorry about this inconvenience! We hope this workaround may help until we can fix this performance regression.

Mjf, what do we think is going on here? This is happening to regular non-simulcast video.

Severity: -- → S2
Status: UNCONFIRMED → NEW
Has Regression Range: --- → yes
Has STR: --- → yes
Ever confirmed: true
Flags: needinfo?(mfroman)
Keywords: regression
Priority: -- → P2
Regressed by: 1674463
Flags: needinfo?(docfaraday)

Quality is still very bad even with suggested workaround and tracks/videos removed in your jsfiddle.
Just move from right to left in front of the camera, FPS will go down significantly. That is the primary issue I'm concerned with.

Hi Nazar, note in JSFiddle you have to hit the ▶ Run button in the upper left after uncommenting the line, unlike JSBin.

Are you still seeing the frame rate drop to half with the workaround in https://jsfiddle.net/jib1/caznsg28/4/ ?

I'm waving my hand back and forth in front of my face, while I

  1. click add video track
  2. observe Frame rate:
  3. click remove video track
  4. goto 1

In comment 3, I'm seeing output: Frame rate: ~30 fps, ~30 fps, ~15 fps, ~15 fps.
In comment 5, I'm seeing output: Frame rate: ~30 fps, ~30 fps, ~30 fps, ~30 fps.

Just making sure we're seeing the same thing.

Flags: needinfo?(nazar)

Too late for a fix in 86 as we are past betas.

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #3)

Mjf, what do we think is going on here? This is happening to regular non-simulcast video.

My first guess is that since the mitigation for Bug 1674463 involved lowering the min bitrate for HD, a different (higher) resolution is picked. If stopping the transceiver makes the problem better, it may that multiple higher-resolution video tracks are continuing to fight for available bandwidth in the non-stop case.

I'll try to add some ad-hoc logging on my end to see if this theory holds up.

Jan, I think it is potentially a different issue, but yes, the way you describe it does indeed work fine.
But if you add 4 videos quality degrades permanently even if you start removing them afterwards.

Flags: needinfo?(nazar)

Wow, you're right. Even with the comment 5 fiddle, pressing add video track 4 times, used to produce 4 smooth videos before the regression:

Frame rate: 29.08
Frame rate: 28.85
Frame rate: 28.92
Frame rate: 28.92

...whereas now it's abysmal:

Frame rate: 9.02
Frame rate: 7.98
Frame rate: 8.98
Frame rate: 8.06

Interestingly, pressing remove video track 3 times does not recover the frame rate.

Clicking remove video track a fourth time, waiting a bit and then pressing add video track seems to recover it, but not reliably.

Thanks for confirming!

Most of Firefox users have this issue in our https://restream.io/studio product caused by the nature of complex functionality we offer there.
If there is a workaround or a way to speed-up solution on this, I'd be very thankful for that!

(In reply to Michael Froman [:mjf] from comment #7)

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #3)

Mjf, what do we think is going on here? This is happening to regular non-simulcast video.

My first guess is that since the mitigation for Bug 1674463 involved lowering the min bitrate for HD, a different (higher) resolution is picked. If stopping the transceiver makes the problem better, it may that multiple higher-resolution video tracks are continuing to fight for available bandwidth in the non-stop case.

I'll try to add some ad-hoc logging on my end to see if this theory holds up.

Well, resolution was the wrong thing to key on here, but I suppose it makes sense that if our min bitrate is allowed to drop to 200kbps (whereas previously only allowed to drop to 600kbps), the frame rate will drop for a given resolution. I've run both cases (with the patch from Bug 1674463 included and not), and we're choosing the same line from kResolutionAndBitrateLimits[1]. Given that, I guess the question becomes what is causing the available bandwidth to decrease?

[1] https://searchfox.org/mozilla-central/source/dom/media/webrtc/libwebrtcglue/VideoStreamFactory.cpp#34

Flags: needinfo?(mfroman)

To followup, we can also consider raising the minimum bitrate on that line, with a potential degradation in our simulcast switching performance vs chrome when using jitsi.

what is causing the available bandwidth to decrease?

The bandwidth can't be decreasing because before the patch we observe the same code sending 4 videos at 30 fps without problem.

Also, at the end of comment 9, we've called pc.removeTrack and transceiver.stop() on all but one transceiver, and renegotiated. All transceivers but one have transceiver.sender.track == null at this point, so there's no real competition for actual bandwidth, yet the remaining video is still transmitting at poor quality and low frame rate. There's also no simulcast being done, so why is simulcast logic kicking in?

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #13)

what is causing the available bandwidth to decrease?

The bandwidth can't be decreasing because before the patch we observe the same code sending 4 videos at 30 fps without problem.

I guess I misspoke - something appears to be acting as if bandwidth is lower. The previous minimum allowed bandwidth was set to 600k, which explains why the 4 videos continue to work at ~30fps without issues. The new lower minimum is allowing the framerate to drop too low.

Also, at the end of comment 9, we've called pc.removeTrack and transceiver.stop() on all but one transceiver, and renegotiated. All transceivers but one have transceiver.sender.track == null at this point, so there's no real competition for actual bandwidth, yet the remaining video is still transmitting at poor quality and low frame rate. There's also no simulcast being done, so why is simulcast logic kicking in?

Since I don't have that much experience with the guts of simulcast I'm not sure if this is distinctly simulcast logic or not. I thought the simulcast logic dealt with switching streams as the available bandwidth changed. Since we're not doing simulcast here, I don't know why the encoder would suddenly start using less bandwidth unless it is receiving input that would indicate that it should do so.

I'm going to try to get some info on what is driving the used bandwidth down.

Assignee: nobody → mfroman
Flags: needinfo?(docfaraday)

Progress report:
After a bit of debugging, I can verify that something is driving the reported bitrate down to the minimum allowed for the stream. VideoSender::SetChannelParameters receives a target_bitrate_bps, and that results in VP8EncoderImpl::SetRateAllocation called with an updated BitrateAllocation. This happens with the 3rd click of "Add video track" whether or not other tracks have been removed. I do not ever see the bitrate go above the minimum from that point on.

I'll keep digging.

Progress report:
The last call to SendSideCongestionController::MaybeTriggerOnNetworkChanged after the send "Add video track", we see bitrate: 4,852,111. Suspiciously when we click "Add video track" the 3rd time, the next call to SendSideCongestionController::MaybeTriggerOnNetworkChanged we see bitrate: 2,426,055, exactly half the previous value. The values fall quickly from there to a number (10,000) far below the minimum allowed bitrate (200,000) for the stream.

Progress report:
Upon adding the 3rd video track, the call to DelayBasedBwe::IncomingPacketFeedbackVector is triggering the delayed_feedback code path here[1], which then halves the estimated bandwidth here[2]. This pattern (5 calls in a row with delayed_feedback) continues permanently from this point forward.

[1] https://searchfox.org/mozilla-central/rev/e84baf5c730abd453be5d50dcb0aba1e743bac47/third_party/libwebrtc/webrtc/modules/congestion_controller/delay_based_bwe.cc#157-161
[2] https://searchfox.org/mozilla-central/rev/e84baf5c730abd453be5d50dcb0aba1e743bac47/third_party/libwebrtc/webrtc/modules/congestion_controller/delay_based_bwe.cc#176

On a lark, I allowed more (up to 20) consecutive delayed feedbacks here[1]. I see a very interesting pattern emerge. With each remove/add track cycle after adding the first track, the number of consecutive delayed feedbacks increases by 2.
Example:

click "Add video track"
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 1 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 1 consecutive_delayed_feedbacks_
<snip>
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 1 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 1 consecutive_delayed_feedbacks_
click "Remove video track"
click "Add video track"
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 3 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 3 consecutive_delayed_feedbacks_
<snip>
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 3 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 3 consecutive_delayed_feedbacks_
click "Remove video track"
click "Add video track"
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 5 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 5 consecutive_delayed_feedbacks_
<snip>
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 5 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 5 consecutive_delayed_feedbacks_
click "Remove video track"
click "Add video track"
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 7 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 7 consecutive_delayed_feedbacks_
<snip>
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 7 consecutive_delayed_feedbacks_
DelayBasedBwe::IncomingPacketFeedbackVector calling MaybeUpdateEstimate after 7 consecutive_delayed_feedbacks_

[1] https://searchfox.org/mozilla-central/rev/e84baf5c730abd453be5d50dcb0aba1e743bac47/third_party/libwebrtc/webrtc/modules/congestion_controller/delay_based_bwe.cc#158

With each remove/add track cycle we're making 2 more calls to TransportFeedbackAdapter::GetPacketFeedbackVector right on top of each other, as in with the same (or off by 1) clock_->TimeInMilliseconds() value.

It appears that after each additional remove/add track cycle, we have an additional WebrtcVideoConduit object. However each of the WebrtcVideoConduit objects is using the same Call object.

Each WebrtcVideoConduit's ReceivedRTCPPacket is called twice with the same packet (data ptr and len). We end up with 3 WebrtcVideoConduit objects, each calling DeliverPacket on the same Call object twice. We get one execution of DeliverPacket that works correctly, and then each of the other 5 calls fails to lookup the send time for the packets in the feedback vector, which looks like a consecutive_delayed_feedbacks_. Our limit for consecutive_delayed_feedbacks_ is 5. This in turn trips the OnLongFeedbackDelay call which halves the reported available bandwidth.

I'm not sure if the 2 WebrtcVideoConduit objects the correspond to removed tracks should continue to receive RTCP packets, but this is definitely a combination of issues.

Is there a chance to have some solution for this in time for Firefox 87?
We had to add a large banned discouraging people from using in the meantime :(

I suspect that fixing this will require fixing bug 1654112. If that is the case, there's no way this will be in 87. If we can fix this without bug 1654112, it will be pretty hard to do this in 87, since we're coming up on the release date, and doing a beta uplift that close to release is hard to justify, especially if it is already broken on release for a couple of cycles at least.

fix-optional for 87 based on comment 24, but we'll keep tracking it for 88.

See Also: → 1694610

Is there a way to sponsor or somehow differently accelerate fixes for this and some other WebRTC-related issues?

(In reply to Byron Campen [:bwc] from comment #24)

I suspect that fixing this will require fixing bug 1654112. If that is the case, there's no way this will be in 87. If we can fix this without bug 1654112, it will be pretty hard to do this in 87, since we're coming up on the release date, and doing a beta uplift that close to release is hard to justify, especially if it is already broken on release for a couple of cycles at least.

Based on some added instrumentation, it appears this problem is fixed with the updates coming in bug 1654112 (at least on a linux build). I'm in the process of trying to pull and build for macOS.

I've confirmed the fix on macOS as well.

If this depends on the libwebrtc update to be fixed with no other short-term options in the mean time, we're at least a cycle or two away still from seeing a fix here.

Bug 1654112 has landed and I've verified this now works as expected on Nightly.

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1654112
You need to log in before you can comment on or make changes to this bug.