Closed Bug 1757722 Opened 3 years ago Closed 2 years ago

Resolution adaption of RTCRtpSenders within recreatable time frames every 5 days for around 12 hours

Categories

(Core :: WebRTC, defect, P3)

Firefox 97
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: fh, Unassigned)

References

Details

(Keywords: regression, regressionwindow-wanted)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

Summary

  • Our webrtc application is relying on adaption preference "maintain-resolution"
  • This is working well in general, except that Firefox seems to change to resolution adaption within recreatable time frames every 5 days for around 12 hours
  • The timed condition seems to depend on client (!) system time and can be recreated by adjusting it into a time frame of occurrence

Observation

We observe an reoccurring issue with WebRTC video streams sent by Firefox browsers, versions 96, 97 and 98 beta. At least operating systems Windows and Mac Os are affected.
Firefox degrades the encoded resolution of a sending PeerConnection and reduces the framerate significantly. Codec used is H264.
The issue is seemingly triggered every 5 days for a duration of roughly 12 hours.
We observed it starting at about 23:12:00 UTC, in the nights from 9th-10th, 14th-15th, 19th-20th, 24th-25th of February and 1st-2nd of March.

Further debugging showed to us that Firefox triggers resolution downscaling during those timeframes.

Please see an according log below:

[Child 18660: WebrtcCallThread #1]: D/webrtc_trace (rtcp_receiver.cc:1125): Incoming PLI from SSRC 0
[Child 18660: WebrtcWorker #1]: D/webrtc_trace (video_stream_adapter.cc:475): Scaling down resolution, max pixels: 552960
[Child 18660: WebrtcWorker #1]: D/webrtc_trace (video_stream_encoder_resource_manager.cc:505): Downgrade counts: fps: {quality:0cpu:0}, resolution {quality:0cpu:1}
[Child 18660: WebrtcWorker #1]: D/webrtc_trace (video_stream_encoder.cc:1787): Updating sink restrictions from EncoderUsageResource to { max_pixels_per_frame=552960 }
[Child 18660: WebrtcWorker #1]: D/webrtc_trace (resource_adaptation_processor.cc:229): Resource "EncoderUsageResource" signalled kOveruse. Adapted down successfully. Unfiltered adaptations: { res=1 fps=0 }
[Child 18660: WebrtcCallThread #1]: D/webrtc_trace (video_source_sink_controller.cc:76): Pushing SourceSink restrictions: max_fps=30 max_pixel_count=552960 target_pixel_count=null
[Child 18660: WebrtcWorker #1]: D/webrtc_trace (overuse_frame_detector.cc:677):  Frame stats:  encode usage 223 overuse detections 1 rampup delay 40000

Reproduction

Issue can be reproduced by setting the machines local clock to one of the above dates and start a stream, eg:

  • set local system time to the equivalent of March 2nd 1:00am UTC
  • set system environment parameters in system control panel:
    • MOZ_LOG=timestamp,webrtc_trace:65535,MediaManager:5,MediaPipeline:65535,PlatformEncoderModule:65535
    • MOZ_LOG_FILE=c:\webrtc.log
  • open firefox, about:webrtc -> Start Debug Mode

When starting to send a stream, the resolution of a RTCRtpSender will degrade after a short period of time, often after 30 seconds already.
We triggered the issue succesfully with different resolutions, 640x480, 720p, 1080p.

We tested on:

  • Intel MacBook, Monterey, Intel Iris plus graphics 655
  • Windows 10, GeForce GTX 1070
  • Windows 10, Intel HD Graphics

Actual results:

Results

Video resolution and framerate degrade/adapt shortly (30 seconds to some minutes into streaming) within recreatable time frames every 5 days for around 12 hours.

Workaround

Issue can currently only be avoided by setting "media.webrtc.platformencoder" to "false" in "about:config".
But to our understanding this will disable GPU/HW encoding, which is a sub optimal workaround.

Expected results:

Assumption

The dependency on client system time has moved the focus to Firefox itself.
Can you please confirm or rule out that the periodic changes in adaption behavior are related to some sort of scheduled event, e.g. beta test, field trial, matching the reported time frames and the interval of 5 days of occurrence?

The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Graphics
Product: Firefox → Core
Component: Graphics → WebRTC

Thanks Finn for the detailed report! — Could you do me a favor and test Firefox 95? We did a major libwebrtc update in 96, and if this is a regression in behavior it would significantly improve our chances of locating the problem and prioritizing a fix.

Our webrtc application is relying on adaption preference "maintain-resolution"

Firefox does not yet implement degradationPreference, so if this isn't a regression, this may be a duplicate of bug 1329847.

But I'd like to understand better what is going on, and answer your questions.

The timed condition seems to depend on client (!) system time and can be recreated by adjusting it into a time frame of occurrence

That certainly seems bizarre. What services are you able to reproduce this with? Is it using simulcast? Does it reproduce on a LAN or local loop
connection? Have you replicated this on multiple machines?

Also, what does speedtest say your upload speed is? If it is limited then Firefox is sharing that with any other networking apps on the local system, or even cron jobs, and Firefox will reduce resolution if "estimated bandwidth" (observable in about:webrtc) is low due to these other activities. I'm not aware of any studies being run within Firefox itself, but I'm on the WebRTC team so I wouldn't know for sure.

set local system time to the equivalent of March 2nd 1:00am UTC

To follow these instructions, I tried March 1st 8PM since I'm Eastern US. Do you have reason to believe it is relative to UTC rather than client local time in the local timezone?

I wasn't immediately able to reproduce, but can try again after the weekend.

Flags: needinfo?(fh)

Hey Jan-Ivar,

thanks for your reply!

We run a global broadcasting service (the nanoStream cloud). A core component, the nanoStream Webcaster, depends on the H264 codec and a fixed sender resolution. It does not utilize simulcast and uses only one video sender and/or one audio sender.

We also checked Firefox 95 and it is not happening there. So we assume there is a correlation with updates on Firefox 96, possibly libwebrtc, but we can not say for sure. Apart from logging it with Firefox debug logs on client side and seeing the effect (and logs) on server side, we could not nail down the cause for this.

We could exclude certain things as we tested on different platforms at different locations:

  • not on Firefox 95
  • not a network bandwidth issue
  • not a system load issue, verified by checking system stats before and during time frame of occurrence
  • as reported condition lasts for ~12h, not a system cron job or similar

We are currently collecting additional information regarding your questions and I will follow up further in the coming days.

Flags: needinfo?(fh)

Thanks for confirming it works in 95. While this points to a regression from bug 1654112, it would help tremendously if you'd be able to run the MozRegression tool to narrow down the regression range to confirm this, since I'm still unable to reproduce.

Flags: needinfo?(fh)
QA Whiteboard: [qa-regression-triage]
Severity: -- → S4
Priority: -- → P3

I would like to investigate this forward, but I don't seem to be able to reproduce it with the information provided so far. If I turn the date behind (to March 2nd), most pages won't even load anymore.

"The issue is seemingly triggered every 5 days for a duration of roughly 12 hours.
We observed it starting at about 23:12:00 UTC, in the nights from 9th-10th, 14th-15th, 19th-20th, 24th-25th of February and 1st-2nd of March."
Q1. Should I schedule the test so that I synchronize with the supposed future time range it reproduces?
According to the arguments above, the next time range it would reproduce is between April 30th and May 1st (weekend), then May 5th-6th.

Furthermore, I think there is a large number of prerequisites to be set before testing it and I'm not sure how to make them:
Q2. Which webrtc webapp should I be using to reproduce this issue consistently?
Q3. Do I have to force the use of the H264 codec? How?
Q4. How do I literally compare video stats (resolution, frame rate, video quality) to determine whether the issue reproduces or not?
Q5. How do I set these and how do I revert the change after the test?
set system environment parameters in system control panel:
MOZ_LOG=timestamp,webrtc_trace:65535,MediaManager:5,MediaPipeline:65535,PlatformEncoderModule:65535
MOZ_LOG_FILE=c:\webrtc.log
Q6. Is there any other information that might help me (a rather less technical person) reproduce and investigate this bug?

Thank you for your contribution, Finn!

Redirect a needinfo that is pending on an inactive user to the triage owner.
:mjf, since the bug has recent activity, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(fh) → needinfo?(mfroman)

I think we'll have to mark this as incomplete since we cannot reproduce locally and are still waiting for followup info from the reporter.

Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(mfroman)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.