Increasing static noise makes WebRTC-based apps unusable in Firefox (CPU uptick 80%)

RESOLVED INVALID

Status

()

defect
P3
normal
Rank:
35
RESOLVED INVALID
2 years ago
2 years ago

People

(Reporter: avasilko, Unassigned)

Tracking

56 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.38 Safari/537.36

Steps to reproduce:

Reproducible with Firefox v57 and v56
Good wifi network was used for this test

1. Start a call between 2 clients (each shares audio and video) using any webrtc-based application (ex: https://appr.tc/)
2. Let the call happen for some time, keep checking audio and video quality





Actual results:

Audio is fine at first, then gets some static noise which increases over time. Eventually audio becomes unbearable, Firefox CPU consumption raises up to 70-80% and the whole system starts lagging.


Expected results:

Call continues with good quality of audio and video
Component: Untriaged → WebRTC: Audio/Video
Product: Firefox → Core
Anna, do you see this across all platforms (linux/mac/windows) or is it limited to one? Could you give a hint to what "some time" means?

If you have the possibility, when you hear the bad quality, could you unmute *local* audio and see whether the quality of that is bad too, perhaps even recording it?

Thanks!

I'm making this a P1 until we know more.
Rank: 5
Flags: needinfo?(avasilko)
Priority: -- → P1
Nils, does this sound similar to reports you looked into in the past about high cpu use?
Flags: needinfo?(drno)
Hi Anna, thanks for filing! This only got triaged to the right team today. Sorry about that.

Can you do us a favor and try Firefox 55 to see if it happens there? Knowing if this is a regression would help us tremendously in prioritizing a rapid fix for this issue.

There's also a regression tool we use internally to find a regression range, http://mozilla.github.io/mozregression/ if you're up for it.

But just knowing whether it worked in 55 is the most important. Thanks!
How long does this take to build up?
What hardware?  What OS?  Do activity-monitoring tools show any increase in memory or cpu use?  (I presume from the 70-80% remark it does and that it's most likely Windows).  I presume these are two different machines, not two tabs on one machine?  Does it happen in a Firefox<->Chrome call?  If so, which end?

thanks!
Hi,

Here are some answers.
For now seeing it on several Mac machines, have not tried other OS yet.
The problem was reported by our customers just recently, but when we tried reproducing we were able to do so with Firefox v57, v56, v55. Surprised It did not come out before. It's 100% reproducible for us and our customers.

Participants of video call are on different machines. The problems happens only on Firefox end. Usually at the beginning of the call everything is fine, after ~3 minutes you start hearing some audio noise in Firefox, in 5 minutes from the beginning of the call audio becomes unrecognizable in Firefox. This seems to happen faster or slower depending on resolution of the media in the call. Mentioned timing is when remote participant was sending HD video to FF participant.

During the call I see spikes in CPU and energy consumption by Firefox.
Flags: needinfo?(avasilko)
Thanks Anna! We're going to try to reproduce. Were these calls from another browser (e.g. Chrome) to Firefox? Were they 1-1 calls, or large participant meetings? You mentioned https://appr.tc/ earlier, is that how you reproduced this in comment 6 as well?

Could you try it on 58 as well? http://nightly.mozilla.org/ There's a chance the CPU problem has improved there.

Thanks.
Flags: needinfo?(avasilko)
I tried to repro with Fx 56 on Mac and Chrome or Fx 58 in Win 10 going into one room on appr.tc. But to me the audio sounds just fine even after 15min or so.

I guess we are still missing some important information from the setup on how to repro.
Flags: needinfo?(drno)
Do you use a headset or other external camera or microphone? If yes can you mention the brand/model?
Hi,
It is reproducible with no headset or external camera/mic.

The easiest way to reproduce is with this demo app: http://test-rtc-demo-pp.herokuapp.com/p2p-room-prod
Simply do the 1 on 1 call between Chrome and Firefox on Mac when everyone shares video+audio and you will start hearing the noise in 3 minutes or so.

For some reason with https://appr.tc/ I hear some static noise occasionally but it does not get as bad. 

I'll try with v58 later on.
Flags: needinfo?(avasilko)
I've reproduced comment 10 using Chrome 61.0.3163.100 <-> Firefox 58.0a1 (2017-10-05) (64-bit) by letting it sit for ~8 minutes.

In trying to localize the problem I filed bug 1407088 containing instant symptoms. I'll try to match the regression range next.

Fixing this is of the highest priority.

Anna, thanks again for filing this! Is the symptom isolated to Macs for you?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(avasilko)
Comment 10 also repros for me in two FF tabs (Nightly) in 10 mins, though https://jsfiddle.net/jib1/80xcgwmc/ in two tabs didn't.
This one is older than bug 1407088 and reproduces back to at least 2017-03-01 but no earlier than 2016-11-01 (takes a while to determine range when repro steps take 10-15 mins).

I even saw the problem bleed over to a YouTube video which was running at half speed with audio stutter and 220% cpu. It magically corrected when I closed the two WebRTC tabs.
I can also reproduce like comment 10 in two Nightly tabs. One with a real camera and one with our internal fake devices. I will try to capture this in rr.
I see a lot of streams being added to MediaStreamGraph. That could lead to overloading the audio callback and thus audio artifacts and high cpu usage. I'll check where they come from.
Anna, in twilio-video.min.js loaded on http://test-rtc-demo-pp.herokuapp.com/p2p-room-prod I see that you have code like this:

> FirefoxRTCPeerConnection.prototype.getLocalStreams = function getLocalStreams() {
>   if (this._isClosed) {
>     return []
>   }
>   if (!this.getSenders) {
>     return this._peerConnection.getLocalStreams()
>   }
>   var stream = new MediaStream;
>   this.getSenders().forEach(function (sender) {
>     var track = sender.track;
>     if (track) {
>       stream.addTrack(track)
>     }
>   });
>   return [stream]
> };
> FirefoxRTCPeerConnection.prototype.getRemoteStreams = function getRemoteStreams() {
>   if (this._isClosed) {
>     return []
>   }
>   if (!this.getReceivers) {
>     return this._peerConnection.getRemoteStreams()
>   }
>   var stream = new MediaStream;
>   this.getReceivers().forEach(function (receiver) {
>     var track = receiver.track;
>     if (track) {
>       stream.addTrack(track)
>     }
>   });
>   return [stream]
> };

The file is really minified though so I am unable to hit breakpoints on these lines, but I see in Firefox that the MediaStream constructor is invoked *a lot* and this is the only presence of `new MediaStream` the js debugger could find.

Could you change this code to either
1 use a cached MediaStream instead of recreating one on demand in the getters, or
2 avoid polling getLocalStreams/getRemoteStreams if you are

If you could do this and can can still reproduce, please check that whether there is similar MediaStream activity to now by enabling logging; from command line:
> MOZ_LOG=timestamp,MediaStream:5 /path/to/firefox

If there is no log spam like the below, it would seem you have resolved the MediaStream churn. If you at that point can still reproduce I'm happy to look at an updated demo page. Thanks!


Example of logs that I see:
> 2017-10-10 09:50:38.590382 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Initiated with mInputStream=(nil), mOwnedStream=(nil), mPlaybackStream=0x7f12e9b670c0
> 2017-10-10 09:50:38.590620 UTC - [Main Thread]: I/MediaStream DOMMediaStream 0x7f12d8998960 Adding track 0x7f1301889000 (from stream 0x7f1301e6d370 with ID 2)
> 2017-10-10 09:50:38.590632 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 saw a track get added. Recomputing principal.
> 2017-10-10 09:50:38.590637 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Recomputing principal. Old principal was (nil).
> 2017-10-10 09:50:38.590641 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Taking live track 0x7f1301889000 with principal 0x7f1300f72af0 into account.
> 2017-10-10 09:50:38.590647 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 new principal is 0x7f1300f72af0.
> 2017-10-10 09:50:38.590651 UTC - [Main Thread]: I/MediaStream DOMMediaStream 0x7f12d8998960 Principal changed. Now: null=0, codebase=1, expanded=0, system=0
> 2017-10-10 09:50:38.590659 UTC - [Main Thread]: I/MediaStream DOMMediaStream 0x7f12d8998960 NotifyActive(). 
> 2017-10-10 09:50:38.590664 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Added track 0x7f1301889000
> 2017-10-10 09:50:38.590717 UTC - [Main Thread]: I/MediaStream DOMMediaStream 0x7f12d8998960 Adding track 0x7f1301889400 (from stream 0x7f1301e6d370 with ID 1)
> 2017-10-10 09:50:38.590728 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 saw a track get added. Recomputing principal.
> 2017-10-10 09:50:38.590733 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Recomputing principal. Old principal was 0x7f1300f72af0.
> 2017-10-10 09:50:38.590737 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Taking live track 0x7f1301889000 with principal 0x7f1300f72af0 into account.
> 2017-10-10 09:50:38.590742 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Taking live track 0x7f1301889400 with principal 0x7f1300f72af0 into account.
> 2017-10-10 09:50:38.590747 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 new principal is 0x7f1300f72af0.
> 2017-10-10 09:50:38.590752 UTC - [Main Thread]: I/MediaStream DOMMediaStream 0x7f12d8998960 Principal changed. Now: null=0, codebase=1, expanded=0, system=0
> 2017-10-10 09:50:38.590760 UTC - [Main Thread]: D/MediaStream DOMMediaStream 0x7f12d8998960 Added track 0x7f1301889400

Seeing one "MediaStream DOMMediaStream 0x\d+ Initiated" means one MediaStream was constructed. At the point of my debugging I saw ~10 MediaStreams constructed per second, but I cannot rule out that this number is increasing, since issues progressively get worse.
Andreas,
Thank you for your detailed analysis and suggestions! 
Our team will try cutting down on MediaStream creation. Will keep you posted with results of testing with this change.
Flags: needinfo?(avasilko)
Andreas,

Thanks very much for analyzing twilio-video.min.js. I'm still trying to isolate
the issue without using this library. There are some other places where
MediaStreams are constructed, besides in that shim. For example, the library
provides

1. API for detecting start of a MediaStreamTrack

   Implemented by setting the srcObject of an HTMLMediaElement to a new
   MediaStream containing only the MediaStreamTrack and then listening for the
   "canplay" event (HTMLMediaElement is not attached to DOM).

2. API for detecting dimension changes of a video MediaStreamTrack 

   Implemented by setting the srcObject of an HTMLVideoElement to a new
   MediaStream containing only the MediaStreamTrack and then listening for the
   "loadedmetadata" and "resize" events (HTMLVideoElement is not attached to
   DOM).

3. API for attaching a MediaStreamTrack to the DOM

   Implemented by setting or updating the srcObject of an HTMLMediaElement to a
   MediaStream containing the MediaStreamTrack. Distinct MediaStreamTracks can
   be attached to the same or different HTMLMediaElements (one MediaStream per
   HTMLMediaElement).

These APIs aren't reusing MediaStreams between each other. Also, the library
creates a new MediaStream per RTCPeerConnection it negotiates, and adds
MediaStreamTracks to that. So, in a 2-party video chat with an audio and video
MediaStreamTrack per participant, using all these APIs on local and remote
MediaStreamTracks, you could get up to 10 MediaStreams per participant (not
including those returned by getUserMedia and the "track" event from
RTCPeerConnection, and not including those constructed by the getLocalStreams
and getRemoteStreams shims, which are easily fixed).
Lowering priority as it appears to be not per se a Firefox issue (even if this results in us optimizing code in Firefox to handle something like this better in the future).
Rank: 5 → 35
Priority: P1 → P3
(In reply to mroberts from comment #18)
> These APIs aren't reusing MediaStreams between each other. Also, the library
> creates a new MediaStream per RTCPeerConnection it negotiates, and adds
> MediaStreamTracks to that. So, in a 2-party video chat with an audio and
> video
> MediaStreamTrack per participant, using all these APIs on local and remote
> MediaStreamTracks, you could get up to 10 MediaStreams per participant (not
> including those returned by getUserMedia and the "track" event from
> RTCPeerConnection, and not including those constructed by the getLocalStreams
> and getRemoteStreams shims, which are easily fixed).

This is fine. I ran into issues when they were in the thousands (really because they were ever-growing, together with bug 
1407542 - which has a nice fiddle for reproducing). On lower end machines I can imagine hitting issues already in the hundreds, but it's speculation on my part.
Also, mroberts, if you can confirm that fixing getLocalStreams/getRemoteStreams works, that'd be great. I'll then close this issue and we can focus on the other two.
Flags: needinfo?(mroberts)
Andreas, thanks again for the additional information. I can confirm that fixing `getLocalStreams`/`getRemoteStreams` worked. Thanks!
Flags: needinfo?(mroberts)
Thanks. I'll close as invalid as its not a Firefox bug as such. We'll keep addressing the two blocking bugs to help with extreme cases similar to this in the future.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.