Closed Bug 1309641 Opened 8 years ago Closed 8 years ago

Multiple pre-answer binding requests result in excessive memory usage

Categories

(Core :: WebRTC, defect, P1)

50 Branch
All
Unspecified
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fixed

People

(Reporter: adam, Assigned: drno)

References

Details

Attachments

(1 file)

Steps to reproduce: 

1) Go to https://output.jsbin.com/lekaco in IE 11
2) Download and install the OpenTok plugin for WebRTC
3) Allow access to your camera in IE 11
4) Go to https://output.jsbin.com/lekaco in Firefox 50

Result: Firefox browser locks up and you have to force quit it


This is happening with an older version of the OpenTok SDK (v2.8) which is using an older version of WebRTC in the IE plugin (version 45). It's possible that the same thing could happen with other clients using an older version of WebRTC, eg. Temasys plugin or iOS/Android SDKs.
Component: Untriaged → WebRTC
Product: Firefox → Core
Rank: 14
Priority: -- → P1
Adam do you know if this happens on a specific version of Windows, or universally across all supported Windows versions?

If nobody gets to this earlier, then I can probably take a look at this next week.
Rank: 14 → 19
I'm able to repro this. FF 50b10 continues to render the local preview video, but on my Mac I get the beach ball of death and FF is not responding any more. I'll investigate.
Assignee: nobody → drno
This is some bizarre ICE issue.
So comparing the behavior between FF 49 and 50 I see the following in our logs:

In case of 49:
- PC1 gets created
- PC1.createOffer() & sLD()
- PC1 gathers and emits all its candidates
- PC1 sets answer via sRD()
- PC1 establishes connection and starts sending RTP
- PC2 gets created
- PC2 sets an offer via sRD()
- PC2 creates answer and sets it via sLD()
- ...

In case of FF >= 50 the JS code on that page behaves quite different:
- PC1 gets created
- PC1.createOffer() & sLD()
- PC1 gathers and emits all its candidates
- PC2 gets created
- PC2 sets offer via sRD()
- at this point FF is eating 100% CPU, because the IE Plugin keeps hammering with STUN binding requests onto one of the ports of PC1
  - this results in FF eating up memory and to lock up

I'll be working on a fix for the memory leak in FF. That seems wrong to me what we do there.

But I think this is partially caused by the different behavior of the TB SDK behaving differently with FF >= 50, plus the fact that something is broken in OpenTok's ICE stack (libjingle or what ever it uses for STUN/ICE) which hammers in DoS way onto our STUN port. I'm not convinced that it is worth trying to build some kind of DoS protection into our ICE stack.
The attached patch fixes the memory leak in Firefox when the OpenTok IE plugin starts bombarding Fx with binding requests.
But the browser still locks up if it's running without e10s (a.k.a. Multi-Process support).

If you turn on Multi-Process support (users can manually turn that on and off only in Developer Edition and Nightly, but it's available in Beta and Release as well) then there is no lockup what so ever. With Multi-Process support turned on I think you can nicely see that this is actually some problem in the OpenTok SDK or the signaling, because about:webrtc in that case shows one PeerConnection which generated an offer and gathered all candidates, but never gets an answer or remote candidates. The reason Firefox does not lock up with Multi-Process support is that the binding requests from IE never make into our ICE stack (because we never received any remote candidates which would open the internal packet filter for these requests).
Comment on attachment 8805225 [details]
Bug 1309641: only store a single pre-answer request per 5 tuple.

https://reviewboard.mozilla.org/r/89002/#review88162

::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:1152
(Diff revision 1)
> +    int _status;
> +    nr_ice_pre_answer_request *r1, *r2;
> +    nr_transport_addr dst_addr;
> +
> +    if (r=nr_socket_getaddr(sock, &dst_addr))
> +      ABORT(r);

Seems like this error will get eaten. Maybe we should handle the enqueue in here and name it something else, so an actual error will get noticed.
Attachment #8805225 - Flags: review?(docfaraday) → review-
Found out that the ICE DoS attack is not only problem. Stay tuned...
Depends on: 1313527
So the real bug which causes the hang is bug 1313527, which got introduced in Fx 50 through bug 1275360.

Thanks Adam for the report before 50 gets into release!

But as usual the bad bugs are accumulation of multiple failures: the root cause here is that the OpenTok SDK actually generates broken SDP. Firefox receives an SDP answer from IE with the following line:

 'a=fmtp:111 minptime=10\n'

Note: it is missing the mandatory '\r\n' at the end and instead only has a '\n'. So would be helpful if you guys could fix that on your end as well.
Summary: Firefox 50 browser locks up when using WebRTC with IE 11 using OpenTok → Multiple pre-answer binding requests result in excessive memory usage
Comment on attachment 8805225 [details]
Bug 1309641: only store a single pre-answer request per 5 tuple.

https://reviewboard.mozilla.org/r/89002/#review88462
Attachment #8805225 - Flags: review?(docfaraday) → review+
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/bcccdc43232b
only store a single pre-answer request per 5 tuple. r=bwc
https://hg.mozilla.org/mozilla-central/rev/bcccdc43232b
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Nils, is this something you might want to uplift to 51? Or did the patch for bug 1313527 fix this in 50/51 already? Thanks.
Flags: needinfo?(drno)
Thanks for checking Liz.
Bug 1313527 fixed the real issue of an endless loop on main thread. The likeliness of hitting the problem which is solved by this patch here is so low that I think we should let this one just ride the trains.
Thank you!
You need to log in before you can comment on or make changes to this bug.