Closed Bug 906384 Opened 6 years ago Closed 6 years ago

ICE pacing needs to be global

Categories

(Core :: WebRTC: Networking, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox26 --- wontfix
firefox27 --- fixed
firefox28 --- fixed
firefox-esr24 --- wontfix
b2g18 --- unaffected
b2g-v1.1hd --- unaffected
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.4 --- unaffected

People

(Reporter: ekr, Assigned: bwc)

Details

(Keywords: csectype-dos, sec-low, Whiteboard: [qa-][adv-main27-])

Attachments

(2 files, 6 obsolete files)

26.76 KB, text/plain
Details
2.67 KB, patch
bwc
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
Martin Thomson pointed out to me that the Firefox ICE implementation doesn't do global
pacing. This means that an attacker can cause Firefox to generate a lot of attack traffic to a victim by making a lot of peerconnections with a lot of ICE candidates pointing to the victim.

We need to make ICE pacing Firefox-global, thus limiting the total amount of
traffic Firefox generates in total.
I've written a draft on the subject, text version attached.  See also http://martinthomson.github.io/drafts/draft-thomson-mmusic-ice-webrtc.html
I asked Martin to remove it, since this is a security bug.
The fix for this issue will make the issue pretty obvious, and to complicate matters, there is a similar issue in Chrome. Who at Mozilla would be the best person to coordinate with Chrome security on coordinated fixing?
Flags: needinfo?(continuation)
Notes from today's call with Justin and Martin. Instead of doing global pacing, the proposal is to instead put a global "circuit breaker" cap on the maximum amount of STUN traffic you can generate and then (as needed) adjust pacing to the point where legitimate applications should stay under the cap.
Dan probably has a better idea than me.
Flags: needinfo?(continuation)
Flags: needinfo?(dveditz)
-> ekr for now; if implementation needs to go elsewhere we'll reassign
Assignee: nobody → ekr
We'd like to resolve this in the code within the next 2 weeks.  Adding an internal tracking deadline of Sept 6.
The problem isn't implementing it, it's when we can commit it without implicitly disclosing a problem in both ffox and Chrome.
(In reply to Eric Rescorla (:ekr) from comment #9)
> The problem isn't implementing it, it's when we can commit it without
> implicitly disclosing a problem in both ffox and Chrome.

I understand. I'm just asking everyone (everyone who needs to help make this happen) if we can target resolving this issue within the next 2 weeks.  I realize there are lots of dependencies.
> 
> I understand. I'm just asking everyone (everyone who needs to help make this
> happen) if we can target resolving this issue within the next 2 weeks.  I
> realize there are lots of dependencies.

I also realize that it won't be truly "resolved" until we release updates to the field.  So perhaps the better and more precise question to ask everyone is: can we target having a release plan with Chrome to resolve this issue within 2 weeks?
Flags: needinfo?(dveditz)
Per discussion with dveditz, sec-low
Keywords: sec-moderatesec-low
(In reply to Eric Rescorla (:ekr) from comment #9)
> The problem isn't implementing it, it's when we can commit it without
> implicitly disclosing a problem in both ffox and Chrome.

When asked about landing this the Google security team said "Please go right ahead. We never get very excited about DOS class issues."
Assignee: ekr → docfaraday
Status: NEW → ASSIGNED
Comment on attachment 827067 [details] [diff] [review]
(WIP) Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 1.8K/sec over 20 sec.

Review of attachment 827067 [details] [diff] [review]:
-----------------------------------------------------------------

Is this roughly what we're looking for?
Attachment #827067 - Flags: feedback?(ekr)
Add a comment that more clearly explains why the rate limiting is needed.
Attachment #827067 - Attachment is obsolete: true
Attachment #827067 - Flags: feedback?(ekr)
Attachment #827486 - Flags: feedback?(ekr)
Comment on attachment 827486 [details] [diff] [review]
(WIP) Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 1.8K/sec over 20 sec.

Review of attachment 827486 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_socket_prsock.cpp
@@ +395,5 @@
> +    // (see http://tools.ietf.org/html/draft-thomson-mmusic-ice-webrtc)
> +    // Tolerate rate of 8k/sec, for one second.
> +    static SimpleTokenBucket burst(8192*1, 8192);
> +    // Tolerate rate of 1.8k/sec over twenty seconds.
> +    static SimpleTokenBucket sustained(1843*20, 1843);

I think Martin may have changed his recommendation here to just be 100kbps overall.

Martin?

@@ +399,5 @@
> +    static SimpleTokenBucket sustained(1843*20, 1843);
> +    if (burst.getTokens(UINT32_MAX) < len ||
> +        sustained.getTokens(UINT32_MAX) < len) {
> +      // TODO(bcampen@mozilla.com): Better error code for this?
> +      ABORT(R_NOT_PERMITTED);

IIRC, R_NOT_PERMITTED causes an error higher up.

I recommend R_WOULDBLOCK or a silent drop.

@@ +403,5 @@
> +      ABORT(R_NOT_PERMITTED);
> +    }
> +
> +    burst.getTokens(len);
> +    sustained.getTokens(len);

You should comment this so it's clear what's going on.

Also, a note that this isn't thread safe would probably be useful.
Attachment #827486 - Flags: feedback?(ekr)
Byron has used the numbers that I recommended in the draft. However, I've talked to some people and a higher limit is probably better for the longer term limit to prevent some valid cases with multiple agents from bumping the cap. Doubling the rate for the slower token bucket would probably be ok.
(In reply to Eric Rescorla (:ekr) from comment #18)
> Comment on attachment 827486 [details] [diff] [review]
> (WIP) Very simple global rate-limiting based on SimpleTokenBucket. Will
> tolerate a maximum of 8K/sec over 1 sec, and 1.8K/sec over 20 sec.
> 
> Review of attachment 827486 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +399,5 @@
> > +    static SimpleTokenBucket sustained(1843*20, 1843);
> > +    if (burst.getTokens(UINT32_MAX) < len ||
> > +        sustained.getTokens(UINT32_MAX) < len) {
> > +      // TODO(bcampen@mozilla.com): Better error code for this?
> > +      ABORT(R_NOT_PERMITTED);
> 
> IIRC, R_NOT_PERMITTED causes an error higher up.
> 
> I recommend R_WOULDBLOCK or a silent drop.
> 

   I was worried that at some point R_WOULDBLOCK would prompt a retry. I guess it wouldn't be too awful if that happened.

> @@ +403,5 @@
> > +      ABORT(R_NOT_PERMITTED);
> > +    }
> > +
> > +    burst.getTokens(len);
> > +    sustained.getTokens(len);
> 
> You should comment this so it's clear what's going on.
> 
> Also, a note that this isn't thread safe would probably be useful.

Ok.
(In reply to Martin Thomson from comment #19)
> Byron has used the numbers that I recommended in the draft. However, I've
> talked to some people and a higher limit is probably better for the longer
> term limit to prevent some valid cases with multiple agents from bumping the
> cap. Doubling the rate for the slower token bucket would probably be ok.

I'll do that.
Attachment #827486 - Attachment is obsolete: true
Attachment #831077 - Flags: review?(ekr)
Comment on attachment 831077 [details] [diff] [review]
Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 3.6K/sec over 20 sec.

Review of attachment 831077 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm

::: media/mtransport/nr_socket_prsock.cpp
@@ +401,5 @@
> +
> +    // Check number of tokens in each bucket.
> +    if (burst.getTokens(UINT32_MAX) < len ||
> +        sustained.getTokens(UINT32_MAX) < len) {
> +      ABORT(R_WOULDBLOCK);

Suggest adding a high level (ERR) log here and an assert. That way if this happens
in the field we will know earlier.
Attachment #831077 - Flags: review?(ekr) → review+
(In reply to Eric Rescorla (:ekr) from comment #23)
> Comment on attachment 831077 [details] [diff] [review]
> Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a
> maximum of 8K/sec over 1 sec, and 3.6K/sec over 20 sec.
> 
> Review of attachment 831077 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> lgtm
> 
> ::: media/mtransport/nr_socket_prsock.cpp
> @@ +401,5 @@
> > +
> > +    // Check number of tokens in each bucket.
> > +    if (burst.getTokens(UINT32_MAX) < len ||
> > +        sustained.getTokens(UINT32_MAX) < len) {
> > +      ABORT(R_WOULDBLOCK);
> 
> Suggest adding a high level (ERR) log here and an assert. That way if this
> happens
> in the field we will know earlier.

Hmm. This would allow someone to crash my browser by spamming candidates; we don't require any user interaction to allow a PeerConnection to spin up (at least, not in the DataChannel-only case).
Attachment #831077 - Attachment is obsolete: true
Add some logging when we drop STUN requests due to the rate limit.
Attachment #832368 - Attachment is obsolete: true
Attachment #832384 - Attachment is obsolete: true
Comment on attachment 832497 [details] [diff] [review]
Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 3.6K/sec over 20 sec.

Review of attachment 832497 [details] [diff] [review]:
-----------------------------------------------------------------

This look about right to you?

https://bugzilla.mozilla.org/attachment.cgi?oldid=832368&action=interdiff&newid=832497&headers=1
Attachment #832497 - Flags: review?(ekr)
Comment on attachment 832497 [details] [diff] [review]
Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 3.6K/sec over 20 sec.

Review of attachment 832497 [details] [diff] [review]:
-----------------------------------------------------------------

Does MOZ_ASSERT generate a log message if it's not in debug mode?

Anyway, feel free to land as-is without rereview.
Attachment #832497 - Flags: review?(ekr) → review+
> Does MOZ_ASSERT generate a log message if it's not in debug mode?

No
Then let's add a separate log to go along with the MOZ_ASSERT.

Byron, please land.
Attachment #832497 - Attachment is obsolete: true
Attachment #8333497 - Flags: checkin?(adam)
Comment on attachment 8333497 [details] [diff] [review]
Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 3.6K/sec over 20 sec.

Review of attachment 8333497 [details] [diff] [review]:
-----------------------------------------------------------------

Carry forward r+ ekr.
Attachment #8333497 - Flags: review+
Comment on attachment 8333497 [details] [diff] [review]
Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 3.6K/sec over 20 sec.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a906959d7743
Attachment #8333497 - Flags: checkin?(adam) → checkin+
https://hg.mozilla.org/mozilla-central/rev/a906959d7743
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Does this need any testing from QA? If so, please advise.
Flags: needinfo?(docfaraday)
Hmm. Probably not. ekr?
Flags: needinfo?(docfaraday) → needinfo?(ekr)
I can provide a test case that someone should run, but I think that Byron is amply qualified to do so.  I don't know if you want to commit that test case to mochitest though.
How far back does this issue go?
Back to the original landing for WebRTC.
Flags: needinfo?(ekr)
This applies cleanly to beta and b2g26. Please request approval for uplift :)
Flags: needinfo?(docfaraday)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #41)
> This applies cleanly to beta and b2g26. Please request approval for uplift :)

What about aurora?
Flags: needinfo?(docfaraday)
Comment on attachment 8333497 [details] [diff] [review]
Very simple global rate-limiting based on SimpleTokenBucket. Will tolerate a maximum of 8K/sec over 1 sec, and 3.6K/sec over 20 sec.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 

   Bug 790517

User impact if declined: 
 
   A user's browser could be induced to perform a DoS attack.

Testing completed (on m-c, etc.):

   The usual CI testing, plus some manual testing.

Risk to taking this patch (and alternatives if risky):

   Fairly low.
 
String or IDL/UUID changes made by this patch:

   None.
Attachment #8333497 - Flags: approval-mozilla-beta?
Attachment #8333497 - Flags: approval-mozilla-b2g26?
It landed on trunk when it was version 28, so Aurora already has the fix.
Attachment #8333497 - Flags: approval-mozilla-beta?
Attachment #8333497 - Flags: approval-mozilla-beta+
Attachment #8333497 - Flags: approval-mozilla-b2g26?
Attachment #8333497 - Flags: approval-mozilla-b2g26+
(In reply to Martin Thomson [:mt] from comment #38)
> I can provide a test case that someone should run, but I think that Byron is
> amply qualified to do so.  I don't know if you want to commit that test case
> to mochitest though.

Eric, please advise.
Flags: needinfo?(ekr)
I think it's fine to commit it as long as it doesn't do something bad as a side effect.
Flags: needinfo?(ekr)
Whiteboard: [qa-]
Keywords: csectype-dos
Whiteboard: [qa-] → [qa-][adv-main27-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.