Closed Bug 1012999 Opened 6 years ago Closed 6 years ago

Ping telemetry when STUN global rate limit is exceeded

Categories

(Core :: WebRTC: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: bwc, Assigned: bwc)

References

Details

Attachments

(2 files, 6 obsolete files)

No description provided.
Blocks: 1013007
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Attachment #8425167 - Flags: review?(ekr)
Attached patch Patch for (obsolete) — Splinter Review
Assignee: docfaraday → ekr
Attachment #8425180 - Flags: review?(docfaraday)
Comment on attachment 8425167 [details] [diff] [review]
When STUN global rate limit is exceeded, record this in telemetry.

Review of attachment 8425167 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is quite what we want, since it pings for each overage packet.

See my new patch.
Attachment #8425167 - Flags: review?(ekr) → review-
Comment on attachment 8425180 [details] [diff] [review]
Patch for

Review of attachment 8425180 [details] [diff] [review]:
-----------------------------------------------------------------

r=mt verbally
Attachment #8425180 - Flags: review?(docfaraday) → review+
Comment on attachment 8425180 [details] [diff] [review]
Patch for

Review of attachment 8425180 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/WebrtcGlobalInformation.cpp
@@ +349,5 @@
>      }
>    }
>  
> +  unsigned char bit_pattern = 0;
> +  if (mozilla::nr_socket_short_term_violations()) {

I don't think this is safe; we're accessing from main, but this is written on STS.
Attachment #8425180 - Flags: review+ → review?(docfaraday)
Good point. Make it std::Atomic<int>
Counter-counter proposal, based on setting a timestamp whenever a rate limit is hit.
Attachment #8425180 - Attachment is obsolete: true
Attachment #8425180 - Flags: review?(docfaraday)
Assignee: ekr → docfaraday
Put my name on the patch, although the waters are pretty muddy by now.
Attachment #8425224 - Attachment is obsolete: true
Comment on attachment 8425226 [details] [diff] [review]
When STUN global rate limit is exceeded, record this in telemetry.

Review of attachment 8425226 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_socket_prsock.cpp
@@ +115,5 @@
>  #include "stun_hint.h"
>  }
>  #include "nr_socket_prsock.h"
>  #include "simpletokenbucket.h"
> +#include "nricectx.h"

I've put the timestamp setter/getter functions in NrIceCtx, to avoid using extern-like function declarations, or picking up nrappkit includes in WebrtcGlobalInformation.

::: media/mtransport/nricectx.h
@@ +49,5 @@
>  // This is a wrapper around the nICEr ICE stack
>  #ifndef nricectx_h__
>  #define nricectx_h__
>  
> +#include <string>

An iwyu fix.

@@ +63,5 @@
>  #include "nsAutoPtr.h"
>  #include "nsIEventTarget.h"
>  #include "nsITimer.h"
>  
> +#include "nricemediastream.h"

Another iwyu fix.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.h
@@ +176,5 @@
>      std::string error;
> +    // A timestamp to help with telemetry.
> +    mozilla::TimeStamp iceStartTime;
> +    // Just for convenience, maybe integrate into the report later
> +    bool failed;

Maybe I'm just lazy, but these seem generally useful.

::: media/webrtc/signaling/src/peerconnection/WebrtcGlobalInformation.cpp
@@ +362,5 @@
> +    rate_limit_bit_pattern |= 1;
> +  }
> +  if (mozilla::nr_socket_long_term_violation_time() >= query->iceStartTime) {
> +    rate_limit_bit_pattern |= 2;
> +  }

Note: Since this is based on timestamps, if multiple PeerConnections run at the same time, and the rate limit is hit, both will get dinged.
Attachment #8425226 - Flags: review?(rjesup)
Attachment #8425226 - Flags: review?(ekr)
Comment on attachment 8425226 [details] [diff] [review]
When STUN global rate limit is exceeded, record this in telemetry.

Review of attachment 8425226 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nr_socket_prsock.cpp
@@ +115,5 @@
>  #include "stun_hint.h"
>  }
>  #include "nr_socket_prsock.h"
>  #include "simpletokenbucket.h"
> +#include "nricectx.h"

Please don't put references to NrIceCtx here. The ownership graph goes the other
way. Put the variables here and have them refed by NrIceCtx

::: media/mtransport/nricectx.cpp
@@ +92,5 @@
> +TimeStamp& nr_socket_short_term_violation_time() {
> +  // On first access, this is initted to the process creation time.
> +  // It can then be updated/accessed like any other static variable.
> +  static bool inconsistent;
> +  static TimeStamp result = TimeStamp::ProcessCreation(inconsistent);

This doesn't appear to be thread-safe either.
Attachment #8425226 - Flags: review?(ekr) → review-
(In reply to Eric Rescorla (:ekr) from comment #10)
> Comment on attachment 8425226 [details] [diff] [review]
> When STUN global rate limit is exceeded, record this in telemetry.
> 
> Review of attachment 8425226 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: media/mtransport/nr_socket_prsock.cpp
> @@ +115,5 @@
> >  #include "stun_hint.h"
> >  }
> >  #include "nr_socket_prsock.h"
> >  #include "simpletokenbucket.h"
> > +#include "nricectx.h"
> 
> Please don't put references to NrIceCtx here. The ownership graph goes the
> other
> way. Put the variables here and have them refed by NrIceCtx
> 

   I guess I can put a passthrough in NrIceCtx.

> ::: media/mtransport/nricectx.cpp
> @@ +92,5 @@
> > +TimeStamp& nr_socket_short_term_violation_time() {
> > +  // On first access, this is initted to the process creation time.
> > +  // It can then be updated/accessed like any other static variable.
> > +  static bool inconsistent;
> > +  static TimeStamp result = TimeStamp::ProcessCreation(inconsistent);
> 
> This doesn't appear to be thread-safe either.

Are you referring to ProcessCreation itself, or accesses to |result|? Looking closer with dxr, I'm not seeing anywhere that actually ends up calling ProcessCreation (it has callers, but those callers aren't called by anything, although I'll need to grep to make sure). It could be completely busted for all I know. I'll stop using it.
accesses to result.
Stop using TimeStamp::ProcessCreation, and shuffle the locations of symbols around.
Attachment #8425226 - Attachment is obsolete: true
Attachment #8425226 - Flags: review?(rjesup)
Ugh. This is still not quire right. The MOZ_ASSERT right before setting the timestamp was causing the code to be elided on debug builds, which meant that our unit-tests were able to build even though the symbols weren't there. New patch in a sec.
Wrap the timestamp setting code in #ifdef, since the prior MOZ_ASSERT was causing it to be elided in the DEBUG case, which happens to coincide with the non-MOZILLA_INTERNAL_API case.
Attachment #8425571 - Attachment is obsolete: true
I tested this out a little by tweaking the limits lower in my working copy, and a tab-to-tab apprtc or talky.io call would run over the limit with a 2K short term bucket pretty reliably. So, it seems plausible that we're seeing this limit reached in the field.
Attachment #8425602 - Flags: review?(rjesup)
Attachment #8425602 - Flags: review?(ekr)
Comment on attachment 8425602 [details] [diff] [review]
When STUN global rate limit is exceeded, record this in telemetry.

Review of attachment 8425602 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm with two nits below. Please resubmit and we will get it landed.

::: media/mtransport/nr_socket_prsock.cpp
@@ +120,5 @@
>  // Implement the nsISupports ref counting
>  namespace mozilla {
>  
> +static TimeStamp nr_socket_short_term_violation_time_;
> +static TimeStamp nr_socket_long_term_violation_time_;

Since these are not members, please remove the _

::: media/mtransport/nricectx.h
@@ +63,5 @@
>  #include "nsAutoPtr.h"
>  #include "nsIEventTarget.h"
>  #include "nsITimer.h"
>  
> +#include "nricemediastream.h"

Not needed.
Attachment #8425602 - Flags: review?(ekr) → review+
Comment on attachment 8425602 [details] [diff] [review]
When STUN global rate limit is exceeded, record this in telemetry.

Review of attachment 8425602 [details] [diff] [review]:
-----------------------------------------------------------------

Jesup's review not needed.
Attachment #8425602 - Flags: review?(rjesup)
Comment on attachment 8425602 [details] [diff] [review]
When STUN global rate limit is exceeded, record this in telemetry.

LGTM too
Attachment #8425602 - Attachment is obsolete: true
Attachment #8425700 - Attachment is obsolete: true
Comment on attachment 8425702 [details] [diff] [review]
When STUN global rate limit is exceeded, record this in telemetry.

Review of attachment 8425702 [details] [diff] [review]:
-----------------------------------------------------------------

Carry forward r=ekr
Attachment #8425702 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/0253f2720564
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.