Closed Bug 970734 Opened 6 years ago Closed 6 years ago

Facilitate reporting of WebRTC Connectivity Failures

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: abr, Assigned: bwc)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [p=3, ft:webrtc])

Attachments

(2 files, 7 obsolete files)

8.40 KB, patch
bwc
: review+
bwc
: checkin+
Details | Diff | Splinter Review
8.84 KB, patch
bwc
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
We need to make sure all the tools necessary to diagnose connectivity failures are exposed in an easy-to-access fashion (e.g., about:webrtc), and to ensure that the process for gathering and reporting this information in the wake of a connectivity failure is well documented (e.g., on a wiki page) in a way that developers and partners are aware of.
Blocks: 970426
I wonder if dumping "Go look at about:webrtc" into the JS logs when stuff goes wrong would have the effect of leading devs there.
Assignee: nobody → docfaraday
Blocks: 970732
so it looks like what we need here is a) an easy way to copy all the data for pasting into a report/etc (ala the Troubleshooting page) and a wiki page (or mdn page) documenting what to do (and how to read the data for diagnosing it yourself - perhaps an example with screenshots of a wireshark trace and that that maps to (totally open, and we can start simpler and then add more)
Ok, so the first thing I will do here is have PeerConnection.js dump "ICE failed, for more information see about:webrtc" into the JS console when ICE fails. That will catch the overwhelming majority of web developers who need to be pointed in that direction, m;uch better than a wiki page will. A follow up might include a wiki page explaining what some of the ICE terminology means, possibly linked from about:webrtc.
Can we also clone the "copy to clipboard" button from about:support?  (and include about:support info in there as well, or at least include OS, OS version, and Firefox version/data/buildid?)
(In reply to Randell Jesup [:jesup] from comment #4)
> Can we also clone the "copy to clipboard" button from about:support?  (and
> include about:support info in there as well, or at least include OS, OS
> version, and Firefox version/data/buildid?)

That seems reasonable; should we do this for the logging too? Are we likely to trip over limits if we do?
Attachment #8403594 - Flags: review?(jib)
Attachment #8403594 - Flags: review?(jib) → review+
Keywords: leave-open
Attachment #8403594 - Flags: checkin?
Whiteboard: [leave-open]
Make a side-effect explicit behavior, and make reportError side-effect free.
Attachment #8403594 - Attachment is obsolete: true
Comment on attachment 8404757 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console.

Review of attachment 8404757 [details] [diff] [review]:
-----------------------------------------------------------------

Fixing this side-effect was a substantial enough change that it needs another look.
Attachment #8404757 - Flags: review?(jib)
Comment on attachment 8404757 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console.

Review of attachment 8404757 [details] [diff] [review]:
-----------------------------------------------------------------

Did you test that this fixes the problem? I doubt it does.

::: dom/media/PeerConnection.js
@@ +496,5 @@
>      this.__DOM_IMPL__.dispatchEvent(event);
>    },
>  
>    // Log error message to web console and window.onerror, if present.
> +  reportErrorAndRaiseException: function(msg, file, line) {

This is a misnomer as it doesn't raise an exception in the traditional sense, rather it is our testing framework that yells "unhandled exception" when anything arrives on the default window.onerror handler. 

Since the underlying reportMsg here calls window.onerror explicitly for anything but warnings, changing this flag probably doesn't have any effect.

I would revert this and either:

A) call reportWarning instead if you're just wishing to inform the dev, or

B) Add a window.onerror handler to the test(s) like here:
   http://mxr.mozilla.org/mozilla-central/source/dom/media/tests/mochitest/test_peerConnection_throwInCallbacks.html?force=1#19

B has the benefit of actually testing the code, however, we should probably decide by which behavior is desirable in the field: Do we want window.onerror to fire or not?

We (it may have been just I) landed on yes for the other reportError case in this file - which is what test_peerConnection_throwInCallbacks.html tests - because it seemed reasonable given that the error originates from content (a throw in a callback). That made it well contained, which may not be the case here, since I suppose the network may go down at any moment on any test.

Since websites probably react poorly to window.onerror these days, and I presume I can listen for iceConnectionState === 'failed', I vote for A.
Attachment #8404757 - Flags: review?(jib) → review-
(In reply to Jan-Ivar Bruaroey [:jib] from comment #11)
> Comment on attachment 8404757 [details] [diff] [review]
> Part 1: When ICE fails, log a pointer to about:webrtc in the JS console.
> 
> Review of attachment 8404757 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Did you test that this fixes the problem? I doubt it does.
> 

I have, and it does, unless there is platform-specific behavior here. I have a try push to test this last bit.

> ::: dom/media/PeerConnection.js
> @@ +496,5 @@
> >      this.__DOM_IMPL__.dispatchEvent(event);
> >    },
> >  
> >    // Log error message to web console and window.onerror, if present.
> > +  reportErrorAndRaiseException: function(msg, file, line) {
> 
> This is a misnomer as it doesn't raise an exception in the traditional
> sense, rather it is our testing framework that yells "unhandled exception"
> when anything arrives on the default window.onerror handler. 
> 
> Since the underlying reportMsg here calls window.onerror explicitly for
> anything but warnings, changing this flag probably doesn't have any effect.
> 
> I would revert this and either:
> 
> A) call reportWarning instead if you're just wishing to inform the dev, or

   I want this to show up without the dev needing to alter the log-level, so this is no good.

> 
> B) Add a window.onerror handler to the test(s) like here:
>   
> http://mxr.mozilla.org/mozilla-central/source/dom/media/tests/mochitest/
> test_peerConnection_throwInCallbacks.html?force=1#19
> 
> B has the benefit of actually testing the code, however, we should probably
> decide by which behavior is desirable in the field: Do we want
> window.onerror to fire or not?

   I know that I don't want my code to do anything other than cause a message to appear in the JS console, with an ERROR level. This is very simple, and if the logging functions make this complicated, the logging functions are wrong.

> 
> We (it may have been just I) landed on yes for the other reportError case in
> this file - which is what test_peerConnection_throwInCallbacks.html tests -
> because it seemed reasonable given that the error originates from content (a
> throw in a callback). That made it well contained, which may not be the case
> here, since I suppose the network may go down at any moment on any test.
> 
> Since websites probably react poorly to window.onerror these days, and I
> presume I can listen for iceConnectionState === 'failed', I vote for A.

   What we ought to do here is have simple logging functions that do exactly what they say they do, no more, no less. If we need more stuff to happen, we have a function that explicitly does this stuff and is named appropriately. I will refactor more to that end.
Convert more side-effects to something explicit, and impose better naming.
Attachment #8404757 - Attachment is obsolete: true
(In reply to Byron Campen [:bwc] from comment #12)
> I have, and it does, unless there is platform-specific behavior here. I have
> a try push to test this last bit.

That surprises me. I may have been wrong then. Do you have a try link?

> > A) call reportWarning instead if you're just wishing to inform the dev, or
> 
>    I want this to show up without the dev needing to alter the log-level, so
> this is no good.

Makes sense.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #14)
> (In reply to Byron Campen [:bwc] from comment #12)
> > I have, and it does, unless there is platform-specific behavior here. I have
> > a try push to test this last bit.
> 
> That surprises me. I may have been wrong then. Do you have a try link?

   At any rate, I have isolated the special behavior even further now. There is no chance that we'll try to invoke onerror when calling logError().
Comment on attachment 8404846 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console.

Review of attachment 8404846 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! Nice refactor and I like the new names.
Attachment #8404846 - Flags: review+
Comment on attachment 8404846 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console.

Review of attachment 8404846 [details] [diff] [review]:
-----------------------------------------------------------------

One thing worries me about this still; we're passing Ci.nsIScriptError.exceptionFlag and not Ci.nsIScriptError.errorFlag inside logErrorAndCallOnError(). I'm pretty sure that there is an exception-related side-effect that is not adequately described in the function name. Do you know what this flag does exactly?
Attachment #8404846 - Flags: review?(jib)
Nice, a mid-air driveby review!
(In reply to Byron Campen [:bwc] from comment #15)
> https://tbpl.mozilla.org/?tree=Try&rev=b6e67b42a2f4

Note that the backout log from comment 8 was in Mochitest 5 on B2G ICS Emulator Opt in  https://tbpl.mozilla.org/?rev=7699524771b1&tree=Mozilla-Inbound which is not in your try. I suspect it would have still failed since still called window.onerror. Your latest patch doesn't though and seems fine.
Hooray, platform-specific exception behavior. That sounds like another bug. Maybe we should ask someone?
We've worked around the B2G ICS failure, but the fact that we're getting JS exception related bustage on just that platform hints that there is a bug lurking deeper here, so it may be worth giving a look. Just a heads-up is all.
Flags: needinfo?(bugs)
Comment on attachment 8404846 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console.

(In reply to Byron Campen [:bwc] from comment #18)
> One thing worries me about this still; we're passing
> Ci.nsIScriptError.exceptionFlag and not Ci.nsIScriptError.errorFlag inside
> logErrorAndCallOnError(). I'm pretty sure that there is an exception-related
> side-effect that is not adequately described in the function name. Do you
> know what this flag does exactly?

I forget if I ever knew, and https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIScriptError isn't helping, but it is unchanged by this patch.
Attachment #8404846 - Flags: review?(jib)
I'm seeing the following description: "An exception was thrown for this case - exception-aware hosts can ignore this." This sounds really weird to me, and we probably should not have been using it?
(In reply to Byron Campen [:bwc] from comment #22)
> We've worked around the B2G ICS failure, but the fact that we're getting JS
> exception related bustage on just that platform hints that there is a bug
> lurking deeper here, so it may be worth giving a look. Just a heads-up is
> all.

It's probably because that earlier patch called window.onerror. Perhaps the B2G tests are more sensitive in that regard?
(In reply to Byron Campen [:bwc] from comment #24)
> I'm seeing the following description: "An exception was thrown for this case
> - exception-aware hosts can ignore this." This sounds really weird to me,
> and we probably should not have been using it?

I remember trying both at the time, so I may have had a reason, but I don't recall.
Comment on attachment 8404846 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console.

>   // Log error message to web console and window.onerror, if present.
>-  reportError: function(msg, file, line) {
>-    this.reportMsg(msg, file, line, Ci.nsIScriptError.exceptionFlag);
>+  logErrorAndCallOnError: function(msg, file, line) {
>+    this.logMsg(msg, file, line, Ci.nsIScriptError.exceptionFlag);
>+
>+    // Safely call onerror directly if present (necessary for testing)
>+    try {
>+      if (typeof this._win.onerror === "function") {
>+        this._win.onerror(msg, file, line);
>+      }
What is this stuff? Calling onerror handler explicitly?
You probably want
this._win.dispatchEvent(new ErrorEvent("error", { message: msg,  filename: file, lineno: line}));
http://mxr.mozilla.org/mozilla-central/source/dom/webidl/ErrorEvent.webidl


It is not quite clear to me what the needinfo was for? Which exception? And on which platforms?
And do we have any small testcase for the inconsistency between platforms ?
Flags: needinfo?(bugs) → needinfo?(docfaraday)
So, in comment 20 there is a tbpl link that shows the test bustage we're talking about, but it only happens on B2G ICS. I certainly don't understand why it only failed on B2G ICS, but at any rate I've avoided it in my code by stripping away some unwanted side-effects in the logging code I'm using (these side-effects include using exceptionFlag instead of errorFlag, and calling window.onerror). Unfortunately, we have some pre-existing code that still is using these side effects, and we now have evidence that they don't quite work the same way on B2G ICS, hence my concern.
Flags: needinfo?(docfaraday)
(In reply to Byron Campen [:bwc] from comment #28)
> Unfortunately, we have some pre-existing code that
> still is using these side effects, and we now have evidence that they don't
> quite work the same way on B2G ICS, hence my concern.

FWIW I'm not concerned about the existing uses since they've had tests for almost a year (tests which specifically catch the window.onerror with a custom window.onerror handler). Your patch was effectively calling window.onerror without a custom handler. Why that only made B2G tests angry, I don't know, but I'm happy to leave it at that.
Ok, so I'll go ahead and request checkin, since my stuff no longer uses the stuff that is behaving weirdly.
Keywords: checkin-needed
Limits about copy-to-clipboard?  Probably not.  about:support has a lot already.

We don't need everything from about:support; just UA string, buildid, platform I think.  All easy to get
Copy-to-clipboard should include the ICE logs even if not visible.

Some cleanup for readability/collapsability

It'd be wonderful to get some level of live update on the RTP stats (I know refreshing the entire page caused some perf issues, I was thinking just those elements).  Not a blocker.
Doesn't apply cleanly to inbound. Please rebase.
Keywords: checkin-needed
Attachment #8404846 - Attachment is obsolete: true
Comment on attachment 8410418 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console. r=jib

Review of attachment 8410418 [details] [diff] [review]:
-----------------------------------------------------------------

Carry forward r=jib
Attachment #8410418 - Flags: review+
Summarize from dev-media thread about options for about:webrtc improvements:

I'm looking to compile ideas for improving about:webrtc.  Even stuff that's further out, though I'm most interested in what we can do for 32, and build on after that.

Thus far:
* View Last Call (shows final stats/logs from last call to end)
* Add a Copy To Clipboard button ala about:support
* Add OS, version, and Mozilla version/buildID or UA string
* live updating of some values (in-call stats)
** graphing values from live updates
* Human-readable summary of connection status (and maybe color-coding)
** Connected via TURN, media flowing.  Connected peer-to-peer, media never flowed.  Connection lost.
* color-coding stats values (have ranges for colors) such as packet loss, etc
* expand/collapse ala about:memory
* more stats values, including non-standard ones
* Highlighted warning about missing TURN servers (if no connection got established) 


* Advanced or debugging section
** View SDP
** ICE logs
** Ability to start/stop logging audio output (MOZ_DUMP_AUDIO) on the fly
** Ability to start/stop AEC inputs/outputs dump
** Ability to start/stop/save off the webrtc_trace logs
** (tricky perhaps) Start/stop various NSPR logs
*** some of these log changes might require taking affect in the next call
** datachannel stats
** Turn on/off AEC/AGC/NoiseReduction
*** already available in about:config
** change mode for AEC/AGC/NoiseReduction
*** already available in about:config

Oh yes:
* Ponies
* Rainbows
Priority: -- → P2
Whiteboard: [leave-open] → [leave-open, p=3, ft:webrtc]
Target Milestone: --- → mozilla32
Comment on attachment 8414080 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

Review of attachment 8414080 [details] [diff] [review]:
-----------------------------------------------------------------

There are a couple of things that I'm not sure I like about this patch, but would require a good chunk of code to fix:

1. We are distinguishing stats for open and closed PeerConnections by appending " (closed)" to the end of the id string, and always putting closed PeerConnections at the end of the list. It would be nice to be able to explicitly mark these with a boolean, so about:webrtc could display them in a unique way, but this would require webidl changes. (I'm kinda leaning toward the explicit indication, with a timestamp indicating when the PeerConnection was closed, since this would be useful information)

2. We are storing these final reports in PeerConnectionCtx, since it already is wired up to destroy on shutdown. We could avoid putting yet another thing in PeerConnectionCtx, but it would require much more code, and it seems like PeerConnectionCtx's reason for being is precisely this sort of thing.
Attachment #8414080 - Flags: feedback?(jib)
I should point out that a lot of these features are best implemented by someone who has some experience doing web stuff (eg; collapsing stuff). I can maybe try to fake it, but I cannot promise you'll like the result!
I've wanted the SDP on more than one occasion. I'll go after that next.
Depends on: 1002831
I am going to stop attaching patches to this bug, and instead treat it as a meta bug of sorts, since tackling that list is going to make this balloon even bigger than 906990.
My 2 cents: You've already landed a patch here for 31 which probably wont get tracked and tested until you close it, which may be a while if you make it a meta.
Comment on attachment 8414080 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

(In reply to Byron Campen [:bwc] from comment #39)
> 1. We are distinguishing stats for open and closed PeerConnections by
> appending " (closed)" to the end of the id string, and always putting closed
> PeerConnections at the end of the list. It would be nice to be able to
> explicitly mark these with a boolean, so about:webrtc could display them in
> a unique way, but this would require webidl changes. (I'm kinda leaning
> toward the explicit indication, with a timestamp indicating when the
> PeerConnection was closed, since this would be useful information)

I like being able to display them differently, especially once we get arrows to expand/collapse (soon?) ;-) then having the closed collapsed separately from the live ones by default would be nice I think.

I like the explicit timestamps idea (though you could search for the " (closed)" in JS too, no shame).

> 2. We are storing these final reports in PeerConnectionCtx, since it already
> is wired up to destroy on shutdown. We could avoid putting yet another thing
> in PeerConnectionCtx, but it would require much more code, and it seems like
> PeerConnectionCtx's reason for being is precisely this sort of thing.

Makes sense. I'm curious what shape the "closed reports" will take. Will it be an arbitrary snapshot right before close? Or averages somehow for the last 5 seconds/duration of the call? They'll obviously not have live updating values, since they're dead. What use-cases are they driving?

Are they perhaps a different "thing"? A last call-report?
Attachment #8414080 - Flags: feedback?(jib) → feedback+
(In reply to Jan-Ivar Bruaroey [:jib] from comment #44)
> Comment on attachment 8414080 [details] [diff] [review]
> Part 2: Record final ICE/media stats when PeerConnections are closed, so
> they show up in about:webrtc.
> 
> (In reply to Byron Campen [:bwc] from comment #39)
> > 2. We are storing these final reports in PeerConnectionCtx, since it already
> > is wired up to destroy on shutdown. We could avoid putting yet another thing
> > in PeerConnectionCtx, but it would require much more code, and it seems like
> > PeerConnectionCtx's reason for being is precisely this sort of thing.
> 
> Makes sense. I'm curious what shape the "closed reports" will take. Will it
> be an arbitrary snapshot right before close? Or averages somehow for the
> last 5 seconds/duration of the call? They'll obviously not have live
> updating values, since they're dead. What use-cases are they driving?
> 
> Are they perhaps a different "thing"? A last call-report?

   They are exactly the same thing as the "live" reports, taken as the PC is closing. If we want something like an average of some value, we could keep a rolling average and put it in all the reports. Ultimately, the use case is to be able to see them in cases where a call has already ended, so we can tell someone to look at about:webrtc a significant time after the failure (provided they have not restarted their browser).
Add |closed| and |timestamp| fields to RTCStatsReportInternal, instead of messing with the PC id directly.
Attachment #8414080 - Attachment is obsolete: true
Comment on attachment 8410418 [details] [diff] [review]
Part 1: When ICE fails, log a pointer to about:webrtc in the JS console. r=jib

This never got marked checked in.
Attachment #8410418 - Flags: checkin+
Attachment #8417478 - Attachment is obsolete: true
Attachment #8417490 - Flags: review?(jib)
Attachment #8410418 - Attachment description: Part 1: When ICE fails, log a pointer to about:webrtc in the JS console. → Part 1: When ICE fails, log a pointer to about:webrtc in the JS console. r=jib
Comment on attachment 8417490 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

Review of attachment 8417490 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.h
@@ +81,5 @@
>    friend class mozilla::dom::WebrtcGlobalInformation;
>  
> +#ifdef MOZILLA_INTERNAL_API
> +  // WebrtcGlobalInformation uses this; we put it here so we don't need to
> +  // create another shutdown observer class. 

whitespace
Attachment #8417490 - Flags: review?(jib) → review+
Attachment #8417490 - Attachment is obsolete: true
Comment on attachment 8418105 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

Carry forward r+ from jib.
Attachment #8418105 - Flags: review+
Needinfo self to check back on try push.

https://tbpl.mozilla.org/?tree=Try&rev=85c9e66c3d52
Flags: needinfo?(docfaraday)
Depends on: 1006636
Comment on attachment 8418105 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

Try looks good, so requesting checkin.
Attachment #8418105 - Flags: checkin?
Flags: needinfo?(docfaraday)
Comment on attachment 8418105 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

I take that back, this can't land with a DOM peer reviewing the WebIDL changes.
Attachment #8418105 - Flags: checkin+ → checkin-
Attachment #8418105 - Flags: review?(bugs)
Comment on attachment 8418105 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

>+static sipcc::PeerConnectionCtx* GetPeerConnectionCtx() {
Looks like a bit inconsistent coding style.
{ is in its own line in the method after this (as it should be per Mozilla C++ coding style)
Attachment #8418105 - Flags: review?(bugs) → review+
Attachment #8418105 - Attachment is obsolete: true
Comment on attachment 8418778 [details] [diff] [review]
Part 2: Record final ICE/media stats when PeerConnections are closed, so they show up in about:webrtc.

Carry forward r+ from jib and smaug, and requesting checkin. Should be landed before bug 1002831, or else the patches will not apply cleanly.
Attachment #8418778 - Flags: review+
Attachment #8418778 - Flags: checkin?
We should probably open a new meta bug for continued improvements? Or maybe re-open this one and make it meta? What's going to be the least trouble wrt tooling?
Flags: needinfo?(mreavy)
Let's open a new meta bug.  Thanks!
Flags: needinfo?(mreavy)
You need to log in before you can comment on or make changes to this bug.