Closed Bug 1063971 Opened 5 years ago Closed 5 years ago

setRemoteDescription call without success-callback freezes PeerConnection queue (FF32 regression)

Categories

(Core :: WebRTC, defect, major)

32 Branch
defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla35
blocking-b2g 2.0+
Tracking Status
firefox32 + verified
firefox33 + verified
firefox34 + verified
firefox35 + verified
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: marco2punti, Assigned: jesup)

References

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

Attached file localPeerConnection.js
User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36

Steps to reproduce:

I'm developing a WebRTC application, but after update to Firefox 32 the peerConnection method createAnswer doesn't call anymore the callback function, so it's no more possible estabilish a peer to peer connection.
It works as it should using Firefox 31.


Actual results:

This code doesn't work anymore in Firefox 32, the function gotRemoteDescription is NEVER called and the SDP answer is NEVER created, so WebRTC is unuseful:

// Handler to be called when the 'local' SDP becomes available
function gotLocalDescription(description){
  // Add the local description to the local PeerConnection
  localPeerConnection.setLocalDescription(description);
  log("Offer from localPeerConnection: \n" + description.sdp);
  
  // ...do the same with the 'pseudo-remote' PeerConnection
  // Note well: this is the part that will have to be changed if you want the communicating peers to become
  // remote (which calls for the setup of a proper signaling channel)
  remotePeerConnection.setRemoteDescription(description);
  
  // Create the Answer to the received Offer based on the 'local' description
  remotePeerConnection.createAnswer(gotRemoteDescription, onSignalingError);
}

// Handler to be called when the 'remote' SDP becomes available
function gotRemoteDescription(description){
  // Set the 'remote' description as the local description of the remote PeerConnection
  remotePeerConnection.setLocalDescription(description);
  log("Answer from remotePeerConnection: \n" + description.sdp);
  // Conversely, set the 'remote' description as the remote description of the local PeerConnection 
  localPeerConnection.setRemoteDescription(description);
}



Expected results:

We need stable browser for WebRTC now and in next years, we are working to develop applications and business on WebRTC and Firefox is expected to be stable. Please fix this as soon as possible or report documentation if the WebRTC api is changed.
By the way there are clear official specifications for WebRTC: http://dev.w3.org/2011/webrtc/editor/webrtc.html
so please i'm talking to all Firefox developers, follow the specs as Chrome is doing!
It seems that Firefox is becoming the new Internet Explorer.
Severity: normal → critical
Priority: -- → P1
Severity: critical → normal
Component: Untriaged → WebRTC
Priority: P1 → --
Product: Firefox → Core
Marco,

I fear the problem here is your code, specifically the following line:

 remotePeerConnection.setRemoteDescription(description);

callers of setRemoteDescription are required by the specification to provide
success and failure callbacks. See:

  http://dev.w3.org/2011/webrtc/editor/webrtc.html#widl-RTCPeerConnection-setRemoteDescription-void-RTCSessionDescription-description-VoidFunction-successCallback-RTCPeerConnectionErrorCallback-failureCallback

Your code does not supply them and so Firefox is choking. I added them and
things behave as expected.


With that said, I'm sort of sad that we're not generating some kind of exception
or the like to complain about your code. needinfoing jib to see if that's intentional.
Flags: needinfo?(jib)
It is intentional. Just like Chrome, we never made the success and failure callbacks on setLocalDescription and setRemoteDescription a requirement, because code like this is out there and works thanks to our queue (though such code obviously wont catch setLocalDescription or setRemoteDescription errors).

While adding the callbacks avoids the problem, there is a regression here in FF32.

I've tracked it to Bug 942367 which overloads the onSuccess callback of setRemoteDescription with state to implement Identity handling:

http://mxr.mozilla.org/mozilla-central/source/dom/media/PeerConnection.js?mark=687-689#678 :
> let allDone = () => {
>   if (!setRemoteComplete || !idpComplete || !onSuccess) {
>     return;
>   }
>   this.callCB(onSuccess);
>   onSuccess = null;
>   this._executeNext();
> };

In Marco's case, onSuccess is undefined, so this code bails without executing this._executeNext() which freezes the queue, which is bad.
Flags: needinfo?(jib) → needinfo?(martin.thomson)
Status: UNCONFIRMED → NEW
Ever confirmed: true
hg.mozilla.org/mozilla-central/rev/50116088c244#l3.116
Summary: WebRTC broken with Firefox 32 → setRemoteDescription call without success-callback freezes PeerConnection queue (FF32 regression)
How much of existing WebRTC code do we think trips over this?  Clearly most of the major sites/services don't.  Were there any widely copied demos/example code that did this?
Keywords: regression
OS: Windows 8.1 → All
Hardware: x86_64 → All
Consider that the code i posted is not mine... it's code all over a loot of books on WebRTC... if you add callbacks as a requirement in FF32 without any documentation it means all the sample code that worked till now without success/error callback is today broken... that is very bad
Attachment #8485506 - Flags: review?(jib)
Comment on attachment 8485506 [details] [diff] [review]
Allow SetRemoteDescription to omit callbacks again

Review of attachment 8485506 [details] [diff] [review]:
-----------------------------------------------------------------

r- for race on allDone not addressed.

::: dom/media/PeerConnection.js
@@ +687,5 @@
> +      if (setRemoteComplete && idpComplete && !onSuccess) {
> +        // user didn't supply sucess/failure callbacks.  Violation of spec,
> +        // but we allow it for now
> +        this._executeNext();
> +      } else if (!setRemoteComplete || !idpComplete || !onSuccess) {

Codeflow is already rather complicated to follow here, so rather than adding an if-else, lets remove "|| !onSuccess" since callCB handles onSuccess being falsy just fine - which incidentally is what saves set*Local*Description - so the precaution is unnecessary and is what bites us. Makes for a smaller patch as well. But see next comment.

@@ +692,4 @@
>          return;
>        }
>        this.callCB(onSuccess);
>        onSuccess = null;

There's still a race here. Using onSuccess == null to catch the two parallel operations racing to call allDone is what's bad here, because Marco's call with onSuccess = undefined disrupts this logic. This is important because the race protector not only guards the callback from being called twice, but guards executeNext() from being called twice, which would upset the queue logic.

Instead, we should avoid touching onSuccess and add another local variable to the group further up:

  let alreadyDone = false;

the set alreadyDone = true here, and test on || !alreadyDone.
Attachment #8485506 - Flags: review?(jib) → review-
Attachment #8485506 - Attachment is obsolete: true
Attachment #8485531 - Flags: review?(jib)
Comment on attachment 8485531 [details] [diff] [review]
Allow SetRemoteDescription to omit callbacks again

Review of attachment 8485531 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comment nit.

::: dom/media/PeerConnection.js
@@ +689,3 @@
>          return;
>        }
> +      // May be null if the user didn't supply sucess/failure callbacks.

// onSuccess may be undefined if the user didn't supply success/failure callbacks.

also note 'success' typo.
Attachment #8485531 - Flags: review?(jib) → review+
I found another queue-freeze in the error-handling part of this code while reviewing it that I've filed as Bug 1064074, but that one shouldn't affect a lot of people or people here.
Tested with and without patch, using modified pc_test.html that does a setTimeout(stepN, 1000) instead of success/fail callbacks for this.  Works with patch, fails without.

https://hg.mozilla.org/integration/mozilla-inbound/rev/9e1453b8ab17

We'll want this on every branch we can, though b2g32 may not make the bar.  If we can possibly get it into a 32.1 release, let's do so.
Whiteboard: [webrtc-uplift]
Target Milestone: --- → mozilla35
Severity: normal → major
We should also announce a time when we are going to enforce specification correctness.
(In reply to Eric Rescorla (:ekr) from comment #12)
> We should also announce a time when we are going to enforce specification
> correctness.

And while it's probably not important enough to uplift, I believe we should start emitting deprecation warnings when this happens, at least in 35. I've filed Bug 1064088.
(In reply to Randell Jesup [:jesup] from comment #11)
> Tested with and without patch, using modified pc_test.html that does a
> setTimeout(stepN, 1000) instead of success/fail callbacks for this.  Works
> with patch, fails without.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/9e1453b8ab17

Any chance of us landing a mochitest for this as well so it doesn't get
broken again?


> We'll want this on every branch we can, though b2g32 may not make the bar. 
> If we can possibly get it into a 32.1 release, let's do so.
https://hg.mozilla.org/mozilla-central/rev/9e1453b8ab17
Assignee: nobody → rjesup
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Tracking as Maire requested that this bug be considered for a 32.0.1 ride along.
Comment on attachment 8485531 [details] [diff] [review]
Allow SetRemoteDescription to omit callbacks again

Approval Request Comment
[Feature/regressing bug #]: bug 942367

[User impact if declined]: A large amount of example code and some webrtc applications (unclear how many) will silently break with no warning and no logs.  While incorrect vs the draft spec, we and Chrome have been explicitly allowing it (and I think early versions of the spec/impls allowed it).  The mandating of a success callback was accidental.

[Describe test coverage new/current, TBPL]: Manual testing; jib is working on an automated test to cover this we'll separately uplift.  Easily tested manually. (I'll attach my test page here)

[Risks and why]: See user impact.  Risk is low; changes are very self-contained.

[String/UUID change made/needed]: none
Attachment #8485531 - Flags: approval-mozilla-release?
Attachment #8485531 - Flags: approval-mozilla-beta?
Attachment #8485531 - Flags: approval-mozilla-b2g32?
Attachment #8485531 - Flags: approval-mozilla-aurora?
[Blocking Requested - why for this release]: See approval request
blocking-b2g: --- → 2.2?
Oops, hit End or something
blocking-b2g: 2.2? → 2.0?
test page
blocking-b2g: 2.0? → 2.0+
Attachment #8485531 - Flags: approval-mozilla-b2g32? → approval-mozilla-b2g32+
Comment on attachment 8485531 [details] [diff] [review]
Allow SetRemoteDescription to omit callbacks again

Approved for Beta and Aurora.
Attachment #8485531 - Flags: approval-mozilla-beta?
Attachment #8485531 - Flags: approval-mozilla-beta+
Attachment #8485531 - Flags: approval-mozilla-aurora?
Attachment #8485531 - Flags: approval-mozilla-aurora+
Flags: needinfo?(martin.thomson)
Flags: qe-verify+
Reproduced in FF 32 using the test page.
Verified fixed FF 33b2, 34.0a2 (2014-09-09), 35.0a1 (2014-09-09) Win 7, OS X 10.9.5, Ubuntu 12.10.
Comment on attachment 8485531 [details] [diff] [review]
Allow SetRemoteDescription to omit callbacks again

Taking this as a ride along with 32.0.1.
Attachment #8485531 - Flags: approval-mozilla-release? → approval-mozilla-release+
Verified as fixed using Firefox 32.0.1 (20140910203853) under Win 7 64-bit, Win 8 32-bit, Mac OSX 10.9.4 and Ubuntu 14.04 LTS 32-bit.
Whiteboard: [webrtc-uplift]
You need to log in before you can comment on or make changes to this bug.