Closed Bug 1087551 Opened 6 years ago Closed 5 years ago

addIceCandidate silently fails if called at the wrong time

Categories

(Core :: WebRTC: Networking, defect, P1)

36 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: drno, Assigned: drno)

References

Details

Attachments

(3 files, 5 obsolete files)

Apparently if addIceCandidate() gets called after providing the SDP offer, but before providing the SDP answer, then the ICE candidates get silently dropped to the floor as this log output and some code search shows:

63 INFO PeerConnectionWrapper (pcRemote): adding ICE candidate {"candidate":"candidate:0 2 UDP 2130379006 10.252.25.107 53782 typ host","sdpMid":"","sdpMLineIndex":0}
1999569680[1003292d0]: [main|fsm_sm] fsmdef.c:3629: SIPCC-FSM: fsmdef_addcandidate: Entered.
(ice/WARNING) ICE(PC:1414001237227225 (id=16 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html)): peer (PC:1414001237227225 (id=16 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html):default) has no stream matching stream 1414001237227225 (id=16 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html): stream1/audio
298348544[100329b20]: Couldn't parse trickle candidate for stream '1414001237227225 (id=16 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html): stream1/audio'
64 INFO PeerConnectionWrapper (pcRemote): 'onsignalingstatechange' event fired
65 INFO TEST-PASS | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | PeerConnectionWrapper (pcRemote): legal signaling state transition from stable to have-remote-offer 
66 INFO PeerConnectionWrapper (pcRemote): 'onsignalingstatechange' event 'have-remote-offer' received
67 INFO TEST-PASS | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | signalingState after remote setRemoteDescription is 'have-remote-offer' 
68 INFO PeerConnectionWrapper (pcRemote): Successfully added an ICE candidate

The peer streams the first error message complains about only get created here:
http://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/sipcc/core/gsm/fsmdef.c#3393
and here:
http://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/sipcc/core/gsm/fsmdef.c#3155
which in both cases is the after the answer. This also matches the log file from mochitest where this error only shows up on the answerer side, where we add the ICE candidates before generating the answer, but on the offer side everything is find as the candidates only get added after the answer for provided.

Really bad here is that still the success callback gets called, so the JS coder has no idea that something is going wrong here!
Svein at appear.in found that this is causing interop issues for both appear.in and talky.io between Firefox 36 and Chrome 43.

He told me "We only see this consistently with Chrome 43 on the other end.  But we also have a case on a strict network that we operate here, where this problem shows up on Firefox 36 with earlier versions of Chrome at the other end. It worked with earlier versions of Firefox on this network."

Here are logs from one occurrence of this bug, where the turn server per below does not show up in about:webrtc:
> ICE(PC:1425655695170739 (id=126 url=https://appear.in/svein)): peer (PC:1425655695170739 (id=126 url=https://appear.in/svein):default) pairing local trickle ICE candidatesrflx(IP4:10.20.32.61:64204/UDP|turn.appear.in:443)
> ICE(PC:1425655695170739 (id=126 url=https://appear.in/svein)): peer (PC:1425655695170739 (id=126 url=https://appear.in/svein):default) has no stream matching stream 1425655695170739 (id=126 url=https://appear.in/svein): stream3/data
> ICE(PC:1425655695170739 (id=126 url=https://appear.in/svein)): All could not pair new trickle candidate
> ICE(PC:1425655695170739 (id=126 url=https://appear.in/svein)): peer (PC:1425655695170739 (id=126 url=https://appear.in/svein):default) pairing local trickle ICE candidate turn-relay(IP4:10.20.32.61:64204/UDP|IP4:54.93.177.27:58165/UDP)
> ICE(PC:1425655695170739 (id=126 url=https://appear.in/svein)): peer (PC:1425655695170739 (id=126 url=https://appear.in/svein):default) has no stream matching stream 1425655695170739 (id=126 url=https://appear.in/svein): stream3/data
> ICE(PC:1425655695170739 (id=126 url=https://appear.in/svein)): All could not pair new trickle candidate

ni? Maire for triage/prioritization.
Flags: needinfo?(mreavy)
I've seen this a while ago, from the session traces on both sides Firefox is sending those candidates before sending the answer. Timing problem.
Bryon, Nils - Can we find the cause of this and an effort required to fix it?  I'd like to find the answers to this as soon as we can.  Thanks
Rank: 5
Flags: needinfo?(mreavy)
Flags: needinfo?(drno)
Flags: needinfo?(docfaraday)
Priority: -- → P1
So, we do have a problem in that if you call addIceCandidate before setting the remote description, you can end up not seeing an error. But do we have any logging indicating that we fire icecandidate events before onSetLocalDescriptionSuccess fires?
(In reply to Byron Campen [:bwc] from comment #4)
> So, we do have a problem in that if you call addIceCandidate before setting
> the remote description, you can end up not seeing an error.

And to be clear, that would just be an early warning, since you shouldn't
be doing that anyway.


 But do we have
> any logging indicating that we fire icecandidate events before
> onSetLocalDescriptionSuccess fires?
(In reply to Eric Rescorla (:ekr) from comment #5)
> (In reply to Byron Campen [:bwc] from comment #4)
> > So, we do have a problem in that if you call addIceCandidate before setting
> > the remote description, you can end up not seeing an error.
> 
> And to be clear, that would just be an early warning, since you shouldn't
> be doing that anyway.

I think the problem here is:
a) Google Chrome apparently allows you to add ice candidates before setting the remote description, so most web developers are not used to the fact that Firefox does not allow it.
b) Someone/something needs to store the ICE candidates until SRD succeeded.

So I see three "solutions" to the problem:
1.1) We educate all web developers to build their own ICE candidate store into their WebRTC service JS code.
1.2) We educate all web developers to use Promises in their JS code to store the ICE candidates for them.
2a) We add some code to Firefox for storing ICE candidates before the SRD.
2b) We change/update the spec and then implement 2a
(In reply to Nils Ohlmeier [:drno] from comment #6)
> (In reply to Eric Rescorla (:ekr) from comment #5)
> > (In reply to Byron Campen [:bwc] from comment #4)
> > > So, we do have a problem in that if you call addIceCandidate before setting
> > > the remote description, you can end up not seeing an error.
> > 
> > And to be clear, that would just be an early warning, since you shouldn't
> > be doing that anyway.
> 
> I think the problem here is:
> a) Google Chrome apparently allows you to add ice candidates before setting
> the remote description, so most web developers are not used to the fact that
> Firefox does not allow it.
> b) Someone/something needs to store the ICE candidates until SRD succeeded.
> 
> So I see three "solutions" to the problem:
> 1.1) We educate all web developers to build their own ICE candidate store
> into their WebRTC service JS code.
> 1.2) We educate all web developers to use Promises in their JS code to store
> the ICE candidates for them.
> 2a) We add some code to Firefox for storing ICE candidates before the SRD.
> 2b) We change/update the spec and then implement 2a

   I should point out that with renegotiation, there's an additional wrinkle. If side A is still trickling candidates after offer/answer concludes, and side B decides to renegotiate, suddenly those trickle candidates start getting errors without any warning. This can easily happen prior to ICE establishing. I think there's a somewhat strong case for doing option 2b.
Flags: needinfo?(docfaraday)
(In reply to Nils Ohlmeier [:drno] from comment #6)
> (In reply to Eric Rescorla (:ekr) from comment #5)
> > (In reply to Byron Campen [:bwc] from comment #4)
> > > So, we do have a problem in that if you call addIceCandidate before setting
> > > the remote description, you can end up not seeing an error.
> > 
> > And to be clear, that would just be an early warning, since you shouldn't
> > be doing that anyway.
> 
> I think the problem here is:
> a) Google Chrome apparently allows you to add ice candidates before setting
> the remote description, so most web developers are not used to the fact that
> Firefox does not allow it.
> b) Someone/something needs to store the ICE candidates until SRD succeeded.
> 
> So I see three "solutions" to the problem:
> 1.1) We educate all web developers to build their own ICE candidate store
> into their WebRTC service JS code.
> 1.2) We educate all web developers to use Promises in their JS code to store
> the ICE candidates for them.
> 2a) We add some code to Firefox for storing ICE candidates before the SRD.
> 2b) We change/update the spec and then implement 2a

From a web developer standpoint, I would prefer the solution to be the same or at least similar in the different browsers. With different solutions, implementers will invariably end up with something that works in one browser and doesn't in another.
(In reply to Nils Ohlmeier [:drno] from comment #6)
> (In reply to Eric Rescorla (:ekr) from comment #5)
> > (In reply to Byron Campen [:bwc] from comment #4)
> > > So, we do have a problem in that if you call addIceCandidate before setting
> > > the remote description, you can end up not seeing an error.
> > 
> > And to be clear, that would just be an early warning, since you shouldn't
> > be doing that anyway.
> 
> I think the problem here is:
> a) Google Chrome apparently allows you to add ice candidates before setting
> the remote description, so most web developers are not used to the fact that
> Firefox does not allow it.
> b) Someone/something needs to store the ICE candidates until SRD succeeded.
> 
> So I see three "solutions" to the problem:
> 1.1) We educate all web developers to build their own ICE candidate store
> into their WebRTC service JS code.
> 1.2) We educate all web developers to use Promises in their JS code to store
> the ICE candidates for them.
> 2a) We add some code to Firefox for storing ICE candidates before the SRD.
> 2b) We change/update the spec and then implement 2a


I'm trying to figure out how this is happening. JSEP prohibits onicecandidate from
being called prior to setLocal. Is this a case of reordering in the network?
(In reply to Eric Rescorla (:ekr) from comment #9)
> I'm trying to figure out how this is happening. JSEP prohibits
> onicecandidate from
> being called prior to setLocal. Is this a case of reordering in the network?

There was a 35 bug (and also in 36.01 apparently) here which looked _very_ similar to this and where Firefox generated onicecandidate before the setLocalDescription success callback.
(In reply to Eric Rescorla (:ekr) from comment #9)
> I'm trying to figure out how this is happening. JSEP prohibits
> onicecandidate from
> being called prior to setLocal. Is this a case of reordering in the network?

If you count the signaling server as the network, then yes. At least that is my view. For example our tests do some sanity checking on the SDP before passing it on. That gives the ICE candidates the opportunity to bypass them.
(In reply to Philipp Hancke from comment #10)
> (In reply to Eric Rescorla (:ekr) from comment #9)
> > I'm trying to figure out how this is happening. JSEP prohibits
> > onicecandidate from
> > being called prior to setLocal. Is this a case of reordering in the network?
> 
> There was a 35 bug (and also in 36.01 apparently) here which looked _very_
> similar to this and where Firefox generated onicecandidate before the
> setLocalDescription success callback.

Hmm... I'd believe that since the behavior of gathering vis-a-vis
setLocal() was in flux in the spec for a while. I'll leave to philosophers
whether it's a bug or not, but if you send the offer out in the
createoffer callback, I bet you will find things happen in the right order.
(In reply to Eric Rescorla (:ekr) from comment #12)
> (In reply to Philipp Hancke from comment #10)
> > (In reply to Eric Rescorla (:ekr) from comment #9)
> > > I'm trying to figure out how this is happening. JSEP prohibits
> > > onicecandidate from
> > > being called prior to setLocal. Is this a case of reordering in the network?
> > 
> > There was a 35 bug (and also in 36.01 apparently) here which looked _very_
> > similar to this and where Firefox generated onicecandidate before the
> > setLocalDescription success callback.
> 
> Hmm... I'd believe that since the behavior of gathering vis-a-vis
> setLocal() was in flux in the spec for a while. I'll leave to philosophers
> whether it's a bug or not, but if you send the offer out in the
> createoffer callback, I bet you will find things happen in the right order.

FWIW, this bug has been fixed, and onicecandidate won't fire until after onSetLocalDescriptionSuccess.
(In reply to Eric Rescorla (:ekr) from comment #12)
> Hmm... I'd believe that since the behavior of gathering vis-a-vis
> setLocal() was in flux in the spec for a while. I'll leave to philosophers
> whether it's a bug or not, but if you send the offer out in the
> createoffer callback, I bet you will find things happen in the right order.

BTW the other way how this can easily happen is that SRD has not succeeded yet. SRD got called, but hasn't finished yet. Now you receive the first ICE candidate via your signaling channel.
(In reply to Eric Rescorla (:ekr) from comment #12)
> Hmm... I'd believe that since the behavior of gathering vis-a-vis
> setLocal() was in flux in the spec for a while. I'll leave to philosophers
> whether it's a bug or not, but if you send the offer out in the
> createoffer callback, I bet you will find things happen in the right order.

Right, which is what apprtc does. Kind of bad since there is no guarantee that SLD will succeed, especially if you munge the SDP. We'll see whether this also reduces the number of times _this_ bug happens in the wild. I'll have some not-yet-so-hard-data tomorrow.
(In reply to Byron Campen [:bwc] from comment #13)
> FWIW, this bug has been fixed, and onicecandidate won't fire until after
> onSetLocalDescriptionSuccess.

which changes the timing slightly so that the time between the SLD success and the first onicecandidate is now smaller, right?
(In reply to Philipp Hancke from comment #16)
> (In reply to Byron Campen [:bwc] from comment #13)
> > FWIW, this bug has been fixed, and onicecandidate won't fire until after
> > onSetLocalDescriptionSuccess.
> 
> which changes the timing slightly so that the time between the SLD success
> and the first onicecandidate is now smaller, right?

The timing has been tightened some, but not a lot.
(In reply to Eric Rescorla (:ekr) from comment #12)
> I'll leave to philosophers
> whether it's a bug or not, but if you send the offer out in the
> createoffer callback, I bet you will find things happen in the right order.

The more I think about it the more I'm convinced the sending side (as in generator of the SDP and the candidates) is pretty non-interesting for this scenario.
More concerning is if answer SDP and ICE candidates are readily waiting for your consumption on your signaling server already. A very typical flow is probably that you take the answer SDP hand it to SRD, and while you wait for the success callback of the SRD you read/receive the ICE candidates.
Everything is in order in this scenario, nobody has delayed anything, but you have to wait for the SRD success callback, before you are allowed to pass in the ICE candidates.

Admitably the gap here is small. But I think we should simply store the ICE candidates until we have internally reached the right state.
And if it helps avoiding concerns about queues and timers to destroy never used queues we could limit this to exactly the scenario: SRD has been called, but success or failure callback has not fired yet. If someone calls addIceCandidate() in that small gap we store the candidates and pass them on or throw them away once we fired the SRD callback.
Flags: needinfo?(drno)
(In reply to Nils Ohlmeier [:drno] from comment #18)
> (In reply to Eric Rescorla (:ekr) from comment #12)
> > I'll leave to philosophers
> > whether it's a bug or not, but if you send the offer out in the
> > createoffer callback, I bet you will find things happen in the right order.
> 
> The more I think about it the more I'm convinced the sending side (as in
> generator of the SDP and the candidates) is pretty non-interesting for this
> scenario.
> More concerning is if answer SDP and ICE candidates are readily waiting for
> your consumption on your signaling server already. A very typical flow is
> probably that you take the answer SDP hand it to SRD, and while you wait for
> the success callback of the SRD you read/receive the ICE candidates.
> Everything is in order in this scenario, nobody has delayed anything, but
> you have to wait for the SRD success callback, before you are allowed to
> pass in the ICE candidates.
>

I don't think that's quite correct. These operations are defined as being queued,
so the right thing should happen automatically.


> Admitably the gap here is small. But I think we should simply store the ICE
> candidates until we have internally reached the right state.
> And if it helps avoiding concerns about queues and timers to destroy never
> used queues we could limit this to exactly the scenario: SRD has been
> called, but success or failure callback has not fired yet. If someone calls
> addIceCandidate() in that small gap we store the candidates and pass them on
> or throw them away once we fired the SRD callback.
(In reply to Byron Campen [:bwc] from comment #13)

> FWIW, this bug has been fixed, and onicecandidate won't fire until after
> onSetLocalDescriptionSuccess.

If so, is the queuing of ICE candidates I do in pc_test.html (until SetLocal succeeds) no longer needed?
(In reply to Randell Jesup [:jesup] from comment #20)
> (In reply to Byron Campen [:bwc] from comment #13)
> 
> > FWIW, this bug has been fixed, and onicecandidate won't fire until after
> > onSetLocalDescriptionSuccess.
> 
> If so, is the queuing of ICE candidates I do in pc_test.html (until SetLocal
> succeeds) no longer needed?

   It should not be.
(In reply to Randell Jesup [:jesup] from comment #20)
> (In reply to Byron Campen [:bwc] from comment #13)
> 
> > FWIW, this bug has been fixed, and onicecandidate won't fire until after
> > onSetLocalDescriptionSuccess.
> 
> If so, is the queuing of ICE candidates I do in pc_test.html (until SetLocal
> succeeds) no longer needed?

To avoid further confusion: you are storing based on SetRemoteDescription and not on SetLocalDescription https://github.com/mozilla/webrtc-landing/commit/bbad4695ac507e5871ba3af00408f818a6a36879
And storing candidates before the success of SetRemoteDescription is still needed, as it otherwise fails silently, which is this bug report about.
The first thing which needs to get fixed is that addIceCandidate() no longer returns success if the operation failed in fact (because it got called in the wrong state).
(In reply to Nils Ohlmeier [:drno] from comment #23)
> And storing candidates before the success of SetRemoteDescription is still
> needed, as it otherwise fails silently, which is this bug report about.
> The first thing which needs to get fixed is that addIceCandidate() no longer
> returns success if the operation failed in fact (because it got called in
> the wrong state).

   Whether we queue or not, we won't detect errors easily, so queuing doesn't get us anything in that regard. If we want to make it more noticeable when this happens, we might want to display an error if onicecandidate fires before the success callback.
(In reply to Byron Campen [:bwc] from comment #24)
> (In reply to Nils Ohlmeier [:drno] from comment #23)
> > And storing candidates before the success of SetRemoteDescription is still
> > needed, as it otherwise fails silently, which is this bug report about.
> > The first thing which needs to get fixed is that addIceCandidate() no longer
> > returns success if the operation failed in fact (because it got called in
> > the wrong state).
> 
>    Whether we queue or not, we won't detect errors easily, so queuing
> doesn't get us anything in that regard. If we want to make it more
> noticeable when this happens, we might want to display an error if
> onicecandidate fires before the success callback.

We still seem to talk about to different scenarios here:
bwc: onicecandidate -> emits new iceCandidates locally (viewing from my local PC)
drno: addIceCandidate() -> receives iceCandidates from the far end

I think it would be cleaner if would leave the discussion here to the addIceCandidate() problem as the subject suggests, and open a separate bug report for the onicecandidate events if it is still needed.
(In reply to Nils Ohlmeier [:drno] from comment #18)
> Admitably the gap here is small. But I think we should simply store the ICE
> candidates until we have internally reached the right state.
> And if it helps avoiding concerns about queues and timers to destroy never
> used queues we could limit this to exactly the scenario: SRD has been
> called, but success or failure callback has not fired yet. If someone calls
> addIceCandidate() in that small gap we store the candidates and pass them on
> or throw them away once we fired the SRD callback.

As a web developer I strongly favor this approach. Implementing a remote candidate store for this specific case felt like an unnecessary workaround. _If_ the web developers are expected to store remote candidates until SRD has succeeded this should to be explicitly stated in the spec, and the examples provided there (Example 4 and 6) should also reflect this to avoid confusion.
I've added the remote and local SDP, and the logs from about:webrtc for fail case. In this case I'm adding the ICE candidates _after_ SRD has succeeded (triggered from the SRD success callback).
(In reply to Nils Ohlmeier [:drno] from comment #6)
> (In reply to Eric Rescorla (:ekr) from comment #5)
> > (In reply to Byron Campen [:bwc] from comment #4)
> > > So, we do have a problem in that if you call addIceCandidate before setting
> > > the remote description, you can end up not seeing an error.
> > 
> > And to be clear, that would just be an early warning, since you shouldn't
> > be doing that anyway.
> 
> I think the problem here is:
> a) Google Chrome apparently allows you to add ice candidates before setting
> the remote description, so most web developers are not used to the fact that
> Firefox does not allow it.

I was about to say "not correct", but... this sounds like a bug in chrome, will file one. Adding an ice candidate before calling SRD is certainly wrong.

(In reply to Nils Ohlmeier [:drno] from comment #6)
> (In reply to Eric Rescorla (:ekr) from comment #5)
> > (In reply to Byron Campen [:bwc] from comment #4)
> > > So, we do have a problem in that if you call addIceCandidate before setting
> > > the remote description, you can end up not seeing an error.
> > 
> > And to be clear, that would just be an early warning, since you shouldn't
> > be doing that anyway.
> 
> I think the problem here is:
> a) Google Chrome apparently allows you to add ice candidates before setting
> the remote description, so most web developers are not used to the fact that
> Firefox does not allow it.

I wanted to say "not true" but indeed... filed https://code.google.com/p/webrtc/issues/detail?id=4409
I do not think this is what happens here though.

In fact, I have a hard time reproducing this. Just calling addIceCandidate immediately after SRD succeeds (in a single page demo) which is what I expect.
(In reply to Byron Campen [:bwc] from comment #7)
> I think there's a somewhat strong case for doing option 2b.

+1. This is what you have to do on a local-loop call otherwise: http://jsfiddle.net/zy3e8rqo
Assignee: nobody → drno
Attached file MozReview Request: bz://1087551/drno (obsolete) —
/r/7013 - Bug 1087551: remove the queueing of ice candidates from mochitests

Pull down this commit:

hg pull -r e3b5263cf684c5514fbc7f5e5b464640dc460eb1 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review?(jib)
Sorry for the messed up patch title...

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=da8aec461bdd

I added a form of queuing to PeerConnection.js, which does the job of queuing ice candidates IF addIceCandidate gets called before SetRemoteDescription got called.
BUT this approach has two problems:

1) As we don't have a remote SDP we can not evaluate if for example the m-line index exists in the SDP. Therefore the return code for addIceCanditate is fake and only tells you that the candidate has been queued. The real return code for from the addIceCandidate() call to our implementation gets dropped to the floor.

2) Returning a promise, which would resolve once the ice candidate has successfully (or not) been added after SRD success, would be nice, but breaks with the chain model, which requires us to process things in FIFO order. Therefore in case queuing happens the promise resolves immediately to unblock the chain (for the hopefully soon SRD).

The other alternative I see would be to return a clear error in case addIceCandidate() gets called before SRD() to point out to web developers that something went wrong. To cover for that case the developer would still then have to implement his/her own queuing.
Yes, this is why addIceCandidate() needs to be queued. See:
https://lists.w3.org/Archives/Public/public-webrtc/2014May/0227.html

I suggest that we re-raise this on the list.
A temporary measure would be to *just* set a flag for sRD being called and if it hasn't been thrown an error and if it has been then just assume the m= line exists.
(In reply to Nils Ohlmeier [:drno] from comment #31)
> 1) As we don't have a remote SDP we can not evaluate if for example the
> m-line index exists in the SDP. Therefore the return code for
> addIceCanditate is fake and only tells you that the candidate has been
> queued. The real return code for from the addIceCandidate() call to our
> implementation gets dropped to the floor.

Don't do it that way. Instead, use a promise as a gate, like in the 3 lines of code you removed from pc.js:

  this.holdIceCandidates = new Promise(r => this.releaseIceCandidates = r);
  ...
  setRemoteDescription: function(desc) { ... this.releaseIceCandidates(); }

  AddIceCandidate: function(c) { this.holdIceCandidates.then(() => { this.addIceCandidate(c); }); }

That's its entire implementation. It works because a new () => this.addIceCandidate(c) function is queued (not executed) inside the holdIceCandidates promise (acting as a gate) each time AddIceCandidate is called, and they'll all execute in FIFO order when the promise resolves (gate opens) later.

This means the calls execute later, after SRD, when m-lines exist.

Yes, this adds multiple .then() functions to a single promise, and they're not chained, instead each is a fork. This is important, because each fork can fail without affecting the other forks, and that failure belongs only to the respective caller of AddIceCandidate. So the final promise from its forked chain, is what AddIceCandidate needs to return. This is where the mochitest has a bug, the third line above should be:

  AddIceCandidate: function(c) { return this.holdIceCandidates.then(() => this.addIceCandidate(c)); }

Importantly, () => this.addIceCandidate(c) is the same as () => { return this.addIceCandidate(c); }, so AddIceCandidate now returns the final promise in this forked chain.

> 2) Returning a promise, which would resolve once the ice candidate has
> successfully (or not) been added after SRD success, would be nice, but
> breaks with the chain model, which requires us to process things in FIFO
> order. Therefore in case queuing happens the promise resolves immediately to
> unblock the chain (for the hopefully soon SRD).

I'm not sure what you mean with "breaks the chain model" but I'm fairly confident that what I describe above should just work, and that the internal operations chain is largely orthogonal to this.
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

https://reviewboard.mozilla.org/r/7011/#review5833

See comment 34.

::: dom/media/PeerConnection.js
(Diff revision 1)
> +      if (this._remoteType !== null) {

Another nice thing with using a promise as a gate is that the case where the promise is already resolved is handled transparently, removing for state tests and multiple codepaths.
Attachment #8591885 - Flags: review?(jib)
, removing the need for state tests and multiple codepaths.
(In reply to Thomas Bruun from comment #27)
> Created attachment 8575277 [details]
> Failing case when adding queued candidates after SRD succeeds
> 
> I've added the remote and local SDP, and the logs from about:webrtc for fail
> case. In this case I'm adding the ICE candidates _after_ SRD has succeeded
> (triggered from the SRD success callback).

I looked through the log file and I think our ICE stack gets really confused because the SDP offer uses exactly the same ICE candidates, even including the exact same ICE credentials, for two different m-lines, but without officially supporting BUNDLE.
I have not checked the spec, but that looks really wrong to me. Our stack has no way to distinguish the two, but from the SDP believes it needs to establish two separate transports. I hope this is/was only a side effect of you trying to trigger this bug.
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: queue remote ICE candidates if needed
/r/7131 - Bug 1087551: adapt mochitests to ICE candidate queueing

Pull down these commits:

hg pull -r acafd51b7907d1705f77b419ccca1a7d350163e5 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review?(jib)
(In reply to Nils Ohlmeier [:drno] from comment #38)
> Comment on attachment 8591885 [details]
> MozReview Request: bz://1087551/drno
> 
> /r/7013 - Bug 1087551: queue remote ICE candidates if needed
> /r/7131 - Bug 1087551: adapt mochitests to ICE candidate queueing
> 
> Pull down these commits:
> 
> hg pull -r acafd51b7907d1705f77b419ccca1a7d350163e5
> https://reviewboard-hg.mozilla.org/gecko/

Thanks for your comments jib. I think this a lot cleaner solution. I should probably add a new mochitest which verifies the new expected behavior of addIceCandidate (as a substitute for the tests I removed - which either never worked or no longer work).
https://reviewboard.mozilla.org/r/7011/#review5925

::: dom/media/PeerConnection.js
(Diff revision 2)
> +      .then(() => { this.releaseIceCandidates(); })
> +      .then(() => { }); // must return undefined

The above already returns undefined.
I don't think this is the right fix.

There's no valid reason to be calling addIceCandidate() prior to setRemote()
and instead of accommodating code which does that, we should report an error.
There's a promise here so there's no problem with doing that.

If there's some situation in which Firefox calls onicecandidate prior to returning
an offer/answer, we need to fix that.
To follow up on this: if you think we should be accepting addIceCandidate prior
to setRemote() then you need to discuss this on the mailing list, not just change
Firefox.
https://reviewboard.mozilla.org/r/7013/#review5939

lgtm with nits. Holding off for mailing list discussion. I can post something in a bit if we want.

::: dom/media/PeerConnection.js
(Diff revision 2)
> +      .then(() => { this.releaseIceCandidates(); })
> +      .then(() => { }); // must return undefined

What Martin said.

::: dom/media/PeerConnection.js
(Diff revision 2)
> +    if (this._dompc._onAddIceCandidateError) {
> -    this._dompc._onAddIceCandidateError(this.newError(message, code));
> +      this._dompc._onAddIceCandidateError(this.newError(message, code));
> +    }

And this

::: dom/media/PeerConnection.js
(Diff revision 2)
> +    if (this._dompc._onAddIceCandidateSuccess) {
> -    this._dompc._onAddIceCandidateSuccess();
> +      this._dompc._onAddIceCandidateSuccess();
> +    }

Why do you need this?
https://reviewboard.mozilla.org/r/7131/#review5943

::: dom/media/tests/mochitest/test_peerConnection_errorCallbacks.html
(Diff revision 1)
> -  function testAddIceCandidateError() {
> -    var pc = new mozRTCPeerConnection();
> -    info ("Testing addIceCandidate error");
> -    return pc.addIceCandidate(new mozRTCIceCandidate({ candidate: "Pony Lords, jump!",
> -                                                       sdpMid: "whee",
> -                                                       sdpMLineIndex: 1 }))
> -    .then(generateErrorCallback("addIceCandidate with nonsense candidate should fail"),
> -          validateReason);
> -  };

Why delete this one?
(In reply to Jan-Ivar Bruaroey [:jib] from comment #43)
> https://reviewboard.mozilla.org/r/7013/#review5939
> 
> lgtm with nits. Holding off for mailing list discussion. I can post
> something in a bit if we want.

Well, I think we should try to develop a consensus view here first.

I've explained above why I think this patch is misguided. I believe
Martin shares my view and as https://code.google.com/p/webrtc/issues/detail?id=4409
makes clear, Google intends to forbid this usage in Chrome.

If you believe that we should indeed permit addIceCandidate before
setRD, it would be useful if you explained why.
TL;DR: People should never have to cache addIceCandidate in JS to have WebRTC work reliably, even for local loops. We should remove our mochitest caching so we're testing what we ship, and not hide errors. If that works, then we're good. Otherwise I would argue a change is needed to addIceCandidate.

We want WebRTC to be reliable and easy to use and learn at the same time. Demos do not need extra challenges.

Whenever I wrote something like below a few months ago, I would intermittently get "ICE failed, see about:webrtc for more details" for no clear reason:

var pc1 = new mozRTCPeerConnection(), pc2 = new mozRTCPeerConnection();

pc1.onicecandidate = e => !e.candidate || pc2.addIceCandidate(e.candidate).catch(failed);
pc2.onicecandidate = e => !e.candidate || pc1.addIceCandidate(e.candidate).catch(failed);
pc2.onaddstream = e => (v2.mozSrcObject = e.stream, log("Connected!"));

function start() {
  navigator.mediaDevices.getUserMedia({ video: true })
  .then(stream => pc1.addStream(v1.mozSrcObject = stream))
  .then(() => pc1.createOffer())
  .then(offer => pc1.setLocalDescription(offer))
  .then(() => pc2.setRemoteDescription(pc1.localDescription))
  .then(() => pc2.createAnswer())
  .then(answer => pc2.setLocalDescription(answer))
  .then(() => pc1.setRemoteDescription(pc2.localDescription))
  .catch(failed);
}

var log = msg => div.innerHTML = div.innerHTML + "<p>" + msg + "</p>";
var failed = e => log(e.name +": "+ e.message +" line "+ e.lineNumber);

But when I try it on http://jsfiddle.net/fsLz3pdt now it works great for me. Did we fix something? If so then that would be great! In any case I suggest we remove our mochitest caching to find out. If this reveals a problem then I think we need to fix it properly, in our code, not in the tests.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #46)
> TL;DR: People should never have to cache addIceCandidate in JS to have
> WebRTC work reliably, even for local loops.

Why would they have to do that? Any implementation which generates
candidates before a local description is nonconformant and if
the signaling system reorders the messages then it's broken


> We should remove our mochitest
> caching so we're testing what we ship, and not hide errors. If that works,
> then we're good. 

OK, let's do that.


> Otherwise I would argue a change is needed to
> addIceCandidate.

Actually I believe that this reflects a defect in Firefox where it
is emitting candidates too early.


> Whenever I wrote something like below a few months ago, I would
> intermittently get "ICE failed, see about:webrtc for more details" for no
> clear reason:
> 
> var pc1 = new mozRTCPeerConnection(), pc2 = new mozRTCPeerConnection();
> 
> pc1.onicecandidate = e => !e.candidate ||
> pc2.addIceCandidate(e.candidate).catch(failed);
> pc2.onicecandidate = e => !e.candidate ||
> pc1.addIceCandidate(e.candidate).catch(failed);
> pc2.onaddstream = e => (v2.mozSrcObject = e.stream, log("Connected!"));

Note: this code is not correct because onaddstream() is not a signal
that you are connected but merely that the stream has been created.


> But when I try it on http://jsfiddle.net/fsLz3pdt now it works great for me.
> Did we fix something? If so then that would be great! In any case I suggest
> we remove our mochitest caching to find out. If this reveals a problem then
> I think we need to fix it properly, in our code, not in the tests.

Yes, I agree with that. Specifically it likely means we are emitting candidates
too early and need to fix that.
(In reply to Eric Rescorla (:ekr) from comment #47)
> > pc2.onaddstream = e => (v2.mozSrcObject = e.stream, log("Connected!"));
> 
> Note: this code is not correct because onaddstream() is not a signal
> that you are connected but merely that the stream has been created.

Thanks, what's a better signal?
Well, if you are interested in the negotiation being completed a change in the
signaling state to stable.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #48)
> (In reply to Eric Rescorla (:ekr) from comment #47)
> > > pc2.onaddstream = e => (v2.mozSrcObject = e.stream, log("Connected!"));
> > 
> > Note: this code is not correct because onaddstream() is not a signal
> > that you are connected but merely that the stream has been created.
> 
> Thanks, what's a better signal?

If you want to claim that you are connected I think you should monitor iceconnectionstatechange:

    pc1.oniceconnectionstatechange = function() {
      if (pc1.iceConnectionState == "connected") {
        log("HIP HIP HORRAY PC1 is connected");
      }
    }
(In reply to Jan-Ivar Bruaroey [:jib] from comment #44)
> https://reviewboard.mozilla.org/r/7131/#review5943
> 
> ::: dom/media/tests/mochitest/test_peerConnection_errorCallbacks.html
> (Diff revision 1)
> > -  function testAddIceCandidateError() {
> > -    var pc = new mozRTCPeerConnection();
> > -    info ("Testing addIceCandidate error");
> > -    return pc.addIceCandidate(new mozRTCIceCandidate({ candidate: "Pony Lords, jump!",
> > -                                                       sdpMid: "whee",
> > -                                                       sdpMLineIndex: 1 }))
> > -    .then(generateErrorCallback("addIceCandidate with nonsense candidate should fail"),
> > -          validateReason);
> > -  };
> 
> Why delete this one?

This test is actually completely off. From the name I'm assuming it tries to test the candidate parsing. But it never gets/got that far, because you always get the error that you are not allowed to call addIceCandidate in that signaling state.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #43)
> ::: dom/media/PeerConnection.js
> (Diff revision 2)
> > +    if (this._dompc._onAddIceCandidateError) {
> > -    this._dompc._onAddIceCandidateError(this.newError(message, code));
> > +      this._dompc._onAddIceCandidateError(this.newError(message, code));
> > +    }
> 
> And this
> 
> ::: dom/media/PeerConnection.js
> (Diff revision 2)
> > +    if (this._dompc._onAddIceCandidateSuccess) {
> > -    this._dompc._onAddIceCandidateSuccess();
> > +      this._dompc._onAddIceCandidateSuccess();
> > +    }
> 
> Why do you need this?

With my patch the callbacks for success or failure don't get set, so you get error messages every time addIceCandidate() succeeds or fails.
I fear that if we allow multiple addIceCandidate() calls to return Promises, which wait for something we would need to set these callbacks individually in the code path when the Promises return something (you can't set them at the time when addIceCandidate() gets called as these calls would overwrite each others callbacks).
(In reply to Nils Ohlmeier [:drno] from comment #52)
> > > -    return pc.addIceCandidate(new mozRTCIceCandidate({ candidate: "Pony Lords, jump!",
> > > -                                                       sdpMid: "whee",
> > > -                                                       sdpMLineIndex: 1 }))
> > > -    .then(generateErrorCallback("addIceCandidate with nonsense candidate should fail"),
> > > -          validateReason);
> > > -  };
> > 
> > Why delete this one?
> 
> This test is actually completely off. From the name I'm assuming it tries to
> test the candidate parsing. But it never gets/got that far, because you
> always get the error that you are not allowed to call addIceCandidate in
> that signaling state.

We should probably fix the test instead so we have coverage for candidate validation failing. We should also be testing that we're getting the right error.name (which would have detected the mistake).

(In reply to Nils Ohlmeier [:drno] from comment #53)
> > > +    if (this._dompc._onAddIceCandidateError) {
> > > -    this._dompc._onAddIceCandidateError(this.newError(message, code));
> > > +      this._dompc._onAddIceCandidateError(this.newError(message, code));
> > > +    }
> > 
> > And this
> > 
> > ::: dom/media/PeerConnection.js
> > (Diff revision 2)
> > > +    if (this._dompc._onAddIceCandidateSuccess) {
> > > -    this._dompc._onAddIceCandidateSuccess();
> > > +      this._dompc._onAddIceCandidateSuccess();
> > > +    }
> > 
> > Why do you need this?
> 
> With my patch the callbacks for success or failure don't get set, so you get
> error messages every time addIceCandidate() succeeds or fails.

I doubt this is true in the latest version, since the calls themselves are deferred and later executed in FIFO order. Could you remove them and see?

> I fear that if we allow multiple addIceCandidate() calls to return Promises,
> which wait for something we would need to set these callbacks individually
> in the code path when the Promises return something (you can't set them at
> the time when addIceCandidate() gets called as these calls would overwrite
> each others callbacks).

_onAddIceCandidateSuccess and _onAddIceCandidateError are protected by the operations .chain() which effectively serializes these calls (i.e. they're set and read before the next call sets them). It's probably the only reason we're chaining (I refuse to say queueing) addIceCandidate when the spec tells us not to.

The clean way to fix this is to pass promises to the c++ code, but there were some obstacles there (Bug 1111742). We could probably try it for addIceCandidate if we think it's important to unchain it.
After removing the queueing of ICE candidates and adding artificial sleeps I still see that adding ICE candidates at the "wrong" time fails in nICEr, but that never gets translated into errors in JS. I'm going to look into what is blocking the errors.
I found the problem in our tests which wasn't reporting the failures which happen when adding ICE candidates in the wrong state.

But with some more fixes applied I get on my local machine ICE failures for the H264 test. In that particular test I see on my local machine the pcRemote PC emitting ICE candidates while the test is waiting for the setLocalDescription success callback. So the answer SDP is generated, but the sLD request seems to take quite some time. During that time the ICE candidate gets transferred and added to pcLocal, before that has even gotten the answer SDP.
So it seems we still have a problem on the answerer side.
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: remove queuing of ICE candidates from mochitests

Pull down this commit:

hg pull -r 6f8517fc489e371247b164a7d5c3538803404747 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review?(docfaraday)
Forgot the try run of my latest changes which removes the candidates queuing in mochitest, plus some more tests and fixes: https://treeherder.mozilla.org/#/jobs?repo=try&revision=21eb4563f630
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

https://reviewboard.mozilla.org/r/7011/#review6123

::: dom/media/tests/mochitest/pc.js:1187
(Diff revision 3)
> +      ok(false, "Adding ICE candidate failed with: " + e.message)

Can we prepend a |this| here?

::: dom/media/tests/mochitest/pc.js:1166
(Diff revision 3)
>      this._remote_ice_candidates.push(candidate);

We seem to still be using this in dumpSdp, but |_ice_candidates_to_add| in dumpSdp is never populated and probably needs to be removed (this was before your patch even, which is strange); I would have assumed this was part of the mechanism that queues up candidates...

::: dom/media/tests/mochitest/pc.js:1159
(Diff revision 3)
>     * Either adds a given ICE candidate right away or stores it to be added
>     * later, depending on the state of the PeerConnection.
>     *
>     * @param {object} candidate
>     *        The mozRTCIceCandidate to be added or stored
>     */
>    storeOrAddIceCandidate : function(candidate) {

Name and comment need to be updated.

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:10
(Diff revision 3)
> +    bug: "784519",
> +    title: "addCandidate (answer) in 'have-local-offer'"

This needs updating, right?

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:49
(Diff revision 3)
> +           sdpMid: "whee",

This is not going to be a valid mid, so let's omit the sdpMid property.

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:66
(Diff revision 3)
> +          ok(true, "Missing index in got automatically set to a valid value")

Yikes!

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:35
(Diff revision 3)
> +          {candidate:"candidate:Pony Lords, jump!",
> +           sdpMid: "whee",

I would make this a valid candidate, and remove the mid, since we are trying to isolate the behavior when the index is too high (yes, this candidate property will satisfy JsepSessionImpl now, but we probably want to fix that before too long).
Attachment #8591885 - Flags: review?(docfaraday)
(In reply to Eric Rescorla (:ekr) from comment #47)
> (In reply to Jan-Ivar Bruaroey [:jib] from comment #46)
> > TL;DR: People should never have to cache addIceCandidate in JS to have
> > WebRTC work reliably, even for local loops.
> 
> Why would they have to do that? Any implementation which generates
> candidates before a local description is nonconformant

OK I found my mistake in comment 46, and I should have known better.

See if you can spot it:

  function start() {
    navigator.mediaDevices.getUserMedia({ video: true })
    .then(stream => pc1.addStream(v1.mozSrcObject = stream))
    .then(() => pc1.createOffer())
    .then(offer => pc1.setLocalDescription(offer))
    .then(() => pc2.setRemoteDescription(pc1.localDescription))
    .then(() => pc2.createAnswer())
    .then(answer => pc2.setLocalDescription(answer))
    .then(() => pc1.setRemoteDescription(pc2.localDescription))
    .catch(failed);
  }

Here's a fix:

  function start() {
    navigator.mediaDevices.getUserMedia({ video: true })
    .then(stream => pc1.addStream(v1.mozSrcObject = stream))
    .then(() => pc1.createOffer())
    .then(offer => pc2.setRemoteDescription(offer))
    .then(() => pc1.setLocalDescription(pc2.remoteDescription))
    .then(() => pc2.createAnswer())
    .then(answer => pc1.setRemoteDescription(answer))
    .then(() => pc2.setLocalDescription(pc2.localDescription))
    .catch(failed);
  }

In local-loop demos, candidates are added directly on the peer:

  pc1.onicecandidate = e => !e.candidate || pc2.addIceCandidate(e.candidate).catch(failed);
  pc2.onicecandidate = e => !e.candidate || pc1.addIceCandidate(e.candidate).catch(failed);

In the first example, this means that pc2.addIceCandidate races with pc2.setRemoteDescription, and pc1.addIceCandidate races with pc1.setRemoteDescription.

The remedy seems solid, if subtle. It underscores the importance of getting the sdp sent first (whether signaling is used or not).

Unfortunately, the spec example [1] seems to have this race as well, in that it encourages people to setLocalDescription before sending sdp, which may be a mistake. Worth raising on the list?

Racy: http://jsfiddle.net/my30dg5n
 Fix: http://jsfiddle.net/gpf802k1

[1] http://w3c.github.io/webrtc-pc/#simple-peer-to-peer-example
(In reply to Jan-Ivar Bruaroey [:jib] from comment #60)
> The remedy seems solid, if subtle.

Except what if setLocalDescription fails...
What's the problem with calling sLD first as long as you send the offer in the success callback?
.then() functions are always invoked asynchronously (and they're part of the implementation now). Extending a race-guard across promises is tricky (but may be doable).
A specific solution to this may be to ensure that all onicecandidate events fire after the success sLD callback has been dispatched. That seems doable by dispatching them off the same promise.
Multiple .then() functions on the same promise are guaranteed to be executed fifo.
Disclaimer: I should say I haven't reproduced a race of late without the wait(2000) commands in comment 60, so there may not be a race here in practice.

I gotta say I find the current tight coupling of setLocalDescription and onicecandidate troubling, but that's how the spec is written. For instance, a naive view of asynchronous operations and promise chains may be to call it racy if one cannot add wait(2000) without breaking the state machine. This API certainly breaks such naive assumptions.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #64)
> A specific solution to this may be to ensure that all onicecandidate events
> fire after the success sLD callback has been dispatched.

This is the intended behavior. If Firefox isn't doing that, we have
a problem.


> That seems doable
> by dispatching them off the same promise.

Martin analyzed this and believes we are in good shape aready.
(In reply to Eric Rescorla (:ekr) from comment #67)
> (In reply to Jan-Ivar Bruaroey [:jib] from comment #64)
> > A specific solution to this may be to ensure that all onicecandidate events
> > fire after the success sLD callback has been dispatched.
> 
> This is the intended behavior. If Firefox isn't doing that, we have
> a problem.

We do this, and the tests should fail if we don't.
(In reply to Byron Campen [:bwc] from comment #68)
> (In reply to Eric Rescorla (:ekr) from comment #67)
> > (In reply to Jan-Ivar Bruaroey [:jib] from comment #64)
> > > A specific solution to this may be to ensure that all onicecandidate events
> > > fire after the success sLD callback has been dispatched.
> > 
> > This is the intended behavior. If Firefox isn't doing that, we have
> > a problem.
> 
> We do this, and the tests should fail if we don't.

That is exactly what I was describing in comment #56. We don't seem to do it on the answerer side.
(In reply to Byron Campen [:bwc] from comment #59)
> ::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:66
> (Diff revision 3)
> > +          ok(true, "Missing index in got automatically set to a valid value")
> 
> Yikes!

Opened bug 1157034 to fix this some time later.
Nils, please try with this patch to see if it helps.

There's setLocalDescription stuff going on in PeerConnection.js that doesn't seem accounted for in the icecandidate code which comes almost straight from c++.
Attachment #8595924 - Flags: feedback?(drno)
Tested and fixed a failure path.
Attachment #8595924 - Attachment is obsolete: true
Attachment #8595924 - Flags: feedback?(drno)
Attachment #8595932 - Flags: feedback?(drno)
Comment on attachment 8595932 [details] [diff] [review]
delay firing icecandidate until after setLocalDesciption promise resolves (2)

Review of attachment 8595932 [details] [diff] [review]:
-----------------------------------------------------------------

This seems like it's going in the wrong direction. The C++ events seem to passed to the JS in the right order, in which case they are being reordered somewhere in the JS. Instead of adding an extra layer of interlocks, let's figure out why they are being reordered and fix that.
Attachment #8595932 - Flags: feedback?(drno) → feedback-
I'll be on firmer ground once I learn if this patch fixes the problem Nils is seeing, but ...

As I said in comment 63, .then() functions are never synchronously invoked, which means that:

  setLocalDescription(offer).then(callback)

will never call callback in the same cycle as the c++ calls PCObserver.onSetLocalDescriptionSuccess() in.

The legacy callbacks are implemented on-top of the new API, and are therefore equally affected (same code).
Could we simply add a new function for event dispatch that added to the task queue (i.e., pc._chain())?  There are some events that we don't want to hold back in this fashion, but it seems like onnn and onicecandidate might benefit from that.
Comment on attachment 8595932 [details] [diff] [review]
delay firing icecandidate until after setLocalDesciption promise resolves (2)

Review of attachment 8595932 [details] [diff] [review]:
-----------------------------------------------------------------

I'm OK with this on balance.  It's ugly, but that's a consequence of the spec being ugly.

::: dom/media/PeerConnection.js
@@ +679,5 @@
>          this._impl.setLocalDescription(type, desc.sdp);
>        }));
>      });
> +    this._afterSetLocalDescription =
> +      this._afterSetLocalDescription.then(() => p).catch(() => {});

You probably don't need to chain these, just set
this._afterSetLocal = p.catch(_ => {});

@@ +1103,5 @@
> +    // Ensure event fires after setLocalDescription promise has resolved.
> +    this._dompc._afterSetLocalDescription.then(() => this.dispatchEvent(
> +      new this._dompc._win.RTCPeerConnectionIceEvent ("icecandidate",
> +                                                      { candidate: icecand })))
> +    .catch(e => this._dompc._win.setTimeout(() => { throw e; })); // coding error

This catch doesn't seem like it's doing anything valuable.  Why are you causing this to be thrown outside of the promise chain?

Also, I think that you need to move this code to foundIceCandidate()
(In reply to Martin Thomson [:mt] from comment #76)
> > +    this._afterSetLocalDescription =
> > +      this._afterSetLocalDescription.then(() => p).catch(() => {});
> 
> You probably don't need to chain these, just set
> this._afterSetLocal = p.catch(_ => {});

It seemed safer to chain but since all setLD calls go on the same queue, I think you're right.

> > +    // Ensure event fires after setLocalDescription promise has resolved.
> > +    this._dompc._afterSetLocalDescription.then(() => this.dispatchEvent(
> > +      new this._dompc._win.RTCPeerConnectionIceEvent ("icecandidate",
> > +                                                      { candidate: icecand })))
> > +    .catch(e => this._dompc._win.setTimeout(() => { throw e; })); // coding error
> 
> This catch doesn't seem like it's doing anything valuable.  Why are you
> causing this to be thrown outside of the promise chain?

It is valuable. Without this catch, any coding error in the .then() above - e.g. RTCPoorConnectionIceEvent - would be swallowed. With it, such coding errors gets reported correctly to the Browser console (not Web console), even with the right line number.

The setTimeout() is a pattern I found on stack-overflow (can't find the link). It's a way to throw outside of the promise chain.

> Also, I think that you need to move this code to foundIceCandidate()

I refactored out foundIceCandidate() since it was only used to distinguish the "" candidate.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #77)
> I refactored out foundIceCandidate() since it was only used to distinguish
> the "" candidate.

But if there are no candidates, then this will fire early.
Comment on attachment 8595932 [details] [diff] [review]
delay firing icecandidate until after setLocalDesciption promise resolves (2)

Review of attachment 8595932 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/PeerConnection.js
@@ +1099,5 @@
> +    var icecand = (candidate == "") ? null :
> +      new this._dompc._win.mozRTCIceCandidate({ candidate: candidate,
> +                                                sdpMid: mid,
> +                                                sdpMLineIndex: level });
> +    // Ensure event fires after setLocalDescription promise has resolved.

The new code has the same code-path for candidate == "". I refactored out the sub-function entirely, so all the code is here now.
so it should not fire early on null candidate.
Comment on attachment 8595932 [details] [diff] [review]
delay firing icecandidate until after setLocalDesciption promise resolves (2)

Review of attachment 8595932 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/PeerConnection.js
@@ +1099,5 @@
> +    var icecand = (candidate == "") ? null :
> +      new this._dompc._win.mozRTCIceCandidate({ candidate: candidate,
> +                                                sdpMid: mid,
> +                                                sdpMLineIndex: level });
> +    // Ensure event fires after setLocalDescription promise has resolved.

Yes, though the readability is dramatically worse.

@@ +1101,5 @@
> +                                                sdpMid: mid,
> +                                                sdpMLineIndex: level });
> +    // Ensure event fires after setLocalDescription promise has resolved.
> +    this._dompc._afterSetLocalDescription.then(() => this.dispatchEvent(
> +      new this._dompc._win.RTCPeerConnectionIceEvent ("icecandidate",

Extraneous space here.

@@ +1103,5 @@
> +    // Ensure event fires after setLocalDescription promise has resolved.
> +    this._dompc._afterSetLocalDescription.then(() => this.dispatchEvent(
> +      new this._dompc._win.RTCPeerConnectionIceEvent ("icecandidate",
> +                                                      { candidate: icecand })))
> +    .catch(e => this._dompc._win.setTimeout(() => { throw e; })); // coding error

This needs a comment to explain why this is necessary.  Note that we don't do this elsewhere and more or less assume that we have written valid code.
Incorporate feedback, added comment. Ekr reports this fixes his test.
Attachment #8595932 - Attachment is obsolete: true
Attachment #8596085 - Flags: review?(martin.thomson)
Comment on attachment 8596085 [details] [diff] [review]
delay firing icecandidate until after setLocalDesciption promise resolves (3)

Review of attachment 8596085 [details] [diff] [review]:
-----------------------------------------------------------------

I can live with this.

Thinking about this a renegotiation, I'm a little concerned that we aren't doing this properly.  We don't have any way to attribute the onicecandidate events with a particular setLD, so we could end up holding back the candidates in unnecessarily.  But by the same token, we don't really have the information to fix it either.

That's probably not a huge problem, since setLocalDescription runs quickly, but maybe we could just have this fire once.

::: dom/media/PeerConnection.js
@@ +1111,5 @@
> +
> +    this._dompc._afterSetLocal.then(() => this.dispatchEvent(
> +      new this._dompc._win.RTCPeerConnectionIceEvent ("icecandidate",
> +                                                      { candidate: icecand })))
> +    .catch(e => this._dompc._win.setTimeout(() => { throw e; })); // browser console

The indentation here is odd.  Maybe move the trailing ')' from 1114 down to this line.

Also, extra space on line 1113.
Attachment #8596085 - Flags: review?(martin.thomson) → review+
Removed space. How about this instead of the lone paren?
Attachment #8596085 - Attachment is obsolete: true
Attachment #8596106 - Flags: review+
(In reply to Jan-Ivar Bruaroey [:jib] from comment #84)
> Removed space. How about this instead of the lone paren?

Much better, thanks.
See Also: → 1157034
To document the findings:

With all the patches applied ICE candidates seem to get emitted at the right place. But the answer SDP travels with multiple dispatches and two test steps in template.js to the other side (pcLocal). While the ICE candidates directly from the pcRemote onicecandidate callback get put into the pcLocal addIceCandidate() call. This allows the ICE candidates in some cases to arrive at the pcLocal side before the SDP answer.

It all makes us sad, but to comply with the current spec we need to have code in our test framework which prevents the bypassing of the candidates (as reducing the amount of dispatches would only reduce, but not remove the possibility of this happening completely).

Two points should be raised on the mailing lists:
- In case of renegotiation ice candidates can not be linked to the right SDP. Ice candidates should have an indicator to which SDP they belong, like the SDP version or the ICE ufrag.
- To avoid having web developers to worry about passing different pieces of signaling properly serialized into the API it might be good to start a discussion of having addIceCandidate() return a promise which resolves once sRD got called.
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: catch errors from addIceCandidate() and updated test case

Pull down this commit:

hg pull -r d34f8852052ddeba27d65c4b1194b48cd3847c9d https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review?(docfaraday)
I think with catching the test failures we have fixed the initial report here.

For the topic of removing the ice candidate queuing: right now this does not seem easily achievable. If we would implement a generic signaling interface in our tests between PeerConnectionWrappers like suggested in bug 1020436, then we can revisit that topic.

At this point I think it makes sense to land the patches we have accumulated in here and close this ticket.
See Also: → 1020436
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: catch errors from addIceCandidate() and updated test case

Pull down this commit:

hg pull -r d34f8852052ddeba27d65c4b1194b48cd3847c9d https://reviewboard-hg.mozilla.org/gecko/
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: catch errors from addIceCandidate() and updated test case

Pull down this commit:

hg pull -r b3250b127d40ffa0c0ba5ef6e1af4a78ca619789 https://reviewboard-hg.mozilla.org/gecko/
Here is a try run of jib's and my combined patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc7dc928b85a
Nils, just so we're on the same page, your patch here doesn't actually remove storeOrAddIceCandidate() so the ok(false)-reporting added to AddIceCandidate failures all run after holdIceCandidates is resolved later, and therefore don't prove addIceCandidate isn't being called early on the peer due to potential racing between setLocalDescription and onicecandidate, which would be nice to test and confirm.

The real test of my patch would seem to be a try run where such caching is removed. When do we anticipate doing happening?
Flags: needinfo?(drno)
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

https://reviewboard.mozilla.org/r/7011/#review6271

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:37
(Diff revision 6)
> +        return test.pcRemote._pc.addIceCandidate(bogus)

and here

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:49
(Diff revision 6)
> +        return test.pcRemote._pc.addIceCandidate(bogus)

and here

::: dom/media/tests/mochitest/pc.js:1184
(Diff revision 6)
>    addIceCandidate : function(candidate) {
>      info(this + ": adding ICE candidate " + JSON.stringify(candidate));
>      return this._pc.addIceCandidate(candidate).then(() => {
>        info(this + ": Successfully added an ICE candidate");
> -    });
> +    })
> +    .catch( e =>
> +      // re-throwing results in uncaught JS errors, so just capture it
> +      ok(false, "Adding ICE candidate failed with: " + e.message)
> +    );

With this new catch, this addIceCandidate-emulating wrapper never fails, which seems undesirable. The asymmetry of info() on success vs ok(false) on failure, also suggests this error-check should be higher up. 

I also question why we wrap addIceCandidate at all, or why we're not consistently using the wrapper (other places in this same patch bypass the wrapper and call \_pc.addIceCandidate directly).

The only caller of this wrapper seems to be storeOrAddIceCandidate, which I hope is going away soon.

::: dom/media/tests/mochitest/pc.js:1190
(Diff revision 6)
> +      // re-throwing results in uncaught JS errors, so just capture it

I suspect this stems from storeOrAddIceCandidate ignoring the promise returned from this wrapper, which it shouldn't do.

::: dom/media/tests/mochitest/pc.js:1173
(Diff revision 6)
>      this.holdIceCandidates.then(() => {
>        this.addIceCandidate(candidate);
>      });

Should be returning promise from this.addIceCandidate here to avoid uncaught JS errors. Removing the \{ \} should be sufficient.

Hopefully storeOrAddIceCandidate() itself is going away soon?

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:10
(Diff revision 6)
> +    bug: "1087551",
> +    title: "addCandidate behavior in different states"

Looks like ReviewBoard doesn't pick up on hg rename. Beat by splinter???

https://hg.mozilla.org/try/rev/b3250b127d40#l3.3

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:25
(Diff revision 6)
> +        return test.pcLocal._pc.addIceCandidate(candidate).then(

Why are we bypassing the wrapper here?

::: dom/media/tests/mochitest/test_peerConnection_addIceCandidate.html:68
(Diff revision 6)
> +      function PC_REMOTE_ADD_VALID_CANDIDATE(test) {
> +        var broken = new mozRTCIceCandidate(
> +          {candidate:"candidate:1 1 UDP 2130706431 192.168.2.1 50005 typ host",
> +           sdpMLineIndex: 0});
> +        return test.pcRemote._pc.addIceCandidate(broken)
> +        .then(ok(true, "Successfully added valid ICE candidate"));
> +      }

I'm confused; valid or broken?
Attachment #8591885 - Flags: review?(jib)
(In reply to Jan-Ivar Bruaroey [:jib] from comment #92)
> When do we anticipate doing happening?

Uh, the central word missing being 'that', then pick a phrase. ;-)
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

https://reviewboard.mozilla.org/r/7011/#review6305

There's still a couple of issues you haven't gotten to yet; I closed and dropped my issues that you got already or that have been invalidated.
Attachment #8591885 - Flags: review?(docfaraday)
Comment on attachment 8596106 [details] [diff] [review]
delay firing icecandidate until after setLocalDesciption promise resolves (4) r=mt

Thinking of moving this patch to Bug 1157042 and land from there, as it seems unrelated in the end.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #92)
> Nils, just so we're on the same page, your patch here doesn't actually
> remove storeOrAddIceCandidate() so the ok(false)-reporting added to
> AddIceCandidate failures all run after holdIceCandidates is resolved later,
> and therefore don't prove addIceCandidate isn't being called early on the
> peer due to potential racing between setLocalDescription and onicecandidate,
> which would be nice to test and confirm.
> 
> The real test of my patch would seem to be a try run where such caching is
> removed. When do we anticipate doing happening?

Sorry this became not clear in our last week standup: I ran the combination of your patch plus my old patch (which removed the candidate queuing & catching the candidate adding failure) on my local machine and it kept running into the new error, where the ICE candidates from pcRemote get added to pcLocal, before the SDP answer gets added.
From my previous test runs I know that this would only result in lots of oranges on treeherder (as the build machines are usually even slower then my machine).

Then I though and discussed the problem more with mt after our meeting and we came to the conclusion that the source for this specific issue is the combination of our test framework plus how the API has to be handled according to the current spec.
In other words: right now removing the candidate queuing from our test framework would only result in test failures.

We should land the catching of the candidate adding errors as right now we are simply flying blind on that topic (test framework simply ignores these errors).

I think it is a good idea to land your patch as part of bug 1157042.
Flags: needinfo?(drno)
See Also: → 1158959
https://reviewboard.mozilla.org/r/7011/#review6511

> With this new catch, this addIceCandidate-emulating wrapper never fails, which seems undesirable. The asymmetry of info() on success vs ok(false) on failure, also suggests this error-check should be higher up. 
> 
> I also question why we wrap addIceCandidate at all, or why we're not consistently using the wrapper (other places in this same patch bypass the wrapper and call \_pc.addIceCandidate directly).
> 
> The only caller of this wrapper seems to be storeOrAddIceCandidate, which I hope is going away soon.

The problem here is that storeOrAddIceCandidate gets called from within the onincecandidate callback from the other PeerConnection. That results in any error thrown in here not being caught by the Promise race in templates.js. See bug 1158959.
Regarding the asymmetry: fixed it by converting the above info(...) into ok(true,...) :-)

We can discuss if because of these special circumstances it makes more sense to roll this functionality directly into the storeOrAddIceCandidate() function, as this seems to be now more of a special case then a generic wrapper function. But if you look at bug 1158959 I doubt that storeOrAddIceCandidate is really going to go away soon :-(

> Why are we bypassing the wrapper here?

Because we are expecting an error here and the wrapper function currently needs to catch errors by itself (see above).
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: catch errors from addIceCandidate() and updated test case

Pull down this commit:

hg pull -r 6f1da865fd5d566f3b4a855a2adc14b395e5b8c2 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review?(jib)
Attachment #8591885 - Flags: review?(docfaraday)
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: catch errors from addIceCandidate() and updated test case

Pull down this commit:

hg pull -r 6f1da865fd5d566f3b4a855a2adc14b395e5b8c2 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review?(docfaraday)
As I hope jib is not going to have major concerns with the latest version: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef461e7a7428
Attachment #8591885 - Flags: review?(jib) → review+
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

https://reviewboard.mozilla.org/r/7011/#review6525

::: dom/media/tests/mochitest/pc.js:1211
(Diff revisions 6 - 7)
> -    .catch( e =>
> +      .catch( e =>
> -      // re-throwing results in uncaught JS errors, so just capture it
> -      ok(false, "Adding ICE candidate failed with: " + e.message)
> +        // The onicecandidate callback runs independent of the test steps
> +        // and therefore errors thrown from in there don't get caught by the
> +        // race of the Promises around our test steps.
> +        // Note: as long as we are queuing ICE candidates until the success
> +        //       of sRD() this should never ever happen.
> +        ok(false, this + " adding ICE candidate failed with: " + e.message)
> -    );
> +      );

This is better, but instead of a comment talking about where this is called from and the special circumstances inflicted upon us here, why not return the promise and catch it there? Then the need for a comment would seem to go away.

I know storeOrAddIceCandidate is under no obligation to work like addIceCandidate and that it is called from two places, so I'll leave it up to you.

::: dom/media/tests/mochitest/pc.js:1208
(Diff revisions 6 - 7)
> -    return this._pc.addIceCandidate(candidate).then(() => {
> +      return this._pc.addIceCandidate(candidate).then(() => {
> -      info(this + ": Successfully added an ICE candidate");
> +        ok(true, this + " successfully added an ICE candidate");
> -    })
> +      })
> -    .catch( e =>
> +      .catch( e =>

Nit: This chain can be flattened:

  return this._pc.addIceCandidate(candidate);
})
.then(() =>  ok(true, this + " successfully added an ICE candidate");
.catch( e =>
  return this._pc.addIceCandidate(candidate);
})
.then(() =>  ok(true, this + " successfully added an ICE candidate"))
.catch( e =>
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: catch errors from addIceCandidate() and updated test case

Pull down this commit:

hg pull -r 8188da10aba3e25bab370bf4f1555f7a416f8649 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review+ → review?(jib)
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

Implemented the suggested fix by jib. Carrying forward r+=jib
Attachment #8591885 - Flags: review?(jib) → review+
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

/r/7013 - Bug 1087551: catch errors from addIceCandidate() and updated test case

Pull down this commit:

hg pull -r 6df10706661983885619e6e9e919a96a6348bddf https://reviewboard-hg.mozilla.org/gecko/
Attachment #8591885 - Flags: review+ → review?(jib)
Comment on attachment 8591885 [details]
MozReview Request: bz://1087551/drno

One should update his mercurial queues before pushing to review...

Still carrying forward r+=jib
Attachment #8591885 - Flags: review?(jib) → review+
Jib I was under the impression that you did not wanted to land your patch as part of this bug, correct?
Status: NEW → ASSIGNED
Flags: needinfo?(jib)
Backed out because of build bustage:

Bustage:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=899c470e0275

Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab4feae9bff8

Build error:
00:52:50     INFO -  mozbuild.frontend.reader.SandboxValidationError:
00:52:50     INFO -  ==============================
00:52:50     INFO -  ERROR PROCESSING MOZBUILD FILE
00:52:50     INFO -  ==============================
00:52:50     INFO -  The error occurred while processing the following file or one of the files it includes:
00:52:50     INFO -      /builds/slave/m-in-lx-0000000000000000000000/build/src/dom/media/moz.build
00:52:50     INFO -  The error occurred when validating the result of the execution. The reported error is:
00:52:50     INFO -      Error processing test manifest file /builds/slave/m-in-lx-0000000000000000000000/build/src/dom/media/tests/mochitest/mochitest.ini: Traceback (most recent call last):
00:52:50     INFO -        File "/builds/slave/m-in-lx-0000000000000000000000/build/src/python/mozbuild/mozbuild/frontend/emitter.py", line 986, in _process_test_manifest
00:52:50     INFO -          path, ', '.join(missing)), context)
00:52:50     INFO -      SandboxValidationError:
00:52:50     INFO -      ==============================
00:52:50     INFO -      ERROR PROCESSING MOZBUILD FILE
00:52:50     INFO -      ==============================
00:52:50     INFO -      The error occurred while processing the following file or one of the files it includes:
00:52:50     INFO -          /builds/slave/m-in-lx-0000000000000000000000/build/src/dom/media/moz.build
00:52:50     INFO -      The error occurred when validating the result of the execution. The reported error is:
00:52:50     INFO -          Test manifest (/builds/slave/m-in-lx-0000000000000000000000/build/src/dom/media/tests/mochitest/mochitest.ini) lists test that does not exist: test_peerConnection_addIceCandidate.html
Comment on attachment 8596106 [details] [diff] [review]
delay firing icecandidate until after setLocalDesciption promise resolves (4) r=mt

Moved to Bug 1157042.
Attachment #8596106 - Attachment is obsolete: true
Flags: needinfo?(jib)
https://hg.mozilla.org/mozilla-central/rev/fbe9a27dbb55
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Attachment #8591885 - Attachment is obsolete: true
Attachment #8618435 - Flags: review+
Attachment #8618436 - Flags: review+
You need to log in before you can comment on or make changes to this bug.