Closed Bug 1006809 Opened 12 years ago Closed 10 years ago

Bring triggered check behavior up-to-date with final ICE RFC

Categories

(Core :: WebRTC: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: bwc, Assigned: drno)

References

Details

Attachments

(1 file, 2 obsolete files)

Currently, we have implemented triggered check behavior from draft v15 of the ICE spec, with the additional behavior that we send error responses when the max retransmit count is hit when attempting to send a triggered check. This has a tendency to exacerbate problems caused by latency, since each triggered check eats away at our max retransmit count.
Some initial work; I am of the opinion that the language in the final spec is very hand-wavy when it comes to handing failure responses, but the approach of starting a new pair when triggered is much better defined, will have much more predictable results, and is mostly equivalent in terms of behavior. Might require some tweaks to how we test the stats API.
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Some fixes to the logic around nomination. Still has bugs; either creating new pairs or bringing existing ones back to WAITING won't be scheduled if all the pairs on the stream were already in progress, since the check timer isn't restarted in that case. Bug 1004530 has a fix for similar problems, so it probably needs to land first. Also, this is going to rot ICE-TCP pretty badly, so I want to land that first too.
Attachment #8418509 - Attachment is obsolete: true
Priority: -- → P3
Target Milestone: --- → mozilla33
Target Milestone: mozilla33 → mozilla35
backlog: --- → webRTC+
Rank: 35
See Also: → 1208254
See Also: → 1208278
See Also: → 1215002
Assignee: docfaraday → drno
Bug 1006809: update triggered check behavior to RFC 5245.
Attachment #8679103 - Flags: review?(mfroman)
Attachment #8679103 - Flags: review?(docfaraday)
Attachment #8419049 - Attachment is obsolete: true
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review20811 This is all I have time to look at today, will continue tomorrow. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:222 (Diff revision 1) > + /* Fall through */ A STUN ctx that has a peer_addr set, but is in either WAITING or CANCELLED seems really weird to me. This fallthrough needs a good comment, I think. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:410 (Diff revision 1) > + /* This inserts after the existing pair; the triggered check will hit > + * it next, possibly updating things like nomination status. */ > + //nr_ice_candidate_pair_insert(&pair->remote->stream->check_list,copy); > + /* Insert into the trigger check queue so it gets processed next */ > + //nr_ice_candidate_pair_append(&pair->remote->stream->trigger_list,copy); What needs to be done here? ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:416 (Diff revision 1) > + TAILQ_INSERT_TAIL(&pair->remote->stream->trigger_list,copy,entry); I think we're supposed to check for duplicates here (see 7.2.1.4.), unless |trigger_list| is not intended to be the "triggered check queue" in the spec. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:446 (Diff revision 1) > - /* Fall through */ > + //nr_ice_candidate_pair_append(&pair->remote->stream->trigger_list,pair); Make sure to remove this comment. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:449 (Diff revision 1) > - if(r=nr_ice_candidate_pair_start(pctx,pair)) > - ABORT(r); > + TAILQ_REMOVE(&pair->remote->stream->check_list,pair,entry); > + TAILQ_INSERT_TAIL(&pair->remote->stream->trigger_list,pair,entry); Should we be starting checks here? Also, I don't see anything in the spec about removing the pair from the normal check queue. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:581 (Diff revision 1) > +int nr_ice_candidate_pair_append(nr_ice_cand_pair_head *head,nr_ice_cand_pair *pair) > + { > + TAILQ_INSERT_TAIL(head,pair,entry); > + > + return(0); > + } > + Everywhere that uses this has been commented out. ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:709 (Diff revision 1) > + if(pair->remote->component->component_id!=comp->component_id) > + return(0); I realize this is just copied code, but as far as I can tell, we should never hit this... ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:819 (Diff revision 1) > + r_log(LOG_ICE,LOG_DEBUG,"ICE-PEER(%s)/CAND_PAIR(%s): Ignoring matching but canceled pair",comp->stream->pctx->label,pair->codeword); > + pair=TAILQ_NEXT(pair,entry); > + continue; If there is only a cancelled pair, won't this result in the creation of a new peer reflexive below? Should we instead move this logic into nr_ice_component_handle_check_matching_pair, and increment |found_valid| even if we don't do anything with it? ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:862 (Diff revision 1) > - /* Try to find matching peer active tcp candidate */ > + /* Now make a peer reflexive (remote) candidate */ > - pcand=TAILQ_FIRST(&comp->candidates); > - while(pcand){ > - if(pcand->tcp_type == TCP_TYPE_ACTIVE) { > - if(!nr_transport_addr_cmp(&pcand->addr,&req->src_addr,NR_TRANSPORT_ADDR_CMP_MODE_ALL)) > - break; > - } > - > - pcand=TAILQ_NEXT(pcand,entry_comp); > - } > - > - if (!pcand){ > - /* We now need to make a peer reflexive */ Did this code get eaten by bitrot? ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c (Diff revision 1) > - new_pcand_created=1; > - if(!nr_stun_message_has_attribute(sreq,NR_STUN_ATTR_PRIORITY,&attr)){ > - r_log(LOG_ICE,LOG_WARNING,"ICE-PEER(%s): Rejecting stun request without priority",comp->stream->pctx->label); > - *error=487; > - ABORT(R_BAD_DATA); > - } > - pcand->priority=attr->u.priority; > + pcand->priority=attr->u.priority; > - } > - Same question here.
Attachment #8679103 - Flags: review?(docfaraday)
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf I found enough problems when still testing this, that I think more review would not help at this point.
Attachment #8679103 - Flags: review?(mfroman)
https://reviewboard.mozilla.org/r/23373/#review20811 > A STUN ctx that has a peer_addr set, but is in either WAITING or CANCELLED seems really weird to me. This fallthrough needs a good comment, I think. The problem here is that the spec says we should cancel, but then still wait for potential responses to come in for that cancelled request. And our ice_unittest show that this makes perfect sense, as often in non-NAT scenarios the response to the cancelled request comes in before the response for the newly trigger check created transaction. Problem is that nICEr seems to written with the assumption that the state canceled means you not only stop sending, but you also drop all incoming messages for that stun ctx. But that results in missing the valid responses for the canceled transactions. So we need to distinguish between A) we locally cancelled because e.g. things timed out and we really don't care any longer about any messages related to this and B) the trigger check cancel where receiving a response might save some round trips. That's why I introduced this new WAITING state to handle scenario B). I'm not sure if it is the best solution. Another option could be to have just the one CANCELLED state and distinguish A) and B) only by the fact if the |timer_handle| in the stun client ctx is still set (=B) or not (=A). But I'm open to other solution/suggestions here. > I think we're supposed to check for duplicates here (see 7.2.1.4.), unless |trigger_list| is not intended to be the "triggered check queue" in the spec. The |triggered| flag set in nr_ice_candidate_pair_do_triggered_check() should prevent that we get duplicates in here. Assumption is though that we don't have duplicates in the check list. > Should we be starting checks here? Also, I don't see anything in the spec about removing the pair from the normal check queue. For the FAILED and FROZEN scenario my assumption is that there is a pair in IN_PROGRESS right now. And the spec does not say that we should stop that ongoing check in these cases. So by putting them into the trigger check queue they should get handled next, once the current IN_PROGRESS pair finishes. Initially I thought of having two lists and each pair could be in both lists at the same time, what seems to be more what the spec suggests. Then I realized that I can't simply add |trigger_list| and re-use |entry|, because I assume |entry| is the pointer for the |check_list| already. But your comment made me just realize that the current code probably does not work, as all the code which searches for matching pairs for incoming requests and responses doesn't search through the |trigger_list|. So I guess I'll need to add an |entry2| to allow each pair to be in both lists at the same time. > Did this code get eaten by bitrot? No. I deleted it on purpose. This code never made really any sense as for TCP_TYPE_ACTIVE candidate the port number in the remote candidate should be port 9. Hopefully we will never encounter an implementation which really uses port 9 so that the nr_transport_addr_cmp() would return true. The other possibility here is that the far end tells us his ephemeral port number for his TCP_TYPE_ACTIVE candidate (that how our initial ICE TCP used to do it), which violates the spec. In that case this will A) only work if there is no NAT what so ever between the machines (pretty unlikely that ICE TCP is used in that scenario) and B) a normal search for a candidate pair should succeed in that case. On top of that the ICE TCP spec actually says that because of the port number 9 in the active candidate these are expected to create peer reflexive candidates. > Same question here. This code just got moved further up, right behind the if(!found_valid), because it makes more sense to me to sanity check the incoming request, before we start searching through all the candidate lists just to realize then that the request is missing crucial data.
https://reviewboard.mozilla.org/r/23373/#review20811 > I realize this is just copied code, but as far as I can tell, we should never hit this... Turns out that when I turned this into assertion it hits on the first ice unit test which connets, when it iterates through the pre-answer list.
https://reviewboard.mozilla.org/r/23373/#review20811 > The problem here is that the spec says we should cancel, but then still wait for potential responses to come in for that cancelled request. And our ice_unittest show that this makes perfect sense, as often in non-NAT scenarios the response to the cancelled request comes in before the response for the newly trigger check created transaction. Problem is that nICEr seems to written with the assumption that the state canceled means you not only stop sending, but you also drop all incoming messages for that stun ctx. But that results in missing the valid responses for the canceled transactions. > > So we need to distinguish between A) we locally cancelled because e.g. things timed out and we really don't care any longer about any messages related to this and B) the trigger check cancel where receiving a response might save some round trips. > That's why I introduced this new WAITING state to handle scenario B). I'm not sure if it is the best solution. Another option could be to have just the one CANCELLED state and distinguish A) and B) only by the fact if the |timer_handle| in the stun client ctx is still set (=B) or not (=A). But I'm open to other solution/suggestions here. In that case, we probably shouldn't have CANCELLED in the switch, and I don't think we should have WAITING there either, since we will transition to DONE if we receive a success response while in WAITING (see https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/stun/stun_client_ctx.c#737). The other calls to nr_stun_client_fire_finished_cb end up happening in failure cases. > The |triggered| flag set in nr_ice_candidate_pair_do_triggered_check() should prevent that we get duplicates in here. Assumption is though that we don't have duplicates in the check list. Ah, that flag is documented incorrectly. Will comment there. > For the FAILED and FROZEN scenario my assumption is that there is a pair in IN_PROGRESS right now. And the spec does not say that we should stop that ongoing check in these cases. So by putting them into the trigger check queue they should get handled next, once the current IN_PROGRESS pair finishes. > > Initially I thought of having two lists and each pair could be in both lists at the same time, what seems to be more what the spec suggests. Then I realized that I can't simply add |trigger_list| and re-use |entry|, because I assume |entry| is the pointer for the |check_list| already. But your comment made me just realize that the current code probably does not work, as all the code which searches for matching pairs for incoming requests and responses doesn't search through the |trigger_list|. So I guess I'll need to add an |entry2| to allow each pair to be in both lists at the same time. It is pretty easy to end up receiving a check on a component/stream that is totally FROZEN, and I think we're supposed to unfreeze it when that happens, but I guess we don't need the flag to make that happen. On further reading of both the spec and the code, it seems that |start_checks| will ensure that the check timer is running in case the IN_PROGRESS pair is the last incomplete pair, since the check timer will not have been rescheduled. In that case, we probably want to set it in the FAILED case above, since that can take us from 0 incomplete pairs to 1 (although we need to watch out for the assert here: https://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c?case=true&from=nr_ice_media_stream_start_checks#375).
https://reviewboard.mozilla.org/r/23373/#review20907 ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.h:59 (Diff revision 1) > + UCHAR triggered; /* Was this created to send a triggered check? */ This is used to mark pairs that have received a STUN check, and have spawned a pair that will be used to send a triggered check, so they do not do it again.
(In reply to Byron Campen [:bwc] from comment #10) > https://reviewboard.mozilla.org/r/23373/#review20907 > > ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.h:59 > (Diff revision 1) > > + UCHAR triggered; /* Was this created to send a triggered check? */ > > This is used to mark pairs that have received a STUN check, and have spawned > a pair that will be used to send a triggered check, so they do not do it > again. I updated the comment. This flag gets set if the pair spawned a new pair through a trigger check and the new spawned pair gets that flag also set. Correct? Never the less. My point was that the adding of pair to the trigger check queue only happens in the code block inside the |triggered| if condition. And in the same code block we set the |triggered| flag. I'll add a safety check if the pairs trigger check queue is already set. But then the only remaining case IMHO is that we have two identical entries in the check list. I hope that should not happen, right?
(In reply to Nils Ohlmeier [:drno] from comment #11) > (In reply to Byron Campen [:bwc] from comment #10) > > https://reviewboard.mozilla.org/r/23373/#review20907 > > > > ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.h:59 > > (Diff revision 1) > > > + UCHAR triggered; /* Was this created to send a triggered check? */ > > > > This is used to mark pairs that have received a STUN check, and have spawned > > a pair that will be used to send a triggered check, so they do not do it > > again. > > I updated the comment. This flag gets set if the pair spawned a new pair > through a trigger check and the new spawned pair gets that flag also set. > Correct? > > Never the less. My point was that the adding of pair to the trigger check > queue only happens in the code block inside the |triggered| if condition. > And in the same code block we set the |triggered| flag. I'll add a safety > check if the pairs trigger check queue is already set. But then the only > remaining case IMHO is that we have two identical entries in the check list. > I hope that should not happen, right? Right, updating the comment makes this more clear.
Attachment #8679103 - Flags: review?(mfroman)
Attachment #8679103 - Flags: review?(docfaraday)
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Bug 1006809: update triggered check behavior to RFC 5245.
Note: the code has still a lot of TODO's and debug messages in it, which I left to help with the review.
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review21271 The more I think about this, the more I worry about having a separate check queue. What if we prioritized pairs with |triggered| set when starting pairs? ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.h:73 (Diff revision 2) > - TAILQ_ENTRY(nr_ice_cand_pair_) entry; > + TAILQ_ENTRY(nr_ice_cand_pair_) entry; /* the check list */ > + TAILQ_ENTRY(nr_ice_cand_pair_) entry2; /* the trigger check queue */ I'm worried that it will be very easy to mix these two up. Renaming them to check_queue_entry and triggered_check_queue_entry will help some, but I think I'd prefer having each pair in at most one queue. ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:883 (Diff revision 2) > - r_log(LOG_ICE,LOG_INFO,"ICE-PEER(%s)/CAND-PAIR(%s): received STUN check on invalid pair, resurrecting: %s",comp->stream->pctx->label,pair->codeword,pair->as_string); > + /* Finally start the trigger check is needed */ "if needed" ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:987 (Diff revision 2) > + if(nr_transport_addr_is_loopback(&lcand->addr) && > + !nr_transport_addr_is_loopback(&pcand->addr)) > + continue; > + if(!nr_transport_addr_is_loopback(&lcand->addr) && > + nr_transport_addr_is_loopback(&pcand->addr)) > + continue; nr_transport_addr_is_loopback() != nr_transport_addr_is_loopback() would be a little more terse, perhaps. I'll leave it up to you. ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c (Diff revision 2) > - int fire_cb=0; > nr_ice_cand_pair *p2; > > - if(!comp->nominated) > - fire_cb=1; > - Good catch. ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:1108 (Diff revision 2) > - if(comp->nominated->priority > pair->priority) > + if(comp->nominated->priority >= pair->priority) Also a good catch. ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:1122 (Diff revision 2) > + p2=TAILQ_FIRST(&comp->stream->trigger_check_queue); > + while(p2){ > + if((p2 != pair) && > + (p2->remote->component->component_id == comp->component_id)) { > + assert(p2->state == NR_ICE_PAIR_STATE_WAITING || > + p2->state == NR_ICE_PAIR_STATE_CANCELLED); > + r_log(LOG_ICE,LOG_INFO,"ICE-PEER(%s)/STREAM(%s)/COMP(%d)/CAND-PAIR(%s): cancelling FROZEN/WAITING pair %s in trigger check queue because CAND-PAIR(%s) was nominated.",comp->stream->pctx->label,comp->stream->label,comp->component_id,p2->codeword,p2->as_string,pair->codeword); > + > + if(r=nr_ice_candidate_pair_cancel(pair->pctx,p2,0)) > + ABORT(r); > + } > + > + p2=TAILQ_NEXT(p2,entry2); > + } If we made sure a candidate pair was in at most one check list, and gave it only one next pointer, we could probably break this out into its own function, perhaps minus the assert.
Attachment #8679103 - Flags: review?(docfaraday)
Looks like I managed to get ice_unittests passing, but the mochitest are broken all over the place because candidates get nominated but never selected according to the stats. Interesting...
Attachment #8679103 - Flags: review?(mfroman)
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review21537 ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.h:82 (Diff revision 2) > -int nr_ice_candidate_pair_cancel(nr_ice_peer_ctx *pctx,nr_ice_cand_pair *pair); > +int nr_ice_candidate_pair_cancel(nr_ice_peer_ctx *pctx,nr_ice_cand_pair *pair, int wait); The arg 'wait' doesn't jump out at me as a "boolean". I had to go look a the implementation. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:444 (Diff revision 2) > - /* Fall through */ > + /* Move it from the check list into the trigger check queue */ Fix comment - not really move, but add to the trigger check queue since it exists in both? ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:721 (Diff revision 2) > +static int nr_ice_component_handle_check_matching_pair(nr_ice_component *comp, nr_ice_cand_pair *pair, nr_stun_server_request *req, int *error) Would it make more sense for this to have triggered in the name? ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:830 (Diff revision 2) > + r_log(LOG_ICE,LOG_WARNING,"ICE-PEER(%s): Rejecting stun request without priority",comp->stream->pctx->label); > + *error=487; Is 487 the right error code for a missing priority? ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:987 (Diff revision 2) > + if(nr_transport_addr_is_loopback(&lcand->addr) && > + !nr_transport_addr_is_loopback(&pcand->addr)) > + continue; > + if(!nr_transport_addr_is_loopback(&lcand->addr) && > + nr_transport_addr_is_loopback(&pcand->addr)) > + continue; Maybe a comment here why these checks are here for the next person coming along?
https://reviewboard.mozilla.org/r/23373/#review21271 > I'm worried that it will be very easy to mix these two up. Renaming them to check_queue_entry and triggered_check_queue_entry will help some, but I think I'd prefer having each pair in at most one queue. I did go forth and back on this one. And I went with the solution of a second queue, because: A) If a pair can be in two different queues the amount of places where we would need to check both queues is mind boggling (see all the lines I touched for the name change). B) I think it is possible that we receive multiple trigger checks, before we start checks on a new pair from the queue. But if we simply pop the current pair at the front of the queue will have LIFO at the top of the check queue (in case of multiple pending trigger checks) instead of a FIFO. But you really want a FIFO here to answer to the first incoming trigger check first. I guess I could pop the pair to the front of the queue and then walk down the queue until I find a pair which has not |triggered| set yet....
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/2-3/
Attachment #8679103 - Flags: review?(mfroman)
Attachment #8679103 - Flags: review?(docfaraday)
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review23041
Attachment #8679103 - Flags: review?(mfroman) → review+
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review23101
Attachment #8679103 - Flags: review?(docfaraday) → review+
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/3-4/
Attachment #8679103 - Attachment description: MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. → MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r=bwc,mjf
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/4-5/
Attachment #8679103 - Attachment description: MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r=bwc,mjf → MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/5-6/
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/6-7/
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/7-8/
Attachment #8679103 - Attachment description: MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf → MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r?bwc,mjf
I know you both gave r+ already, but tests were still busted in revision 3. So I would appreciate if you could review the interdiff between 3 and 8 (which should pass all tests now) one more time. Interdiff 3-8: https://reviewboard.mozilla.org/r/23371/diff/3-8/
Flags: needinfo?(mfroman)
Flags: needinfo?(docfaraday)
https://reviewboard.mozilla.org/r/23371/#review23509 ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:430 (Diff revisions 3 - 8) > + if(pair->peer_nominated) > + copy->peer_nominated=1; > + if(pair->nominated) > + copy->nominated=1; Why not just assignments here? copy->peer_nominated = pair->peer_nominated; copy->nominated = pair->nominated; Just curious. ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:745 (Diff revisions 3 - 8) > + /* Note: the RFC says to trigger first and the nominate. But in that case s/the nominate/then nominate/ ::: media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c:401 (Diff revisions 3 - 8) > + if (stream->ice_state != NR_ICE_MEDIA_STREAM_CHECKS_COMPLETED) Need brackets?
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review23617 ::: media/mtransport/nricemediastream.cpp:355 (Diff revisions 3 - 8) > + /* Do not expose cancelled pairs which got superseeded by another pair > + * through a triggered check. */ > + if ((p1->state == NR_ICE_PAIR_STATE_CANCELLED) && (p1->triggered)) { > + MOZ_ASSERT(!pair.selected); > + continue; > + } I think I'd prefer to solve this by looking for dupes and removing them (based on what state each one is in). Looking at code elsewhere, it looks possible to have two succeeded duplicate triggered check pairs, and maybe also failed ones too. Also, s/superseeded/superceded/ ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:249 (Diff revisions 3 - 8) > + /* This is the original pair which created a new triggered check > + * pair, but received a successful response before the new triggered > + * check pair (probably because this a non NAT scenario where the > + * two check request from both sides raced on the wire and where the > + * check response will arrive before triggered check response). In > + * this case nomination will not cancel the new triggered check > + * pair, so we need to that here to avoid confusion if the new > + * triggered check should also receive a successful response later > + * (which is very likely at this point because no NAT). Note: this > + * is only to avoid having two identical pairs show as selected > + * which does not work in JS dictionaries. */ > + orig_pair=TAILQ_FIRST(&pair->remote->stream->check_list); > + while(orig_pair) { > + if(orig_pair->triggered && > + (orig_pair->state==NR_ICE_PAIR_STATE_IN_PROGRESS) && > + (strncmp(pair->codeword,orig_pair->codeword,sizeof(pair->codeword))==0)) { > + r_log(LOG_ICE,LOG_INFO,"ICE-PEER(%s)/CAND-PAIR(%s): cancelling triggered clone which is in progress already", pair->pctx->label,pair->codeword); > + if(nr_ice_candidate_pair_cancel(pair->pctx,orig_pair,0)) > + r_log(LOG_ICE,LOG_WARNING,"ICE-PEER(%s)/CAND-PAIR(%s): failed to cancel triggered check clone pair", pair->pctx->label,pair->codeword); > + } > + orig_pair=TAILQ_NEXT(orig_pair,check_queue_entry); > + } Breaking this out into a well-named function could help reduce the need for this long comment. Maybe nr_ice_media_stream_cancel_duplicate_triggered_pair? Also, let's avoid bringing JS up all the way down here; all we need to say is we want to avoid having dupes. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:266 (Diff revisions 3 - 8) > + if(nr_ice_candidate_pair_cancel(pair->pctx,orig_pair,0)) Is it possible for this pair to end up succeeding too? What happens if both succeed? We don't seem to be trying to filter out this kind of duplicate. (Also, what happens if both fail?) ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c:745 (Diff revisions 3 - 8) > + /* Note: the RFC says to trigger first and the nominate. But in that case s/the nominate/then nominate/ ::: media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c:401 (Diff revisions 3 - 8) > + if (stream->ice_state != NR_ICE_MEDIA_STREAM_CHECKS_COMPLETED) We might need to add a comment here explaining that we might have a new triggered check to perform even if the stream is completed.
Attachment #8679103 - Flags: review+
https://reviewboard.mozilla.org/r/23373/#review23617 > I think I'd prefer to solve this by looking for dupes and removing them (based on what state each one is in). Looking at code elsewhere, it looks possible to have two succeeded duplicate triggered check pairs, and maybe also failed ones too. > > Also, s/superseeded/superceded/ That seems to be a good idea. The only problem I have with this is how to accomplish this: - two loops with O(n^2) appear pretty inefficient - checking the result vector is also not great, as it does not contain the |triggered| value, and I might have to do ugly removals from the vector in case we encounter a "better" candidate Building a new intermediary list and translate then that list into the result vector sounds like the most sensible thing to do here. > Is it possible for this pair to end up succeeding too? What happens if both succeed? We don't seem to be trying to filter out this kind of duplicate. (Also, what happens if both fail?) So here is how it goes: 1) Incoming check creates triggered check - which with this patch results in two pairs with the same codeword (and also otherwise identical). The original pair gets set to CANCELLED, with its STUN client set into the new WAIT state where it only waits for a response but does not retransmit any more. The newly cloned pair gets set to WAITING or IN_PROGRESS, depending on the state of the stream. These are the possible next outcomes for these two pairs: 2A) With no NAT or firewall in between most likely the response for the CANCELLED pair will be received before any response for the new STUN transaction from the cloned pair will be received. In that case the CANCELLED pair proceeds to SUCCEEDED. As no NAT or firewall is blocking anything most likely the cloned pair will also receive a success response some time later. Now nICEr itself does not care if the cloned pair also succeeds, it simply marks it as another succeeded pair in the list (the initial winner will stay as the selected pair). This only causes a problem when the JS stats try to create a dictionary/object of the pairs and suddenly randomly one of the two pairs with the same codeword show up in the stats. 2B) With a NAT or firewall in between most likely the CANCELLED pair will never receive a response. The STUN client in the wait state will simply time out and switch the pair to the FAILED state. In this scenario the new cloned pair should receive a response eventually and switch to SUCCEEDED. Again the original pair in the CANCELLED state will cause problems in the JS stats. (Note: it does not make much sense to try to cancel the CANCELLED original pair at this scenario at it is already cancelled, the only difference would be that we could move the STUN client from the wait state to the cancelled state so it ignores any responses.) 2C) Both STUN clients time out and switch their pairs to FAILED. But I don't see how the new triggered check should fail as the remote side managed to get a check request to us already. Now the bad thing about my over time evolved solution is that it solves 2A and 2B in different places. I guess to solve the problems the JS problems your idea of finding duplicates in general seems to be the better solution (see comments there). Which leaves us with the question: do we still want to cancel in the 2A scenario to save resources and avoid confusion (the two pairs with the same codeword in different states can be very confusing), or should just let as many pairs succeed as we can as nICEr doesn't really care?
(In reply to Nils Ohlmeier [:drno] from comment #30) > Now the bad thing about my over time evolved solution is that it solves 2A > and 2B in different places. I guess to solve the problems the JS problems > your idea of finding duplicates in general seems to be the better solution > (see comments there). Which leaves us with the question: do we still want to > cancel in the 2A scenario to save resources and avoid confusion (the two > pairs with the same codeword in different states can be very confusing), or > should just let as many pairs succeed as we can as nICEr doesn't really care? Looks like a solution where I look only one element ahead for duplicates solves the problem sufficiently on try (without cancelling anything, but let all pairs succeed): https://treeherder.mozilla.org/#/jobs?repo=try&revision=af82eb196fa0 It might still be worth to do the cancelling in the 2A scenario, as I don't see us gain anything from continuing to wait for the second pair. Opinions Byron?
(In reply to Nils Ohlmeier [:drno] from comment #30) > https://reviewboard.mozilla.org/r/23373/#review23617 > > > I think I'd prefer to solve this by looking for dupes and removing them (based on what state each one is in). Looking at code elsewhere, it looks possible to have two succeeded duplicate triggered check pairs, and maybe also failed ones too. > > > > Also, s/superseeded/superceded/ > > That seems to be a good idea. > > The only problem I have with this is how to accomplish this: > - two loops with O(n^2) appear pretty inefficient > - checking the result vector is also not great, as it does not contain the > |triggered| value, and I might have to do ugly removals from the vector in > case we encounter a "better" candidate > > Building a new intermediary list and translate then that list into the > result vector sounds like the most sensible thing to do here. I'm not too worried about O(n^2) on this (since we're talking about less than 100 pairs in pretty much any normal scenario). Using an intermediary set or something probably won't get you any performance benefit, although the code may be easier to read. I'll leave it up to your preference. > > Is it possible for this pair to end up succeeding too? What happens if both succeed? We don't seem to be trying to filter out this kind of duplicate. (Also, what happens if both fail?) > > So here is how it goes: > 1) Incoming check creates triggered check - which with this patch results in > two pairs with the same codeword (and also otherwise identical). The > original pair gets set to CANCELLED, with its STUN client set into the new > WAIT state where it only waits for a response but does not retransmit any > more. The newly cloned pair gets set to WAITING or IN_PROGRESS, depending on > the state of the stream. > > These are the possible next outcomes for these two pairs: > 2A) With no NAT or firewall in between most likely the response for the > CANCELLED pair will be received before any response for the new STUN > transaction from the cloned pair will be received. In that case the > CANCELLED pair proceeds to SUCCEEDED. As no NAT or firewall is blocking > anything most likely the cloned pair will also receive a success response > some time later. Now nICEr itself does not care if the cloned pair also > succeeds, it simply marks it as another succeeded pair in the list (the > initial winner will stay as the selected pair). This only causes a problem > when the JS stats try to create a dictionary/object of the pairs and > suddenly randomly one of the two pairs with the same codeword show up in the > stats. > 2B) With a NAT or firewall in between most likely the CANCELLED pair will > never receive a response. The STUN client in the wait state will simply time > out and switch the pair to the FAILED state. In this scenario the new cloned > pair should receive a response eventually and switch to SUCCEEDED. Again the > original pair in the CANCELLED state will cause problems in the JS stats. > (Note: it does not make much sense to try to cancel the CANCELLED original > pair at this scenario at it is already cancelled, the only difference would > be that we could move the STUN client from the wait state to the cancelled > state so it ignores any responses.) > 2C) Both STUN clients time out and switch their pairs to FAILED. But I don't > see how the new triggered check should fail as the remote side managed to > get a check request to us already. > > Now the bad thing about my over time evolved solution is that it solves 2A > and 2B in different places. I guess to solve the problems the JS problems > your idea of finding duplicates in general seems to be the better solution > (see comments there). Which leaves us with the question: do we still want to > cancel in the 2A scenario to save resources and avoid confusion (the two > pairs with the same codeword in different states can be very confusing), or > should just let as many pairs succeed as we can as nICEr doesn't really care? Allowing either to succeed could let ICE finish faster, which is always nice. What does the spec say about this?
Flags: needinfo?(docfaraday)
Flags: needinfo?(mfroman) → needinfo?
Flags: needinfo?
(In reply to Byron Campen [:bwc] from comment #32) > I'm not too worried about O(n^2) on this (since we're talking about less > than 100 pairs in pretty much any normal scenario). Using an intermediary > set or something probably won't get you any performance benefit, although > the code may be easier to read. I'll leave it up to your preference. Complexity: fair enough. What about the solution I have in comment #31? That works under the assumption that any duplicate pairs will be next to each other in the check list, which makes sense if we assume that duplicate pairs should have the same priority. We only need a more complex solution if we want/need to cover problems with duplicate pairs with different priorities. I think the restriction to same priority makes sense. What do you think Byron? > Allowing either to succeed could let ICE finish faster, which is always > nice. What does the spec say about this? Well one of the two pairs has already succeeded. So letting the second succeed later is not going to speed up anything here. It only offers more redundancy. The spec basically says that the original STUN transaction should go into a cancelled/wait state and still be able to receive responses, but no longer cause a failure if it times out. And it says we should start a new transaction for the triggered check. Which is exactly what we would have with this patch now. But it does not say what to do if the canceled STUN transaction actually succeeds, because it receives a success response. I guess either way should be fine. Letting the new triggered check transaction eventually succeed means less code complexity, but more ongoing STUN transactions. Cancelling the new triggered check saves some resources for the price of code complexity.
Flags: needinfo?(docfaraday)
Attachment #8679103 - Flags: review?(docfaraday)
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/8-9/
(In reply to Nils Ohlmeier [:drno] from comment #34) > Comment on attachment 8679103 [details] > MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. > r?bwc,mjf > > Review request updated; see interdiff: > https://reviewboard.mozilla.org/r/23373/diff/8-9/ This contains the dupe detection for a dupe being the next entry in the check list (in case you are good with that - I can remove the code for canceling the in_progress triggered check). But I can also extend this to check the whole check list for potential dupes if you think that makes more sense.
(In reply to Nils Ohlmeier [:drno] from comment #33) > (In reply to Byron Campen [:bwc] from comment #32) > > I'm not too worried about O(n^2) on this (since we're talking about less > > than 100 pairs in pretty much any normal scenario). Using an intermediary > > set or something probably won't get you any performance benefit, although > > the code may be easier to read. I'll leave it up to your preference. > > Complexity: fair enough. > > What about the solution I have in comment #31? > That works under the assumption that any duplicate pairs will be next to > each other in the check list, which makes sense if we assume that duplicate > pairs should have the same priority. We only need a more complex solution if > we want/need to cover problems with duplicate pairs with different > priorities. I think the restriction to same priority makes sense. What do > you think Byron? > In normal cases that makes sense, although we could probably end up with dupes in other ways. The remote end could trickle the same candidate twice with different priorities, for example, and I don't think we would filter it out. > > Allowing either to succeed could let ICE finish faster, which is always > > nice. What does the spec say about this? > > Well one of the two pairs has already succeeded. So letting the second > succeed later is not going to speed up anything here. It only offers more > redundancy. > The spec basically says that the original STUN transaction should go into a > cancelled/wait state and still be able to receive responses, but no longer > cause a failure if it times out. And it says we should start a new > transaction for the triggered check. Which is exactly what we would have > with this patch now. But it does not say what to do if the canceled STUN > transaction actually succeeds, because it receives a success response. > I guess either way should be fine. Letting the new triggered check > transaction eventually succeed means less code complexity, but more ongoing > STUN transactions. Cancelling the new triggered check saves some resources > for the price of code complexity. I don't think anything is succeeded yet when the trigger check happens; it spawns a new STUN transaction that, when successful, leads to a succeeded pair, but it doesn't happen right away like it does with prflx learned from a STUN response. So there is some delay, where each pair races against the other. Cancelling the new pair if the original succeeds doesn't save very much in the way of resources, and if we can prune dupes another way, it probably doesn't matter too much to do this.
Flags: needinfo?(docfaraday)
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review24445 ::: media/mtransport/nricemediastream.cpp:341 (Diff revisions 8 - 9) > + p1 = p2; I feel like this would be easier to read if we did a continue; here. ::: media/mtransport/nricemediastream.cpp:376 (Diff revisions 8 - 9) > /* Do not expose cancelled pairs which got superseeded by another pair > * through a triggered check. */ > + /* > if ((p1->state == NR_ICE_PAIR_STATE_CANCELLED) && (p1->triggered)) { > MOZ_ASSERT(!pair.selected); > continue; > } > + */ This all needs to go. ::: media/mtransport/nricemediastream.cpp:391 (Diff revisions 8 - 9) > + > + if (skip_next_pair && p2) { > + p1 = p2; > + } It would be easier to read if this check were performed up top, eg; if (skip_next_pair) { skip_next_pair = false; continue; } ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:248 (Diff revisions 8 - 9) > + /* > if(pair->state==NR_ICE_PAIR_STATE_CANCELLED && pair->triggered) { > - /* This is the original pair which created a new triggered check > + nr_ice_media_stream_cancel_duplicate_triggered_pair(pair); > - * pair, but received a successful response before the new triggered > - * check pair (probably because this a non NAT scenario where the > - * two check request from both sides raced on the wire and where the > - * check response will arrive before triggered check response). In > - * this case nomination will not cancel the new triggered check > - * pair, so we need to that here to avoid confusion if the new > - * triggered check should also receive a successful response later > - * (which is very likely at this point because no NAT). Note: this > - * is only to avoid having two identical pairs show as selected > - * which does not work in JS dictionaries. */ > - orig_pair=TAILQ_FIRST(&pair->remote->stream->check_list); > - while(orig_pair) { > - if(orig_pair->triggered && > - (orig_pair->state==NR_ICE_PAIR_STATE_IN_PROGRESS) && > - (strncmp(pair->codeword,orig_pair->codeword,sizeof(pair->codeword))==0)) { > - r_log(LOG_ICE,LOG_INFO,"ICE-PEER(%s)/CAND-PAIR(%s): cancelling triggered clone which is in progress already", pair->pctx->label,pair->codeword); > - if(nr_ice_candidate_pair_cancel(pair->pctx,orig_pair,0)) > - r_log(LOG_ICE,LOG_WARNING,"ICE-PEER(%s)/CAND-PAIR(%s): failed to cancel triggered check clone pair", pair->pctx->label,pair->codeword); > - } > - orig_pair=TAILQ_NEXT(orig_pair,check_queue_entry); > - } > } > + */ Just making sure we don't lose track of this commented-out code; I'm ok either way, I think. ::: media/mtransport/third_party/nICEr/src/ice/ice_media_stream.h:102 (Diff revisions 8 - 9) > +int nr_ice_media_stream_cancel_duplicate_triggered_pair(nr_ice_cand_pair *pair); Don't forget to remove this if you decide to not do this. ::: media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c:893 (Diff revisions 8 - 9) > +int nr_ice_media_stream_cancel_duplicate_triggered_pair(nr_ice_cand_pair *pair) If you decide not to do this canceling, make sure to remove. ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:248 (Diff revision 9) > + /* > + if(pair->state==NR_ICE_PAIR_STATE_CANCELLED && pair->triggered) { > + nr_ice_media_stream_cancel_duplicate_triggered_pair(pair); > + } > + */ Im ok either way on this, I think, just make sure to not leave it like this.
Attachment #8679103 - Flags: review?(docfaraday)
Attachment #8679103 - Flags: review?(docfaraday)
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/9-10/
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf https://reviewboard.mozilla.org/r/23373/#review24603 ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate_pair.c:584 (Diff revision 10) > + /* TODO: is there a better way to check this? */ I don't think so, probably makes sense to remove this TODO. ::: media/mtransport/third_party/nICEr/src/ice/ice_media_stream.h:66 (Diff revision 10) > + /* TODO: should this be a STAIL queue instead (it's only a FIFO)? */ I think this should be fine, we probably don't need this TODO.
Attachment #8679103 - Flags: review?(docfaraday) → review+
Comment on attachment 8679103 [details] MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23373/diff/10-11/
Attachment #8679103 - Attachment description: MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r?bwc,mjf → MozReview Request: Bug 1006809: update triggered check behavior to RFC 5245. r+bwc,mjf
Try run looks okay.
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Rank: 35 → 15
Priority: P3 → P1
Target Milestone: mozilla35 → mozilla45
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: