Closed Bug 1149298 Opened 5 years ago Closed 5 years ago

null candidate never fires on pc.onicecandidate (REGRESSION)

Categories

(Core :: WebRTC: Signaling, defect, major)

39 Branch
x86
macOS
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- unaffected
firefox39 + fixed
firefox40 --- fixed

People

(Reporter: jib, Assigned: bwc)

References

()

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

STR:
- Hit [Start!] button in URL.

Expected result:
> cand: {"candidate":"candidate:0 1 UDP 2130379007 192.168.1.2 54307 typ host","sdpMid":"","sdpMLineIndex":0}
> cand: {"candidate":"candidate:0 2 UDP 2130379006 192.168.1.2 60858 typ host","sdpMid":"","sdpMLineIndex":0}
> cand: {"candidate":"candidate:1 1 UDP 1694236671 24.47.153.9 54307 typ srflx raddr 192.168.1.2 rport 54307","sdpMid":"","sdpMLineIndex":0}
> cand: {"candidate":"candidate:1 2 UDP 1694236670 24.47.153.9 60858 typ srflx raddr 192.168.1.2 rport 60858","sdpMid":"","sdpMLineIndex":0}
> here!
> cand: null

Actual result:
> cand: {"candidate":"candidate:0 1 UDP 2130379007 192.168.1.2 54307 typ host","sdpMid":"","sdpMLineIndex":0}
> cand: {"candidate":"candidate:0 2 UDP 2130379006 192.168.1.2 60858 typ host","sdpMid":"","sdpMLineIndex":0}
> cand: {"candidate":"candidate:1 1 UDP 1694236671 24.47.153.9 54307 typ srflx raddr 192.168.1.2 rport 54307","sdpMid":"","sdpMLineIndex":0}
> cand: {"candidate":"candidate:1 2 UDP 1694236670 24.47.153.9 60858 typ srflx raddr 192.168.1.2 rport 60858","sdpMid":"","sdpMLineIndex":0}

This hangs any script relying on the null candidate to learn when ice gathering has finished, such as this carrier-pigeon remote call test script that used to work: http://jsfiddle.net/7vv2vxtt

mozRegression:

Last good revision: e7db788bc372
First bad revision: aca75f4ef930
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e7db788bc372&tochange=aca75f4ef930
The timecards definitely indicate that gathering never finishes. Not sure why this would be. Looking into it.
I think I see what is going on here. When we disable a peer stream, we aren't decrementing |uninitialized_candidates| on the ice_ctx for any candidates in that stream. Looking into a fix now.
Assignee: nobody → docfaraday
Attached file MozReview Request: bz://1149298/bwc (obsolete) —
/r/6329 - Bug 1149298: When destroying a candidate, ensure that the ice_ctx doesn't continue waiting for it to init.

Pull down this commit:

hg pull review -r 553ffb29f8a372d980d6e928fc4c1b811a1caeca
The reason this was not detected was because we never verify that end-of-candidates happens in the mochitests. It is easy enough to add this verification, but a question pops up; if we perform a renegotiation that results in no new ICE streams, should we fire an end-of-candidates just to make it clear to content not to expect any? It seems like a sensible thing to do.
Comment on attachment 8585758 [details]
MozReview Request: bz://1149298/bwc

/r/6331 - Bug 1149298 - Part 1: Test cases.
/r/6329 - Bug 1149298 - Part 2: When destroying a candidate, ensure that the ice_ctx doesn't continue waiting for it to init.
/r/6333 - Bug 1149298 - Part 3: Fire end of candidates signal when StartGathering has nothing to do, and only call StartGathering once per offer/answer.

Pull down these commits:

hg pull review -r 5f1274f5752e0ebca36319518cdff9d4366363f3
https://reviewboard.mozilla.org/r/6327/#review5293

::: media/mtransport/nricectx.cpp
(Diff revision 2)
> +    oldStream->Close();

This Close() call can cause gathering to finish, if this stream is the only one with uninitted candidates, which causes code elsewhere to walk the streams getting default candidates and such.
Comment on attachment 8585758 [details]
MozReview Request: bz://1149298/bwc

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d795a5c7437
Attachment #8585758 - Flags: review?(drno)
(In reply to Byron Campen [:bwc] from comment #4)
> if we perform a renegotiation that results in no new ICE streams, should we
> fire an end-of-candidates just to make it clear to content not to expect any?
> It seems like a sensible thing to do.

Makes sense to me.
This is a question for the WG.
(In reply to Byron Campen [:bwc] from comment #4)
> The reason this was not detected was because we never verify that
> end-of-candidates happens in the mochitests.

IIRC we actually had a check for that in the initial test extension for trickle ICE, but the problem was that on slow systems your test might end before you get the end-of-candidates.
Maybe we can work around this by not checking it on every single mochitest, but only have a specific test which waits for EOC.

> It is easy enough to add this
> verification, but a question pops up; if we perform a renegotiation that
> results in no new ICE streams, should we fire an end-of-candidates just to
> make it clear to content not to expect any? It seems like a sensible thing
> to do.

I think Ekr is right, that this should be raised on the mailing list.
As a developer I want to be able to re-use my signaling code as much as possible for renegotiation. So for that I think we need to send EOC.
BTW the topic is related to if the ice connection state should change during renegotiation if bundle is used and therefore nothing changes on the ICE transport layer.
https://reviewboard.mozilla.org/r/6331/#review5303

::: dom/media/tests/mochitest/templates.js
(Diff revision 1)
> +  function PC_LOCAL_WAIT_FOR_END_OF_TRICKLE(test) {
> +    return test.pcLocal.endOfTrickleIce;
> +  },
> +  function PC_REMOTE_WAIT_FOR_END_OF_TRICKLE(test) {
> +    return test.pcRemote.endOfTrickleIce;
>    }

As I mentioned in bz comment already I think this might be prone to cause intermittent errors on slow platforms. (Although I think this is waiting for EOC later then we did previously.)

I think it might be better to have a new mochitest which requires EOC to come in before finishing the test and leave this check out for the all other tests.
I agree. I think having one test for this would be sufficient. In my experience, waiting for all candidates can take up to 15 seconds on windows (and always blindingly fast on OSX for some reason).
(In reply to Nils Ohlmeier [:drno] from comment #11)
> https://reviewboard.mozilla.org/r/6331/#review5303
> 
> ::: dom/media/tests/mochitest/templates.js
> (Diff revision 1)
> > +  function PC_LOCAL_WAIT_FOR_END_OF_TRICKLE(test) {
> > +    return test.pcLocal.endOfTrickleIce;
> > +  },
> > +  function PC_REMOTE_WAIT_FOR_END_OF_TRICKLE(test) {
> > +    return test.pcRemote.endOfTrickleIce;
> >    }
> 
> As I mentioned in bz comment already I think this might be prone to cause
> intermittent errors on slow platforms. (Although I think this is waiting for
> EOC later then we did previously.)
> 
> I think it might be better to have a new mochitest which requires EOC to
> come in before finishing the test and leave this check out for the all other
> tests.

   Since this was rewritten to use promises, we won't end up in a situation where we miss the end of candidates signal. If this test is intermittent, it is probably a bug (we aren't using stun servers in these tests anymore, so we're just talking about host candidates). Also, at a minimum we'd need mochitests for a few different renegotiation variants, bundled and non-bundled, one m-section and more than one. (detecting this bug requires more than one m-line, the removeThenAdd renegotiation variants are what detected the fact that we weren't firing end-of-candidates on renegotiation that didn't require additional ICE, and the single m-line variants helped me fix a bug where we were sending the end-of-candidates signal twice in one offer/answer exchange)
Comment on attachment 8585758 [details]
MozReview Request: bz://1149298/bwc

https://reviewboard.mozilla.org/r/6327/#review5331

LGTM
Attachment #8585758 - Flags: review?(drno) → review+
Try needed some retriggers. Check back.
Flags: needinfo?(docfaraday)
https://hg.mozilla.org/mozilla-central/rev/dcf3bcce815d
https://hg.mozilla.org/mozilla-central/rev/695a7c1e513d
https://hg.mozilla.org/mozilla-central/rev/327aae812387
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
[Tracking Requested - why for this release]: I believe Bug 1146462 introduced this regression, and that was landed in Firefox 39.  So we want to uplift this fix to Firefox 39 (Aurora).  When Byron gets back, I'll ask him to write the uplift request for this (and I'll confirm with him that only Fx 39 is affected).
Tracking for 39. 

Also, that part about the ice gathering and the carrier pigeons wins the Bugzilla poetic quote of the day award.
Comment on attachment 8585758 [details]
MozReview Request: bz://1149298/bwc

I'm pretty sure this patchset will apply cleanly to 39, but if not I can backport.

Approval Request Comment
[Feature/regressing bug #]:

   Bug 1146462

[User impact if declined]:

   Some webrtc services may break or behave unexpectedly, although most should be fine.

[Describe test coverage new/current, TreeHerder]:

   New checks have been added to the mochitest suite, and additional ICE unit-tests have been added.

[Risks and why]: 

   Fairly low, but because this patch alters timing it could expose bugs elsewhere.

[String/UUID change made/needed]:

   None.
Attachment #8585758 - Flags: approval-mozilla-aurora?
Comment on attachment 8585758 [details]
MozReview Request: bz://1149298/bwc

This fix has been on m-c for almost 3 weeks. Let's get this regression fix onto Aurora. Aurora+
Attachment #8585758 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8585758 - Attachment is obsolete: true
Attachment #8619914 - Flags: review+
Attachment #8619915 - Flags: review+
Attachment #8619916 - Flags: review+
You need to log in before you can comment on or make changes to this bug.