null candidate fired ahead of SLD(answer) on SRD w/rollback or double SRD (regression)
Categories
(Core :: WebRTC: Signaling, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox74 | --- | wontfix |
firefox75 | --- | wontfix |
firefox76 | --- | wontfix |
firefox77 | --- | fixed |
People
(Reporter: jib, Assigned: bwc)
References
(Regression)
Details
(Keywords: regression)
Attachments
(11 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Doing SRD twice on the answerer (which is allowed), causes ICE gathering to jump the gate, firing answer candidates ahead of answer.
This is against spec.
STRs:
- Open https://jsfiddle.net/jib1/gjcms01p/ and share camera
Expected results:
pc2.SRD DONE
pc2.SRD TWICE DONE
stable
pc2.SLD DONE
pc2.onicecandidate in stable
pc2.onicecandidate in stable
Actual results:
pc2.SRD DONE
pc2.SRD TWICE DONE
pc2.onicecandidate in have-remote-offer BOOOH!
pc1.addIce: InvalidStateError: Cannot add ICE candidate when there is no remote SDP
stable
pc2.SLD DONE
pc2.onicecandidate in stable
Regression range:
45:19.27 INFO: Narrowed inbound regression window from [442f57db, f829ca2e] (4 builds) to [fde54097, f829ca2e] (2 builds) (~1 steps left)
45:19.27 INFO: No more inbound revisions, bisection finished.
45:19.27 INFO: Last good revision: fde540977b4e52dd21031777aa35906ff81d6e63
45:19.27 INFO: First bad revision: f829ca2e1ad6d637eb01f5cbe50876f22fe53c37
45:19.27 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=fde540977b4e52dd21031777aa35906ff81d6e63&tochange=f829ca2e1ad6d637eb01f5cbe50876f22fe53c37
Assignee | ||
Comment 1•4 years ago
|
||
Yeah, this is just another race opened up by bug 1591199.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 2•4 years ago
|
||
Seems like a logic problem, not a race. Note the
console.log("pc2.SRD DONE");
await wait(1000);
in the fiddle, which is not needed to reproduce btw, but shows it isn't timing related. I.e. we can wait any number of seconds after the first SRD and NO candidates fire prematurely.
It specifically takes a second SRD to trigger ICE, at which point it's 100% reproducible (no SLD needed). No timing involved. The other end will have no answer to apply these candidates to, likely causing the call to fail.
I think investigating why this happens should be a P2 in case there are other ways to trigger this logic error.
In fact, it appears to be happening on SRD implicit rollback: https://jsfiddle.net/jib1/gjcms01p/28/ which may miss an important candidate, causing calls to fail to connect.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Comment hidden (obsolete) |
Updated•4 years ago
|
Comment 4•4 years ago
|
||
Hi Nico, is there currently someone working on this? Thank you?
Updated•4 years ago
|
Reporter | ||
Comment 5•4 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #3)
This test case widens the window of opportunity for the race, even without any waiting. The problem is that the handling for state changes for the sLD(answer) has been delayed by bug 1591199 enough that the candidates can arrive before those state changes are carried out, provided the ICE stack is given enough head start.
I don't understand this explanation, because this reproduces even without calling sLD(answer) https://jsfiddle.net/jib1/gjcms01p/39/
I also verified this is not fixed by the patches in bug 1622384.
Assignee | ||
Comment 6•4 years ago
|
||
Ok, that's weird. I'll look into it.
Reporter | ||
Comment 7•4 years ago
|
||
Ah, it's the null
candidate fired from handleIceGatheringStateChange - https://jsfiddle.net/jib1/gjcms01p/45/
Reporter | ||
Comment 8•4 years ago
|
||
iceGatheringState goes to "complete"
for some reason on second SRD(offer) - https://jsfiddle.net/jib1/gjcms01p/48/
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 9•4 years ago
|
||
I've verified it's the null candidate that's being fired on single SRD() with implicit rollback as well. https://jsfiddle.net/jib1/gjcms01p/50/
Assignee | ||
Comment 10•4 years ago
|
||
Ok, this is due to the implicit rollback itself; rolling back a sLD shuts down gathering, which means we emit an end-of-candidates. Looking at the spec now to determine what it says should happen on rollback.
Assignee | ||
Comment 11•4 years ago
•
|
||
Yeah, webrtc-pc says that whenever gathering transitions to complete for all ice transports, we fire this null candidate. I wonder if this should apply when we're rolling back to a state where there are no ICE transports at all? The ICE gathering state should be "new" in that case, probably?
Reporter | ||
Comment 12•4 years ago
•
|
||
Ah, unlike comment 8 which said pc2.onicecandidate in have-remote-offer null BOOOH!
,
comment 9 actually says pc2.onicecandidate null in have-local-offer BOOOH!
which is quite normal. I'll remove the BOOOH!
there.
But one problem remains with the comment 9 fiddle updated with onicegatheringstatechange: It looks like we're firing "complete"
without ever having fired "gathering"
, which the spec indicates we should have done "When the ICE Agent indicates that it began gathering" which sans a state diagram clearly has to have happened before "When the ICE Agent is finished gathering". That seems wrong (in Firefox).
Yeah, webrtc-pc says that whenever gathering transitions to complete for all ice transports, we fire this null candidate. I wonder if this should apply when we're rolling back to a state where there are no ICE transports at all? The ICE gathering state should be "new" in that case, probably?
I think it's a stretch to say ice gathering is "complete"
when it got aborted by its transport being rolled back and vaporized.
After all, the spec says: "When the ICE Agent is finished gathering a generation of candidates for an RTCIceTransport, and those candidates have been surfaced to the application". I'd argue it wasn't allowed to finish in this case.
ICE seems supportive of this interpretation: "ICE processing across all media streams also has a state associated with it. This state is equal to Running while ICE processing is under way. The state is Completed when ICE processing is complete and Failed if it failed without success."
But regardless of interpretation, I see no support anywhere for the behavior in comment 8 (without any SLD on pc2) and earlier, because the phrase "and those candidates have been surfaced to the application" above is a reference to [[EarlyCandidates]]
, which means we must not ever fire "complete"
before SLD(answer).
Assignee | ||
Comment 13•4 years ago
|
||
Yeah, I think in this case we should be going back to "new". As for going directly from "new" to "complete", there are corner-cases like failure to start gathering that look under-specified (although that isn't what is happening here).
As for comment 8, I think maybe we're triggering the ICE agent rollback stuff with sRD somehow? Looking into it.
Assignee | ||
Comment 14•4 years ago
|
||
Ok, flaw is here for comment 8:
Somehow, this is true in the double sRD(offer) case.
Assignee | ||
Comment 15•4 years ago
|
||
Assignee | ||
Comment 16•4 years ago
|
||
Depends on D72234
Assignee | ||
Comment 17•4 years ago
|
||
Depends on D72235
Assignee | ||
Comment 18•4 years ago
|
||
Depends on D72236
Assignee | ||
Comment 19•4 years ago
|
||
Depends on D72238
Assignee | ||
Comment 20•4 years ago
|
||
Depends on D72239
Assignee | ||
Comment 21•4 years ago
|
||
Depends on D72240
Assignee | ||
Comment 22•4 years ago
|
||
Depends on D72241
Assignee | ||
Comment 23•4 years ago
|
||
Depends on D72242
Assignee | ||
Comment 24•4 years ago
|
||
Depends on D72243
Assignee | ||
Comment 25•4 years ago
|
||
Depends on D72244
Assignee | ||
Comment 26•4 years ago
•
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b3ce3ef10ca835135961a70cfb435f3184d1fd6 Gtests here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c6d537e955d7ccc7a7f696956e92b30229d78b4
Assignee | ||
Comment 27•4 years ago
|
||
Try looks good.
Assignee | ||
Comment 28•4 years ago
|
||
More thorough try push here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0311cb13bc110167d3b76f33926734a3038d7b8f
Assignee | ||
Comment 29•4 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d23b8d97f8a2198f4ef62921b72ae65fcc452e04
Comment 30•4 years ago
|
||
Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e2f411ffa079 Test-case for bug. r=jib https://hg.mozilla.org/integration/autoland/rev/5e5e7b27d296 Fix bug where repeated sRD(offer) could start gathering. r=mjf https://hg.mozilla.org/integration/autoland/rev/673ef75f8fe6 Test that we do not transition to "gathering" on a simple reoffer. r=jib https://hg.mozilla.org/integration/autoland/rev/bee9068e5a39 Do not transition back to "gathering" if no gathering is needed. r=mjf https://hg.mozilla.org/integration/autoland/rev/41ac898a5d97 Test that we do not transition to "gathering" on rollback if we were already done gathering. r=jib https://hg.mozilla.org/integration/autoland/rev/3a5601c68d76 Fix bug where rollback of local offer would stomp the pre-existing transport with a new one with the same ufrag/pwd. r=mjf https://hg.mozilla.org/integration/autoland/rev/40c9fc4fa292 Test-cases for transitioning to "new" gathering state. r=jib https://hg.mozilla.org/integration/autoland/rev/065ddaaefe60 Transition gathering state to "new" if a negotiation leaves us with no transports. r=mjf https://hg.mozilla.org/integration/autoland/rev/09af49103607 Some logging that was helpful. r=mjf https://hg.mozilla.org/integration/autoland/rev/53ba1301dfdd Mochitest fixes. r=jib https://hg.mozilla.org/integration/autoland/rev/7917d7e0a471 Another test we should probably have. Passes. r=jib
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/23274 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Comment 33•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e2f411ffa079
https://hg.mozilla.org/mozilla-central/rev/5e5e7b27d296
https://hg.mozilla.org/mozilla-central/rev/673ef75f8fe6
https://hg.mozilla.org/mozilla-central/rev/bee9068e5a39
https://hg.mozilla.org/mozilla-central/rev/41ac898a5d97
https://hg.mozilla.org/mozilla-central/rev/3a5601c68d76
https://hg.mozilla.org/mozilla-central/rev/40c9fc4fa292
https://hg.mozilla.org/mozilla-central/rev/065ddaaefe60
https://hg.mozilla.org/mozilla-central/rev/09af49103607
https://hg.mozilla.org/mozilla-central/rev/53ba1301dfdd
https://hg.mozilla.org/mozilla-central/rev/7917d7e0a471
Upstream PR merged by moz-wptsync-bot
Updated•4 years ago
|
Updated•4 years ago
|
Description
•