null candidate fired ahead of SLD(answer) on SRD w/rollback or double SRD (regression)
Categories
(Core :: WebRTC: Signaling, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr68 | --- | unaffected |
| firefox74 | --- | wontfix |
| firefox75 | --- | wontfix |
| firefox76 | --- | wontfix |
| firefox77 | --- | fixed |
People
(Reporter: jib, Assigned: bwc)
References
(Regression)
Details
(Keywords: regression)
Attachments
(11 files)
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review |
Doing SRD twice on the answerer (which is allowed), causes ICE gathering to jump the gate, firing answer candidates ahead of answer.
This is against spec.
STRs:
- Open https://jsfiddle.net/jib1/gjcms01p/ and share camera
Expected results:
pc2.SRD DONE
pc2.SRD TWICE DONE
stable
pc2.SLD DONE
pc2.onicecandidate in stable
pc2.onicecandidate in stable
Actual results:
pc2.SRD DONE
pc2.SRD TWICE DONE
pc2.onicecandidate in have-remote-offer BOOOH!
pc1.addIce: InvalidStateError: Cannot add ICE candidate when there is no remote SDP
stable
pc2.SLD DONE
pc2.onicecandidate in stable
Regression range:
45:19.27 INFO: Narrowed inbound regression window from [442f57db, f829ca2e] (4 builds) to [fde54097, f829ca2e] (2 builds) (~1 steps left)
45:19.27 INFO: No more inbound revisions, bisection finished.
45:19.27 INFO: Last good revision: fde540977b4e52dd21031777aa35906ff81d6e63
45:19.27 INFO: First bad revision: f829ca2e1ad6d637eb01f5cbe50876f22fe53c37
45:19.27 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=fde540977b4e52dd21031777aa35906ff81d6e63&tochange=f829ca2e1ad6d637eb01f5cbe50876f22fe53c37
| Assignee | ||
Comment 1•5 years ago
|
||
Yeah, this is just another race opened up by bug 1591199.
| Reporter | ||
Updated•5 years ago
|
| Reporter | ||
Comment 2•5 years ago
|
||
Seems like a logic problem, not a race. Note the
console.log("pc2.SRD DONE");
await wait(1000);
in the fiddle, which is not needed to reproduce btw, but shows it isn't timing related. I.e. we can wait any number of seconds after the first SRD and NO candidates fire prematurely.
It specifically takes a second SRD to trigger ICE, at which point it's 100% reproducible (no SLD needed). No timing involved. The other end will have no answer to apply these candidates to, likely causing the call to fail.
I think investigating why this happens should be a P2 in case there are other ways to trigger this logic error.
In fact, it appears to be happening on SRD implicit rollback: https://jsfiddle.net/jib1/gjcms01p/28/ which may miss an important candidate, causing calls to fail to connect.
| Reporter | ||
Updated•5 years ago
|
| Reporter | ||
Updated•5 years ago
|
| Comment hidden (obsolete) |
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Hi Nico, is there currently someone working on this? Thank you?
Updated•5 years ago
|
| Reporter | ||
Comment 5•5 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #3)
This test case widens the window of opportunity for the race, even without any waiting. The problem is that the handling for state changes for the sLD(answer) has been delayed by bug 1591199 enough that the candidates can arrive before those state changes are carried out, provided the ICE stack is given enough head start.
I don't understand this explanation, because this reproduces even without calling sLD(answer) https://jsfiddle.net/jib1/gjcms01p/39/
I also verified this is not fixed by the patches in bug 1622384.
| Assignee | ||
Comment 6•5 years ago
|
||
Ok, that's weird. I'll look into it.
| Reporter | ||
Comment 7•5 years ago
|
||
Ah, it's the null candidate fired from handleIceGatheringStateChange - https://jsfiddle.net/jib1/gjcms01p/45/
| Reporter | ||
Comment 8•5 years ago
|
||
iceGatheringState goes to "complete" for some reason on second SRD(offer) - https://jsfiddle.net/jib1/gjcms01p/48/
| Reporter | ||
Updated•5 years ago
|
| Reporter | ||
Updated•5 years ago
|
| Reporter | ||
Updated•5 years ago
|
| Reporter | ||
Comment 9•5 years ago
|
||
I've verified it's the null candidate that's being fired on single SRD() with implicit rollback as well. https://jsfiddle.net/jib1/gjcms01p/50/
| Assignee | ||
Comment 10•5 years ago
|
||
Ok, this is due to the implicit rollback itself; rolling back a sLD shuts down gathering, which means we emit an end-of-candidates. Looking at the spec now to determine what it says should happen on rollback.
| Assignee | ||
Comment 11•5 years ago
•
|
||
Yeah, webrtc-pc says that whenever gathering transitions to complete for all ice transports, we fire this null candidate. I wonder if this should apply when we're rolling back to a state where there are no ICE transports at all? The ICE gathering state should be "new" in that case, probably?
| Reporter | ||
Comment 12•5 years ago
•
|
||
Ah, unlike comment 8 which said pc2.onicecandidate in have-remote-offer null BOOOH!,
comment 9 actually says pc2.onicecandidate null in have-local-offer BOOOH! which is quite normal. I'll remove the BOOOH! there.
But one problem remains with the comment 9 fiddle updated with onicegatheringstatechange: It looks like we're firing "complete" without ever having fired "gathering", which the spec indicates we should have done "When the ICE Agent indicates that it began gathering" which sans a state diagram clearly has to have happened before "When the ICE Agent is finished gathering". That seems wrong (in Firefox).
Yeah, webrtc-pc says that whenever gathering transitions to complete for all ice transports, we fire this null candidate. I wonder if this should apply when we're rolling back to a state where there are no ICE transports at all? The ICE gathering state should be "new" in that case, probably?
I think it's a stretch to say ice gathering is "complete" when it got aborted by its transport being rolled back and vaporized.
After all, the spec says: "When the ICE Agent is finished gathering a generation of candidates for an RTCIceTransport, and those candidates have been surfaced to the application". I'd argue it wasn't allowed to finish in this case.
ICE seems supportive of this interpretation: "ICE processing across all media streams also has a state associated with it. This state is equal to Running while ICE processing is under way. The state is Completed when ICE processing is complete and Failed if it failed without success."
But regardless of interpretation, I see no support anywhere for the behavior in comment 8 (without any SLD on pc2) and earlier, because the phrase "and those candidates have been surfaced to the application" above is a reference to [[EarlyCandidates]], which means we must not ever fire "complete" before SLD(answer).
| Assignee | ||
Comment 13•5 years ago
|
||
Yeah, I think in this case we should be going back to "new". As for going directly from "new" to "complete", there are corner-cases like failure to start gathering that look under-specified (although that isn't what is happening here).
As for comment 8, I think maybe we're triggering the ICE agent rollback stuff with sRD somehow? Looking into it.
| Assignee | ||
Comment 14•5 years ago
|
||
Ok, flaw is here for comment 8:
Somehow, this is true in the double sRD(offer) case.
| Assignee | ||
Comment 15•5 years ago
|
||
| Assignee | ||
Comment 16•5 years ago
|
||
Depends on D72234
| Assignee | ||
Comment 17•5 years ago
|
||
Depends on D72235
| Assignee | ||
Comment 18•5 years ago
|
||
Depends on D72236
| Assignee | ||
Comment 19•5 years ago
|
||
Depends on D72238
| Assignee | ||
Comment 20•5 years ago
|
||
Depends on D72239
| Assignee | ||
Comment 21•5 years ago
|
||
Depends on D72240
| Assignee | ||
Comment 22•5 years ago
|
||
Depends on D72241
| Assignee | ||
Comment 23•5 years ago
|
||
Depends on D72242
| Assignee | ||
Comment 24•5 years ago
|
||
Depends on D72243
| Assignee | ||
Comment 25•5 years ago
|
||
Depends on D72244
| Assignee | ||
Comment 26•5 years ago
•
|
||
| Assignee | ||
Comment 27•5 years ago
|
||
Try looks good.
| Assignee | ||
Comment 28•5 years ago
|
||
More thorough try push here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0311cb13bc110167d3b76f33926734a3038d7b8f
| Assignee | ||
Comment 29•5 years ago
|
||
Comment 30•5 years ago
|
||
Comment 33•5 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/e2f411ffa079
https://hg.mozilla.org/mozilla-central/rev/5e5e7b27d296
https://hg.mozilla.org/mozilla-central/rev/673ef75f8fe6
https://hg.mozilla.org/mozilla-central/rev/bee9068e5a39
https://hg.mozilla.org/mozilla-central/rev/41ac898a5d97
https://hg.mozilla.org/mozilla-central/rev/3a5601c68d76
https://hg.mozilla.org/mozilla-central/rev/40c9fc4fa292
https://hg.mozilla.org/mozilla-central/rev/065ddaaefe60
https://hg.mozilla.org/mozilla-central/rev/09af49103607
https://hg.mozilla.org/mozilla-central/rev/53ba1301dfdd
https://hg.mozilla.org/mozilla-central/rev/7917d7e0a471
Updated•5 years ago
|
Updated•5 years ago
|
Description
•