Closed Bug 1629565 Opened 4 years ago Closed 4 years ago

null candidate fired ahead of SLD(answer) on SRD w/rollback or double SRD (regression)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla77

Tracking Flags:

Tracking

Status

firefox-esr68

---

unaffected

firefox74

---

wontfix

firefox75

---

wontfix

firefox76

---

wontfix

firefox77

---

fixed

People

(Reporter: jib, Assigned: bwc)

References

(Regression)

Details

(Keywords: regression)

Attachments

(11 files)

Bug 1629565: Test-case for bug. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Fix bug where repeated sRD(offer) could start gathering. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Test that we do not transition to "gathering" on a simple reoffer. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Do not transition back to "gathering" if no gathering is needed. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Test that we do not transition to "gathering" on rollback if we were already done gathering. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Fix bug where rollback of local offer would stomp the pre-existing transport with a new one with the same ufrag/pwd. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Test-cases for transitioning to "new" gathering state. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Transition gathering state to "new" if a negotiation leaves us with no transports. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Some logging that was helpful. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Mochitest fixes. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1629565: Another test we should probably have. Passes. 4 years ago Byron Campen [:bwc] 47 bytes, text/x-phabricator-request		Details \| Review

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Description

•

4 years ago

•

Edited

Doing SRD twice on the answerer (which is allowed), causes ICE gathering to jump the gate, firing answer candidates ahead of answer.

This is against spec.

STRs:

Open https://jsfiddle.net/jib1/gjcms01p/ and share camera

Expected results:

pc2.SRD DONE
pc2.SRD TWICE DONE
stable
pc2.SLD DONE
pc2.onicecandidate in stable
pc2.onicecandidate in stable

Actual results:

pc2.SRD DONE
pc2.SRD TWICE DONE
pc2.onicecandidate in have-remote-offer BOOOH!
pc1.addIce: InvalidStateError: Cannot add ICE candidate when there is no remote SDP
stable
pc2.SLD DONE
pc2.onicecandidate in stable

Regression range:

45:19.27 INFO: Narrowed inbound regression window from [442f57db, f829ca2e] (4 builds) to [fde54097, f829ca2e] (2 builds) (~1 steps left)
45:19.27 INFO: No more inbound revisions, bisection finished.
45:19.27 INFO: Last good revision: fde540977b4e52dd21031777aa35906ff81d6e63
45:19.27 INFO: First bad revision: f829ca2e1ad6d637eb01f5cbe50876f22fe53c37
45:19.27 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=fde540977b4e52dd21031777aa35906ff81d6e63&tochange=f829ca2e1ad6d637eb01f5cbe50876f22fe53c37

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 1

•

4 years ago

Yeah, this is just another race opened up by bug 1591199.

Flags: needinfo?(docfaraday)

Priority: -- → P3

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

4 years ago

status-firefox74: --- → affected

status-firefox75: --- → affected

status-firefox76: --- → affected

status-firefox77: --- → affected

status-firefox-esr68: --- → unaffected

Summary: Early pc.onicecandidate before SLD on double SRD (regression) → Answer candidates fired ahead of answer on double SRD (regression)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Comment 2

•

4 years ago

Seems like a logic problem, not a race. Note the

console.log("pc2.SRD DONE");
await wait(1000);

in the fiddle, which is not needed to reproduce btw, but shows it isn't timing related. I.e. we can wait any number of seconds after the first SRD and NO candidates fire prematurely.

It specifically takes a second SRD to trigger ICE, at which point it's 100% reproducible (no SLD needed). No timing involved. The other end will have no answer to apply these candidates to, likely causing the call to fail.

I think investigating why this happens should be a P2 in case there are other ways to trigger this logic error.

In fact, it appears to be happening on SRD implicit rollback: https://jsfiddle.net/jib1/gjcms01p/28/ which may miss an important candidate, causing calls to fail to connect.

Priority: P3 → P2

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

4 years ago

Flags: needinfo?(docfaraday)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

4 years ago

Summary: Answer candidates fired ahead of answer on double SRD (regression) → Answer candidates fired ahead of answer on SRD w/rollback or double SRD (regression)

Comment hidden (obsolete)

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Keywords: regression

Jens Stutte [:jstutte]

Comment 4

•

4 years ago

Hi Nico, is there currently someone working on this? Thank you?

Flags: needinfo?(na-g)

Jens Stutte [:jstutte]

Updated

•

4 years ago

status-firefox74: affected → wontfix

status-firefox75: affected → wontfix

status-firefox76: affected → fix-optional

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Comment 5

•

4 years ago

(In reply to Byron Campen [:bwc] from comment #3)

This test case widens the window of opportunity for the race, even without any waiting. The problem is that the handling for state changes for the sLD(answer) has been delayed by bug 1591199 enough that the candidates can arrive before those state changes are carried out, provided the ICE stack is given enough head start.

I don't understand this explanation, because this reproduces even without calling sLD(answer) https://jsfiddle.net/jib1/gjcms01p/39/

I also verified this is not fixed by the patches in bug 1622384.

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 6

•

4 years ago

Ok, that's weird. I'll look into it.

Assignee: nobody → docfaraday

Flags: needinfo?(docfaraday)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Comment 7

•

4 years ago

Ah, it's the null candidate fired from handleIceGatheringStateChange - https://jsfiddle.net/jib1/gjcms01p/45/

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Comment 8

•

4 years ago

iceGatheringState goes to "complete" for some reason on second SRD(offer) - https://jsfiddle.net/jib1/gjcms01p/48/

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

4 years ago

Flags: needinfo?(na-g)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

4 years ago

Summary: Answer candidates fired ahead of answer on SRD w/rollback or double SRD (regression) → null candidate fired ahead of answer on SRD w/rollback or double SRD (regression)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

4 years ago

Summary: null candidate fired ahead of answer on SRD w/rollback or double SRD (regression) → null candidate fired ahead of SLD(answer) on SRD w/rollback or double SRD (regression)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Comment 9

•

4 years ago

I've verified it's the null candidate that's being fired on single SRD() with implicit rollback as well. https://jsfiddle.net/jib1/gjcms01p/50/

Byron Campen [:bwc]

Assignee

Comment 10

•

4 years ago

Ok, this is due to the implicit rollback itself; rolling back a sLD shuts down gathering, which means we emit an end-of-candidates. Looking at the spec now to determine what it says should happen on rollback.

Byron Campen [:bwc]

Assignee

Comment 11

•

4 years ago

•

Edited

Yeah, webrtc-pc says that whenever gathering transitions to complete for all ice transports, we fire this null candidate. I wonder if this should apply when we're rolling back to a state where there are no ICE transports at all? The ICE gathering state should be "new" in that case, probably?

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Comment 12

•

4 years ago

•

Edited

Ah, unlike comment 8 which said pc2.onicecandidate in have-remote-offer null BOOOH!,
comment 9 actually says pc2.onicecandidate null in have-local-offer BOOOH! which is quite normal. I'll remove the BOOOH! there.

But one problem remains with the comment 9 fiddle updated with onicegatheringstatechange: It looks like we're firing "complete" without ever having fired "gathering", which the spec indicates we should have done "When the ICE Agent indicates that it began gathering" which sans a state diagram clearly has to have happened before "When the ICE Agent is finished gathering". That seems wrong (in Firefox).

Yeah, webrtc-pc says that whenever gathering transitions to complete for all ice transports, we fire this null candidate. I wonder if this should apply when we're rolling back to a state where there are no ICE transports at all? The ICE gathering state should be "new" in that case, probably?

I think it's a stretch to say ice gathering is "complete" when it got aborted by its transport being rolled back and vaporized.

After all, the spec says: "When the ICE Agent is finished gathering a generation of candidates for an RTCIceTransport, and those candidates have been surfaced to the application". I'd argue it wasn't allowed to finish in this case.

ICE seems supportive of this interpretation: "ICE processing across all media streams also has a state associated with it. This state is equal to Running while ICE processing is under way. The state is Completed when ICE processing is complete and Failed if it failed without success."

But regardless of interpretation, I see no support anywhere for the behavior in comment 8 (without any SLD on pc2) and earlier, because the phrase "and those candidates have been surfaced to the application" above is a reference to [[EarlyCandidates]], which means we must not ever fire "complete" before SLD(answer).

Byron Campen [:bwc]

Assignee

Comment 13

•

4 years ago

Yeah, I think in this case we should be going back to "new". As for going directly from "new" to "complete", there are corner-cases like failure to start gathering that look under-specified (although that isn't what is happening here).

As for comment 8, I think maybe we're triggering the ICE agent rollback stuff with sRD somehow? Looking into it.

Byron Campen [:bwc]

Assignee

Comment 14

•

4 years ago

Ok, flaw is here for comment 8:

https://searchfox.org/mozilla-central/rev/d2cec90777d573585f8477d5170892e5dcdfb0ab/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#2170-2171

Somehow, this is true in the double sRD(offer) case.

Byron Campen [:bwc]

Assignee

Comment 15

•

4 years ago

Attached file Bug 1629565: Test-case for bug. — Details

Byron Campen [:bwc]

Assignee

Comment 16

•

4 years ago

Attached file Bug 1629565: Fix bug where repeated sRD(offer) could start gathering. — Details

Depends on D72234

Byron Campen [:bwc]

Assignee

Comment 17

•

4 years ago

Attached file Bug 1629565: Test that we do not transition to "gathering" on a simple reoffer. — Details

Depends on D72235

Byron Campen [:bwc]

Assignee

Comment 18

•

4 years ago

Attached file Bug 1629565: Do not transition back to "gathering" if no gathering is needed. — Details

Depends on D72236

Byron Campen [:bwc]

Assignee

Comment 19

•

4 years ago

Attached file Bug 1629565: Test that we do not transition to "gathering" on rollback if we were already done gathering. — Details

Depends on D72238

Byron Campen [:bwc]

Assignee

Comment 20

•

4 years ago

Attached file Bug 1629565: Fix bug where rollback of local offer would stomp the pre-existing transport with a new one with the same ufrag/pwd. — Details

Depends on D72239

Byron Campen [:bwc]

Assignee

Comment 21

•

4 years ago

Attached file Bug 1629565: Test-cases for transitioning to "new" gathering state. — Details

Depends on D72240

Byron Campen [:bwc]

Assignee

Comment 22

•

4 years ago

Attached file Bug 1629565: Transition gathering state to "new" if a negotiation leaves us with no transports. — Details

Depends on D72241

Byron Campen [:bwc]

Assignee

Comment 23

•

4 years ago

Attached file Bug 1629565: Some logging that was helpful. — Details

Depends on D72242

Byron Campen [:bwc]

Assignee

Comment 24

•

4 years ago

Attached file Bug 1629565: Mochitest fixes. — Details

Depends on D72243

Byron Campen [:bwc]

Assignee

Comment 25

•

4 years ago

Attached file Bug 1629565: Another test we should probably have. Passes. — Details

Depends on D72244

Byron Campen [:bwc]

Assignee

Comment 26

•

4 years ago

•

Edited

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7b3ce3ef10ca835135961a70cfb435f3184d1fd6

Gtests here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c6d537e955d7ccc7a7f696956e92b30229d78b4

Byron Campen [:bwc]

Assignee

Comment 27

•

4 years ago

Try looks good.

Byron Campen [:bwc]

Assignee

Comment 28

•

4 years ago

More thorough try push here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0311cb13bc110167d3b76f33926734a3038d7b8f

Byron Campen [:bwc]

Assignee

Comment 29

•

4 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d23b8d97f8a2198f4ef62921b72ae65fcc452e04

Pulsebot

Comment 30

•

4 years ago

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e2f411ffa079
Test-case for bug. r=jib
https://hg.mozilla.org/integration/autoland/rev/5e5e7b27d296
Fix bug where repeated sRD(offer) could start gathering. r=mjf
https://hg.mozilla.org/integration/autoland/rev/673ef75f8fe6
Test that we do not transition to "gathering" on a simple reoffer. r=jib
https://hg.mozilla.org/integration/autoland/rev/bee9068e5a39
Do not transition back to "gathering" if no gathering is needed. r=mjf
https://hg.mozilla.org/integration/autoland/rev/41ac898a5d97
Test that we do not transition to "gathering" on rollback if we were already done gathering. r=jib
https://hg.mozilla.org/integration/autoland/rev/3a5601c68d76
Fix bug where rollback of local offer would stomp the pre-existing transport with a new one with the same ufrag/pwd. r=mjf
https://hg.mozilla.org/integration/autoland/rev/40c9fc4fa292
Test-cases for transitioning to "new" gathering state. r=jib
https://hg.mozilla.org/integration/autoland/rev/065ddaaefe60
Transition gathering state to "new" if a negotiation leaves us with no transports. r=mjf
https://hg.mozilla.org/integration/autoland/rev/09af49103607
Some logging that was helpful. r=mjf
https://hg.mozilla.org/integration/autoland/rev/53ba1301dfdd
Mochitest fixes. r=jib
https://hg.mozilla.org/integration/autoland/rev/7917d7e0a471
Another test we should probably have. Passes. r=jib

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 31

•

4 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/23274 for changes under testing/web-platform/tests

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 32

•

4 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Andreea Pavel [:apavel]

Updated

•

4 years ago

Regressions: 1633533

Stefan Hindli [:stefan_hindli]

Comment 33

•

4 years ago

bugherder

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox77: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla77

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 34

•

4 years ago

Upstream PR merged by moz-wptsync-bot

Natalia Csoregi [:nataliaCs]

Updated

•

4 years ago

Regressions: 1633780

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

4 years ago

status-firefox76: fix-optional → wontfix

Andreea Pavel [:apavel]

Updated

•

4 years ago

Regressions: 1633595

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

You need to log in before you can comment on or make changes to this bug.