Closed Bug 1591199 Opened 3 years ago Closed 2 years ago

answer is offer for a brief moment after signalingstatechange event

Categories

(Core :: WebRTC: Signaling, defect, P2)

70 Branch
Unspecified
All
defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- fixed

People

(Reporter: jib, Assigned: bwc)

References

(Blocks 1 open bug)

Details

Attachments

(19 files, 15 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

STRs:

  1. Open https://jsfiddle.net/jib1/4et5v62h/
await pc2.setRemoteDescription(await pc1.createOffer());
pc2.setLocalDescription(await pc2.createAnswer());
await new Promise(r => pc2.onsignalingstatechange = r);
console.log(pc2.localDescription.type);
await undefined;
console.log(pc2.localDescription.type);

Expected result:

answer
answer

Actual result:

offer
answer

Not a regression AFAIK.

Flags: needinfo?(docfaraday)
Priority: -- → P2

For context, here's the fiddle I was trying to make when I found this: https://jsfiddle.net/jib1/4vhur53e/

Would make a good wpt test since it fails in Chrome and Safari.

Yeah, this is because we're firing the signalingstate event from c++, and then jumping to JS, where we update the local SDP type:

https://searchfox.org/mozilla-central/rev/74cc0f4dce444fe0757e2a6b8307d19e4d0e0212/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#1352-1353

We cannot just switch the order here, because the sLD needs to resolve after the signalingstate event. We probably need to get rid of this _pendingRole stuff and modify the webidl so PeerConnectionImpl gives us the SDP and the type, not just the SDP.

Flags: needinfo?(docfaraday)
Assignee: nobody → docfaraday

Try looks good. Should probably add a wpt based on the fiddle, and maybe break the patch up.

This is snowballing a bit. Fixing one case where we aren't queueing a task basically forces me to fix all of the others. This is going to end up being a fairly large fix.

Depends on: 1408256

Hey, I'm seeing some ICE failures caused by MDNS resolution taking 5 seconds or more on Windows 7 (https://treeherder.mozilla.org/#/jobs?repo=try&revision=786f2cc241bae9b87eb63f57bdd2e2bfdca802d7&selectedJob=279922353). Any idea what might be going on here?

^

Flags: needinfo?(dminor)

(In reply to Byron Campen [:bwc] from comment #14)

Hey, I'm seeing some ICE failures caused by MDNS resolution taking 5 seconds or more on Windows 7 (https://treeherder.mozilla.org/#/jobs?repo=try&revision=786f2cc241bae9b87eb63f57bdd2e2bfdca802d7&selectedJob=279922353). Any idea what might be going on here?

Nope, when I was landing the mDNS stuff, I had one wpt test that was particularly bad on Windows 7 only, that I just ended up disabling. I filed bug 1594018 for that. I'd suggest just disabling your test for Windows 7, with a link to that bug as well, and I can follow up.

Flags: needinfo?(dminor)
Attachment #9107297 - Attachment description: Bug 1591199: Task queueing fixes to prevent races. → Bug 1591199: Test-cases for the bug. r?jib

Depends on D52228

Depends on D56403

Depends on D52228

Depends on D56459

Attached file Bug 1591199: Reduce redundant code. (obsolete) —

Depends on D56460

Try looks good.

Attachment #9114701 - Attachment is obsolete: true
Attachment #9114703 - Attachment is obsolete: true
Attachment #9114704 - Attachment is obsolete: true
Attachment #9114706 - Attachment is obsolete: true
Attachment #9114707 - Attachment is obsolete: true
Attachment #9114708 - Attachment is obsolete: true
Attachment #9114709 - Attachment is obsolete: true
Attachment #9114710 - Attachment is obsolete: true
Attachment #9114711 - Attachment is obsolete: true
Attachment #9114713 - Attachment is obsolete: true
Attachment #9114714 - Attachment is obsolete: true

Try looks good. Going to run all tests to catch any stuff elsewhere in the tree (eg; presentation API).

Blocks: 1548318
No longer blocks: 1548318

Depends on D59521

Attachment #9117794 - Attachment is obsolete: true
Attachment #9117793 - Attachment is obsolete: true
Attachment #9114603 - Attachment is obsolete: true
Attachment #9114604 - Attachment is obsolete: true

Hey Byron, just wanted to make sure you saw that Lando couldn't push this due to webidl changes missing DOM peer review.

Flags: needinfo?(docfaraday)

This looks cooked to me. AFAICT only https://phabricator.services.mozilla.com/D56403 of the accepted patches touches PeerConnectionImpl.webidl which is the file mentioned in the logs—which otherwise aren't very pinpointy—as missing dom review and blocking landing, and that review has smaug's rubberstamp, even on the most recent revision.

If I were to guess, I think tooling is tripping over https://phabricator.services.mozilla.com/D56460, which also touches PeerConnectionImpl.webidl, but is an abandoned patch that for some reason shows up in the lando stack.

I guess we can try one of two things:

  1. rebase away the abandoned revisions (why are they showing up?), or
  2. ask smaug or someone to r+ the abandoned https://phabricator.services.mozilla.com/D56460 to appease the bot.

Thoughts?

Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7e67ac9f5591
Test-cases for the bug. r=jib
https://hg.mozilla.org/integration/autoland/rev/447c8b096f5a
Task queueing fixes in c++ r=jib
https://hg.mozilla.org/integration/autoland/rev/a1a1f71b91fc
Update tests to no longer expect a signalingstatechange event when the PC is closed. r=jib
https://hg.mozilla.org/integration/autoland/rev/48bfe1c627e1
In the mochitest harness, shut off trickle candidates before sRD(rollback) to prevent them from slipping through afterward. r=jib
https://hg.mozilla.org/integration/autoland/rev/4973bd5df42d
Make this test-case less racy, and a little more modern. r=jib
https://hg.mozilla.org/integration/autoland/rev/55a69e5443ab
In these mochitests, reset the ICE candidate handler _before_ making changes that will restart ICE gathering. r=jib
https://hg.mozilla.org/integration/autoland/rev/3bc58d0f0b35
Start waiting for a negotiationneeded event earlier so we don't miss it. r=jib
https://hg.mozilla.org/integration/autoland/rev/897e8de078fa
Queue updates to pending/current descriptions. r=mjf,jib,smaug
https://hg.mozilla.org/integration/autoland/rev/ac47d18c085c
Reduce redundant code. r=mjf,smaug
https://hg.mozilla.org/integration/autoland/rev/8ba6518dfd42
Extend trickle ICE grace period, because windows 7 is taking a very long time to resolve MDNS candidates on try sometimes. r=dminor
https://hg.mozilla.org/integration/autoland/rev/5811f7b48aee
Wait for all the mute/unmute events so we don't miss any transitions. r=jib
https://hg.mozilla.org/integration/autoland/rev/96091b875760
Use correct time-base for rtcp stats comparison. r=ng
https://hg.mozilla.org/integration/autoland/rev/dc0ab0642f92
Fix a race where sRD(offer) could mute a track, then an RTP packet could arrive that unmutes it before negotiation completed, this time without needing to stop the conduit and send a bunch of extra RTCP BYEs. r=mjf
https://hg.mozilla.org/integration/autoland/rev/9ae196dc62b9
Mark this test as passing. r=jib
https://hg.mozilla.org/integration/autoland/rev/a67f2a68965d
Mark this test as long, since it does ICE. r=jib
https://hg.mozilla.org/integration/autoland/rev/cd44655b2740
Don't assert when a candidate is gathered for a transceiver that is not associated, since that can happen in certain rollback scenarios. r=mjf
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/21363 for changes under testing/web-platform/tests
Flags: needinfo?(docfaraday)
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR was closed without merging

(In reply to Andrei Ciure[:andrei_ciure] from comment #79)

Backed out 16 changesets (bug 1591199) for causing mda failures
https://hg.mozilla.org/integration/autoland/rev/c0d7cf71074de0d9ccadb96de2d98dec211e6d2f

push that caused the backout: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=cd44655b27409dd0604cdd051da6302f1994f085&searchStr=mda

Is try fuzzy the only way to run that job? Because "try: -b do -p all -u all -t none" did not run it.

Flags: needinfo?(docfaraday) → needinfo?(ahal)

Yeah, I think android-hw platforms explicitly opt out of try syntax because they are extremely hardware constrained. IIRC you can't even run them with normal mach try fuzzy, you need to use ./mach try fuzzy --full. Fwiw, you should also be able to use ./mach try chooser --full if fuzzy isn't your thing. I believe the platform you want is android-aarch64.

Flags: needinfo?(ahal)

Aarch64 appears to struggle with lots of tests.

So... we're backing stuff out now for failing tests that only run with ./mach try fuzzy --full? When ./mach try fuzzy still can't be used to run the webrtc tests? Do I need to be running ./mach try fuzzy --full and running all of the jobs just to make sure I don't miss something? Seems expensive.

Flags: needinfo?(aciure)

No, that's not the intended side effect :).

I understand being backed out for something you had no control over is super frustrating. But in this case, it's the lesser of two evils. The alternative is to schedule those android-hw tasks on try by default but then have them take some ridiculous amount of time to complete (like 24hrs) as the demand will outweigh the supply.

Fwiw, I don't have any involvement with this platform so I'm the wrong person to be asking. But a conversation about increasing the pool or making these tier-2 would be a productive path forward.

When ./mach try fuzzy still can't be used to run the webrtc tests?

Normally it would, but this platform is explicitly excluded:
https://searchfox.org/mozilla-central/source/tools/tryselect/selectors/fuzzy.py#48

On an unrelated note, our main focus these days is creating smarter scheduling algorithms that can automatically choose which tasks should run based on machine learning and ccov. So in the future you hopefully won't run into situations like this.

:bwc not much more I can add except point out that that we can't let perma fails spread to all the following pushes as that might mask future perma fails in other changesets, that's the reasoning for backouts regardless of whose fault it is.

Flags: needinfo?(aciure)

Ok, so I guess that's the situation we're in. Kinda defeats the purpose of try though.

Since this is almost certainly a case of the platform being too slow to run the test, or a timing flaw in the test itself, I think I will need to disable this test on this platform. Trying to troubleshoot this test will probably take too much time on this limited hardware.

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9ae3747204bb
Test-cases for the bug. r=jib
https://hg.mozilla.org/integration/autoland/rev/88a2a2175fa7
Task queueing fixes in c++ r=jib
https://hg.mozilla.org/integration/autoland/rev/d8d75d0ac290
Update tests to no longer expect a signalingstatechange event when the PC is closed. r=jib
https://hg.mozilla.org/integration/autoland/rev/78ab1b3faf21
In the mochitest harness, shut off trickle candidates before sRD(rollback) to prevent them from slipping through afterward. r=jib
https://hg.mozilla.org/integration/autoland/rev/99f30faf04ba
Make this test-case less racy, and a little more modern. r=jib
https://hg.mozilla.org/integration/autoland/rev/7d726ee6e1e5
In these mochitests, reset the ICE candidate handler _before_ making changes that will restart ICE gathering. r=jib
https://hg.mozilla.org/integration/autoland/rev/fb115a585596
Start waiting for a negotiationneeded event earlier so we don't miss it. r=jib
https://hg.mozilla.org/integration/autoland/rev/7b27ee1c2e0b
Queue updates to pending/current descriptions. r=mjf,jib,smaug
https://hg.mozilla.org/integration/autoland/rev/fa0542f31812
Reduce redundant code. r=mjf,smaug
https://hg.mozilla.org/integration/autoland/rev/73eadde199da
Extend trickle ICE grace period, because windows 7 is taking a very long time to resolve MDNS candidates on try sometimes. r=dminor
https://hg.mozilla.org/integration/autoland/rev/99cae41d1405
Wait for all the mute/unmute events so we don't miss any transitions. r=jib
https://hg.mozilla.org/integration/autoland/rev/2f0d6b422540
Use correct time-base for rtcp stats comparison. r=ng
https://hg.mozilla.org/integration/autoland/rev/5a56d03652dc
Fix a race where sRD(offer) could mute a track, then an RTP packet could arrive that unmutes it before negotiation completed, this time without needing to stop the conduit and send a bunch of extra RTCP BYEs. r=mjf
https://hg.mozilla.org/integration/autoland/rev/f73f5ba4e8f8
Mark this test as passing. r=jib
https://hg.mozilla.org/integration/autoland/rev/883b3c119d54
Mark this test as long, since it does ICE. r=jib
https://hg.mozilla.org/integration/autoland/rev/8f2415485e75
Don't assert when a candidate is gathered for a transceiver that is not associated, since that can happen in certain rollback scenarios. r=mjf
https://hg.mozilla.org/integration/autoland/rev/ccb5d5c6f1c2
Test suppressions for some of the android-hw jobs. r=jib
https://hg.mozilla.org/integration/autoland/rev/0bc8b977c76d
Stop expecting this test to encounter an error. r=jib
https://hg.mozilla.org/integration/autoland/rev/f829ca2e1ad6
Mark this test as intermittent, due to bug 1613433, since this patchset makes this failure more frequent. r=jib
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR merged by moz-wptsync-bot
Regressions: 1613976
Regressions: 1614055
Regressions: 1614803
Regressions: 1616382
Regressions: 1629565
You need to log in before you can comment on or make changes to this bug.