Open Bug 1412122 Opened 7 years ago Updated 2 years ago

ICE connection state verification is to strict

Categories

(Core :: WebRTC: Networking, enhancement, P2)

enhancement

Tracking

()

Tracking Status
firefox58 --- affected

People

(Reporter: drno, Unassigned)

References

Details

Attachments

(1 file)

The mochitest contain a ICE connection state verification which assumes that all states are exactly enter as in the state diagram of the webrtc spec.

But as the oniceconnectionstate events are delivered without the new state as an argument, the tests poll the current state after receiving the event. But in some situations the ICE connections state can already have progressed further. So to the JS it looks like a state was skipped, where in fact only the notification to JS was too slow for JS to observe the state in between.

So we need to relax that check to only ensure that we don't make illegal backwards transitions.
Rank: 25
Priority: -- → P2
Comment on attachment 8924012 [details]
Bug 1412122: relax ICE connection state checking.

https://reviewboard.mozilla.org/r/195218/#review200284

Are we really getting the state change event late? Or is something else going on here?
Attachment #8924012 - Flags: review?(docfaraday)
With queue-jumping removed, we can move to the final state without the middle transitory state being necessarily observable by JS, since we notify that a state changed, but don't forward the new state value as part of the event.
So I did a little bit more digging and it turns out that the functions where we call PeerConnectionObserver::OnStateChange via RUN_ON_THREAD are also places where we skipped the queue so far.

So the question is: do we want/need to continue to deliver these events directly when the event occurred, or can these even notifications (IceConnectionState and IceGatheringState) be queued on the thread queue and delivered later?
https://searchfox.org/mozilla-central/rev/aa1343961fca52e8c7dab16788530da31aab7c8d/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#3542
https://searchfox.org/mozilla-central/rev/aa1343961fca52e8c7dab16788530da31aab7c8d/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#3582

Interestingly for SignalingState we don't use RUN_ON_THREAD, but always call directly: https://searchfox.org/mozilla-central/rev/aa1343961fca52e8c7dab16788530da31aab7c8d/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#3256

Thoughts?
Flags: needinfo?(rjesup)
Flags: needinfo?(jib)
Flags: needinfo?(docfaraday)
Another thing to note: if we're on MainThread and skip the queue, if we were invoked from JS, and the event we run immediately invokes JS, we've just violated the JS run-to-completion mandate, and effectively spun the event loop.  This can cause subtle security/stability issues.

We should only be skipping the queue where there's an important requirement or strong advantage to doing so, and in those cases do it explicitly.  It does save a few cycles (not a significant amount).  It does mean some things happen sooner.  It also means that ordering of events becomes hard to reason about, and nasty bugs can and have been caused by this.
Flags: needinfo?(rjesup)
(In reply to Nils Ohlmeier [:drno] from comment #4)
> So the question is: do we want/need to continue to deliver these events
> directly when the event occurred, or can these even notifications
> (IceConnectionState and IceGatheringState) be queued on the thread queue and
> delivered later?

This depends on whether queuing has already happened in the C++ code or not.

All the JS APIs firing events say to do so from queued events, but at the same time we don't want to queue more than one.

Someone (Nils, Byron?) need to look at the spec, which details what needs to happen for each API affected by this, and make sure we queue a task correctly according to the spec steps of each API.

Might this also be causing bug 1394072?
Flags: needinfo?(jib)
We should also add tests here to verify this fix to our task queuing order (not to be confused with enqueueing order), that it is to spec (web-platform-tests used to have this at one point but backed off).

Needinfo me for tips (may need to use SpecialPowers.Services.tm.dispatchToMainThread to ensure things aren't queued too late)
So, the w3c spec says that we should be queueing a task that checks whether the new state is actually different than the current state (bail if not), updates the ICE connection state on the PC, and then immediately fires oniceconnectionstatechange. I don't see how this would allow an event handler to "miss" state changes.
Flags: needinfo?(docfaraday)
I don't have the capacity any more to work on this any time soon.
Assignee: drno → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: