Make it possible to check if a Window has a WebRTC connection

RESOLVED FIXED in Firefox 57

Status

()

enhancement
P1
normal
RESOLVED FIXED
2 years ago
a month ago

People

(Reporter: farre, Assigned: hsivonen)

Tracking

unspecified
mozilla57
Points:
---

Firefox Tracking Flags

(firefox57 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Comment hidden (empty)
(Reporter)

Updated

2 years ago
Blocks: 1377766
Assignee: nobody → afarre
Priority: -- → P1
Looks like we need to maintain a peer connection count on the inner window object.
Assignee: afarre → hsivonen
Status: NEW → ASSIGNED
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Attachment #8893773 - Attachment is obsolete: true
Comment hidden (mozreview-request)
IPeerConnectionManager defines an RTCPeerConnection as "active" if an RTCPeerConnection exists (has been created and hasn't been garbage collected yet). My current patch (rev 3 on MozReview) makes the definition a bit narrower by not waiting until GC but until the connection getting closed. (It's not quite clear to me if there is a mechanism that ensures that the connection gets closed if the JS program just drops the references to the RTCPeerConnection object without explicitly calling close().)

This appears to have the undesirable effect that a connection remains active for throttling purposes even if it has failed for practical purposes. (E.g. the peer just went away.) That is, arguably, the connection is "active" only if its iceConnectionState is neither "failed" nor "closed", but the current patch is unaware of "failed".

My main problem, though, is that I don't know the reason *why* an "active" RTCPeerConnection should inhibit throttling, so I don't know what definition of "active" would best fit the rationale of an "active" connection inhibiting throttling. 

What's the rationale for an "active" RTCPeerConnection needing to inhibit throttling?
Flags: needinfo?(overholt)
Looks like we'll probably want to treat "failed" as "not active", but then ICE restart via pc.createOffer({iceRestart: true}) needs to be taken into account.

The case of the RTCPeerConnection getting GCed before either getting to "failed" or "closed" needs solving, too.
Flags: needinfo?(overholt)
I was going to suggest asking :jesup about these things.
What I've learned so far:

1) Currently, there exists IPeerConnectionManager (implemented in JS), which can be asked if a given inner window has any not-yet-GCed RTCPeerConnections. This is based on weak references.

2) Defining "active" for the purpose of back/forward cache and throttling as "an RTCPeerConnection exists" is rather unsatisfactory, because it means that e.g. https://www.sharedrop.io/ stays active forever after it has found a peer once even if the peer closes its window of otherwise just disappears. Therefore, it's probably desirable to treat the "closed" and "failed" states as "not active".

3) The current mechanism that requires a call into JS to query for the existence of active connections doesn't work for reasons farre documents in https://searchfox.org/mozilla-central/source/dom/base/TimeoutManager.cpp#1248.

4) To avoid having to call into JS when we want to know if there are "active" RTCPeerConnections, the obvious remedy is to maintain the state "has 'active' RTCPeerConnections" on the inner window itself and have the implementation of RTCPeerConnection update that state as appropriate. This way, the state can be read quickly in a C++-only manner.

5) I can't find a guarantee that an RTCPeerConnection would transition to the "closed" state prior to being garbage-collected if the JS on the page just drops a reference to a non-closed RTCPeerConnection. Therefore, there needs to be some mechanism to update the state on the inner window around the time an RTCPeerConnection gets GCed.

6) According to bz on dev-platform, we don't have finalizers for JS-based XPCOM objects, so I conclude that the case of an RTCPeerConnection transitioning to "not active" due to getting GCed should be handled by the destructor of a C++ object that gets deleted around the time the RTCPeerConnection gets GCed.

7) The C++ object should, therefore, maintain knowledge of whether it is responsible for telling the inner windows that it's no longer active.

8) It *appears* that each JS-based RTCPeerConnection instance is paired with a C++-based PeerConnectionImpl instance that actually connects to the underlying connection machinery. Therefore, it seems that the PeerConnectionImpl instance should have the role of the C++ object contemplated in points 6 and 7.

9) It appears that the connection state changes on PeerConnectionImpl happen off-the-main-thread and a runnable is posted to the main thread to update the state on the JS-based RTCPeerConnection object.

Unresolved:

 * PeerConnectionImpl dispatches to its mThread for doing main-thread stuff instead of using a dispatch method that dispatches to the main thread. It's not clear to me what the significance of this is.

 * If the JS-based RTCPeerConnection had the responsibility of looking at transitions between the set of states that doesn't include "closed" and "failed" and the {"closed", "failed"} set of states in order to notify the inner window, it would have to sync back to PeerConnectionImpl the info on whether, if deleted now, the PeerConnectionImpl destructor would be responsible for notifying the inner window.

 * OTOH, if the notifying of the inner window lived completely in C++ in PeerConnectionImpl, it would be necessary to do one of:
   - Make the status on the inner window an atomic and update it from a non-main thread.
   - Add another set of runnables (another on top of the ones mentioned in point 9 above) for effecting the notification of the inner window from the main thread.
   - Make the runnables mentioned in point 9 carry extra info about the need to notify the inner window.

I'm trying figure out which of these I should do. Using atomics without runnables would be less code, but I'm not sure if that would be problematic in another way.
jesup, do you have guidance on what to do about the issues mentioned in the previous comment?
Flags: needinfo?(rjesup)
" PeerConnectionImpl dispatches to its mThread for doing main-thread stuff instead of using a dispatch method that dispatches to the main thread. It's not clear to me what the significance of this is."

The answer to this question is that we have (or at least had) tests which didn't really run the main Gecko event loop and so needed to be provided with a thread that "main" things could run on.
(from IRC):
mThread goes back to the very beginning of the PeerConnectionImpl code, or close.  Had to do archaeology on the mothballed copy of alder that it was developed on, but the mThread var was added by Anant in July of 2012, so that they could pass nullptr to Initialize() for some fakemediastream tests in signaling_unittests.cpp (and also since at that time, the gtest tests might have been running with a non-MainThread mThread).  Nowadays, signaling_unittests.cpp calls NS_GetMainThread() and passes that in, so I think all support for passing in null threads has been removed (and even asserted against), though in theory one could pass a non-MainThread thread in to use for all the MainThread dispatches -- but likely this would break a ton.

Since those tests now actually just use NS_GetMainThread to pass into Initialize(), so I think we can just remove the entire mThread stuff (perhaps it in theory has very small perf wins, but that's ok).  

Also note that there's a CheckThread/CheckThreadInt() call that checks if we're running on mThread, that would have to become a "check we're running on a virtual MainThread"
Flags: needinfo?(rjesup)
> 8) It *appears* that each JS-based RTCPeerConnection instance is paired with a C++-based PeerConnectionImpl instance that actually connects to the underlying connection machinery. Therefore, it seems that the PeerConnectionImpl instance should have the role of the C++ object contemplated in points 6 and 7.

Correct

jib: thoughts on atomics vs runanbles/etc?  (jib implemented most of the PeerConnection.js code)
Flags: needinfo?(jib)
Please do not remove the mThread. I would like to retain the ability to sometime break out PC from Gecko, and while many other things would be required, this is one of them, so removing it would be a step backward.
(In reply to Randell Jesup [:jesup] from comment #13)
> jib: thoughts on atomics vs runnables/etc?

I would piggyback on the PCObserverStateType::IceConnectionState runnable [1], by changing it to run a c++ method instead - e.g. PeerConnectionImpl::IceConnectionStateChange_m - on the main thread that does both your thing and calls PeerConnectionObserver::OnStateChange synchronously. That way it's all main-thread state.

Note that "failed" is not necessarily terminal.

[1] https://dxr.mozilla.org/mozilla-central/rev/5322c03f4c8587fe526172d3f87160031faa6d75/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#3434

PS: I believe PeerConnectionImpl *should* GC whenever RTCPeerConnection is GC'ed, at least it did when I tried it in a simple test. We use a weakptr to the PeerConnectionObserver to avoid a cycle, since PeerConnectionImpl does not participate in cycle-collection (We had to pick between NS_DECL_THREADSAFE_ISUPPORTS and NS_DECL_CYCLE_COLLECTING_ISUPPORTS).
Flags: needinfo?(jib)
jesup, ekr, jib: Thanks for the guidance.

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #15)
> Note that "failed" is not necessarily terminal.

Is there a need for the RTCPeerConnection to process events from the network while in the "failed" state?
Flags: needinfo?(jib)
(In reply to Henri Sivonen (:hsivonen) from comment #9)
>  * OTOH, if the notifying of the inner window lived completely in C++ in
> PeerConnectionImpl, it would be necessary to do one of:
>    - Make the status on the inner window an atomic and update it from a
> non-main thread.
>    - Add another set of runnables (another on top of the ones mentioned in
> point 9 above) for effecting the notification of the inner window from the
> main thread.
>    - Make the runnables mentioned in point 9 carry extra info about the need
> to notify the inner window.

Oops. My assumption that I'd have to start with a non-main thread seems wrong, since PeerConnectionImpl::IceConnectionStateChange runs on the main thread. I was fooled by it posting a runnable.
Comment hidden (mozreview-request)
So this turned out to be much simpler than I thought in terms of threading, since I previously had a misunderstanding of PeerConnectionImpl threading.

It's still a bit unclear to me if "failed" should count as "not active" for throttling and back/forward cache purposes, but it looks like connections on sharedrop.io don't transition (at least not quickly) to "failed" when the peer disappears to the point of the peer no longer showing up in the sharedrop.io UI.

It could be that my pursuit of a solution that would make sharedrop.io after peer going away count as "not active" is misguided.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
The patch assumes that
 1) There is no need to process network events in the "failed" state.
 2) That it's misguided to pursue a definition of "active" that makes sharedrop.io got to "not active" when its peer goes away.

Tests are bug 1378402.
(In reply to Henri Sivonen (:hsivonen) from comment #16)
> Is there a need for the RTCPeerConnection to process events from the network
> while in the "failed" state?

If you include an incoming offer on the signaling-channel, then yes, I believe so.

From https://tools.ietf.org/html/rfc5245#section-2.6

  "Once ICE is concluded, it can be restarted at any time for one or all
   of the media streams by either agent.  This is done by sending an
   updated offer indicating a restart."

If the other agent has reset ICE then an incoming offer is effectively new work that takes us out of "failed", I believe.
Flags: needinfo?(jib)
When the partner disappears, a peerconnection (a connected one) will *eventually* transition to the failed state on consent/keepalive failure.  This will not be for a while, however (at least 30 seconds, perhaps more).
Flags: needinfo?(drno)
(In reply to Eric Rescorla (:ekr) from comment #14)
> Please do not remove the mThread. I would like to retain the ability to
> sometime break out PC from Gecko, and while many other things would be
> required, this is one of them, so removing it would be a step backward.

One issue with leaving mThread is that this forcing sync points into the virtual MainThread pool of threads (really coroutines).  For DataChannelConnections, the pseudo-thread to use for "MainThread" is passed in from PeerConnectionImpl via 'mWindow->EventTargetFor(TaskCategory::Other)', so if we keep mThread, we should pass something like that in instead of GetMainThread().
Comment hidden (mozreview-request)
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #23)
> If the other agent has reset ICE then an incoming offer is effectively new
> work that takes us out of "failed", I believe.

OK. I didn't realize that ICE reset didn't need to be initiated by the peer itself.

The latest patch doesn't try to count "failed" as active. In the new patch, only the state transitioning to "closed" or the connection getting GCed count as no longer active.
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> OK. I didn't realize that ICE reset didn't need to be initiated by the peer
> itself.

Well, "by itself" depends on what you mean. Technically the signaling-channel is outside the domain of WebRTC and RTCPeerConnection, in that the JS is expected to solve this with regular web tech. 99% of the time, the content script will use a websocket connection as its signaling channel. This might matter to you, if open websockets already mark the page as "active".

For example, in a situation where the other peer initiates an ICE restart, the other side does something like this:

  await pc.setLocalDescription(await pc.createOffer({iceRestart: true}));
  webSocket.send({sdp: pc.localDescription});

which restarts ice locally on that end, and sends a new offer on the websocket signaling channel (shortly to be followed by ICE candidates on the same signaling channel), which are received by the other end e.g. like this:

  webSocket.onmessage = async ({data: {sdp, candidate}}) => {
    if (sdp) {
      await pc.setRemoteDescription(sdp); // offer w/ICE restart in this case
      ...
    } else if (candidate) await pc.addIceCandidate(candidate);
  }

But users don't HAVE to use websockets. 
 - They could short-circuit a local-loop demo with no outside influence https://jsfiddle.net/yxbLvjm6/
 - They could use localStorage for a same-browser 2-tab demo https://jsfiddle.net/jib1/80xcgwmc/
 - They could use a data channel from same (or other if "failed") connection: http://jsfiddle.net/jib1/7vxzbybp/
 - They could use clipboard cut'n'paste to another machine somehow: https://jsfiddle.net/jib1/7vv2vxtt/
 - They could rely on bluetooth, SMS, QR codes, or something else.

So technically, "failed" might mean "not active" from the viewpoint of the RTCPeerConnection, but I don't know if we care about these esoteric cases.
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #28)
> (In reply to Henri Sivonen (:hsivonen) from comment #27)
> > OK. I didn't realize that ICE reset didn't need to be initiated by the peer
> > itself.
> 
> Well, "by itself" depends on what you mean.

I meant the PeerConnection itself causing the WebRTC machinery to send/receive packets.

> Technically the
> signaling-channel is outside the domain of WebRTC and RTCPeerConnection, in
> that the JS is expected to solve this with regular web tech. 99% of the
> time, the content script will use a websocket connection as its signaling
> channel. This might matter to you, if open websockets already mark the page
> as "active".

Indeed, a WebSocket connection makes the page "active" for the purpose of throttling. Curiously, a WebSocket connection doesn't appear make the page ineligible to be put in the b/f cache, but that might itself be a bug (or there might be a less obvious mechanism than I can see).

> So technically, "failed" might mean "not active" from the viewpoint of the
> RTCPeerConnection, but I don't know if we care about these esoteric cases.

I don't know if we should care.

Comment 30

2 years ago
mozreview-review
Comment on attachment 8893774 [details]
Bug 1378123 - Make inner window track whether there is an active PeerConnection.

https://reviewboard.mozilla.org/r/164874/#review174558

Missing mActiveOnWindow = false in one key place, looks good otherwise. Any way to test this?

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:3410
(Diff revision 7)
> +  // Uncount this connection as active on the inner window upon close.
> +  if (mWindow && mActiveOnWindow && mIceConnectionState == PCImplIceConnectionState::Closed) {
> +    mWindow->RemovePeerConnection();
> +    mActiveOnWindow = false;
> +  }

Sorry, I gave incomplete advice earlier!

Separate from this "closed" ICE state [1], which we unfortunately do not fire yet (bug 984580), we need to also do this for the "closed" signaling state [2] a.k.a. PCImplSignalingState::SignalingClosed, which is separate, to catch when JS calls pc.close(), which I believe we want to catch.

[1] http://w3c.github.io/webrtc-pc/#dom-rtciceconnectionstate
[2] FYI we're behind the spec (bug 1265827) which has now moved the "closed" signalingState to PeerConnectionState http://w3c.github.io/webrtc-pc/#dom-rtcpeerconnectionstate
Attachment #8893774 - Flags: review?(jib) → review-
Comment hidden (mozreview-request)
(Assignee)

Comment 32

2 years ago
mozreview-review-reply
Comment on attachment 8893774 [details]
Bug 1378123 - Make inner window track whether there is an active PeerConnection.

https://reviewboard.mozilla.org/r/164874/#review174558

I'm hoping that farre would add tests (and show generally how throttling should be tested) in bug 1378402.

> Sorry, I gave incomplete advice earlier!
> 
> Separate from this "closed" ICE state [1], which we unfortunately do not fire yet (bug 984580), we need to also do this for the "closed" signaling state [2] a.k.a. PCImplSignalingState::SignalingClosed, which is separate, to catch when JS calls pc.close(), which I believe we want to catch.
> 
> [1] http://w3c.github.io/webrtc-pc/#dom-rtciceconnectionstate
> [2] FYI we're behind the spec (bug 1265827) which has now moved the "closed" signalingState to PeerConnectionState http://w3c.github.io/webrtc-pc/#dom-rtcpeerconnectionstate

I added code with the assumption that the way things work is that the connection becomes permanently inactive when either the signaling state or the ICE state becomes closed. Is that correct?

Comment 33

2 years ago
mozreview-review
Comment on attachment 8893774 [details]
Bug 1378123 - Make inner window track whether there is an active PeerConnection.

https://reviewboard.mozilla.org/r/164874/#review175322
Attachment #8893774 - Flags: review?(jib) → review+

Comment 35

2 years ago
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/434a7c807ffe
Make inner window track whether there is an active PeerConnection. r=jib

Comment 36

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/434a7c807ffe
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
(In reply to Randell Jesup [:jesup] from comment #24)
> When the partner disappears, a peerconnection (a connected one) will
> *eventually* transition to the failed state on consent/keepalive failure. 
> This will not be for a while, however (at least 30 seconds, perhaps more).
Yes a remote peer which dis-appears will cause the the ICE connection state to switch to failed after 30s.
Flags: needinfo?(drno)
Component: DOM → DOM: Core & HTML
Product: Core → Core
You need to log in before you can comment on or make changes to this bug.