Closed Bug 1569183 Opened 1 year ago Closed 1 year ago

Write a TCP socket class that handles DNS queries and web-proxy

Categories

(Core :: Networking, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox71 --- fixed

People

(Reporter: bwc, Assigned: bwc)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(17 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

To simplify the webrtc networking code, and provide a tool for fixing bug 1385030, bug 1421240, and bug 1189055, it would make sense to have a TCP socket implementation that takes an FQDN/port/IP protocol (the destination), and an SSL flag. This implementation would determine whether it should connect via a web-proxy (based on proxy config), do DNS A or AAAA lookups if needed, and then connect to the destination.

dom::TCPSocket could be modified to do this stuff, or we could create a new class (perhaps wrapping TCPSocket) that does this stuff.

It would also be nice for this API to be usable from any process (and have it automatically do IPC when needed), but that is not strictly necessary. It would need to have an IPC wrapper, at least.

Honza, which approach do you think would be more fruitful; augmenting dom::TCPSocket, or creating something new? On first glance, I am leaning toward augmenting dom::TCPSocket.

Flags: needinfo?(honzab.moz)

I think that if we were to use TCPSocket, all we would need to add is the web proxy support. It looks like it already handles FQDN, and upon further thought, we probably do not need any control over whether we resolve A or AAAA. The w3c spec for TCPSocket doesn't mention web proxies, so I'm guessing we would not want to use web proxy support by default.

One potential sticking point; TCPSocket works on the socket process currently, but it is unclear whether things like nsIProtocolProxyService work on the socket process. If they do not, we will likely need to use the PProxyLookup.ipdl that baku is putting together over in bug 1567892.

Taking this, since I am working on a strawman implementation.

Assignee: nobody → docfaraday

Hi, sorry for late answer.

The terminal API for WebRTC is something more up to you to decide, TCPSocket is probably fine.

My old idea for the necko changes is somewhat vague. I wanted to get rid of the 15+ years old and obsolete scheduling internal mess and rather just async-provide a stream-like or socket-like interface to dispatch any application level protocols on and not care about what it is - an end to end socket, proxy tunnel... it would clean things up and allow us to do optimizations better and easily support way more use cases, like this one.

To build this is a project and not small. So let's go incrementally.

What you need is similar to how websockets are created, except that any HTTP preflight is only needed for http(2) proxies (as you say in comment 1). Right now, all the preparations for a connection happen in nsHttpChannel, parent process. We resolve the proxy and host names somewhat in parallel, then pre-connect. If there is a proxy requiring authentication, the channel on the parent process will handle it (may also trigger the authentication dialog.) With the socket process, we will do proxy resolution still on the parent process, proxy auth handling as well; socket connections will be actually happening there including recognition if proxies and servers talk h2 or not (if yes, h2 stack is inserted, going to also live on socket process).

Good is that the data connection itself will fully live in the socket process, so, after all the establishing steps are done (like nsHttpChannels gave up the created socket/tunnel and went away) we can IPC from the socket process to the originally requesting process.

So, the process you will ask for a connection will be the parent process. When established we can then use PBackground or end-points to link the socket process and the requesting content process directly. The TCPSocket on the parent will look for a proxy for the URL in question. What URL you will use is up to you - tcp://example.com:12345 ?. If an 'http' or 'https' proxy is setup for that url, you will have to use http channel, created as 'proxied channel' [1] with the already resolved proxy info assigned directly on it and with 'connect only' flag set (introduced in bug 1203503, bug 1416220, I believe you know this well enough). Then, it's up to necko internals to provide (subject to figure out how exactly) an object implementing some suitable API to easily plug it in to TCPSocket.

If you are fine with the actual socket live on the parent process (full socket process is still in dev) then it may be easy, the result of 'connect only' is an actual socket. If there is an h2 proxy we may use the same code we already have for ws-over-h2, but I don't know that code that well.

[1] https://searchfox.org/mozilla-central/rev/b38e3beb658b80e1ed03e0fdf64d225bd4a40327/netwerk/base/nsIProxiedProtocolHandler.idl#32

Flags: needinfo?(honzab.moz)
Whiteboard: [necko-triaged]

Ok. So what I am putting together is mostly a refactoring of dom::TCPSocket into a class that provides the core functionality of dom::TCPSocket (including any necessary IPC), and a subclass that has the DOM hooks to satisfy the webidl interface for TCPSocket. Webrtc and TCPSocketParent will use the base-class with its core functionality. There will also be some cleanup and simplification that I did on the way.

Given that I am creating a base-class that isn't involved with DOM, should that live in netwerk? Maybe call it nsTCPSocket? RIght now it is named TCPSocketBase and lives in TCPSocket.h/cpp, but I want to ask before moving it somewhere else.

Flags: needinfo?(honzab.moz)

Yeah, the base class should probably live in netwerk/base, yes.

Flags: needinfo?(honzab.moz)

One of the larger snags I've hit is the fact that the socket process does not seem to have a cycle-collector, which causes assertions when I try to use things like dom::TCPSocket. dom::TCPSocket is cycle-collected because it registers itself as a listener on lots of its members, creating lots of cycles. The nsTCPSocket class will have this property also, and I don't see any easy way to avoid it. We could try to add some Destroy() API to nsTCPSocket that would deregister these callbacks (thus breaking the cycles), but it is unclear when dom::TCPSocket would call this API. Maybe dom::TCPSocket could hold a weak reference to the nsTCPSocket, but that seems awkward.

Is there a way we could enable the cycle collector on the socket process? Or is there some way around this mess that I am missing? Are there other things that we will eventually need the cycle collector for as we migrate necko stuff to the socket process?

Flags: needinfo?(kershaw)

I am afraid that there is no easy way to enable cycle collector on the socket process, since we only initialize minimal XPCOM on socket process. As far as I can tell, there is no stuff in necko that needs the cycle collector.

Maybe the best way is to let dom::TCPSocket hold a strong reference to nsTCPSocket. We just have to break the cycle manually.

Flags: needinfo?(kershaw)
Flags: needinfo?(honzab.moz)

So, there's another snag; the API that TCPSocket uses internally is not usable on the socket process yet, and the IPC protocol that TCPSocket uses is not usable on the socket process either. I guess it wouldn't be too hard to allow PTCPSocket to be used between the socket/parent process, and that's something I can do. But for future reference, does anyone have an idea how hard would it be to get the necko stuff that TCPSocket uses working on the socket process?

Flags: needinfo?(kershaw)

(In reply to Byron Campen [:bwc] from comment #9)

So, there's another snag; the API that TCPSocket uses internally is not usable on the socket process yet, and the IPC protocol that TCPSocket uses is not usable on the socket process either. I guess it wouldn't be too hard to allow PTCPSocket to be used between the socket/parent process, and that's something I can do. But for future reference, does anyone have an idea how hard would it be to get the necko stuff that TCPSocket uses working on the socket process?

Could you be more specific about which API not working?
I guess TCPSocket only uses these three objects to manipulate the tcp socket.

Flags: needinfo?(kershaw)

I think all have been answered during a f2f meeting.

  • the code will live on the parent process for now, sufficient for fulfilling the goal and simple to implement
  • we still need to use an http channel in the 'connect only' mode to tunnel via 'http' and 'https' proxies
Flags: needinfo?(honzab.moz)

Patches are basically cleaned up now, and now it is clear that there's yet another missing piece; an IPC channel between the socket and parent processes. This is necessary if we're going to use the TCP socket class on the socket process, because the socket process does not support a lot of the API it uses. Has anyone started an effort like this already that I could pick up and finish?

Flags: needinfo?(kershaw)

(In reply to Byron Campen [:bwc] from comment #12)

Patches are basically cleaned up now, and now it is clear that there's yet another missing piece; an IPC channel between the socket and parent processes. This is necessary if we're going to use the TCP socket class on the socket process, because the socket process does not support a lot of the API it uses. Has anyone started an effort like this already that I could pick up and finish?

No, no one in necko team has enough cycle to do this.
Would allowing PTCPSocket to be used between parent and socket process be enough?

Flags: needinfo?(kershaw)

(In reply to Kershaw Chang [:kershaw] from comment #13)

(In reply to Byron Campen [:bwc] from comment #12)

Patches are basically cleaned up now, and now it is clear that there's yet another missing piece; an IPC channel between the socket and parent processes. This is necessary if we're going to use the TCP socket class on the socket process, because the socket process does not support a lot of the API it uses. Has anyone started an effort like this already that I could pick up and finish?

No, no one in necko team has enough cycle to do this.
Would allowing PTCPSocket to be used between parent and socket process be enough?

Implementing something like PSocketProcessBridge for a socket/parent IPC link would be enough for my purposes, yes. I can implement that, I think, but was just wondering if anyone had started on something like that on a project branch or something.

We have implemented a PBackground ipc channel [1] between parent and socket process.
Not sure if this is what you need.

[1] https://hg.mozilla.org/projects/larch/rev/af35c413ac4ae3eb165399fc210a391095e00be9

So I got this mostly working, but I was really unhappy with some of the things I needed to do in the refactoring to make it usable without the cycle collector. I have tried an alternate approach that is much cleaner, that involves enhancing WebrtcProxyChannel to handle plain TCP connections.

Depends on D45104

Depends on D45105

Attachment #9091197 - Attachment description: Bug 1569183: (WIP) Remove some stuff from TCPSocket that NrTcpSocketIpc was the only user of. → Bug 1569183: Remove some stuff from TCPSocket that NrTcpSocketIpc was the only user of. r?mayhemer
Attachment #9091626 - Attachment description: Bug 1569183: Stop doing a proxy lookup to determine whether we're configured to use a proxy (for the proxy_only_if_behind_proxy)), and instead look at whether we loaded the doc using a proxy. r?mjf,mayhemer → Bug 1569183: Stop doing a proxy lookup to determine whether we're configured to use a proxy (for the proxy_only_if_behind_proxy pref), and instead look at whether we loaded the doc using a proxy. r?mjf,mayhemer

Try looks good. I am going to run a -u all try push now, because I've touched more than just webrtc stuff here.

Try run focused on webrtc looks good. Here's a -u all try run, since I've touched more than just webrtc code.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b03c3c7ab9f8c5040104735f65fcf1e178042b36

No unusual failures in that -u all try push.

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2b7dc87046da
Add some much-needed logging to this function. r=mjf
https://hg.mozilla.org/integration/autoland/rev/82af7b115f60
Teach WebrtcProxyChannel to handle non-proxied TCP connections, and SOCKS proxying because that was easy to do also. r=mjf
https://hg.mozilla.org/integration/autoland/rev/c684f79d8c67
Rename PWebrtcProxyChannel to PWebrtcTCPSocket. r=mjf
https://hg.mozilla.org/integration/autoland/rev/6059845767eb
Rename NrSocketProxy to NrTcpSocket, and use NrTcpSocket for TCP on content/socket process. r=mjf
https://hg.mozilla.org/integration/autoland/rev/ed1ae6a85255
Rename SetProxyServer to SetProxyConfig to better reflect what is actually going on. r=mjf
https://hg.mozilla.org/integration/autoland/rev/f5353ea9f1df
Add a proxy policy argument to NrSocketProxyConfig, and always set this config on the NrIceCtx. r=mjf
https://hg.mozilla.org/integration/autoland/rev/b879a0044cf3
Teach WebrtcTCPSocket to bind to a specific host/port. r=mjf
https://hg.mozilla.org/integration/autoland/rev/b83124cf3a89
Teach WebrtcTCPSocket to use TLS when configured to. r=mjf,mayhemer
https://hg.mozilla.org/integration/autoland/rev/da41f0fab427
Make proxy config parameters optional in WebrtcTCPSocket to get the NAT simulator working again. r=mjf,mayhemer
https://hg.mozilla.org/integration/autoland/rev/bd833f5afa30
Make the proxy_only pref disable UDP. r=mjf
https://hg.mozilla.org/integration/autoland/rev/bf207998251a
Remove NrTcpSocketIpc. r=mjf
https://hg.mozilla.org/integration/autoland/rev/940d7f787bea
Remove some stuff from TCPSocket that NrTcpSocketIpc was the only user of. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/c43589fd0f90
Update OnConnected callbacks to specify what type of proxying (if any) is being used, and implement IsProxied based on this information. r=mjf
https://hg.mozilla.org/integration/autoland/rev/6e1eba5c88ad
Stop doing a proxy lookup to determine whether we're configured to use a proxy (for the proxy_only_if_behind_proxy pref), and instead look at whether we loaded the doc using a proxy. r=mjf,mayhemer,jld
https://hg.mozilla.org/integration/autoland/rev/8196045d38b5
Create an ipdl struct called WebrtcProxyConfig, to cut down on the arg count on lots of functions. r=mjf,mayhemer
https://hg.mozilla.org/integration/autoland/rev/cf23522758d5
Simplify how NrSockets specify whether they're proxied or not. r=mjf
https://hg.mozilla.org/integration/autoland/rev/cd286d53c121
Fix bugs where OnProxyAvailable would be called with a proxy info whose resolve flags did not match what was asked for. r=mayhemer

Hi, I am wondering if it can also be inserted in "Blocks:" for
https://bugzilla.mozilla.org/show_bug.cgi?id=1239006
?

Yes, you're right.

Blocks: 1239006
Regressions: 1582646
Duplicate of this bug: 1421240
Duplicate of this bug: 1189055

Because of the number of patches here, I'm unclear on how big this change actually is. We have a customer that is experiencing

https://bugzilla.mozilla.org/show_bug.cgi?id=1505329

and this fixes it and they were wondering the likelihood of getting this on the ESR.

Is this an extremely large change? Is it scoped well?

It is a big change, yes. I'm not comfortable uplifting this to ESR.

You need to log in before you can comment on or make changes to this bug.