Open Bug 865589 Opened 7 years ago Updated 1 year ago

[meta] [WebRTC] e10s for network operations

Categories

(Core :: WebRTC: Networking, defect)

defect
Not set

Tracking

()

Tracking Status
e10s - ---

People

(Reporter: schien, Unassigned)

References

Details

(Keywords: meta)

Attachments

(1 file)

This bug is for tracking the discussion of e10s WebRTC network operations.
I've done some IPC overhead evaluation for making socket operation e10s on desktop. The test result is based on current TcpSocket e10s implementation, which is passing the serialized packet between chrome process and content process.

Test procedure: 
1. create server socket and client socket in chrome or content process
2. send a 1Kbytes package from client socket to server socket for every K ms
3. measure the delay between sending and receiving packet.

Test Result: (average from 1000 packet)
K    chrome    content
10   1.49ms    1.72ms  (+15.4%)
50   1.602ms   1.81ms  (+13.0%)

Looks like the IPC overhead on desktop is affordable. Patrick will update the test result on b2g phone later.
Whiteboard: [WebRTC] → [webrtc][blocking-webrtc-]
I've just done the experiment on unagi. The experiment is also based on current e10s-enabled TCPSocket.

K     In-proc   oop
10    2.54ms    33.025ms
50    5.85ms    12.74ms

Each result is average of 1000 send/recv pairs.
(In reply to Eric Rescorla (:ekr) from comment #1)
>
> See: https://wiki.mozilla.org/Media/WebRTC/WebRTCE10S/NetworkProxyInterface
> for a strawman interface.

The address field in |struct Interface| should probably be an array, as an interface may have multiple IPv4 addresses, and will most definitely have multiple IPv6 addresses.

For |SocketCreate|, PRNetAddr may not be enough to uniquely identify an interface in the case of connections to multiple private networks that happen to hand out colliding addresses; in that case, a separate interface specifier will be needed.
(In reply to Mike Habicher [:mikeh] from comment #4)
> (In reply to Eric Rescorla (:ekr) from comment #1)
> >
> > See: https://wiki.mozilla.org/Media/WebRTC/WebRTCE10S/NetworkProxyInterface
> > for a strawman interface.
> 
> The address field in |struct Interface| should probably be an array, as an
> interface may have multiple IPv4 addresses, and will most definitely have
> multiple IPv6 addresses.

Good point. We are currently only prepared to handle one, however.


> For |SocketCreate|, PRNetAddr may not be enough to uniquely identify an
> interface in the case of connections to multiple private networks that
> happen to hand out colliding addresses; in that case, a separate interface
> specifier will be needed.

Generally, it's a pretty serious problem if you have the same
address on multiple interfaces because it means you have ambiguous
routing to addresses on the networks those interfaces are on.
AFAICT know sockets has no way to bind to a specific interface
other than specifying your address (at least that's the API
we assume) so I think this is OK.
(In reply to Eric Rescorla (:ekr) from comment #5)
> 
> Generally, it's a pretty serious problem if you have the same
> address on multiple interfaces because it means you have ambiguous
> routing to addresses on the networks those interfaces are on.
> AFAICT know sockets has no way to bind to a specific interface
> other than specifying your address (at least that's the API
> we assume) so I think this is OK.

Linux solves this with setsockopt(SO_BINDTODEVICE), which takes a string identifying the interface a socket is bound to (e.g. "eth0").  It's been a while since I've looked at the protocol stack code, but IIRC, this sets 'sk_bound_dev_if' in the socket structure, which is propagated to outgoing packets, and constrains routing decisions in the IP layer.  (I believe BSD also supports this, and newer versions of Windows support something similar.)

It's the only way to solve address collisions, which may be an issue on multihomed devices running Firefox OS, where the cell carrier and the WiFi connections are both NATed and both get, e.g., 10.1.1.1.

This doesn't happen often, but it can, and would break services that need access to the core carrier network while a higher-priority WiFi connection is active.  (Though whether or not any of these will be using WebRTC is a whole other matter.)
(In reply to Mike Habicher [:mikeh] from comment #6)
> (In reply to Eric Rescorla (:ekr) from comment #5)
> > 
> > Generally, it's a pretty serious problem if you have the same
> > address on multiple interfaces because it means you have ambiguous
> > routing to addresses on the networks those interfaces are on.
> > AFAICT know sockets has no way to bind to a specific interface
> > other than specifying your address (at least that's the API
> > we assume) so I think this is OK.
> 
> Linux solves this with setsockopt(SO_BINDTODEVICE), which takes a string
> identifying the interface a socket is bound to (e.g. "eth0").  It's been a
> while since I've looked at the protocol stack code, but IIRC, this sets
> 'sk_bound_dev_if' in the socket structure, which is propagated to outgoing
> packets, and constrains routing decisions in the IP layer.  (I believe BSD
> also supports this, and newer versions of Windows support something similar.)

OK, I didn't know that. We don't currently support that, but we have
the information at the time we set up the socket, so we can certainly
add it to this interface.


> It's the only way to solve address collisions, which may be an issue on
> multihomed devices running Firefox OS, where the cell carrier and the WiFi
> connections are both NATed and both get, e.g., 10.1.1.1.
> 
> This doesn't happen often, but it can, and would break services that need
> access to the core carrier network while a higher-priority WiFi connection
> is active.  (Though whether or not any of these will be using WebRTC is a
> whole other matter.)

I'm not sure how it solves the problem, though. Which interface do we
think we send to 10.1.1/24 on?
(In reply to Eric Rescorla (:ekr) from comment #7)
> (In reply to Mike Habicher [:mikeh] from comment #6)

> > It's the only way to solve address collisions, which may be an issue on
> > multihomed devices running Firefox OS, where the cell carrier and the WiFi
> > connections are both NATed and both get, e.g., 10.1.1.1.
> > 
> > This doesn't happen often, but it can, and would break services that need
> > access to the core carrier network while a higher-priority WiFi connection
> > is active.  (Though whether or not any of these will be using WebRTC is a
> > whole other matter.)
> 
> I'm not sure how it solves the problem, though. Which interface do we
> think we send to 10.1.1/24 on?

Lacking a specific binding, that's determined by the order the interfaces appear in the kernel routing table.

What's important for us (WebRTC) is that our ICE machinery will use both interfaces as foundations (even if they have identical address ranges) and determine which one works for getting to a particular 10.1.1.x host. I haven't looked at it with an eye towards verifying that this works, but we'll want to make sure that it does.
(In reply to Adam Roach [:abr] from comment #8)
> Lacking a specific binding, that's determined by the order the interfaces
> appear in the kernel routing table.

Right, but in general this means if you are not running ICE
you are going to have a bad time.
Ben, Doug, Kyle -- We found some surprisingly high latency numbers on B2G.  (See Comment 3.)  We were wondering if you had run some tests like this yourself and what the results were.  Also, any theories as to what's going on are welcome.  Thanks.
Attached patch Test on unagiSplinter Review
The code I added for measuring the performance of TCPSocket on unagi. It creates a server to listens to 56789 port and any app can call runTCPTest to start the test. When the function is called by an oop app, the result is for oop. The timestamps of send/recv are printed to logcat.
Note that these tests involve a lot of JS. One question we should resolve before we get too deep in is whether that is a potential source of latency. Ben/Doug/Kyle: Can you take a look at this test and see if you think it's reasonable representative of what we need to measure?

Other potential issues: Nagle algorithm....
Flags: needinfo?(khuey)
Update the desktop test result with different packet size. Overhead is increasing along with the packet size.

Test Result: (average from 1000 packet)
K(ms)  Size(bytes)  chrome(ms)  content(ms)
10     1000         1.508       1.677 (+11.2%)  
        500         1.263       1.443 (+14.3%)
        100         1.163       1.206 (+3.7%)
50     1000         1.505       1.773 (+17.8%)
        500         1.357       1.518 (+11.9%)
        100         1.247       1.252 (+0.4%)
I modified the code by moving the dump() to end of test.

result:
K = 10, 1 Kbyte, OOP: 18.92ms
K = 10, 1 Kbyte, Inproc: 2.70ms

K = 50, 1 kbyte, OOP: 12.35ms
K = 50, 1 kbyte, Inproc: 5.75ms

calling dump() every time affects our result a lot.
Why the numbers from Patrick Wang is so much different from SC.  What is "K" meaning?
K is milliseconds between transfers I believe

SC gets  very different numbers because he's running the test on a Desktop machine
My testcase is ran on xpcshell on desktop, which is a faster and simpler environment compared to b2g device.
Bug 850175 is also talking about overhead of IPC, but its focus is on memory duplication.  We are talking about COW with shared memory on B2G mailing.  I think we can also apply shared memory to avoid memory copy to reduce overhead of IPC.  My idea is to create a tmpfile as the back storage of memory mapping for big strings or data blocks, and send the file descriptor to the peer of the IPC, so the peer can mmap the file descriptor to get the content.

I think it can be implement at IPC and nsString code by detecting if a string or an uint8_t array is big enough to apply shared memory in IPC.
Isn't there already an IPDL type for carrying shared memory objects?

https://wiki.mozilla.org/IPDL/Shmem

I'd basically assume that we're sending over a fairly minimal packet size (~50 bytes)
with meta-information and a reference to the Shmem object.
I want to share more by mapping also nsString objects at both parent and child into the same back storage and COW, for using tmpfile.  And, I suppose that the IPC users don't need to know it, for changing trait of nsString.

Basically, Shmem is also using the same technical (tmpfile) to create share memory.  So, the implementation can reuse the back objects (tmpfile) created by Shmem.  But, we may also use COW shmem at memcpy, more generic way is required for this case.
I'm not following what you are saying here:

1. The only thing we need in this case is to copy at high volume is byte arrays.
2. We don't need them to be COW. The target can simply own them
as long as there is some way for the sender to not have to constantly
allocate new buffers.
I am talking what bug 850175 trying to resolve can also be leveraged by this bug.
Does NrSocket handle all the network traffic for nICEr/mtransport, so e10s-ize NrSocket can enable e10s for network?
(In reply to Thinker Li [:sinker] from comment #22)
> I am talking what bug 850175 trying to resolve can also be leveraged by this
> bug.

Perhaps so, but they really seem to me to be solving different problems. Bug 850175 is
about handling of long-term data whereas we are explicitly dealing only with very short-term
data. Can you explain why the problem we have can't be solved simply by having the data
buffer be carried in an ipdl Shmem?
(In reply to Patrick Wang [:kk1fff] from comment #23)
> Does NrSocket handle all the network traffic for nICEr/mtransport, so
> e10s-ize NrSocket can enable e10s for network?

It handles all the I/O, but you also need to port addrs.c for interface
enumeration.

I think we should start from the bottom here: write the IPC interface
for the primitives and then we can discuss how to port the ICE code.
(In reply to Eric Rescorla (:ekr) from comment #24)
> Perhaps so, but they really seem to me to be solving different problems. Bug
> 850175 is
> about handling of long-term data whereas we are explicitly dealing only with
> very short-term
> data. Can you explain why the problem we have can't be solved simply by
> having the data
> buffer be carried in an ipdl Shmem?
Thinker is trying to reduce the memory copy for primitive types by using shared memory directly in ipdl codegen, therefore, we don't need to use ipdl Shmem by ourselves. We could still use ipdl Shmem if we have to.
(In reply to Shih-Chiang Chien [:schien] from comment #26)
> (In reply to Eric Rescorla (:ekr) from comment #24)
> > Perhaps so, but they really seem to me to be solving different problems. Bug
> > 850175 is
> > about handling of long-term data whereas we are explicitly dealing only with
> > very short-term
> > data. Can you explain why the problem we have can't be solved simply by
> > having the data
> > buffer be carried in an ipdl Shmem?
> Thinker is trying to reduce the memory copy for primitive types by using
> shared memory directly in ipdl codegen, therefore, we don't need to use ipdl
> Shmem by ourselves. We could still use ipdl Shmem if we have to.

I'm primarily concerned with how long this is going to take to implement.
It seems like we currently have Shmem so we don't need to improve the
ipdl system at all. That seems like a big win which we can revisit if
it turns out to still be too slow.
Note that all our HTTP traffic for B2G is currently using regular IPDL semantics, i.e. we just pass in arguments (CString httpPayload) and they get copied across IPC, without using shared memory.  That hasn't killed us, but we also haven't benchmarked the performance on a real B2G phone, so it may or may not be performant enough for the WebRTC case (and we may need to fix HTTP if it's too slow).

I think we need to get some necko xpcshell test numbers on unagi to see how we compare with desktop--the test that uses JS may have a lot of other stuff going on.  I tried to get xpcshell working with my unagi phone a few weeks ago and didn't get very far.  Has anyone else gotten it to work?
Component: WebRTC → WebRTC: Networking
It's a little unclear to me from the previous comments whether it would be even more helpful to entirely eliminate JS from the performance numbers you're looking at.  If so, basic C++ unit tests were functioning on Android at least as recently as March, and if they're not already functioning on B2G, my suspicion is that porting the Android work wouldn't too hard.  If that is indeed a worthwhile direction to explore, I suggest adding needinfo? gbrown.
Depends on: 867933
Depends on: 870660
I'm not sure if it's still relevant, but I don't think I know enough about what we're trying to test here to answer comment 12?  Are we concerned about the cost of e10s or the cost of calling JS/etc?

Feel free to needinfo me again if you think I have something useful to add.
Flags: needinfo?(khuey)
Depends on: 950660
backlog: --- → webRTC+
Keywords: meta
QA Contact: jsmith
Whiteboard: [webrtc][blocking-webrtc-]
backlog: webRTC+ → ---
Summary: [WebRTC] e10s for network operations → [meta] [WebRTC] e10s for network operations
You need to log in before you can comment on or make changes to this bug.