Closed Bug 1051685 Opened 5 years ago Closed Last year
RTC data channels always use the default SCTP window size of 128K
46 bytes, text/x-phabricator-request
|Details | Review|
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2067.0 Safari/537.36 Steps to reproduce: Using a simple js application which repeatedly sends data over a WebRTC data channel, very low throughput rates can be observed with large RTT. The unchanged window size can be confirmed with a SCTP log dump, where a_rwnd is always the default of 131072. I have attached a patch which could be used temporarily for increasing the initial window size to 1M. Ideally this should be scaled dynamically? This has been discussed further on: https://groups.google.com/forum/#!topic/discuss-webrtc/0synE_0zeCQ Actual results: Low throughput rates at large RTT. Expected results: No drop in throughput should happen when RTT is under reasonable levels (sub 200ms)
Randall, I'm needinfo'ing you because this has a patch and you're the person I've interacted with most wrt webrtc - if you know a better person to help this bug along, please feel free to pass the needinfo along - thanks!
Component: Untriaged → WebRTC: Networking
Product: Firefox → Core
I'm not sure we want to unilaterally increase the window. Michael, any comments/suggestions/fixes?
Flags: needinfo?(rjesup) → needinfo?(tuexen)
Randell, the throughput is limited by min(my_send_window, peer_receive_window)/RTT. The patch submitted looks good to change the send/recv window to 1 MB. You can set it to any number you think makes sense... It was chosen initially pretty conservative, since SCTP will try to use the above limit if allowed by the congestion control. If this affects the media streams in the same peer connection is currently not taken into account. We might want to use a delay based CC in the future or couple the media stream CC with the one of SCTP, but we neither have a concept for this right now nor we have code. The FreeBSD kernel uses something in the order of 2 MB for the send/recv buffer... But it doesn't deal with the impact on media streams... Does this help? Best regards Michael
Assignee: nobody → rjesup
Status: UNCONFIRMED → ASSIGNED
backlog: --- → webRTC+
Ever confirmed: true
Priority: -- → P2
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
Putting latency aside for a second, what might be interesting is that RAWRTC to RAWRTC results in ~310 Mbit/s throughput on the same machine when sending 1 GiB even though it uses the default receiver window (128 KiB) as well. Firefox 57 to Firefox 57 results in ~93 Mbit/s. Sending from Firefox 57 to RAWRTC (and vice versa) results in ~130 Mbit/s.
(In reply to Lennart Grahl from comment #5) > Putting latency aside for a second, what might be interesting is that RAWRTC > to RAWRTC results in ~310 Mbit/s throughput on the same machine when sending > 1 GiB even though it uses the default receiver window (128 KiB) as well. > Firefox 57 to Firefox 57 results in ~93 Mbit/s. Sending from Firefox 57 to > RAWRTC (and vice versa) results in ~130 Mbit/s. My guess would be that this caused by additional buffering in Firefox e10s code (more specifically the IPC code connecting the SCTP stack running in the child process with the actual network sockets running in the parent process). Lenart: as you did comparings already, can you check how the throughput changes with the patch applied?
Looks like in our tests we need to adjust the size of the big data here https://searchfox.org/mozilla-central/source/dom/media/tests/mochitest/dataChannel.js#206 and/or the low amount threshold in the same test to make that test still pass with this change.
Tested on my PC this time, so results are a little different: RAWRTC (128 KiB) to FF 58 (1 MiB): 238 Mbit/s RAWRTC (128 KiB) to FF 57 (128 KiB): 247 Mbit/s RAWRTC (128 KiB) to FF 58 (1 MiB, no e10s): 314 Mbit/s RAWRTC (128 KiB) to FF 57 (128 KiB, no e10s): 336 Mbit/s RAWRTC (128 KiB) to RAWRTC (128 KiB): 422 Mbit/s RAWRTC (1 MiB) to FF 58 (1 MiB): 245 Mbit/s RAWRTC (1 MiB) to FF 57 (128 KiB): 248 Mbit/s RAWRTC (1 MiB) to FF 58 (1 MiB, no e10s): 300 Mbit/s RAWRTC (1 MiB) to FF 57 (128 KiB, no e10s): 336 Mbit/s RAWRTC (1 MiB) to RAWRTC (1 MiB): 238 Mbit/s FF 58 (1 MiB) to FF 58 (1 MiB): 176 Mbit/s FF 58 (1 MiB) to FF 58 (1 MiB, no e10s): 210 Mbit/s FF 58 (1 MiB, no e10s) to FF 58 (1 MiB, no e10s): 272 Mbit/s (skipped testing FF 57 with e10s) FF 57 (128 KiB, no e10s) to FF 57 (128 KiB, no e10s): 285 Mbit/s Results are a bit flaky, especially on non-e10s, so be aware they may not be entirely representative. When running in non-e10s, both RAWRTC and FF are nearly at 100% CPU load (only one core) but I'm also seeing a lot of drops in CPU load. The more drops, the worse the throughput in the end. Overall, the results are puzzling to me as the larger window seems to decrease throughput in a near-0-RTT scenario. Is that to be expected, Michael? However, this probably says nothing about large RTT scenarios.
Flags: needinfo?(lennart.grahl) → needinfo?(tuexen)
Thanks for testing Lennart! But these number don't look like we would actually gain much just from increasing the windows size. If at all it looks like we should adjust the window size depending on the RTT. Theoretically ICE could tell us the RTT of the connection, before SCTP kicks in. But that's a lot more complex task to do.
Nils: this bug is about low rates at large RTT - large RTT means you'll saturate your window and stop sending, waiting for acks. lennart's tests were at 0 RTT, and so should show little or no difference (perhaps even negative)
Like Randell said, my rather naive tests tell us nothing about large RTT scenarios. What puzzles me is that throughput with a larger window is reduced slightly for FF - I would like to understand why that is the case (hence pinging Michael)... and by a whopping amount for RAWRTC (however, the RAWRTC case doesn't need to be discussed here). I'm thinking about creating a test setup for simulating delay but will talk to Michael first.
The send and receive buffer sizes should only affect the performance substantially when having a non-zero RTT. So yes, it makes sense to setup a testbed for this. One can use dummynet to emulate delay. When running both applications on the same machine, I guess OS scheduling and other things like that dominate the performance.
These are my results of testing **in RAWRTC** with various window sizes and several RTT values: https://docs.google.com/spreadsheets/d/1Ze2hZl9KZJ1hKcm5Y9J0OIknQCBHUpw5eefskCc8xpM/edit?usp=sharing I think we should at least go for 1 MiB window sizes by default since it definitely has a positive impact on throughput as you can see. However, the impact of large window sizes is not as large as I would have expected. If you look at my test results, there's definitely something odd going on (hence pinging Michael). CPU usage and throughput is erratic even though the RTT is constant. Interestingly, this only starts to happen once the windows are being raised from 128 KiB. Overall, CPU usage seems to be quite high (I ran this on an i7-6700 @ 3.4GHz) relative to the throughput. But keep in mind that DTLS is also actively encrypting and decrypting packets (`ECDHE-ECDSA-AES128-GCM-SHA256` is the negotiated cipher suite)... although I have the feeling this is not the reason for the high CPU usage. I've seen similar CPU usage in Firefox, so I don't expect it to behave much different than RAWRTC but I will do a sanity-check.
Looks like 1MiB would be a safe choice for now. Might be time to do profiling to get an idea where the stack spends most of it's time.
Hi Lennart, when using large send/recv windows are you observing packet drops? I'm wondering why the throughput is not as expected and that would be an explanation. So write a .pcapng file with the SCTP packets and see if you can find SACKs with gap reports. Please note that even an end-host can drop packets in its UDP stack.
Comment on attachment 8923923 [details] Bug 1051685: increase SCTP window size from 128K to 1M https://reviewboard.mozilla.org/r/195080/#review242842
Michael and I had a brief email conversation. There is nothing obvious that would explain the low throughput. Sadly, I need to pause investigating this for now. I plan to come back to this in a few months time if possible. For now, I agree with Nils that we should increase the window sizes to 1 MiB.
I fully support trying to increase the window size - but it should be rigorously checked in terms of performance. You wouldn't want to introduce a change that for some reason increases cpu usage or decreases throughput for 10% of users.
From the email conversation between Michael and me, traces indicate that the association is in congestion avoidance mode and throughput increases... just very slowly (my tests where usually ~1 minute in length) and so does CPU usage. It very much looks like high throughput results in high CPU usage. But I haven't traced those, so this is simply what I observed. Since you obviously have some data channel use cases, I would suggest grabbing the package from the try build and checking for yourself. :)
(In reply to Shachar from comment #20) > I fully support trying to increase the window size - but it should be > rigorously checked in terms of performance. > You wouldn't want to introduce a change that for some reason increases cpu > usage or decreases throughput for 10% of users. What do you suggest we should do then here?
check throughput and cpu over the following matrix: 1. FF->FF, FF->Chrome, Chrome->FF 2. latency added: 0ms, 10ms, 50ms, 150ms, 250ms, 500ms
I tested our application and it behaves fine under the changes. reply to Lennart Grahl from comment #14) > These are my results of testing **in RAWRTC** with various window sizes and > several RTT values: > https://docs.google.com/spreadsheets/d/ > 1Ze2hZl9KZJ1hKcm5Y9J0OIknQCBHUpw5eefskCc8xpM/edit?usp=sharing Can you please focus on the 200,500ms case - Is it indeed the case that with the 1MB window you get lower throughput there?
I honestly can't tell what's wrong with the 200ms and 500ms cases. But I don't think we should focus on that too much since the throughput is already pretty low on that level. Like I said, the traces indicated that we were in congestion avoidance mode which is fine. Michael, is there anything you think should be investigated or should we finally move forward and deploy 1 MiB as the new default?
Increasing the window only allows a higher bandwidth to be used. If that is fine (it might interact with media streams), then go ahead.
Let's get this upstream. :)
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/f7d1fd195963 increase SCTP window size from 128K to 1M. r=lgrahl
You need to log in before you can comment on or make changes to this bug.