Closed Bug 1051685 Opened 11 years ago Closed 7 years ago

WebRTC data channels always use the default SCTP window size of 128K

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla64

Tracking Flags:

Tracking

Status

firefox64

---

fixed

Blocking Flags:

backlog

webrtc/webaudio+

People

(Reporter: fruitiex, Assigned: jesup)

Details

Attachments

(1 file, 4 obsolete files)

firefox_sctpwnd.diff 11 years ago fruitiex 993 bytes, patch		Details \| Diff \| Splinter Review
Bug 1051685: increase SCTP window size from 128K to 1M 8 years ago Nils Ohlmeier [:drno] 59 bytes, text/x-review-board-request	lgrahl : review+	Details
Bug 1051685: adjust test data to bigger buffer size 7 years ago Nils Ohlmeier [:drno] 59 bytes, text/x-review-board-request	lgrahl : review+	Details
Bug 1051685: move data channel bufferedAmountLow into its own test 7 years ago Nils Ohlmeier [:drno] 59 bytes, text/x-review-board-request		Details
Bug 1051685: increase SCTP window size from 128K to 1M. 7 years ago Nils Ohlmeier [:drno] 46 bytes, text/x-phabricator-request		Details \| Review

fruitiex

Reporter

Description

•

11 years ago

Attached patch firefox_sctpwnd.diff (obsolete) — Details — Splinter Review

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2067.0 Safari/537.36 Steps to reproduce: Using a simple js application which repeatedly sends data over a WebRTC data channel, very low throughput rates can be observed with large RTT. The unchanged window size can be confirmed with a SCTP log dump, where a_rwnd is always the default of 131072. I have attached a patch which could be used temporarily for increasing the initial window size to 1M. Ideally this should be scaled dynamically? This has been discussed further on: https://groups.google.com/forum/#!topic/discuss-webrtc/0synE_0zeCQ Actual results: Low throughput rates at large RTT. Expected results: No drop in throughput should happen when RTT is under reasonable levels (sub 200ms)

:Gijs (he/him)

Comment 1

•

11 years ago

Randall, I'm needinfo'ing you because this has a patch and you're the person I've interacted with most wrt webrtc - if you know a better person to help this bug along, please feel free to pass the needinfo along - thanks!

Component: Untriaged → WebRTC: Networking

Flags: needinfo?(rjesup)

Product: Firefox → Core

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 2

•

11 years ago

I'm not sure we want to unilaterally increase the window. Michael, any comments/suggestions/fixes?

Flags: needinfo?(rjesup) → needinfo?(tuexen)

Michael Tüxen

Comment 3

•

11 years ago

Randell, the throughput is limited by min(my_send_window, peer_receive_window)/RTT. The patch submitted looks good to change the send/recv window to 1 MB. You can set it to any number you think makes sense... It was chosen initially pretty conservative, since SCTP will try to use the above limit if allowed by the congestion control. If this affects the media streams in the same peer connection is currently not taken into account. We might want to use a delay based CC in the future or couple the media stream CC with the one of SCTP, but we neither have a concept for this right now nor we have code. The FreeBSD kernel uses something in the order of 2 MB for the send/recv buffer... But it doesn't deal with the impact on media streams... Does this help? Best regards Michael

Flags: needinfo?(tuexen)

Maire Reavy [:mreavy]

Updated

•

10 years ago

Assignee: nobody → rjesup

Status: UNCONFIRMED → ASSIGNED

backlog: --- → webRTC+

Rank: 25

Ever confirmed: true

Priority: -- → P2

Bulk Bug Changes for mreavy's org

Comment 4

•

8 years ago

Mass change P2->P3 to align with new Mozilla triage process.

Priority: P2 → P3

Lennart Grahl [:lgrahl]

Comment 5

•

8 years ago

Putting latency aside for a second, what might be interesting is that RAWRTC to RAWRTC results in ~310 Mbit/s throughput on the same machine when sending 1 GiB even though it uses the default receiver window (128 KiB) as well. Firefox 57 to Firefox 57 results in ~93 Mbit/s. Sending from Firefox 57 to RAWRTC (and vice versa) results in ~130 Mbit/s.

Nils Ohlmeier [:drno]

Comment 6

•

8 years ago

(In reply to Lennart Grahl from comment #5) > Putting latency aside for a second, what might be interesting is that RAWRTC > to RAWRTC results in ~310 Mbit/s throughput on the same machine when sending > 1 GiB even though it uses the default receiver window (128 KiB) as well. > Firefox 57 to Firefox 57 results in ~93 Mbit/s. Sending from Firefox 57 to > RAWRTC (and vice versa) results in ~130 Mbit/s. My guess would be that this caused by additional buffering in Firefox e10s code (more specifically the IPC code connecting the SCTP stack running in the child process with the actual network sockets running in the parent process). Lenart: as you did comparings already, can you check how the throughput changes with the patch applied?

Flags: needinfo?(lennart.grahl)

Comment hidden (mozreview-request)

Nils Ohlmeier [:drno]

Comment 8

•

8 years ago

Looks like in our tests we need to adjust the size of the big data here https://searchfox.org/mozilla-central/source/dom/media/tests/mochitest/dataChannel.js#206 and/or the low amount threshold in the same test to make that test still pass with this change.

Lennart Grahl [:lgrahl]

Comment 9

•

8 years ago

Tested on my PC this time, so results are a little different: RAWRTC (128 KiB) to FF 58 (1 MiB): 238 Mbit/s RAWRTC (128 KiB) to FF 57 (128 KiB): 247 Mbit/s RAWRTC (128 KiB) to FF 58 (1 MiB, no e10s): 314 Mbit/s RAWRTC (128 KiB) to FF 57 (128 KiB, no e10s): 336 Mbit/s RAWRTC (128 KiB) to RAWRTC (128 KiB): 422 Mbit/s RAWRTC (1 MiB) to FF 58 (1 MiB): 245 Mbit/s RAWRTC (1 MiB) to FF 57 (128 KiB): 248 Mbit/s RAWRTC (1 MiB) to FF 58 (1 MiB, no e10s): 300 Mbit/s RAWRTC (1 MiB) to FF 57 (128 KiB, no e10s): 336 Mbit/s RAWRTC (1 MiB) to RAWRTC (1 MiB): 238 Mbit/s FF 58 (1 MiB) to FF 58 (1 MiB): 176 Mbit/s FF 58 (1 MiB) to FF 58 (1 MiB, no e10s): 210 Mbit/s FF 58 (1 MiB, no e10s) to FF 58 (1 MiB, no e10s): 272 Mbit/s (skipped testing FF 57 with e10s) FF 57 (128 KiB, no e10s) to FF 57 (128 KiB, no e10s): 285 Mbit/s Results are a bit flaky, especially on non-e10s, so be aware they may not be entirely representative. When running in non-e10s, both RAWRTC and FF are nearly at 100% CPU load (only one core) but I'm also seeing a lot of drops in CPU load. The more drops, the worse the throughput in the end. Overall, the results are puzzling to me as the larger window seems to decrease throughput in a near-0-RTT scenario. Is that to be expected, Michael? However, this probably says nothing about large RTT scenarios.

Flags: needinfo?(lennart.grahl) → needinfo?(tuexen)

Nils Ohlmeier [:drno]

Comment 10

•

8 years ago

Thanks for testing Lennart! But these number don't look like we would actually gain much just from increasing the windows size. If at all it looks like we should adjust the window size depending on the RTT. Theoretically ICE could tell us the RTT of the connection, before SCTP kicks in. But that's a lot more complex task to do.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 11

•

8 years ago

Nils: this bug is about low rates at large RTT - large RTT means you'll saturate your window and stop sending, waiting for acks. lennart's tests were at 0 RTT, and so should show little or no difference (perhaps even negative)

Lennart Grahl [:lgrahl]

Comment 12

•

8 years ago

Like Randell said, my rather naive tests tell us nothing about large RTT scenarios. What puzzles me is that throughput with a larger window is reduced slightly for FF - I would like to understand why that is the case (hence pinging Michael)... and by a whopping amount for RAWRTC (however, the RAWRTC case doesn't need to be discussed here). I'm thinking about creating a test setup for simulating delay but will talk to Michael first.

Michael Tüxen

Comment 13

•

8 years ago

The send and receive buffer sizes should only affect the performance substantially when having a non-zero RTT. So yes, it makes sense to setup a testbed for this. One can use dummynet to emulate delay. When running both applications on the same machine, I guess OS scheduling and other things like that dominate the performance.

Flags: needinfo?(tuexen)

Lennart Grahl [:lgrahl]

Comment 14

•

7 years ago

These are my results of testing **in RAWRTC** with various window sizes and several RTT values: https://docs.google.com/spreadsheets/d/1Ze2hZl9KZJ1hKcm5Y9J0OIknQCBHUpw5eefskCc8xpM/edit?usp=sharing I think we should at least go for 1 MiB window sizes by default since it definitely has a positive impact on throughput as you can see. However, the impact of large window sizes is not as large as I would have expected. If you look at my test results, there's definitely something odd going on (hence pinging Michael). CPU usage and throughput is erratic even though the RTT is constant. Interestingly, this only starts to happen once the windows are being raised from 128 KiB. Overall, CPU usage seems to be quite high (I ran this on an i7-6700 @ 3.4GHz) relative to the throughput. But keep in mind that DTLS is also actively encrypting and decrypting packets (`ECDHE-ECDSA-AES128-GCM-SHA256` is the negotiated cipher suite)... although I have the feeling this is not the reason for the high CPU usage. I've seen similar CPU usage in Firefox, so I don't expect it to behave much different than RAWRTC but I will do a sanity-check.

Flags: needinfo?(tuexen)

Nils Ohlmeier [:drno]

Comment 15

•

7 years ago

Looks like 1MiB would be a safe choice for now. Might be time to do profiling to get an idea where the stack spends most of it's time.

Michael Tüxen

Comment 16

•

7 years ago

Hi Lennart, when using large send/recv windows are you observing packet drops? I'm wondering why the throughput is not as expected and that would be an explanation. So write a .pcapng file with the SCTP packets and see if you can find SACKs with gap reports. Please note that even an end-host can drop packets in its UDP stack.

Flags: needinfo?(tuexen)

Lennart Grahl [:lgrahl]

Comment 17

•

7 years ago

mozreview-review

Comment on attachment 8923923 [details] Bug 1051685: increase SCTP window size from 128K to 1M https://reviewboard.mozilla.org/r/195080/#review242842

Attachment #8923923 - Flags: review+

Lennart Grahl [:lgrahl]

Comment 18

•

7 years ago

Michael and I had a brief email conversation. There is nothing obvious that would explain the low throughput. Sadly, I need to pause investigating this for now. I plan to come back to this in a few months time if possible. For now, I agree with Nils that we should increase the window sizes to 1 MiB.

Comment hidden (mozreview-request)

Shachar

Comment 20

•

7 years ago

I fully support trying to increase the window size - but it should be rigorously checked in terms of performance. You wouldn't want to introduce a change that for some reason increases cpu usage or decreases throughput for 10% of users.

Lennart Grahl [:lgrahl]

Comment 21

•

7 years ago

From the email conversation between Michael and me, traces indicate that the association is in congestion avoidance mode and throughput increases... just very slowly (my tests where usually ~1 minute in length) and so does CPU usage. It very much looks like high throughput results in high CPU usage. But I haven't traced those, so this is simply what I observed. Since you obviously have some data channel use cases, I would suggest grabbing the package from the try build and checking for yourself. :)

Comment hidden (mozreview-request)

Nils Ohlmeier [:drno]

Comment 25

•

7 years ago

(In reply to Shachar from comment #20) > I fully support trying to increase the window size - but it should be > rigorously checked in terms of performance. > You wouldn't want to introduce a change that for some reason increases cpu > usage or decreases throughput for 10% of users. What do you suggest we should do then here?

Shachar

Comment 26

•

7 years ago

check throughput and cpu over the following matrix: 1. FF->FF, FF->Chrome, Chrome->FF 2. latency added: 0ms, 10ms, 50ms, 150ms, 250ms, 500ms

Shachar

Comment 27

•

7 years ago

I tested our application and it behaves fine under the changes. reply to Lennart Grahl from comment #14) > These are my results of testing **in RAWRTC** with various window sizes and > several RTT values: > https://docs.google.com/spreadsheets/d/ > 1Ze2hZl9KZJ1hKcm5Y9J0OIknQCBHUpw5eefskCc8xpM/edit?usp=sharing Can you please focus on the 200,500ms case - Is it indeed the case that with the 1MB window you get lower throughput there?

Lennart Grahl [:lgrahl]

Comment 28

•

7 years ago

I honestly can't tell what's wrong with the 200ms and 500ms cases. But I don't think we should focus on that too much since the throughput is already pretty low on that level. Like I said, the traces indicated that we were in congestion avoidance mode which is fine. Michael, is there anything you think should be investigated or should we finally move forward and deploy 1 MiB as the new default?

Flags: needinfo?(tuexen)

Michael Tüxen

Comment 29

•

7 years ago

Increasing the window only allows a higher bandwidth to be used. If that is fine (it might interact with media streams), then go ahead.

Flags: needinfo?(tuexen)

Lennart Grahl [:lgrahl]

Updated

•

7 years ago

Attachment #8923923 - Flags: review+

Lennart Grahl [:lgrahl]

Updated

•

7 years ago

Attachment #8969925 - Flags: review+

Lennart Grahl [:lgrahl]

Comment 30

•

7 years ago

Let's get this upstream. :)

Flags: needinfo?(drno)

Nils Ohlmeier [:drno]

Comment 31

•

7 years ago

Attached file Bug 1051685: increase SCTP window size from 128K to 1M. — Details

Nils Ohlmeier [:drno]

Comment 32

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b02a54184e5523df6a7b9baa2081e4463b306585

Flags: needinfo?(drno)

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Attachment #8470660 - Attachment is obsolete: true

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Attachment #8923923 - Attachment is obsolete: true

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Attachment #8969925 - Attachment is obsolete: true

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Attachment #8969939 - Attachment is obsolete: true

Pulsebot

Comment 33

•

7 years ago

Pushed by nohlmeier@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f7d1fd195963 increase SCTP window size from 128K to 1M. r=lgrahl

Natalia Csoregi [:nataliaCs]

Comment 34

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f7d1fd195963

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox64: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

You need to log in before you can comment on or make changes to this bug.