28.82 KB, application/octet-stream
10.87 KB, application/cap
8.76 KB, application/cap
1.97 KB, patch
|Details | Diff | Splinter Review|
Right now we set TCP_NODELAY (via PR_SockOpt_NoDelay) for nss sockets, but not for regular HTTP connections. Jonas has heard that other browsers set this, and that they get some performance benefit. If our HTTP request logic is doing multiple writes() smaller than an MSS (aka ethernet-packet size: 1500 bytes) for requests, we will in fact wind up waiting for an ACK from the server for earlier writes() before we send out the later ones. Setting TCP_NODELAY would let us send all those writes out immediately. We should get more data on this, do some benchmarking, and see if this is worth doing. Good first bug for anyone who knows TCP/IP well--you don't need to know many necko internals (just where to set the sockopt) to test/compare these.
Please don't use TCP_NODELAY if you do not know what you're doing - this is a typical beginners mistake. This should only be done if you can write the HTTP-request in a single write, otherwise each write would result in a separate packet. In the default case (without TCP_NODELAY), you are not waiting for the ACK, because the data should not have been written out yet (unless it was more than 100 msec ago, or more than the MTU). Using TCP_NODELAY means that the application becomes responsible for the buffering. Be careful, sometimes the solution is worse than the problem. I'm working a major router manufacturer. I can assure you that we have code in the router to *reverse* this behavior in certain cases (packets are delayed and glued back together), although it's not very common. I have used TCP_NODELAY myself in code for telephone exchanges, but only in locations where round trip delay was important. And a few times, I had to correct code from colleagues that used TCP_NODELAY, by making sure that there was only a single write. Or removing the TCP_NODELAY all together. Personally I think that it should have been called TCP_NOBUFFER, people seem to draw a wrong conclusion when they see the 'delay' word (especially manager types that lack the actual developer or tester experience).
I think in most cases we are buffering writes internally, although that of course needs to be checked. But if we're not we have a problem anyway since the result is still that the initial packet is going out with less data than desired. Also note that latency is generally quite important. But yes, generally we should know what we're doing before we do something :)
I don't see any delays, though I don't have time to reverse engineer why that is. I looked at a lot of traces last winter and I never noticed nagle induced delays - and I certainly would have flagged them if I had as the latency introduced would have killed mobile. Attached see packet capture of a POST wherein the headers and the POST data all fit in one packet (worst case for TCP_NODELAY), and you'll see from the trace that not only are they in 1 packet but that packet goes out within a ms of the handshake being completed.
fwiw - nagle probably doesn't imapct the normal case for us. Nagle only introduces a delay if you want to send a runt packet and there is un-acked data outstanding.. FF apparently coalesces writes at the application level (good!) and the request normally contains 1 packet or less of data onto an outbound stream that is either brand new or long-since fully acked (in the case of persistent connections).. so it won't be delayed by nagle. there are some conditions I'll speculate could trigger the delay: * Larger POST/PUT requests would obviously have to put many packets into the send stream.. the last one in the request is likely a runt and indeed is likely sent when the packet previous to it (at least) is unacked.. assuming congestion control doesn't limit you anyhow - that will result in a nagle delay to the last packet in the POST. That's worth tracking down though it is likely only a significant portion of the cost on a relatively small (though larger than 1 packet) POST - and that's the kind of thing that's likely to be congestion limited anyhow. * pipelined requests. Jason, this is a good thought.. Jo is right that we ought to know what we're doing. But that's not a reason for not knowing what we're doing :)
We've got verification that Chromium sets TCP_NODELAY: http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/tcp_client_socket_win.cc?view=markup (Windows) http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/tcp_client_socket_libevent.cc?view=markup (Mac & Linux) I'm guessing there's not a huge win either way with the flag, and we shouldn't rush into blindly setting it, but we should investigate more (including making sure our internal buffering is good, some packet traces, etc.). I'm personally too busy with other things for now, but if someone else has the bandwidth, grab the bug and go for it.
Add'l comment via email from Artur Bergman below. The issue of long cookies (and/or long URIs) is an interesting one. It would be interesting to know how often these push GETs over a packet size. ----- I've done research on this for varnish on the server side. It makes a lot of sense for SSL since the handshake is full of small packets. I think the issue is if you have one outstanding full MSS unacked, and you try to write a packet that isn't the full MSS, it will wait for the ack or 200ms. Let me think about it a bit more. Another detail has to do with the interaction of delayed acks, interesting comment from John Nagle http://tech.slashdot.org/comments.pl?sid=174457&cid=14515105 I think this only really affects POSTs or a GET with a lot of cookies. (It would affect pipelining, but no one does that anyway
shaver also mentioned possibly using TCP_SOCK on linux.
Created attachment 512468 [details] pcap showing nagle hurting pipeline latency as I said in an earlier comment, this would probably be relevant to pipelines - and it is. You can see in the attached capture what happens starting at packet 10. #10 - T0, request A sent in ~500 byte packet #11 - T240, ack rcvd #12 - T240 - request B and C sent B and C are obviously delayed waiting for an ack due to the nagle algorithm - a runt packet (req A) is outstanding, which overrides CWND. We have shown that we do the appropriate coalescing within an individual request (i.e. the previous attachment), in which case we do want to disable nagle - it is adding latency for us without addressing its primary use case (which is really interactive event sequences, which are bursts of tiny packets).
Created attachment 512469 [details] pcap showing nagle off improving request latency this capture is obtained while running a patch to turn nagle off. (i.e. nodelay on). Requests A, B, and C are sent in packets 4, 5, and 6.. 4 and 5 are close together, 6 follows about 140ms later when the request is submitted to necko for processing - all of them are within the ~240ms rtt (you can see it in the 3whs).. the first response follows at just about 1rtt from the submission of A. there is some junk after that that is due to some bugs in amazon's infrastructure - it isn't germane to this bug. (i.e. it is the same with or without nagle).
Created attachment 512470 [details] [diff] [review] disable nagle on connection open v1
Comment on attachment 512470 [details] [diff] [review] disable nagle on connection open v1 (bz: do you think this needs sr?) OK, I think we're good to turn off Nagle. Note that your patch turns off Nagle for every socket that's opened with nsSocketTransportService2::createTransport, and that includes HTTP, HTTPS, FTP, and Websockets. I've done a little more poking around, and I also think we're fine to turn Nagle off for HTTP. We always appear to coalesce small/medium-sized HTTP requests into a single sendto() call (I've verified with strace). Big POSTs turn into a series of sendto()s, but they're all (except possily the last one) well above the MTU size, so irrelevant to Nagle. We're not going to suddenly be spewing lots of small packets into the internet if we disable Nagle for HTTP. HTTPS already disables Nagle for sslSocket's. We open HTTPS connections with createTransport, but I'm not sure how/if they wind up mapping to sslSockets (ssl seems to be creating sockets of it own, too). It's possible that this patch needlessly duplicate the disable Nagle system call for a HTTPS sockets, and we might be able to avoid that (for instance, by setting 'delayDisabled=1' on any sslSocket we know comes via createTransport). But I'm not going to lose sleep over a single system call overhead per socket. I've looked at FTP on wireshark, and it also seems fine (lots of small packets, but in a call-reponse protocol, so nothing changes with Nagle disabled). I assume Websockets should want Nagle off by the very nature of the protocol. XHR is just layered on top of HTTP. Hopefully I'm not missing anything else. Comment 10 clearly shows pipelining will benefit from this. Yay! Note on comment 3: Nagle blocks sendto/write calls with sz < MTU, not any small packet. I.e., the only time it blocks anything if it the program calls write() with a < MTU amount of data, and there are other previous write() un-ACK'd. So I'm little confused by why you called your example of opening a socket and then immediately doing a < MTU-sized POST the "worst-case" for TCP_NODELAY. The POST packet goes out immediately whether TCP_NODELAY is set or not, because it's the first sendto() call on the connection (Nagle doesn't delay a write because the client's handshake SYN hasn't been ACK'd). Note that Nagle also doesn't delay 'runt' packets if they're part of the same write. When I did a trace on a POST that's just over MTU size, so it's split into 2 packets with the 2nd being < MTU, both packets go out immediately. Nagle could only bite us for large POSTs that get split into multiple write calls that have the chance of the last write call being < MTU. Anyway, it's moot now that we're turning Nagle off :)
Didn't you mean to write + opt.value.no_delay = PR_TRUE; instead of + opt.value.send_buffer_size = 1; I know that it's a union, but it just looks wrong :-)
By the way, if we want to play things more conservatively, we could provide some sort of interface for telling nsSocketTransport(Provider) to disable Nagle, so the protocols for which we know there's a benefit (HTTP, Websockets) can use it, while the default for FTP and future protocols is to keep Nagle on. I don't think we need to go there from what I've seen. re: comment 13. Yeah, the union member looks wrong. I should really look over 3 line patches more carefully :) Thank Jo!
(In reply to comment #13) > > I know that it's a union, but it just looks wrong :-) yes, thanks. I can't figure out why I did that :)
> Note that your patch turns off Nagle for every socket that's opened with > nsSocketTransportService2::createTransport, and that includes HTTP, HTTPS, FTP, > and Websockets. yes. The default of nagle-on is totally backwards imo. Q: What protocols benefit from nagle? A: Protocols that send repetitive tiny bursts of data which are willing to trade latency for bandwidth. Honestly, I can't name any protocol on the 2010 Internet that generally wants to trade latency for bandwidth. The major lesson of the era is that our high speed links are underutilized because of interactions with high latency links. We've got a whole school of technologies that try and overcome that (pipelining, resource packages, google's push to increase tcp initial windows, etc..).. Even the traditonal protocols nagle is aimed at (one keystroke at a time telnet) are probably happy to trade the overhead of building a tcp packet for each data byte because even with the packet overhead they run at rates so low compared to the link data rate. This is a different story than when nagle was introduced - bandwidth is generally much more available. Now this does mean that we need to be careful in the application to do coalescing ourselves instead of letting nagle do it for us, but as you note we are already doing that and the nagle approach is sub optimal because it does not coalesce the first sub-packet sized send() and imposes an RTT penalty to get the rest out. Honestly if we were getting this wrong in the application I would rather be spewing small packets than absorbing RTT pauses - but we don't need to do either! Anyhow, if we do find something that really should run the other way around I think it is fine to put he onus on that new technology to build an interface for setting the override. >. But I'm not going to lose > sleep over a single system call overhead per socket. Especially an ssl socket, which has a lot of other overhead that dwarfs a system call. > Note on comment 3: Nagle blocks sendto/write calls with sz < MTU, not any small > packet. I.e., the only time it blocks anything if it the program calls write() > with a < MTU amount of data, and there are other previous write() un-ACK'd. So > I'm little confused by why you called your example of opening a socket and then > immediately doing a < MTU-sized POST the "worst-case" for TCP_NODELAY. The > POST packet goes out immediately whether TCP_NODELAY is set or not, because > it's the first sendto() call on the connection (Nagle doesn't delay a write > because the client's handshake SYN hasn't been ACK'd). > I wasn't very clear. What I was checking for in comment 3 was that necko was doing coalesced writes. So in that case it had a short amount of headers (let's say 700 bytes) and a short amount of post data (lets say 200 bytes).. if necko had done that with two writes and nagle enabled (nagle was enabled) you would see the post-data packet delayed until an ACK arrived for the headers, because the header packet was an outstanding 'runt' packet and the post data packet was also a 'runt'. That's why I said that was a worst case because nagle would kill it if the application didn't optimize the writes - but necko was successfully coalescing them into one packet so I was arguing that disabling nagle wouldn't introduce any additional packet overhead. > Note that Nagle also doesn't delay 'runt' packets if they're part of the same > write. When I did a trace on a POST that's just over MTU size, so it's split > into 2 packets with the 2nd being < MTU, both packets go out immediately. I don't think you're interpreting that right.. nagle allows one outstanding runt.. so you got split into 1 full size and 1 runt and that is ok. The fact that they map to the same system call is irrelevant. Anyhow - on with the patch update!
Created attachment 520544 [details] [diff] [review] disable nagle on connection open v2 I assume this is r+ jduell as it matches the review comment. Now we just need somewhere to land this stuff :)
Comment on attachment 520544 [details] [diff] [review] disable nagle on connection open v2 Patrick, If a +r patch needs only minor fixes, you can carry forward the +r yourself (usually with comment like "carrying forward jduell's +r"). Is this perhaps better wording for the comment? // Disable Nagle (we do our own write aggregation). Keeping it on // can delay pipelined requests that are smaller than an MTU, and/or // the final segment of multi-segment POST/PUTs. Your choice. (your version implied RTT delay, when it's actually MIN(RTT, 200 ms)--or to be really specific, up to 200ms delay depending on when OS timer fires, IIRC; I think just "delay" is fine.)
Backed out on suspicion of causing tp4 regression: http://hg.mozilla.org/mozilla-central/rev/096bf9dcf26f
this is cleared of being the tp4 regression (it was 628561).. ready to push again
pushed to cedar http://hg.mozilla.org/projects/cedar/rev/af0342b4004d
Pushed to m-c: http://hg.mozilla.org/mozilla-central/rev/af0342b4004d