Last Friday I received email from someone privately testing a new spdy deployment with firefox. While troubleshooting an unrelated issue they noted a case where firefox sent a stream ID out of order.. it was a large ID (1009 iirc) in the middle of a big burst of streams.. it came in after 1021 or something along those lines.. everything else was ok. that's obviously not good.. not only is that just plain out of spec, but if the ID is out of order its extremely likely that the compression state was our of order too.. now we are doing only literal (i.e. literal copying) compression right now because of CRIME so it doesn't turn out to matter - but we want to keep that compression context in tact as the replacement scheme will likely be stateful with respect to ordering too. This didn't actually cause a problem for the reporter they just reset the stream and everything went on ok. If it was reset with REFUSED_STREAM then it would even be restarted and its also seems rare. I was eventually able to reproduce it by doing the following: * changing the socket send buffer to just 1000 bytes * lowering the default session output queue buffer size to 6000 bytes with its 4096 reserved tail. (normally 16KB with 4KB reserve) When that session buffer was stressed enough eventually the stream sending code deferred sending a blocked SYN_STREAM and managed to generate a stream id (and transmit it) for a totally different stream first before getting back to the deferred one (at which point they were out of order). The session buffer can indeed grow, but it prefers to push back on the streams rather than just buffer everything - this is really aimed at the request body generation code as those can be large. its pretty impressive that someone's installation with default socket send buffers and a 16KB session send buffer managed to trigger than condition with http requests. The fact that they are uncompressed right now (and not really sized with that in mind) contributes I'm sure but I wasn't able to come anywhere near reproducing with the defaults - but maybe there was a lower level network sending glitch that coincided with the issue. Anyhow - the fix is pretty straight forward. SYN_STREAMS contain session state and can't be reordered so when they are generated they need to be buffered or sent immediately, never blocked (and potentially reordered) the way a DATA frame can be. So I've added a couple flags to say "always buffer if you can't send" to those paths.. I can then confirm (with my buffer size changes) that I can trigger that path and no reordering occurs. I've also doubled the inital session send buffer size as we're dealing with uncompressed requests at the moment. As this isn't causing an actual interop problem I don't want to backport the whole patch - but the issue is serious enough that I think we should backport the increases in buffer sizes as that's very safe and should reduce the likelyhood of the rare event even further. patches to follow.
Attachment #693124 - Flags: review?(honzab.moz) → review+
Attachment #693128 - Flags: review?(honzab.moz) → review+
Comment on attachment 693128 [details] [diff] [review] buffer size workaround for backport version 0 [Approval Request Comment] the tl;dr here is that this is potentially a serious bug (in that it causes a failure - but reload will fix it) but in practice is extremely hard to produce and indeed the one production of it in the field is now believed to be correlated to testing of load balancer equipment at the site under test. Normally that would argue for not backporting it at all - but we can mitigate the issue even further (making it harder to trigger) by simply making an undersized buffer larger and that can be done very safely. so we should do that. Bug caused by (feature/regressing bug #): original spdy bug User impact if declined: risk of http transaction failure Testing completed (on m-c, etc.): hand testing Risk to taking this patch (and alternatives if risky): extremely low - all it does is make a buffer larger. Alternative would be just wontfix it beyond mozilla-central. String or UUID changes made by this patch: none
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Comment on attachment 693128 [details] [diff] [review] buffer size workaround for backport version 0 This isn't critical enough to take for our final beta, but we'll take this on Aurora 19.
You need to log in before you can comment on or make changes to this bug.