Created attachment 542757 [details]
User Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Build ID: 20110615151330
Steps to reproduce:
When I try to access the SMS-send page from a NowSMS installation using Firefox 5.0, I see the source of the page, not the actual page.
Firefox 5 displays the raw text received from the server, including the HTTP response, thus:
HTTP/1.1 200 OK
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 10">
<meta name=Originator content="Microsoft Word 10">
I sniffed the traffic with Wireshark (log attached) and it seems that the server sends the initial CRLF in a separate TCP packet from the following "HTTP/1.1 200 OK".
The browser should grab data until it has a full header before trying to parse it.
Patrick, could you take a look?
I'm not sure why the server is sending anything at all before "HTTP/1.1 200 OK", though... That seems weird.
In particular, I would expect that that bogus CRLF leads us to fall back to HTTP 0.9 in this situation, and there are no headers in HTTP 0.9.
(In reply to comment #2)
> In particular, I would expect that that bogus CRLF leads us to fall back to
> HTTP 0.9 in this situation, and there are no headers in HTTP 0.9.
no doubt. I can't say I've ever seen breakage quite that way before the first response on a connection.
a] not worry about it as the use case is broken and (seems) rare.
b] implement a general "toss leading whitespace" routine before parsing the status line in all circumstances.
c] if buf == whitespace then push it into the "look for http in this stream of stuff" routine... pushing everything through that routine in the past lead to some problems (stuff that really was 0.9 was not being identified that way iirc)
I guess I'd favor b or a.
Patrick, would we also fall back to HTTP 0.9 if the CRLF came in the same packet as the status line? Or do we skip over the CRLF in that situation?
(In reply to comment #4)
> Patrick, would we also fall back to HTTP 0.9 if the CRLF came in the same
> packet as the status line? Or do we skip over the CRLF in that situation?
commonly we would skip up to 4 bytes of preamble before matching against the read that delivers the 'HTTP/1.' .. Partial reads that contain only a subset of the status line (i.e. 'HT') are accepted - but they do not tolerate any preamble. (If they turn out not to be HTTP I believe they fail at a later time.)
It's all rather byzantine and inconsistent if the server is not in spec - I don't really know where it all originates from.
I do know that around 4.0 many more cases were pushed into the more generous parser (the one that ignores content bodies on preceeding 304's for example) and that actually produced regressions (i think of the form 'this is really 0.9 but you are declaring otherwise') so we went back to the strange set of rules for any case that wasn't specifically targetted (i.e. the no-content with content case).
OK. I guess what bothers me is when behavior depends on packet boundaries. Any time that happens, feels like a definite bug.
Spec is to skip a limited amount of whitespace on the server side;
In the interest of robustness, servers SHOULD ignore at least one
empty line received where a Request-Line is expected. In other
words, if the server is reading the protocol stream at the beginning
of a message and receives a CRLF first, it SHOULD ignore the CRLF.
Should this be spec'd for the client side too?
I agree that making behaviour dependant on packet boundries isn't great.
(In reply to comment #7)
> Should this be spec'd for the client side too?
I really dislike specs that say "sender MUST NOT do foo" but "receiver must tolerate foo if A sends it anyhow". At its most basic, that's just silly and results in unmanageable test matricies.
It's unpleasant, but it's a fact of life, especially with widely deployed protocols. We have to specify things so that both senders and receivers know what to expect; leaving it up to statements like "X MUST NOT appear on the wire" leaves too many questions and hurts interop (as we've seen many times).
Anyway, getting off-topic here; will take it to the httpbis list. Thanks,
Just for info:
1) The source of the problem pages is NowSMS 2010.11.4. They have fixed this bug in a more recent release. (http://www.nowsms.com/nowsms-update-2011-03-21). I note the bugfix response is "Web Interface: Fix for problem introduced in 2010.11.04 version where the web interface was not working properly with Firefox." instead of "fix HTTP specification compliance failure". Sigh.
2) Unfortunately, Internet Explorer renders the page fine!