Last Comment Bug 668168 - If the HTTP response is fragmented, Firefox fails to parse the content-type
: If the HTTP response is fragmented, Firefox fails to parse the content-type
Status: RESOLVED INVALID
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 5 Branch
: All Other
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-29 03:28 PDT by Hayden Clark
Modified: 2016-02-09 11:54 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
fragmentation-breaks-response-content-type-parse-marked.pcap (4.09 KB, application/octet-stream)
2011-06-29 03:28 PDT, Hayden Clark
no flags Details

Description Hayden Clark 2011-06-29 03:28:05 PDT
Created attachment 542757 [details]
fragmentation-breaks-response-content-type-parse-marked.pcap

User Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Build ID: 20110615151330

Steps to reproduce:

When I try to access the SMS-send page from a NowSMS installation using Firefox 5.0, I see the source of the page, not the actual page.


Actual results:

Firefox 5 displays the raw text received from the server, including the HTTP response, thus:

HTTP/1.1 200 OK
Connection: Keep-Alive
Content-type: text/html
Content-Length: 2720

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 10">
<meta name=Originator content="Microsoft Word 10">
<title>Now SMS</title>
[snip]

I sniffed the traffic with Wireshark (log attached) and it seems that the server sends the initial CRLF in a separate TCP packet from the following "HTTP/1.1 200 OK". 


Expected results:

The browser should grab data until it has a full header before trying to parse it.
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2011-06-29 07:08:15 PDT
Patrick, could you take a look?

I'm not sure why the server is sending anything at all before "HTTP/1.1 200 OK", though...  That seems weird.
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2011-06-29 07:20:41 PDT
In particular, I would expect that that bogus CRLF leads us to fall back to HTTP 0.9 in this situation, and there are no headers in HTTP 0.9.
Comment 3 Patrick McManus [:mcmanus] 2011-06-29 08:34:26 PDT
(In reply to comment #2)
> In particular, I would expect that that bogus CRLF leads us to fall back to
> HTTP 0.9 in this situation, and there are no headers in HTTP 0.9.

no doubt. I can't say I've ever seen breakage quite that way before the first response on a connection.

We could:
 a] not worry about it as the use case is broken and (seems) rare.
 b] implement a general "toss leading whitespace" routine before parsing the status line in all circumstances.
 c] if buf[0] == whitespace then push it into the "look for http in this stream of stuff" routine... pushing everything through that routine in the past lead to some problems (stuff that really was 0.9 was not being identified that way iirc)

I guess I'd favor b or a.
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2011-06-29 08:39:25 PDT
Patrick, would we also fall back to HTTP 0.9 if the CRLF came in the same packet as the status line?  Or do we skip over the CRLF in that situation?
Comment 5 Patrick McManus [:mcmanus] 2011-06-29 09:00:39 PDT
(In reply to comment #4)
> Patrick, would we also fall back to HTTP 0.9 if the CRLF came in the same
> packet as the status line?  Or do we skip over the CRLF in that situation?

commonly we would skip up to 4 bytes of preamble before matching against the read that delivers the 'HTTP/1.' .. Partial reads that contain only a subset of the status line (i.e. 'HT') are accepted - but they do not tolerate any preamble. (If they turn out not to be HTTP I believe they fail at a later time.)

It's all rather byzantine and inconsistent if the server is not in spec - I don't really know where it all originates from.

I do know that around 4.0 many more cases were pushed into the more generous parser (the one that ignores content bodies on preceeding 304's for example) and that actually produced regressions (i think of the form 'this is really 0.9 but you are declaring otherwise') so we went back to the strange set of rules for any case that wasn't specifically targetted (i.e. the no-content with content case).
Comment 6 Boris Zbarsky [:bz] (still a bit busy) 2011-06-29 09:14:12 PDT
OK.  I guess what bothers me is when behavior depends on packet boundaries.  Any time that happens, feels like a definite bug.
Comment 7 mnot 2011-06-29 15:44:02 PDT
Spec is to skip a limited amount of whitespace on the server side;

   In the interest of robustness, servers SHOULD ignore at least one
   empty line received where a Request-Line is expected.  In other
   words, if the server is reading the protocol stream at the beginning
   of a message and receives a CRLF first, it SHOULD ignore the CRLF.

http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-14#section-3.1

Should this be spec'd for the client side too?

I agree that making behaviour dependant on packet boundries isn't great.
Comment 8 Patrick McManus [:mcmanus] 2011-06-29 15:59:24 PDT
(In reply to comment #7)

> Should this be spec'd for the client side too?

no.

I really dislike specs that say  "sender MUST NOT do foo" but "receiver must tolerate foo if A sends it anyhow". At its most basic, that's just silly and results in unmanageable test matricies.
Comment 9 mnot 2011-06-29 16:11:58 PDT
It's unpleasant, but it's a fact of life, especially with widely deployed protocols. We have to specify things so that both senders and receivers know what to expect; leaving it up to statements like "X MUST NOT appear on the wire" leaves too many questions and hurts interop (as we've seen many times).

Anyway, getting off-topic here; will take it to the httpbis list. Thanks,
Comment 10 Hayden Clark 2011-06-30 03:09:09 PDT
Just for info:

1) The source of the problem pages is NowSMS 2010.11.4. They have fixed this bug in a more recent release. (http://www.nowsms.com/nowsms-update-2011-03-21). I note the bugfix response is "Web Interface: Fix for problem introduced in 2010.11.04 version where the web interface was not working properly with Firefox." instead of "fix HTTP specification compliance failure". Sigh.
2) Unfortunately, Internet Explorer renders the page fine!

Note You need to log in before you can comment on or make changes to this bug.