Closed Bug 363109 Opened 18 years ago Closed 14 years ago

body of HTTP 304 response is treated as a HTTP/0.9 response to subsequent HTTP request

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: robert, Assigned: michal)

References

()

Details

Attachments

(4 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.8) Gecko/20061025 Firefox/1.5.0.8
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.8) Gecko/20061025 Firefox/1.5.0.8

Extremely slow reloading, page is rendered incorrectly after reloading, apparently due to the CSS and/or JS getting corrupted.

On the surface this looks like bug #222457 which was resolved as a duplicate of bug #227976. What I'm reporting here only happens when you do a reload, so the browser sends and if-modified-since request, and the server gives a HTTP 304 reply, and then CSS and JS files appear to get corrupted. Perhaps this is a corner case that wasn't fixed in bug #227976 ?

Reproducible: Always

Steps to Reproduce:
1. Go to the given URL.
2. Press reload.
3.

Actual Results:  
Page first loads fast and renders fine. Then after reload is hit page loads dead slow and renders incorrectly.

Expected Results:  
Page should load fast and render correctly also after a reload.
The URL https://www.fondsfinans.no/ff/viewIndex.do given as a testcase below has since been fixed to work around the bug, so it cannot be used to demonstrate the bug now.
Hi,

We can confirm that this issue still exists in Mozilla 2.0.0.6. Has anyone looked into this?

Damon.
Application Development Team
Amnesty International
> Does it occur with a trunk build too?

I can confirm, that the problem still exists with Minefield 3.0a8pre

Damon.
Are you able to attach a minimal testcase or put up a testcase that can stay around for some time?
Ok, we have put together a testcase demonstrating the problem here. Can you confirm you can reproduce the issue?

https://195.234.175.134/test_deflate/ 

Thanks,
Damon.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a8pre) Gecko/2007091207 Minefield/3.0a8pre ID:2007091207

With the above URL, i do see the page load quickly with a CTRL+F5 but more slowly with a normal F5.

I do not see that the "page is rendered incorrectly after reloading, apparently due to the CSS and/or JS getting corrupted."
But with Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.7pre) Gecko/2007081203 BonEcho/2.0.0.7pre I do see the incorrect rendering / css problems.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a8pre) Gecko/2007091207 Minefield/3.0a8pre ID:2007091207

I made a new profile, then deleted the bbc news rss bookmark, turned off google-antiphishing and disabled all update checks for firefox/search engines/extensions. Then using http://www.mozilla.org/projects/netlib/http/http-debugging.html I made am HTTP log of me:
1. Starting firefox
2. Going to https://195.234.175.134/test_deflate/ (page loads quickly)
3. Pressing F5 for a refresh (page loads slowly)

After step 2 I made a note of how many lines had been logged so far. Then after step 3 I killed the browser window, and deleted from the http log all the lines caused by step 1 and 2. So the log I attach is the log relevant only to step 3.

On branch, step 3 takes longer than expected and there are rendering problems (looks like the page isn't getting styled). On trunk, step 3 renders fine, but still takes longer than expected to load.
The branch manifestation of this problem (css appearing not to load, page unstyled) may have been fixed on trunk by bug 381812.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.7pre) Gecko/2007081203 BonEcho/2.0.0.7pre

Same as comment 9, but with a branch build (Page is slow to load _and_ rendering screws up).
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee: nobody → michal
Status: ASSIGNED → NEW
It took me quite a lot of time to inspect this but I finally know what is going on here. Problem is on server side. I put some logging of received data into nsSocketTransport and found out that servers response is:

HTTP/1.1 304 Not Modified\r\n
Date: Wed, 12 Mar 2008 09:54:41 GMT\r\n
Server: Apache-Coyote/1.1\r\n
Vary: Accept-Encoding,User-Agent\r\n\r\n
%1F%8B%08%00%00%00%00%00%00%03%03%00%00%00%00%00%00%00%00%00

Message body is binary but I escaped it to be able to post it.
According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.5
304 reply MUST NOT contain a message body. This binary data is actually gzipped "nothing": it will be decompressed as empty file. And this will happen in firefox:

1) Some nsHttpTransaction performs request with "If-Modified-Since" header.
2) Server responds 304 with message body.
3) nsHttpTransaction consumes data up to message body.
4) New nsHttpTranstaction that reuses this connection will do some request and reply will always begin with message body from previous request. This is identified as HTTP 0.9 response, that automatically means status 200 and all received data is content of the requested file, which can look like:

%1F%8B%08%00%00%00%00%00%00%03%03%00%00%00%00%00%00%00%00%00
HTTP/1.1 304 Not Modified\r\n
Date: Wed, 12 Mar 2008 09:54:41 GMT\r\n
Server: Apache-Coyote/1.1\r\n
Vary: Accept-Encoding,User-Agent\r\n\r\n
%1F%8B%08%00%00%00%00%00%00%03%03%00%00%00%00%00%00%00%00%00

This is why css and js can't be parsed in the test case.

IIRC HTTP 0.9 server should close the connection just after the file is served. So firefox is waiting for connection close and server is waiting for another request. Connection is timed out after a while. That's why it "hangs" for a while.


I see 2 solutions:

1) Close this bug as invalid.

2) When we detect in HTTP connection either 1.0 or 1.1 response, then we don't allow any further response to be detected as 0.9. Instead of assuming HTTP 0.9 response connection can be closed immediately and status can be set to some 4XX error.

Any opinions?
Thanks for the bughunting and for the nice summary Michal.

It seems the server-side problem is widespread.
1. Any CGI/PHP-script may cause it, see for instance:
https://issues.apache.org/bugzilla/show_bug.cgi?id=40953
2. Anything that uses tomcat may cause it, see for instance:
https://issues.apache.org/bugzilla/show_bug.cgi?id=44448

It's going to take forever to get all the server-sides issues fixed and deployed, so my vote is for a fix in the browser. Quoting the late Jon Postel:
"Be liberal in what you accept, and conservative in what you send."

Regards, Robert.
Attached file testcases
I have created a set of testcases for this bug. They are cgi-scripts written in bash. All responses with status 200 contain "Last-Modified" field with fixed date. If there is a "If-Modified-Since" field in the request then status 304 is returned followed by message body.

Testcases are available also here:
http://michal.lightcomp.cz/bug363109/img/test.html
http://michal.lightcomp.cz/bug363109/iframe/test.html
http://michal.lightcomp.cz/bug363109/css/test.html

Usage is simple. Delete cache and load the page (or reload it using shift-reload), then reload the page. The excessive message body contains such data that the bug can be checked visually.

I've been testing behavior of Firefox3, IE6, Konqueror3.5 and Opera9. All browsers except Opera have problem with these testcases. From sniffed traffic I can see that Opera tries to download problematic files again. It uses http pipelining and 4 connections to the server, so whole process for first testcase looks like:

Connection 1:
-> GET test.html (doesn't contain If-Modified-Since)
-> GET img01.png
-> GET img05.png
-> GET img09.png
-> GET img13.png
<- 200 for test.html and message body
<- 304 for img01.png
<- unexpected message body for img01.png => close connection

Connections 2,3,4 do the same for other pictures.

Connection 5:
-> GET img05.png
-> GET img09.png
... and so forth

Almost all pictures are requested from the server at least twice. Repeating request can be problematic in case of posting data from forms. I tried a lot of tests and it seems that Opera uses always new connection when submitting a form. By contrast Firefox can reuse any connection when submitting a form.

IMHO to fix this bug Firefox should implement the same logic as Opera. Any opinions?
Is there any way to just read-and-ignore the bogus 304 body and move on with life?  Would that solve the problem?
But we don't know the length of the body. We even don't know that there is such body. Or more precisely we know it when response to the next request is being read.
Hmm.  So we just have no way to tell whether the following bytes are part of the 304, especially on a persistent (or worse yet pipelined) connection?  Or rather we can tell that there are bytes there, and that we need to stop using this connection, but that's all we can tell?  Just killing the connection will apparently suck on mobile, but if it's the best we can do, we should do it, I guess.
Hi - that server response is, as already noted, pretty badly screwed up and hard to work around.

1] 304's are dejure 0 byte bodies. 

If we wanted to be liberal in parsing it, always a good idea, we could apply the usual message delimiter rules (chunked encodings, content-length, mime, etc..) - BUT as c#16 indicates, this response doesn't provide any of that either. If we took it at face value (ignoring that face value says it is 0 bytes by spec) it indicates delimination via EOF - meaning read all the following data until server close. That is obviously too much data (it is not only the gzip stuff, but the next response, etc..)

2] I'm not even clear if there is an empty CRLF between the header and "body" - I guess there probably is from the behavior reported.

3] Michal makes the awesome observation that it is really 0 bytes of data gzipped -  but the response headers don't label it as compression so it is pretty hard to infer that in the code

I believe option #2 in comment #12 is a reasonable approach. assuming HTTP/0.9 is extremely dangerous (because * qualifies as 0.9 and * is more likely to be junk than 0.9 in real life). If there is something conrete available to rule it out (in this case that 0.9 is found on a persistent connection as transaction > 1), I agree it is best to maybe consider it "connection closed" so that the transaction gets rescheduled on a new transaction.

As for performance and mobile, yes it will suck when it comes up. But the 0.9 strategy is no better as 0.9 has no provision for persistent connections and would result in closing the connection as well. 

The ultimate A++ work around would be: 
 a] treat the 304 as usual - it has no message body, don't try and read one
 b] have the next transaction determine if it is HTTP >= 1.0 or not from the first few bytes.. if not see if we can identify from a magic number if this is a self delimited format such as gzip and if so read it to the end and throw it all away. goto b.

Most clients do a variation of [b] for runs of whitespace (off the top of my head I don't recall if firefox does) as a workaround for a very common version of illegal http responses. In a previous life I did a similar thing for XML documents, which are also self delimited. 
Ok.  Sounds like we should do either option 2 from comment 12, or Patrick's suggestion if we can do it reasonably...
(In reply to comment #18)
> I believe option #2 in comment #12 is a reasonable approach. assuming HTTP/0.9
> is extremely dangerous (because * qualifies as 0.9 and * is more likely to be
> junk than 0.9 in real life). If there is something conrete available to rule it
> out (in this case that 0.9 is found on a persistent connection as transaction >
> 1), I agree it is best to maybe consider it "connection closed" so that the
> transaction gets rescheduled on a new transaction.

And what about POSTs? Is it a security problem to re-post a form? If yes then new connection must be used always when submitting a form, as noted in comment #14.

> The ultimate A++ work around would be: 
>  a] treat the 304 as usual - it has no message body, don't try and read one
>  b] have the next transaction determine if it is HTTP >= 1.0 or not from the
> first few bytes.. if not see if we can identify from a magic number if this is
> a self delimited format such as gzip and if so read it to the end and throw it
> all away. goto b.

I'm not sure if the junk should be skipped somehow intelligently or if we should simply search for "HTTP/1.".
Resending POSTs is bad.  It's not a security problem to re-post, but it can cause reprocessing of the transaction.  There must be no automatic re-posting, ever.
i agree that a put/post/etc.. in this situation should just end in a 500, while the get/head/etc can be retried.

there is no need to do the post always on a fresh connection - that incurs a performance penalty on all posts in order to work around an invalid protocol implementation. I'm all for working around broken implementations where it can be done, but not at the cost of optimizations for valid implementations. Its not like this bug is the common case.

the suggestion of just scanning ahead for HTTP/1. is a good one - its just a heuristic for a broken server after all - and doing so probly makes the fresh connection discussion moot.

Attached patch fix v1 (obsolete) — Splinter Review
This patch fixes the problem so, that if some connection is once detected as HTTP/1.X then no further response can be assumed as HTTP/0.9 response and string "HTTP/1." is searched instead.

First I just tried to reset the connection and reschedule the request on another connection. I was hoping that this can be achieved by returning some right error value when processing the content in nsHttpTransaction. It seems that I would need to implement this functionality in nsHttpChannel. But in the end I think that searching for "HTTP/1." is enough.

This patch fixes bug #375701 too.
Attachment #343299 - Flags: review?(bzbarsky)
Comment on attachment 343299 [details] [diff] [review]
fix v1

Christian would be a much better reviewer for this...
Attachment #343299 - Flags: review?(bzbarsky) → review?(cbiesinger)
OS: Linux → All
Hardware: x86 → All
So won't the attached patch break if the response-body contains the string "HTTP/1."?  I think we should just kill the connection if this happens instead of trying to guess like that...

One question: what does Chrome (for which we can read the network stack source) do?  I'm curious how Darin solved this over there, if at all.
Blocks: 264354
Summary: gzipped CSS and JS over SSL fails when reloading → body of HTTP 304 response is treated as a HTTP/0.9 response to subsequent HTTP request
(In reply to comment #27)
> So won't the attached patch break if the response-body contains the string
> "HTTP/1."?  I think we should just kill the connection if this happens instead
> of trying to guess like that...

You're right. But it will work in most cases. Each time I saw this problem, there was an unexpected gzipped body. This doesn't mean that response body can't contain string "HTTP/1." but it doesn't seem to be a common case.

> One question: what does Chrome (for which we can read the network stack source)
> do?  I'm curious how Darin solved this over there, if at all.

It isn't solved in Chrome. Chrome fails on tests mentioned in comment #14.
(In reply to comment #28)
> > So won't the attached patch break if the response-body contains the string
> > "HTTP/1."?  I think we should just kill the connection if this happens instead
> > of trying to guess like that...
> 
> You're right. But it will work in most cases. Each time I saw this problem,
> there was an unexpected gzipped body. This doesn't mean that response body
> can't contain string "HTTP/1." but it doesn't seem to be a common case.

Is it HTTP response split attack safe?
> It isn't solved in Chrome. Chrome fails on tests mentioned in comment #14.

Hmm.  For some reason I thought bug 583827 mentioned the issue not existing in Chrome, but I was just wrong about that.

So the upshot is that if the response body contains "HTTP/1." the following response will still be busted (as in, this bug will resurface), right?

Is this issue really common enough that killing the connection is not reasonable?
(In reply to comment #30)
> So the upshot is that if the response body contains "HTTP/1." the following
> response will still be busted (as in, this bug will resurface), right?

Yes

> Is this issue really common enough that killing the connection is not
> reasonable?

I don't have any stats, but it seems that it is quite common when the 304 response is generated by a script and server uses gzipped transfers. Do you mean just kill the connection, or kill the connection and automatically resend the request? In any case, we should probably use a fresh connection when submitting a form as Opera does (see comment #14).
> but it seems that it is quite common when the 304 response is generated by a
> script and server uses gzipped transfers.

Hmm.  We can special-case this case as needed... (it's a known entity-body we could compare to).

> Do you mean just kill the connection, or kill the connection and automatically
> resend the request?

Which request?  Unless we're pipelining, killing the connection that got a 304 with a nonempty response body will not require anything being resent, right?  The pipelining case I'm not sure about.

Note, though, that killing the connection on nonempty 304 won't let us recover from bogus Content-Length, while the attached patch _might_.

Maybe we need some combination of all the proposed approaches.  e.g.:

1)  If the response is gzipped 304 and the entity body is nonempty but matches
    gzipped empty string, just consume it and move on.
2)  Else if the response is 304 and entity body is nonempty, kill connection
    and restart anything we already sent on that connection.  That involves not
    sending POSTs or GETs with nonempty query, I think, on connections with
    in-flight requests.
3)  Once we see HTTP/1.x don't allow 0.9 responses.
(In reply to comment #32)
> > Do you mean just kill the connection, or kill the connection and automatically
> > resend the request?
> 
> Which request?  Unless we're pipelining, killing the connection that got a 304
> with a nonempty response body will not require anything being resent, right? 
> The pipelining case I'm not sure about.

The current patch detects the bogus body when handling the next request. That's the request I'm talking about.

You're proposing that we should detect it within the current request, which makes sense. But we don't know if the body comes or not until we receive it. So what should we do in case that the body comes a bit later? I.e. we have successfully received 304 response and there is no additional data at this moment (the bogus body would be received on the next OnSocketReadable()), so there is no reason to call AsyncWait().
Ah, hmm.  You're right; I'm not sure how to best handle that case...

I guess we should take this patch, and then file a followup bug on doing #1 and #2 from comment 32?
Attachment #343299 - Flags: review?(cbiesinger) → review?(honzab.moz)
Comment on attachment 343299 [details] [diff] [review]
fix v1

        if (PL_strncasecmp(buf, HTTPHeader + mLineBuf.Length(), checkChars) == 0) {

Please add an assertion before this line that mLineBuf.Length() < strlen(HTTPHeader)

+            if (mLineBuf.Length() == strlen(HTTPHeader)) {
+                // We've found whole HTTPHeader sequence. Return pointer at the
+                // end of matched sequence since it is stored in mLineBuf.
+                return (buf + checkChars);

Don't you need a mLineBuf.Truncate() here?

Why are you sometimes using strlen(HTTPHeader) and sometimes sizeof(HTTPHeader)-1? Please be consistent. Also please add spaces around -

        if (PL_strncasecmp(buf, HTTPHeader, PR_MIN(len, sizeof(HTTPHeader)-1)) == 0) {

please limit your lines to 80 chars
Attachment #343299 - Flags: review?(honzab.moz) → review-
Attached patch patch v2 (obsolete) — Splinter Review
(In reply to comment #35)
> +            if (mLineBuf.Length() == strlen(HTTPHeader)) {
> +                // We've found whole HTTPHeader sequence. Return pointer at
> the
> +                // end of matched sequence since it is stored in mLineBuf.
> +                return (buf + checkChars);
> 
> Don't you need a mLineBuf.Truncate() here?

No, we are only searching for the begin of status line, we're not parsing it so we can't eat it. The part of the status line that we've consumed due to check stays in the mLineBuf and the rest of the line is appended by nsHttpTransaction::ParseLineSegment() called from nsHttpTransaction::ParseHead().

All other comments fixed.
Attachment #343299 - Attachment is obsolete: true
Attachment #486227 - Flags: review?(cbiesinger)
Comment on attachment 486227 [details] [diff] [review]
patch v2

you're right. r=biesi
Attachment #486227 - Flags: review?(cbiesinger) → review+
Attachment #486227 - Flags: superreview?(bzbarsky)
Comment on attachment 486227 [details] [diff] [review]
patch v2

>+        // If HTTP 0.9 response isn't allowed (i.e. 1.0 or 1.1 was already
>+        // detected when talking to the server) then find a HTTP response.
>+        // Otherwise do a simple junk detection.

This comment seems to be backwards.  sr=me with that fixed?
Attachment #486227 - Flags: superreview?(bzbarsky) → superreview+
Michal, can we get an updated patch here so someone can land this? Blocking on this since it will help with other bugs that we want fixed for Firefox 4.
blocking2.0: --- → betaN+
I'll do it asap.
Attachment #486227 - Attachment is obsolete: true
Keywords: checkin-needed
Landed, marking FIXED.

http://hg.mozilla.org/mozilla-central/rev/37aefdd76a22
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Keywords: checkin-needed
Target Milestone: --- → mozilla2.0b8
bug 616591 a possible regression?
Depends on: 628832
Depends on: 632061
Depends on: 632835
Depends on: 635892
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: