Closed Bug 583827 Opened 14 years ago Closed 14 years ago

Failure to gracefully recover from invalid 304 with response-body over a persistent connection

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 363109

People

(Reporter: jab_creations, Unassigned)

Details

(Keywords: student-project)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
Build Identifier: 

If a (X)HTML (both text/html as well as application/xhtml+xml page is cached the first asset file (.css or .js) is NOT requested.

If a .css or .js file is loaded from the cache (HTTP 304) then the NEXT asset (.css or .js file) is NOT requested.

If the second file is 304 the third asset file is NOT requested.

If there are fifty asset files and the 45th one is HTTP 304 then the 46th asset file will likely not be loaded (did not test this out).

Since this bug has sucked up four days of my life I hardly even have the desire to create this bug report though I will answer questions about what I did with this report.

This bug is so detailed I actually shot and posted a video which can be viewed here for further clarification...
http://www.youtube.com/watch?v=M7vh-c4ioIo

The key thing that I'm 99% sure about is that whatever is cached the following file that SHOULD be requested going from a TOP TO BOTTOM approach is NOT requested. I used Fiddler and paid very close attention to server access logs.

Here is an example of the HTTP headers in Firefox 3.6...
/*************
HTTP/1.1 304 Not Modified
Date: Mon, 02 Aug 2010 19:32:07 GMT
Server: Apache/2.2.12 (Win32) DAV/2 mod_ssl/2.2.12 OpenSSL/0.9.8k mod_autoindex_color PHP/5.3.0 mod_perl/2.0.4 Perl/v5.10.0
Connection: Keep-Alive
Keep-Alive: timeout=5, max=97
Expires: Tue, 03 Aug 2010 19:32:07 GMT
Cache-Control: max-age=3600, must-revalidate
Vary: Accept-Encoding
Set-Cookie: member=3ci0smauv10dg1k7j88lpkgo71; path=/
Set-Cookie: date=2010-07-26+20%3A32%3A07; expires=Wed, 01-Sep-2010 19:32:07 GMT; path=/
*************/



Reproducible: Always

Steps to Reproduce:
1. Clear browser cache and fiddler's log.
2. Ensure web page or a JavaScript or style sheet asset file will load from the browser cache on second page load.
3. Load ANY (X)HTML, CSS, or JavaScript file (files that are naturally cached such as images do not apply).
Actual Results:  
If the page response is HTTP 304 the first script or style sheet file (regardless of type) is not REQUESTED.

If the second asset file is loaded from the browser cache then the third asset file is not REQUESTED.

Expected Results:  
All files should be requested unless there is some intentional standards based reason they should not.

The mime or media type does not seem to make any difference (text/html or application/xhtml+xml).

The presence of the base element does not seem to make any difference.

I am NOT aware if this bug afflicts assets loaded from a third party server.

This bug is present in Firefox 3.5, 3.6, and nightly builds of Firefox 4 Beta as late as August 2nd 2010.

I am using Windows XP 64 bit.
John, thank you for the bug report.  I just tried the steps to reproduce over here, and I can't reproduce the problem, unfortunately....

Is there a specific site you see the problem with?  If so, is the site publicly available, or available behind a password you can share with a small group (e.g. one person).  If there's no way I can load a site that shows the problem myself, does an HTML file including a single stylesheet should show the problem for you?  If it does, would you be willing to create a log using the directions at https://developer.mozilla.org/en/HTTP_Logging and attach it to this bug?

Two other things to check first, though:
1) Do you have any extensions installed in the relevant profile?  If so does
   the problem happen in safe mode?
2) Did you by any chance turn on HTTP pipelining in this profile?
First off this is a private testing environment in alpha, albeit only a few weeks from beta. Besides cache and compression everything else on this build seems to be working rock solid. The test page is the front page (with the flags for translations). For basic reference this bug is posted when this build version is Version 2.9 Alpha 5. Any one poking around this build before it goes live interested in causing trouble shouldn't bother as all registered accounts require my personal approval.

I have tested this in the following builds...

Windows NT 5.2; en-US; rv:1.8.1.20) Gecko/20081217 Firefox/2.0.0.20
Windows NT 5.2; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19
Windows NT 5.2; en-US; rv:1.9.1.9) Gecko/20100315 Firefox/3.5.9
Windows NT 5.2; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
Windows NT 5.2; WOW64; rv:2.0b3pre) Gecko/20100726 Minefield/4.0b3pre

My nightly won't update though it will not likely have a different result for those running the absolute latest builds. Gecko 1.8 acted differently though that may have been a different cache related bug. Gecko 1.9.0~2.0.x all seem to have the same reaction.

An important clarification I should make...if I cache the first requested style sheet I think the second style sheet will also likely not be loaded...the same may be for the JavaScript files. If desired I will spend the time to create additional dummy CSS and JavaScript includes if this helps debug the issue.

This is the first time I have worked with cache in conjunction with pulling data from a database and therefore found myself having to manually program PHP to make caching possible. It's nothing super-critical so I do not mind posting that code if desired as I would not proclaim to be 100% absolutely certain that I am executing the code correctly however I would not post this bug if I did not feel that Gecko has a legitimate bug as it makes no sense for the next vertically included asset (external CSS or JavaScript files) to not even be requested as they are independent resources.

As far as verifying the easiest and quickest way to do this in order to make it absolutely painless is to use Firefox's clear private data option to clear the cache, then use Chris Pederick's web developer toolbar to disable cookies (in order to minimize requests on my site), load up Eric Lawrence's Fiddler2 (ensure there are no other active HTTP connections), load the URL which the front page should have an HTTP code of 200 (the test layer should be visible); then simply reload the page with cache enabled and the layer should disappear. This is the easiest visual way to confirm that the CSS file "style-gecko-cache-bug.css" was not requested...check Fiddler. If the front page's HTTP code is 304 Not Modified then that layer will not be styled because Firefox will NOT have requested that style sheet. If Firefox receives an HTTP 200 response from the front page then the style sheet WILL be loaded. Again to reiterate the bug, if the front page is HTTP 304 then the first asset file will NOT be requested.

Here is the link to my private test build...
http://www.jabcreations.com/members/

Again I will be happy to provide clarification and testing to help squash this bug.
I forgot to mention I have tested this in safe mode using Firefox 3.6.8 and a completely fresh profile while not in safe mode in my nightly build (listed above) in Firefox 4.0.
OK, I've managed to reproduce the bug with Firefox 3.6.8 once so far; usually .  When I see it, here's what necko thinks is going on with the second response for that stylesheet (the one that makes it not show up):

-1340051456[44d070]: nsHttpTransaction::ProcessData [this=193c4540 count=589]
-1340051456[44d070]: nsHttpTransaction::ParseHead [count=589]
-1340051456[44d070]: nsHttpResponseHead::ParseVersion [version=]
-1340051456[44d070]: looks like a HTTP/0.9 response
-1340051456[44d070]: Have status line [version=9 status=200 statusText=OK]
-1340051456[44d070]: nsHttpTransaction::HandleContent [this=193c4540 count=589]
-1340051456[44d070]: nsHttpTransaction::HandleContentStart [this=193c4540]
-1340051456[44d070]: http response [
-1340051456[44d070]: ]

Necko does think it sent an HTTP GET for that URI...  Of course that says nothing about what went on the wire.
I meant to say "usually on reload the problem doesn't appear".
OK, so in the log I pretty consistently see one response that looks like HTTP 0.9 to necko; which response varies, though.

The actual bytes involved also vary once you get far enough into the necko code, but if I trace things back to where we first detect something as having gone wrong (and assert when Pushback is called on an nsHttpConnection, which is bogus; we should handle it if it can happen) the bytes look somewhat like this:

      0x1f    0x8b    0x08    0x00    0x00    0x00    0x00    0x00
      0x00    0x03    0xbc    0x5a    0x4d    0x8f    0xdc    0xb6
      0x19    0xbe    0x1b    0xf0    0x7f    0xa0    0x15    0x20
      0x27    0xcf    0x68    0x67    0x1d    0xb7    0xfe    0xd8
      0x19    0xc3    0xf6    0xfa    0xb3    0x4e    0x76    0x51
      0xaf    0x9d    0xf4    0xb4    0xa0    0xa4    0x77    0x24
      0xee    0x4a    0xa2    0x4c    0x52    0x3b    0x3b    0x7b
      0xaa    0x8b    0xa0    0x68    0x91    0x53    0x0f    0x69

I'm looking at where we read from the socket, so that's raw gzip data on the wire.  If I gunzip this data, it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>JAB Creations</title>

etc.  Which is just the site's front page... and not something I would expect to see on a socket where we're listening for a CSS or JS file.
On my site I use Apache rewrite however (at least the way I have it implemented) I have to *manually* set the mime for CSS and JavaScript files. The mime *is* set correctly on all CSS and JavaScript files...so what URL did you encounter that response with?
I ran into it a few times now with the "bug" stylesheet, and a few times with a script (possibly different scripts; I didn't look that carefully).

Going to see if I can get a useful packet log from outside Gecko to see what that looks like.
OK, and today I just can't reproduce the issue at all (though the server is also responding with HTTP 200 to requests for the first several resources)...
John, if you can reproduce the issue reliably, would you be willing to create a Wireshark log of it and either mail it to me or attach it to this bug?
I think I overwrote the file from my local build by accident so I'll pick a specific URL for test purposes as the caching is effecting development of my site in general. I'm going to work on that now and follow up once I've got the bug reproducing on the specific URL (the cache feature will not be enabled on any other page on this test build of my site). I'll look in to Wireshark after I've posted my follow up in a little bit.
Here is my follow up; I've moved the test page to the following URL...
http://www.jabcreations.com/members/web/gecko-bug-583827

With Firefox 3.6's cache cleared and cookies disabled loading the page twice reproduces the problem at this URL now.

I have set the site to *ONLY* reproduce this cache bug at this specific URL so that way I can continue to work on other parts of my site without this test page being effected. If for some reason this bug does not seem to be reproducible again please let me know and I'll retest it however the code specific to this page and the cache feature is exceptionally unlikely to be modified even if I modify the file the code is contained on.

I'm going to have lunch and after I return I'll look in to Wireshark unless you think it's no longer applicable at this point? Please let me know either way.
I installed Wireshark however I can't read anything as the font-size is set to negative z, changing the font-size has no effect, and reducing my screen resolution to 800x600 did not make anything readable either. If you still want a log of some sorts I'd be willing to try a different program.
John, thanks.  The url in comment 12 shows the problem for me just fine, so no need for you to worry about Wireshark.

What I see with the packet-sniffer there is that we make the request for "/members/web/gecko-bug-583827" and the server responds with a 304 response like so:

HTTP/1.1 304 Not Modified
Date: Wed, 11 Aug 2010 03:09:30 GMT
Server: Apache/2.2.15 (Unix) mod_ssl/2.2.15 OpenSSL/0.9.7a mod_bwlimited/1.4 PHP/5.2.13
Connection: Keep-Alive
Keep-Alive: timeout=5, max=100
Expires: Thu, 12 Aug 2010 03:09:30 GMT
Cache-Control: max-age=3600, must-revalidate
Vary: Accept-Encoding
Set-Cookie: member=a9a6fc320ad143af0553017785c75726; path=/
Set-Cookie: date=2010-08-04+4%3A09%3A30; expires=Fri, 10-Sep-2010 03:09:30 GMT; path=/
Set-Cookie: settings29=audio..0__broadbandimages..0__checkboxes..0__connection..0__css3..0__css3columns..2__cursors..0__dhtml..0__initialfocus..search_query__keyboardlayout..developer__pagination..form__personality..0__powerkeys..0__scrollbars..0__sidebar..20__sounds..0__theme..classic__wallpaper..central-cygnus.jpg__; expires=Fri, 10-Sep-2010 03:09:30 GMT; path=/
Set-Cookie: counter=1; expires=Fri, 10-Sep-2010 03:09:30 GMT; path=/

but then followed by a bunch of gzip data.  Presumably the server thinks it's sending an actual response body as part of the 304 response (and in fact the gzip data is just the gzipped main page), but of course the HTTP spec says (RFC 2616 section 10.3.5, with the capitalization directly quoted from the spec):

  The 304 response MUST NOT contain a message-body, and thus is always
  terminated by the first empty line after the header fields. 

Now I think we end up making the request for that CSS file on the same persistent connection, and then when we read data from it we think we're getting back the CSS data we asked for but we get the gzipped main page instead.  Which of course doesn't have any HTTP headers, so we assume it's an HTTP 0.9 response, etc.

We do know that we're sort of in this situation, in the sense that we make that PushBack() call on the nsHttpConnection, but I don't think we know at that point how much data we need to read from the connection to get to non-bogus data...

What we should probably do if we end up having to PushBack() on a non-pipelined connection is to kill the connection off and not reuse it for any other requests, since we have no way to get it to a reasonable state.

John, you probably want to fix your server to not send invalid 304 responses, since the best you can hope for from a fix on our end is killing off the main-page persistent connection and falling back on other connections for loading the subresources, which just slows down the load.

I really appreciate you creating the testcase and bearing with me here, though; we do need to fix our recovery in this situation.
Severity: critical → normal
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: x86 → All
Summary: Firefox cache - Failure to REQUEST next vertical top to bottom external file (JavaScript or CSS style sheet) → Failure to gracefully recover from invalid 304 with response-body over a persistent connection
AWESOME! With your reply (specifically the part about sending gzip data) I tried something I thought might work and it did! Thank you Boris!

I program server-side code with PHP so all I had to do was use the die() command just after setting the HTTP 304 header. That occurs *before* any compression is set (I enable compression after all the headers are sent) and in PHP when you use the die() command that particular request sends out anything that has been put together by PHP and stops executing any/all commands for that specific request.

I have successfully tested this both locally and live on that test URL successfully and there does not seem to be from my initial/casual perspective anything wrong with the way the page loads.

Please feel free to retest this out at your end and let me know if there is anything else I can do to help out. Most importantly thank you for your hard work pinning this down!
John, I don't think we need anything else in the way of help here.  It's pretty clear what the problem is on our end, and it should be pretty easy to reproduce it locally in our test server.  Thanks for offering, though!
This seems to be dupe of bug #363109.
Yes, indeed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.