Last Comment Bug 352848 - cache expiration problems with blog sites? (14 years of Heurostic Expiration instead of considering as "expired", if "Expires: -1" is returned)
: cache expiration problems with blog sites? (14 years of Heurostic Expiration ...
Status: RESOLVED FIXED
[Fx 2.0.0.1] uiHitList
: fixed1.8.1.1
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: x86 Windows XP
: -- normal (vote)
: mozilla1.9alpha1
Assigned To: Darin Fisher
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-15 11:27 PDT by chris hofmann
Modified: 2007-04-02 18:32 PDT (History)
5 users (show)
mtschrep: blocking1.8.1-
dveditz: blocking1.8.1.1+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1 patch (1.33 KB, patch)
2006-09-27 21:42 PDT, Darin Fisher
cbiesinger: review+
dveditz: approval1.8.1.1+
Details | Diff | Splinter Review

Description chris hofmann 2006-09-15 11:27:22 PDT
reported to webmaster...

I like http://crawfordslist.blogspot.com/  It has not been seen on my Firefox since Sunday, Sept 10.  I can get the daily blog with no problem on Safari.  It there a problem with your browser?

Rousculp 

-----------------------

my wife sees this a lot on her blog as well...  she will publish an update, then come bug me becuase the new update can't be viewed in firefox.  if we clear the cache firefox goes out and grabs the latest content and everything is fine.  I've seen this problem off and on in the code base since pre-necko days.  I'm wondering what the best way to investigate is?
Comment 1 WADA 2006-09-17 23:12:28 PDT
Phenomenon of Bug 277813(and Bug 328605)? 

Read Bug 271652 which is listed in Bug 328605.
(Read other bugs listed in it for more example)
And check status of related files in cache(Expires: thru about:cache).  
And Get HTTP header data. 
 See Bug 221036 Comment #7 for getting data by NSPR logging.
 See Bug 221036 Comment #6 for getting data by LiveHTTPHeaders.

Comment 2 Gerry Daly 2006-09-26 16:26:18 PDT
Some additional information. I downloaded and installed "Live HTTP Headers" from http://livehttpheaders.mozdev.org/ and waited until I experienced this problem (which I have seen before).

I went to Ann Althouse's Blog at http://www.althouse.blogspot.com/ and noticed that it was a stale copy. The request headers were as follows.
REQUEST: Get / HTTP/1.1
Host: www.althouse.blogspot.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, */*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7
Keep-Alive: 300
Connection: keep-alive

Response headers:
RESPONSE: HTTP/1.1 200 OK
Server: Apache
Vary: Accept-Encoding
test: %{HOSTNAME}e
Last-Modified: Tue, 26 Sep 2006 03:32:52 GMT
ETag: W/"17d502a-2a4df-4518ae02"
Accept-Ranges: none
Content-Type: text/html
Content-Encoding: gzip
Content-Length: 47108
Date: Tue, 26 Sep 2006 03:26:46 GMT
Cache-Control: private, xgzip-ok=""
Pragma: no-cache
Expires: -1

On the "General" tab of the Page Info (with Live Http Headers installed), it shows Expires: Tuesday, May 05, 2020 11:04:54 PM. That appears to me to be the problem. I have no idea why this page is getting stored in the cache with an Expires date 14 years in the future when the returned headers are as specified.

Comment 3 WADA 2006-09-26 16:54:08 PDT
(In reply to comment #2)
> Expires: -1
Who generated this header? Apache? Weblog applicaion? Or your script?
Comment 4 WADA 2006-09-27 00:24:16 PDT
(In reply to comment #2)
> Last-Modified: Tue, 26 Sep 2006 03:32:52 GMT
> Date: Tue, 26 Sep 2006 03:26:46 GMT
Another question.
Why future time-stamp is returned as Last-Modified: ?
Time-stamp of "Date:" is start of script execution, and time-stamp of Last-modified: is end of script execution? (6 minutes to execute script...)  
Comment 5 WADA 2006-09-27 00:32:04 PDT
(Addition to comment #4)
Proxy server is used?
"Content-Encoding: gzip" for html is rare when usual server, I think, but is popular when proxy server. So clock mis-match between original server and proxy server can produce such HTTP headers.
Comment 6 Gerry Daly 2006-09-27 10:24:35 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > Expires: -1
> Who generated this header? Apache? Weblog applicaion? Or your script?
> 

Professor Althouse tried various ways of getting her site to work well with Firefox and that was one of them. Per the RFC: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21

"HTTP/1.1 clients and caches MUST treat other invalid date formats, especially including the value "0", as in the past (i.e., "already expired")."

A -1 is clearly not a valid date format, so it should be treated as already expired. It is not. Sometimes it results in "Not specified" but much of the time it results in a date sometime in the year 2020.

Also, Cache-Control: no-cache and Pragma: no-cache were tried by her as well, and neither of them worked. She also tried a valid in-the-past date. Same deal.
Comment 7 Gerry Daly 2006-09-27 10:28:08 PDT
"Why future time-stamp is returned as Last-Modified: ?
Time-stamp of "Date:" is start of script execution, and time-stamp of
Last-modified: is end of script execution? (6 minutes to execute script...)"

And 

""Content-Encoding: gzip" for html is rare when usual server, I think, but is
popular when proxy server. So clock mis-match between original server and proxy
server can produce such HTTP headers."

Regarding these questions-- the answer is because that is the way Blogspot (which is owned by Google) is doing things. 

It would be nice if Blogspot would not be doing unusual things, but it also would be nice if Firefox was not doing even more unusual things in response (including, by my read, not quite following the RFC on what to do when there is an invalidly formatted Expires header).
Comment 8 Gerry Daly 2006-09-27 11:18:32 PDT
(In reply to comment #6)
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21
> 
> "HTTP/1.1 clients and caches MUST treat other invalid date formats, especially
> including the value "0", as in the past (i.e., "already expired")."

Per this from the RFC, I believe that lines 545-552 are in error in nsHttpResponseHead.cpp:

535 nsresult
536 nsHttpResponseHead::GetExpiresValue(PRUint32 *result)
537 {
538     const char *val = PeekHeader(nsHttp::Expires);
539     if (!val)
540         return NS_ERROR_NOT_AVAILABLE;
541 
542     PRTime time;
543     PRStatus st = PR_ParseTimeString(val, PR_TRUE, &time);
544     if (st != PR_SUCCESS) {
545         // parsing failed... maybe this is an "Expires: 0"
546         nsCAutoString buf(val);
547         buf.StripWhitespace();
548         if (buf.Length() == 1 && buf[0] == '0') {
549             *result = 0;
550             return NS_OK;
551         }
552         return NS_ERROR_NOT_AVAILABLE;
553     }
554 
555     if (LL_CMP(time, <, LL_Zero()))
556         *result = 0;
557     else
558         *result = PRTimeToSeconds(time); 
559     return NS_OK;
560 }

I believe the correct code should be:

nsresult
nsHttpResponseHead::GetExpiresValue(PRUint32 *result)
{
     const char *val = PeekHeader(nsHttp::Expires);
     if (!val)
         return NS_ERROR_NOT_AVAILABLE;
 
     PRTime time;
     PRStatus st = PR_ParseTimeString(val, PR_TRUE, &time);
     if (st != PR_SUCCESS) {
         // parsing failed but header exists. Treat as already expired...

         *result = 0;
         return NS_OK;
     }
 
     if (LL_CMP(time, <, LL_Zero()))
         *result = 0;
     else
         *result = PRTimeToSeconds(time); 
     return NS_OK;
 }

I apologize in advance for not knowing how to go about creating a formal patch to submit as a proposed solution. If someone wants to email me and teach me how, I would be glad to learn.

Gerry
Comment 9 WADA 2006-09-27 17:44:47 PDT
(In reply to comment #6)
> Also, Cache-Control: no-cache and Pragma: no-cache were tried by her as well,
> and neither of them worked. She also tried a valid in-the-past date.

Server returns both "Cache-Control: private" and "Pragma: no-cache".
And, I couldn't find "Cache-Control: no-cache" in your HTTP header log.  
> RESPONSE: HTTP/1.1 200 OK
> Cache-Control: private, xgzip-ok=""
> Pragma: no-cache
Gerry Daly, do you know specific description about such situation in protocol definition of HTTP?

HTTP 1.1 says "Pragma: no-cache should be treated as if Cache-Control: no-cache is specified", but I think it is only when no Cache-Control: header case because "Pragma" is defined by HTTP 1.1 for backward compatibility purpose only.
Even if "Pragma: no-cache" is always to be treated as "Cache-Control: no-cache", I don't know what should be done when both "Cache-Control: private" and "Cache-Control: no-cache" are returned. 
And, the server says "I'm HTTP 1.1"...
Comment 10 Darin Fisher 2006-09-27 21:39:47 PDT
The patch in comment #8 looks good to me.
Comment 11 Darin Fisher 2006-09-27 21:42:18 PDT
Created attachment 240408 [details] [diff] [review]
v1 patch

Patch based on comment #8.  Thanks!
Comment 12 Gerry Daly 2006-09-28 07:17:59 PDT
(In reply to comment #9)
> Server returns both "Cache-Control: private" and "Pragma: no-cache".
> And, I couldn't find "Cache-Control: no-cache" in your HTTP header log.  

I am sorry for any ambiguity. Let me try to clarify.

First, she has been battling this issue for a few months, along with the help of some of her readers (like me). We told her some things to try, including those. Some of them she tried in the past, and did not work. Some of them she tried in the example I presented. I understand that the headers do not show all of the things I said she has tried, because that was just one example.

Second, she does not have direct control over the response headers. She has been attempting to get around this problem using META HTTP-EQUIV tags. The patch I suggested above will work with the fact that the RFC says that any invalid date in the Expires header should be considered to be already expired, but further testing by me indicates that a META HTTP-EQUIV="Expires" CONTENT="0" still does not do the trick; the Expires shows up in the Page Info (and in about:cache) as being in the year 2020 whenever Blogspot has returned a Date header that is earlier than the Last-Modified header (thanks, WADA, for pointing me in the right direction).

In other words, the proposed patch I came up with fixes a bug, just not the one that was reported here. :-/
Comment 13 Christian :Biesinger (don't email me, ping me on IRC) 2006-09-28 22:11:04 PDT
(In reply to comment #12)
> the Expires shows up in the Page Info
> (and in about:cache) as being in the year 2020 whenever Blogspot has returned a
> Date header that is earlier than the Last-Modified header (thanks, WADA, for
> pointing me in the right direction).

That was fixed in bug 323708, right?
Comment 14 Gerry Daly 2006-09-29 06:46:20 PDT
(In reply to comment #13)
> 
> That was fixed in bug 323708, right?
> 

That does look to my eyes like it would do the trick. Excellent!
Comment 15 Christian :Biesinger (don't email me, ping me on IRC) 2006-09-29 12:16:08 PDT
Comment on attachment 240408 [details] [diff] [review]
v1 patch

this will lead to additional requests if servers use a nonstandard date format... I guess that's ok
Comment 16 chris hofmann 2006-09-29 12:44:59 PDT
if we think this is really low risk it might have a pretty positive impact on folks that do blogging and a lot of other places where folks are seeing stale content and getting frustrated.
Comment 17 Mike Schroepfer 2006-09-29 15:50:50 PDT
It's too late to get this into FF2 - but if we can get the patch in the trunk we'd love to consider for 2.0.0.1.   
Comment 18 Darin Fisher 2006-10-03 19:54:22 PDT
fixed-on-trunk
Comment 19 Reed Loden [:reed] (use needinfo?) 2006-11-22 20:04:16 PST
As this is blocking1.8.1.1+, please either request approval1.8.1.1 on the current patch or, if needed, attach a branch version of the patch and request approval1.8.1.1 on it.
Comment 20 Reed Loden [:reed] (use needinfo?) 2006-11-28 12:46:42 PST
Comment on attachment 240408 [details] [diff] [review]
v1 patch

From a Bonsai inspection, it looks like the same patch would work fine on the branch.
Comment 21 Daniel Veditz [:dveditz] 2006-11-29 10:17:30 PST
Comment on attachment 240408 [details] [diff] [review]
v1 patch

Darin confirmed we don't need a separate patch for the branch.

approved for 1.8, a=dveditz
Comment 22 Daniel Veditz [:dveditz] 2006-11-29 13:51:37 PST
Fixed on 1.8 branch

Checking in nsHttpResponseHead.cpp;
/cvsroot/mozilla/netwerk/protocol/http/src/nsHttpResponseHead.cpp,v  <--  nsHttpResponseHead.cpp
new revision: 1.42.2.3; previous revision: 1.42.2.2

Note You need to log in before you can comment on or make changes to this bug.