Closed Bug 419194 Opened 16 years ago Closed 16 years ago

The "refresh" in addition to validators sends "max-age=0", breaks HTTP 1.1, huge impact

Categories

(Core :: Networking, defect)

1.8 Branch
x86
Windows XP
defect
Not set
major

Tracking

()

RESOLVED INVALID

People

(Reporter: wlodek, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12

This is a basic HTTP v 1.1 handling issue.
It can be observed with any URL with correct cache control headers like in the URL above. 
Any "refresh" action sends a GET request with conditional request headers (strong validation in the submitted example) with additionally header:"Cache-Control: max-age=0" .

Sending 'max-age=0' in this case is incorrect, unjustified and contradictory to HTTP 1.1 . 'max-age' actualy forces the origin validation. 

The effects of it are that:
- browser cache is not used with any simple 'refresh' requests
- intermediate caches force origin revalidation, even if strong validation should allow to serve content from the caches whenever possible.
- there is no difference between F5 and Ctrl+F5 browser actions
( the Ctrl+F5 adds yet another headre "pragma=nocache", which should be equivalent to already present max-age=0)

We are working with ATT's CDN network that happens to honor HTTP 1.1 and does revalidate on 'max-age=0'.  While there are CDN networks (like the one used by cnn.com) that disregard 'max-age' and serve content from the caches.
In other words, Firefox's refresh strategy penalize the good HTTP CDN and cache server implementations.

The issue has a very serious impact on origin server performance, resource use, CDN network use, end user network use and of course end users performance. 
We estimate that our sites with large Firefox population (40%) create 10 times more hits to origin servers then comparable sites with negligible firefox population.

Generally speaking browsers should NOT use the "cache-control:max-age=0",
already used pragma should be enough.

 
This is what (shame to say) IE does.

Please look at urgently fixing at least standard refresh, remove the cache control header, this is really urgent..

Then, why not get rid of 'max-age=0' in all circumstances, since ctrl+f5 inserts pragma=nocache anyway.


best regards
Wlodek Stankiewicz
network architect
navlink gmbh
+33 4 97 232 255







Reproducible: Always

Steps to Reproduce:
1.do a GET to any URL with correct caching paranmeters (max-age) etc.
2.puch refresh
3.look at the GET headres

IF  a CDN is in between, look at 'age' parameter and return code.
properly working CDN will return 200 and age=0 .

CDN that disregards the 'max-control' in get, will send an 304 and age=xx
p
Actual Results:  
=== trace from 'live http headers' plug-in ===
http://www.volvo.com/internet/img/volvocom/logo_wordmark.gif

GET /internet/img/volvocom/logo_wordmark.gif HTTP/1.1
Host: www.volvo.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Cache-Control: max-age=3600
Content-Length: 1711
Content-Type: image/gif
Last-Modified: Tue, 29 Jan 2008 14:08:15 GMT
Accept-Ranges: bytes
Etag: "5e52648062c81:2b8"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sat, 23 Feb 2008 16:28:28 GMT
----------------------------------------------------------

===== click on refresh icon ====

http://www.volvo.com/internet/img/volvocom/logo_wordmark.gif

GET /internet/img/volvocom/logo_wordmark.gif HTTP/1.1
Host: www.volvo.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
If-Modified-Since: Tue, 29 Jan 2008 14:08:15 GMT
If-None-Match: "5e52648062c81:2b8"
Cache-Control: max-age=0  <====== this is the smoking gun !!!

HTTP/1.x 304 Not Modified  <=== this is the revalidation from origin server
Cache-Control: max-age=3600
Last-Modified: Tue, 29 Jan 2008 14:08:15 GMT
Accept-Ranges: bytes
Etag: "5e52648062c81:2b8"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sat, 23 Feb 2008 16:28:37 GMT



Never send 'cache-control:max-age=0' in a standard refresh (urgent fix needed)

Never send  'cache-control:max-age=0' in any requests (duplicates existing functionality available via pragma=nocache)
Component: General → Networking
Product: Firefox → Core
QA Contact: general → networking
Version: unspecified → 1.8 Branch
Is this problem still present in Firefox 3 beta 3?
Mozilla does a end-to-end revalidate with Reload/CTRL+R.

from http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3
"Specific end-to-end revalidation
The request includes a "max-age=0" cache-control directive, which forces each cache along the path to the origin server to revalidate its own entry, if any, with the next cache or server. The initial request includes a cache-validating conditional with the client's current validator. "

shift+reload does a end-to-end reload

"End-to-end reload
The request includes a "no-cache" cache-control directive or, for compatibility with HTTP/1.0 clients, "Pragma: no-cache". Field names MUST NOT be included with the no-cache directive in a request. The server MUST NOT use a cached copy when responding to such a request. "

This bug report looks invalid to me.
Gecko does what it should and follows the RFC.
Matthias,

There is no issue with "Ctrl/Reload" (or maybe a small one that we argue about).

The issue is that just "Reload" inserts "cache-control:max-age=0".

In the example above, a content is received on "Date: Sat, 23 Feb 2008 16:28:28 GMT" with max-age=3600 that is 1h
So, it should be fresh up to 17:28:28.

A "Reload" (without Ctrl) was initiated and Gecko sends a revalidation request with "Cache-Control: max-age=0 " and thus forcing end to end revalidation.

It should have done one of the two things instead:
(1) satisfy the request from the browser cache
(2) request a revalidation without ANY cache-control directives.

Forcing end to end revalidation when this is not requested by the user and there is a fresh content in the browser cache, is not correct operation according to HTTP RFC.
ctrl+r is just the normal reload and it should insert the max-age=0 header because it should be an end-to-end revalidate. It would be a bug if Gecko wouldn't do this (see for example bug 208797).

It's a correct operation according to the http RFC:
"End-to-end revalidation might be necessary if either the cache or the origin server has overestimated the expiration time of the cached response."

Please read http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4 including the part of max-age and performance issues.

marking invalid,, this is by design but I will cc biesi who owns necko to confirm this
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Dear Matti,

I think I understand the confusion !
I've mixed 'reload' and 'refresh' terms, in this case 'Ctrl/reload' does not make sense.

So, again.

GECKO inserts 'cache-control:max-age=0' on Refresh , "F5" or clicking on the "refresh icon" (circular arrow), this is the issue.

It is a different action for a Reload request, "F5/Ctrl", "Ctrl + Refresh icon"

Browser always requires end to end validation on Reload.

I'm sorry for confusion.

===============
Fixing the issue is not as straight forward as it seems.
Ideally one should just make sure that GECKO sticks to HTTP.
And if a content is received with max-age=3600, GECKO serves this content from its cache for 3600 seconds.
There are two risks with this approach
- GECKO will start making freshness decisions, previously all refresh requests were revalidated against upstream servers. So, new or dormant code will be executed. (special case is handling content with no explicit caching information)
- it is possible that sites supplying  incorrect or no caching information at all were still working properly (that is, could reasonably update content) because of frequent revalidation requests.
( say, content 'max-age' is set "a week" for a site that makes daily updates)

IE revalidates content frequently even if it has fresh content in its cache.

A  safe engineering decision should be to request revalidation of the browser cache more often then what is necessary from HTTP RFC point of view.

======================
I see following corrective strategies.

1. Very safe one.
require revalidation on every refresh request (standard conditional GET with no cache-control headers). And thus, never serve browser cache content without revalidation. No new browser code is executed.

There is a performances gain, validation with the first upstream server is faster then end to end.
The big winner is that it does not kill properly working cache servers.

2. Safe with more caching.
Objective should be to serve the content from browser cache without revalidation whenever possible but still revalidate at a reasonable rate.
A very simple solution would be to set a maximum time a content could sit in a browser cache without validation to a 'reasonable' value, like one to few minutes.

cheers

wlodek




Do you mean that Gecko sends a max-age=0 if it refreshes the site for example using the back button (not reload or shift+reload) ?
This would be a bug but you can't use livehttpheaders to validate this because there is AFAIK a bug somehwere with Page info.
Create a http log from Mozilla and attach it:
http://www.mozilla.org/projects/netlib/http/http-debugging.html
 
I was doing my tests with F5 (see the text from the Fierfox help):
"Reload
      	F5
        CtrlCmd+R

Reload (override cache)
      	CtrlCmd+F5
        CtrlCmd+Shift+R"

It seems to imply that a simple "Reload" does not override the cache, and this is the thing that send "max-age:0".

Because of very high volume of the requests with 'max-age:0' I think it's created during normal site browsing and requesting a content like a style sheet on the 'next' page. Will trace that tomorrow.
You need to log in before you can comment on or make changes to this bug.