Last Comment Bug 269303 - if-modified-since sent even though vary: cookie indicates cached page is outdated.
: if-modified-since sent even though vary: cookie indicates cached page is outd...
Status: RESOLVED DUPLICATE of bug 510359
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: x86 Linux
: -- minor with 2 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
: Patrick McManus [:mcmanus]
Mentors:
: 338656 341779 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-11 18:11 PST by Phil Endecott
Modified: 2009-08-20 01:51 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Phil Endecott 2004-11-11 18:11:32 PST
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040719 Firefox/0.9.1
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040719 Firefox/0.9.1

I am observing what I consider the wrong behaviour when vary: cookie and
if-modified-since are combined.  I enclose an HTTP dump below that illustrates
the problem, but the quick summary is the following:

You have never visited this site before so have nothing cached and no cookies.

You visit page P.  It sends a vary: cookie response and a last-modified time. 
You cache this response.

You visit some other pages in the same site and one of them sets a cookie.

You visit page P again.  Since you now have a cookie, and the page declares that
cookies influence the content, you should fetch a fresh version of the page
rather than using the cached version.  Instead it seems that Mozilla sends an
if-modified-since request.  My server replies "not modified" since the content
it would return to a visitor with the cookie has not changed since the specified
date.

Any thoughts?

HTTP dump follows.

Regards,  Phil.


First page request.  No cookies, nothing in the cache.  Response
includes vary: cookie and a last-modified time.

1084485888[809d868]: http request [
1084485888[809d868]:   GET /treefic/work/treefic/test?a=tree_page HTTP/1.1
1084485888[809d868]:   Host: andorra
1084485888[809d868]:   User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7) Gecko/20040719 Firefox/0.9.1
1084485888[809d868]:   Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
1084485888[809d868]:   Accept-Language: en-us,en;q=0.5
1084485888[809d868]:   Accept-Encoding: gzip,deflate
1084485888[809d868]:   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
1084485888[809d868]:   Keep-Alive: 300
1084485888[809d868]:   Connection: keep-alive
1084485888[809d868]: ]
1103780784[812bb98]: http response [
1103780784[812bb98]:   HTTP/1.1 200 OK
1103780784[812bb98]:   Date: Fri, 12 Nov 2004 01:54:23 GMT
1103780784[812bb98]:   Server: Apache/2.0.48 (Debian GNU/Linux)
1103780784[812bb98]:   Vary: Cookie
1103780784[812bb98]:   Last-Modified: Fri, 12 Nov 2004 00:12:47 GMT
1103780784[812bb98]:   Keep-Alive: timeout=15, max=100
1103780784[812bb98]:   Connection: Keep-Alive
1103780784[812bb98]:   Transfer-Encoding: chunked
1103780784[812bb98]:   Content-Type: text/html
1103780784[812bb98]: ]

Various images and style sheets follow.  Nothing to see here.

1084485888[809d868]: http request [
1084485888[809d868]:   GET /treefic/work/treefic.css HTTP/1.1
1084485888[809d868]:   Host: andorra
1084485888[809d868]:   User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7) Gecko/20040719 Firefox/0.9.1
1084485888[809d868]:   Accept: text/css,*/*;q=0.1
1084485888[809d868]:   Accept-Language: en-us,en;q=0.5
1084485888[809d868]:   Accept-Encoding: gzip,deflate
1084485888[809d868]:   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
1084485888[809d868]:   Keep-Alive: 300
1084485888[809d868]:   Connection: keep-alive
1084485888[809d868]:   Referer: http://andorra/treefic/work/treefic/test?a=tree_page
1084485888[809d868]: ]
1103780784[812bb98]: http response [
1103780784[812bb98]:   HTTP/1.1 200 OK
1103780784[812bb98]:   Date: Fri, 12 Nov 2004 01:54:25 GMT
1103780784[812bb98]:   Server: Apache/2.0.48 (Debian GNU/Linux)
1103780784[812bb98]:   Last-Modified: Sat, 25 Sep 2004 14:24:03 GMT
1103780784[812bb98]:   Etag: "bcb18-46a5-d8eca6c0"
1103780784[812bb98]:   Accept-Ranges: bytes
1103780784[812bb98]:   Content-Length: 18085
1103780784[812bb98]:   Keep-Alive: timeout=15, max=100
1103780784[812bb98]:   Connection: Keep-Alive
1103780784[812bb98]:   Content-Type: text/css
1103780784[812bb98]: ]
1084485888[809d868]: http request [
1084485888[809d868]:   GET /treefic/work/imgs/logo.png HTTP/1.1
1084485888[809d868]:   Host: andorra
1084485888[809d868]:   User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7) Gecko/20040719 Firefox/0.9.1
1084485888[809d868]:   Accept: image/png,*/*;q=0.5
1084485888[809d868]:   Accept-Language: en-us,en;q=0.5
1084485888[809d868]:   Accept-Encoding: gzip,deflate
1084485888[809d868]:   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
1084485888[809d868]:   Keep-Alive: 300
1084485888[809d868]:   Connection: keep-alive
1084485888[809d868]:   Referer: http://andorra/treefic/work/treefic/test?a=tree_page
1084485888[809d868]: ]
1103780784[812bb98]: http response [
1103780784[812bb98]:   HTTP/1.1 200 OK
1103780784[812bb98]:   Date: Fri, 12 Nov 2004 01:54:25 GMT
1103780784[812bb98]:   Server: Apache/2.0.48 (Debian GNU/Linux)
1103780784[812bb98]:   Last-Modified: Mon, 12 Jul 2004 12:52:13 GMT
1103780784[812bb98]:   Etag: "a15a7-1889-d2679940"
1103780784[812bb98]:   Accept-Ranges: bytes
1103780784[812bb98]:   Content-Length: 6281
1103780784[812bb98]:   Keep-Alive: timeout=15, max=99
1103780784[812bb98]:   Connection: Keep-Alive
1103780784[812bb98]:   Content-Type: image/png
1103780784[812bb98]: ]

Navigate to another page.  Nothing special happens.

1084485888[809d868]: http request [
1084485888[809d868]:   GET /treefic/work/treefic/test?a=timeline HTTP/1.1
1084485888[809d868]:   Host: andorra
1084485888[809d868]:   User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7) Gecko/20040719 Firefox/0.9.1
1084485888[809d868]:   Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
1084485888[809d868]:   Accept-Language: en-us,en;q=0.5
1084485888[809d868]:   Accept-Encoding: gzip,deflate
1084485888[809d868]:   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
1084485888[809d868]:   Keep-Alive: 300
1084485888[809d868]:   Connection: keep-alive
1084485888[809d868]:   Referer: http://andorra/treefic/work/treefic/test?a=tree_page
1084485888[809d868]: ]
1103780784[812bb98]: http response [
1103780784[812bb98]:   HTTP/1.1 200 OK
1103780784[812bb98]:   Date: Fri, 12 Nov 2004 01:55:02 GMT
1103780784[812bb98]:   Server: Apache/2.0.48 (Debian GNU/Linux)
1103780784[812bb98]:   Vary: Cookie
1103780784[812bb98]:   Last-Modified: Fri, 12 Nov 2004 00:12:47 GMT
1103780784[812bb98]:   Keep-Alive: timeout=15, max=100
1103780784[812bb98]:   Connection: Keep-Alive
1103780784[812bb98]:   Transfer-Encoding: chunked
1103780784[812bb98]:   Content-Type: text/html
1103780784[812bb98]: ]

Send a POST request, which results in a cookie being set.

1084485888[809d868]: http request [
1084485888[809d868]:   POST /treefic/work/treefic/test HTTP/1.1
1084485888[809d868]:   Host: andorra
1084485888[809d868]:   User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7) Gecko/20040719 Firefox/0.9.1
1084485888[809d868]:   Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
1084485888[809d868]:   Accept-Language: en-us,en;q=0.5
1084485888[809d868]:   Accept-Encoding: gzip,deflate
1084485888[809d868]:   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
1084485888[809d868]:   Keep-Alive: 300
1084485888[809d868]:   Connection: keep-alive
1084485888[809d868]:   Referer: http://andorra/treefic/work/treefic/test?a=timeline
1084485888[809d868]: ]
1103780784[812bb98]: http response [
1103780784[812bb98]:   HTTP/1.1 200 OK
1103780784[812bb98]:   Date: Fri, 12 Nov 2004 01:55:14 GMT
1103780784[812bb98]:   Server: Apache/2.0.48 (Debian GNU/Linux)
1103780784[812bb98]:   Vary: Cookie
1103780784[812bb98]:   Set-Cookie: treefic_test_SessionID="89029931"; Version="1"
1103780784[812bb98]:   Last-Modified: Fri, 12 Nov 2004 00:12:47 GMT
1103780784[812bb98]:   Keep-Alive: timeout=15, max=99
1103780784[812bb98]:   Connection: Keep-Alive
1103780784[812bb98]:   Transfer-Encoding: chunked
1103780784[812bb98]:   Content-Type: text/html
1103780784[812bb98]: ]

Now return to the original page.  The cookie received above is sent,
as well as what I consider to be an erroneous if-modified-since
header.

1084485888[809d868]: http request [
1084485888[809d868]:   GET /treefic/work/treefic/test?a=tree_page HTTP/1.1
1084485888[809d868]:   Host: andorra
1084485888[809d868]:   User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7) Gecko/20040719 Firefox/0.9.1
1084485888[809d868]:   Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
1084485888[809d868]:   Accept-Language: en-us,en;q=0.5
1084485888[809d868]:   Accept-Encoding: gzip,deflate
1084485888[809d868]:   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
1084485888[809d868]:   Keep-Alive: 300
1084485888[809d868]:   Connection: keep-alive
1084485888[809d868]:   Referer: http://andorra/treefic/work/treefic/test
1084485888[809d868]:   Cookie: treefic_test_SessionID="89029931"
1084485888[809d868]:   If-Modified-Since: Fri, 12 Nov 2004 00:12:47 GMT
1084485888[809d868]: ]
1103780784[812bb98]: http response [
1103780784[812bb98]:   HTTP/1.1 304 Not Modified
1103780784[812bb98]:   Date: Fri, 12 Nov 2004 01:55:22 GMT
1103780784[812bb98]:   Server: Apache/2.0.48 (Debian GNU/Linux)
1103780784[812bb98]:   Connection: Keep-Alive
1103780784[812bb98]:   Keep-Alive: timeout=15, max=98
1103780784[812bb98]:   Vary: Cookie
1103780784[812bb98]: ]

The server replies saying the page is not modified.


Reproducible: Always
Steps to Reproduce:
Comment 1 Darin Fisher 2004-11-11 18:51:02 PST
Mozilla interprets the Vary header to mean that the cached content needs to be
validated before being used.  Issuing a conditional request is meant as an
optimization to allow the server to either respond with a 304 or a 200 response.
 It would seem that the server should return a 200 response if the content is
indeed a function of the given Cookie header(s).

This bug sounds invalid to me.

From section 14.44 of RFC 2616:

  The Vary field value indicates the set of request-header fields that fully 
  determines, while the response is fresh, whether a cache is permitted to use 
  the response to reply to a subsequent request without revalidation.

Hence, I believe Mozilla's implementation is correct.  Marking INVALID.
Comment 2 Darin Fisher 2004-11-11 19:13:37 PST
BTW, you might want to use an ETag instead of relying on the Last-Modified
header value to discern entities.  Afterall, if the entity depends on the value
of the Cookie header, then you really have two (or more) different entities. 
So, why not create unique ETag values for each entity?  That way, the browser
will send you a If-None-Match header instead of a If-Modified-Since, and you
will be able to look at the ETag value to determine whether or not it is safe to
return 304.
Comment 3 Phil Endecott 2004-11-12 04:46:42 PST
Darin,

Thanks for the quick response.  I don't agree with you, but thanks for being
quick :-)

I have reviewed RFC2616 (esp. section 13.6) and it does seem to agree with
Mozilla's behaviour.  But surely this will lead to the "wrong" thing happening.
 For example if I "vary: accept-language":  If I fetch the EN version of a page,
then change my preference to FR and fetch it again I expect to see the FR
version of the page.  If Mozilla behaves as you suggest it will do an
if-modified-since fetch, discover that the FR version has not changed recently,
and display the EN version again.

If you are right, then sites that use Vary: must never return 304 replies
(unless, perhaps, they are also using Etags).

Is this is a fault with RFC2616?  Time to find an appropriate mailing list...

Phil.
Comment 4 Darin Fisher 2004-11-12 08:20:43 PST
I think it is intentionally designed this way.  The browser does not know if the
FR version of the page exists.  So, it tells the server what it has in its cache
(by sending it a "If-None-Match: entity-tag" request header).  Then, the server
uses that to decide how to respond.

This is another reason why entity-tags are better than last-modified time stamps
for validating cache entries.  If you use last-modified as the cache validator,
then you are saying essentially that the URL is enough to uniquely identify the
content.

So, maybe Mozilla should bend the rules a bit and not send If-Modified-Since
when validating due to a Vary header.  But, I could turn it around and ask why
the server bothers sending a 304 for a document that is dynamically selected
based on some variable request header?

IMO, the server should use ETags if it wants to send 304 responses sometimes. 
Otherwise, it should ignore conditional requests and serve the content with a
200 response.
Comment 5 Phil Endecott 2004-11-12 08:33:58 PST
Hi again,

Certainly if you use Etags it should all just work.  It needs more work in the
server though.  Last-modified with vary: looks broken-by-design.  I will change
my server code to use Etags (exclusively).  Then will find that it breaks
something else.....

Thankfully I have the weekend before I need worry about this again.

Phil.
Comment 6 Rob Marshall [tH] 2006-06-16 09:48:28 PDT
*** Bug 341779 has been marked as a duplicate of this bug. ***
Comment 7 Chris Lightfoot 2006-06-16 10:07:41 PDT
No, the resolution as "INVALID" here is incorrect -- see also bug 341779. The statement that "Last-modified with vary: looks broken-by-design" suggests a misunderstanding. It is true that the HTTP spec is a bit opaque on this subject, but hopefully the following should clarify:

Here's a thought experiment. Suppose you have a URL A which produces two different responses depending on whether a cookie x=1 is set, and which has a Last-Modified: time in the past, and neither page will ever change in the future.

First consider this request:

<- GET /A HTTP/1.1
<- Host: A

and the server responds,

-> HTTP/1.1 200 OK
-> Content-Type: text/html
-> Vary: Cookie
-> Last-Modified: Sat,  1 Jan 2000 00:00:00 GMT
->
-> body-text-with-no-cookie-set

Clearly this result will never change. So, suppose that the client sends,

<- GET /A HTTP/1.1
<- Host: A
<- If-Modified-Since: Sat,  1 Jan 2000 00:00:00 GMT

Evidently the server may always validly send the response,

-> HTTP/1.1 304 Not Modified
-> Content-Type: text/html
-> Vary: Cookie
-> Last-Modified: Sat,  1 Jan 2000 00:00:00 GMT

because the result has not changed.


Now consider the request,

<- GET /A HTTP/1.1
<- Host: A
<- Cookie: x=1

The server sends the response,

-> HTTP/1.1 200 OK
-> Content-Type: text/html
-> Vary: Cookie
-> Last-Modified: Sat,  1 Jan 2000 00:00:00 GMT
->
-> body-text-with-cookie-x=1-set

This is always the correct response to such a request (see assumptions above). Therefore, if the client sends the corresponding conditional request,

<- GET /A HTTP/1.1
<- Host: A
<- Cookie: x=1
<- If-Modified-Since: Sat,  1 Jan 2000 00:00:00 GMT

it is clearly correct for the server always to send the response,

-> HTTP/1.1 304 Not Modified
-> Content-Type: text/html
-> Vary: Cookie
-> Last-Modified: Sat,  1 Jan 2000 00:00:00 GMT

because the response to the non-conditional form of the request will never have changed.

Does this mean that Vary:... with If-Modified-Since: is broken? No, not at all. The problem occurs *only* if the browser takes a cached copy of the response obtained without the cookie set and assumes that it is also a valid cached copy of the response that would have been obtained if the cookie had been set. This is a bug in Mozilla, which as Darin Fisher says above, interprets Vary as meaning "that the cached content needs to be validated before being used". This is true, if and only if you have cached content FOR THE REQUEST YOU ARE MAKING. You cannot take a response from one request, and assume that it is the correct response for another request that you have not made; if you do, you will come unstuck, which is what Mozilla does.
Comment 8 Chris Lightfoot 2006-06-16 10:11:37 PDT
oh, two other brief points: firstly it doesn't matter whether you say Vary: Cookie or Vary: * -- Mozilla gets it wrong in both cases; secondly, for comparison, IE gets this right and Opera gets it wrong.
Comment 9 Phil Endecott 2006-06-16 11:13:44 PDT
It's now a long time since I filed this bug and I've forgotten the details.  But I suggest that "thought experiments" are less useful than carefully reading what the RFC says!  My recollection is that I originally felt as you do about how it should work "in theory", but in practice that is not what the spec requires.  

It's all easy if you use Etags.

--Phil.
Comment 10 Chris Lightfoot 2006-06-16 11:34:48 PDT
The thought experiment is there to clarify what the RFC says, since, as I say, its own words are a bit opaque. You are correct that the existing Mozilla implementation does not match what the standard requires, but that means that Mozilla is wrong, not that the meaning of the RFC has changed.

However, let's risk further confusion by wading through the relevant bit of the RFC:

Firstly, what does a conditional GET with If-Modified-Since: mean?

| 14.25 If-Modified-Since
| 
|    The If-Modified-Since request-header field is used with a method to
|    make it conditional: if the requested variant has not been modified
|    since the time specified in this field, an entity will not be
|    returned from the server; instead, a 304 (not modified) response will
|    be returned without any message-body.

In English: If-Modified-Since: allows you to check whether a "variant" of an entity has changed since a given date; it will yield a 304 response if there has been no change. What is a "variant"? s.1.3:

|    variant
|       A resource may have one, or more than one, representation(s)
|       associated with it at any given instant. Each of these
|       representations is termed a `varriant'[sic.].  Use of the term `variant'
|       does not necessarily imply that the resource is subject to content
|       negotiation.

A resource is "a thing identified by a URI"; a "variant" is "a thing identified by a URI and (perhaps) some other information". A "variant" and a "representation [of a resource]" are equivalent. The Vary: header (s.14.44),

|    The Vary field value indicates the set of request-header fields that
|    fully determines, while the response is fresh, whether a cache is
|    permitted to use the response to reply to a subsequent request
|    without revalidation. For uncacheable or stale responses, the Vary
|    field value advises the user agent about the criteria that were used
|    to select the representation.

In English: if a response contains a Vary: header, then that header tells you which fields in your request were used to choose the specific variant of the resource that was sent to you; therefore, if any of those fields changes in a subsequent request, then you will get a different variant that time.


So: when you send a request you get a "representation of a resource" -- a "variant". Which variant you get depends on the URI and on the fields in your request which were listed in the Vary: header of the response. An If-Modified-Since: header in a request allows you to test whether an old copy of a given variant -- NB NOT a resource -- is still valid. A 304 response to a conditional request for a particular resource tells you that the variant which would have been returned by a previous non-conditional request would still be valid; it does not tell you about any other variant or about the resource in general, and in particular a conditional request for one variant (say, the one you get sending a cookie) does not tell you anything about the validity of any other variant (for instance, the one you get not sending a cookie), and nor does it tell you whether the variant you have is the same as any other variant.

Mozilla's behaviour here is wrong. It gets one variant, asks whether another has changed, and if told "no" shows the user the first variant again.
Comment 11 Phil Endecott 2006-06-16 12:03:09 PDT
The counter-argument relies on the last two words of the paragraph cited in comment #1:  

  The Vary field value indicates the set of request-header fields that fully 
  determines, while the response is fresh, whether a cache is permitted to use 
  the response to reply to a subsequent request without revalidation.

The key thing is _revalidation_ : it says "permitted to use ... without revalidation", not "permitted to use ... at all".  The implication is that it _is_ allowed to reuse the response as long as it revalidates it, which is what Moz does.

There is ambiguity and complexity here.  I can't see any holes in your reading of the RFC: certainly the use of "variant" in the definition that you cite of If-Modified-Since is interesting.  Perhaps the best thing to do is to ask the people who wrote the RFC?

Practically, I decided that I needed something that worked and I re-implemented my server-side code to use Etags, and it now works fine.  You just need to hash together the last-modified time and the significant cookie values to generate an Etag, and return that with the response.

One other comment: you mention that this works as you expect in IE.  My experience is that IE is very, very pessimistic about caching; I don't think it will ever cache anything if it got a Vary header.
Comment 12 Chris Lightfoot 2006-06-16 15:47:02 PDT
"permitted to use ... without revalidation", not "permitted to use ... at all"

Your error is in thinking that a cached copy of one variant may be used to validate another. You cannot retrieve one variant, then ask questions about its validity and interpret them as giving you information about the validity of another variant (of which you do not even have a copy!).

> One other comment: you mention that this works as you expect in IE.  My
> experience is that IE is very, very pessimistic about caching; I don't think it
> will ever cache anything if it got a Vary header.

yeah. My guess is that they read the RFC, realised they didn't understand it, and decided to make conservative assumptions which were correct, even if they were not as economical as possible. Being more charitable to Microsoft, it may be as simple as that they understood it where authors of other browsers did not.
Comment 13 Henrik Nordstrom 2006-06-16 16:42:10 PDT
I don't agree the RFC is poorly worded. There is very clear MUST level rules governing this.

How a cache is supposed to operate is defined in 13.6. Of particular it says:

  "the cache MUST NOT use such a cache entry to construct a response to
   the new request unless all of the selecting request-headers present
   in the new request match the corresponding stored request-headers in
   the original request."

and

  "the cache MUST NOT use a cached entry to satisfy the request unless
   it first relays the new request to the origin server in a conditional
   request and the server responds with 304 (Not Modified), including an
   entity tag or Content-Location that indicates the entity to be used."
Comment 14 Darin Fisher 2006-06-19 17:06:07 PDT
OK, I'm willing to reopen this bug and change the behavior to not attempt a cache validation when there is no ETag, provided someone can show me an existing website, where this causes Firefox not to work.  Otherwise, I'd rather remain consistent with our existing interpretation of RFC 2616.
Comment 15 Chris Lightfoot 2006-06-20 01:11:56 PDT
See the example in bug 341779, with a test case, reproduced below:

Steps to Reproduce:
1. Go to http://caesious.beasts.org/~chris/cgi-bin/vary
2. Note the first line, which tells you whether a particular cookie is set
3. Click the button in the page
4. Click the link below the button

Actual Results:  
The page loaded when the link is clicked is exactly the same as the one
originally loaded.

Expected Results:  
Firefox should have downloaded the new page.

The source for the CGI script above is here:
http://caesious.beasts.org/~chris/tmp/20060616/vary

The request and response headers look like this:

1. first request (actually this was shift-reload)

GET /~chris/cgi-bin/vary HTTP/1.1
Host: caesious.beasts.org
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20060116
Firefox/1.5
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://caesious.beasts.org/~chris/cgi-bin/vary
Pragma: no-cache
Cache-Control: no-cache

2. First response. The request is not conditional so the full response is sent;
the body states that no cookie was received:

HTTP/1.1 200 OK
Date: Fri, 16 Jun 2006 16:21:39 GMT
Server: Apache/1.3.19 (Unix) mod_fastcgi/mod_fastcgi-SNAP-0404142202
Vary: Cookie
Last-Modified: Fri, 16 Jun 2006 00:00:00 GMT
Keep-Alive: timeout=15, max=95
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html

3. Second request sent, after clicking the "Set cookie" button in the page. The
browser wrongly sends a conditional GET with the newly-set vary_test=1 cookie,
even though it does not have a valid cached copy of the resource:

GET /~chris/cgi-bin/vary HTTP/1.1
Host: caesious.beasts.org
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20060116
Firefox/1.5
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://caesious.beasts.org/~chris/cgi-bin/vary
Cookie: vary_test=1
If-Modified-Since: Fri, 16 Jun 2006 00:00:00 GMT

4. The server sends a correct 304 Not Modified response;

HTTP/1.1 304 Not Modified
Date: Fri, 16 Jun 2006 16:21:55 GMT
Server: Apache/1.3.19 (Unix) mod_fastcgi/mod_fastcgi-SNAP-0404142202
Vary: Cookie
Last-Modified: Fri, 16 Jun 2006 00:00:00 GMT
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/plain


What I suspect is happening here is that the cache is not keyed on the full set
of URL + relevant request headers (i.e., those named in the Vary: header sent
with the cached response). So when the user clicks the link to get the new
version of the page, the cache is consulted and the browser wrongly concludes
that it has a fresh cached copy, and all it needs to do is check with the
server that it is still valid. The server, thinking that it is being asked, "is
the copy of the resource with URL
'http://caesious.beasts.org/~chris/cgi-bin/vary' and a header 'Cookie:
vary_test=1' still valid?", responds that it is.

Note that this doesn't just apply to 'Vary: Cookie' -- the same bug occurs with
http://caesious.beasts.org/~chris/cgi-bin/vary2, which sends 'Vary: *'.
Nevertheless the browser still sends a conditional GET response after the user
clicks the link in the page.

NB this is *not* a duplicate of bug 94123, and the statement in the comment of
Thomas Rutter that Mozilla handles this case correctly but inoptimally is not
accurate (though it may have been in previous versions).



More generally, "provided someone can show me an existing website, where this causes Firefox not to work" is a dangerous approach. Web application authors test their applications against different browsers (this is how I discovered that Mozilla is broken in this case); if a particular feature specified in the standard doesn't work in a common browser, as here, then you change the site not to use it, because it is not practical to replace all users' browsers. You cannot test whether something is a "valid" bug by looking for sites broken by it on the web, because it is very likely that the mere fact that the bug exists will have caused web site developers to work around the problem (inefficiently, in this case) so that it does not exhibit. The correct approach is to correctly implement the standard.
Comment 16 Darin Fisher 2006-06-20 11:10:42 PDT
Chris: That's a testcase that you created right?  How about an actual deployed website?  Is this a real problem that causes Firefox not to work on real websites? 
Comment 17 Chris Lightfoot 2006-06-20 11:23:57 PDT
This will affect any website which uses (say) cookies for login, and supports If-Modified-Since:. However, because Mozilla is broken as described above, I would expect anyone who has tried to implement such a website to have suppressed support for such conditional GETs (from Mozilla at least), because otherwise users of that browser will get (at best) a confusing user experience or (at worst) a site that does not work at all. Developers who discover that Mozilla's support for conditional GET does not work are forced to handle conditional GETs as non-conditional GETs (thereby negating their advantages), because Mozilla is now prevalent enough that its users cannot be ignored, even though they are using a non-standards-compliant browser. That is why I provided a test case. 

Obviously even if you do fix this it will be some time before it is safe to implement conditional GET, because there will be a lot of broken copies of Mozilla out there for a long time from now, but it would be nice if at *some point* in the future this stuff started working again (from comments on another bug I understand that this feature used to work properly but has since been broken).
Comment 18 Darin Fisher 2006-06-20 11:28:04 PDT
You're overstating the problem.  Conditional requests that depend on cookie values work fine provided the server uses ETags appropriately.  If there's a real website where this problem occurs, then it becomes a high priority bug to fix.  Otherwise, we're just speculating that it is a problem.
Comment 19 Chris Lightfoot 2006-06-20 11:31:20 PDT
gah! It's a SELECTION EFFECT! Because this feature of Mozilla is broken, nobody can use this feature in a website. But this feature is desirable because it saves bandwidth and improves performance.


Thought experiment: suppose Mozilla's support for (rolls dice...) CSS was horribly broken, and so nobody used CSS in their website. Would the correct response of the maintainers to the report that that feature was broken be, "nobody uses that so we don't need to fix it"?

When did policy on making Mozilla standards-compliant change, ooi?
Comment 20 Darin Fisher 2006-06-20 11:33:07 PDT
fair enough... patches welcome
Comment 21 Darin Fisher 2006-06-20 11:33:56 PDT
-> default owner
Comment 22 Phil Endecott 2006-06-20 14:10:57 PDT
(In reply to comment #16)
> How about an actual deployed website?

When I reported this in November 2004 there was an "actual deployed website" where it caused a problem - treefic.com.  A user reported that sometimes they would log in yet would not be able to access certain functions.  It took me many days to track the problem down and create the HTTP dump in the description at the top of this bug.  After Darin's response in comment #1 I went away and recoded it to use Etags.  Treefic.com is sadly no longer active.
Comment 23 Darin Fisher 2006-06-20 14:21:12 PDT
Thanks for the info Phil!
Comment 24 Chris Lightfoot 2006-06-20 14:38:29 PDT
Try this (not tested):

--- netwerk/protocol/http/src/nsHttpChannel.cpp.orig      Tue Jun 20 22:35:38 2006
+++ netwerk/protocol/http/src/nsHttpChannel.cpp   Tue Jun 20 22:36:36 2006
@@ -1443,7 +1443,7 @@
         }
     }
 
-    PRBool doValidation = PR_FALSE;
+    PRBool doValidation = PR_FALSE, varies = PR_FALSE;
 
     // Be optimistic: assume that we won't need to do validation
     mRequestHead.ClearHeader(nsHttp::If_Modified_Since);
@@ -1485,6 +1485,7 @@
     else if (ResponseWouldVary()) {
         LOG(("Validating based on Vary headers returning TRUE\n"));
         doValidation = PR_TRUE;
+        varies = PR_TRUE;
     }
     // Check if the cache entry has expired...
     else {
@@ -1562,7 +1563,7 @@
             const char *val;
             // Add If-Modified-Since header if a Last-Modified was given
             val = mCachedResponseHead->PeekHeader(nsHttp::Last_Modified);
-            if (val)
+            if (val && !varies)
                 mRequestHead.SetHeader(nsHttp::If_Modified_Since,
                                        nsDependentCString(val));
             // Add If-None-Match header if an ETag was given in the response
Comment 25 Phil Endecott 2006-06-20 15:14:39 PDT
What does the patch do?  It looks to me as if it unconditionally fetches pages if they have a Vary: header.  Is that right?
Comment 26 Chris Lightfoot 2006-06-20 15:26:13 PDT
Not quite unconditionally -- If-None-Match: will still be sent if there is an entity tag.
Comment 27 Henrik Nordstrom 2006-06-20 16:02:59 PDT
Chris: this will make Firefox not validate Vary objects without ETag, right? Or in other words an optimization to not fall into second last paragraph of 10.3.5 304 Not Modified (which is a MUST level condition btw..).

Still there is loopholes where the cache won't operate per the RFC. This should be complemented with an equality check in the 304 processing (second last paragraph of 10.3.5) to catch a number of unexpected cases, and a Content-Location check should be added to the above optimization to allow conditional if Content-Location is known.

In the equality tests below, not having a header computes as a blank value to simplify things.

  if (304_have_etag && old_etag == 304_etag)
     ok
  else if (old_etag != 304_etag)
     retry without conditional
  else if (304_have_content_location && old_content_location == 304_content_location)
     ok
  else if (old_content_location != 304_content_location)
     retry without conditional
  else else
     ok


If it's possible to remember the previous request headers then the logics can be finetuned a bit to allow caching even if the server returns neither ETag or Content-Location on Vary:ing responses, but I am not sure that would be a good thing...


All the above is based on the assumption of a simple cache with at most one object per URL. Shared caches like Squid which I work most with have a bit more to take into account..
Comment 28 Henrik Nordstrom 2006-06-20 16:21:32 PDT
Chris: Right. A somewhat simplified approach to the problem and perhaps not exacly what the RFC had in mind but should mask nearly all of the problem cases and not in any way a violation.

In theory If-Modified-Since should be send whenever validating an object where a modification time is known, but it's not required. If-None-Match takes higher priority anyway, and there is not very much gained from supporting Vary without ETag. So your opimization is very reasonable.

Still the RFC requires validation of the 304 response before accepting it as valid. This to catch situations occuring when servers are upgraded adding Vary+ETag support to the content and other corner cases.
Comment 29 Philip Withnall (unavailable) 2006-06-20 23:13:13 PDT
*** Bug 341779 has been marked as a duplicate of this bug. ***
Comment 30 Julian Reschke 2006-06-27 03:32:56 PDT
I've got a related problem over here.

In some cases, my server returns 

- Last-Modified
- Etag
- Expires: (5 minutes in the future)

According to LiveHttpHeaders and about:cache, Firefox gets the resource once (status 200), and recognizes the Expires header.

If I re-access the resource while it's fresh, Firefox re-validates it's cache entry (with If-None-Match, server returns 304), although it could have used the cached response (request headers are identical, after all).

That's a bug, right?
Comment 31 Henrik Nordstrom 2006-07-02 14:01:06 PDT
(In reply to comment #30)

> If I re-access the resource while it's fresh, Firefox re-validates it's cache
> entry (with If-None-Match, server returns 304), although it could have used the
> cached response (request headers are identical, after all).
> 
> That's a bug, right?

Depends on how you re-access the resource. Certain GUI actions forces a fresh copy.

If it was a plain request by following a link back to the resource then it smells like a bug yes, but in such case it's a different bug than what this bug report is about.

This bug report is about Firefox not honoring the Vary header proper, causing wrong content to be displayed if the server does not support If-None-Match (or when there is no ETag to use in If-None-Match).
Comment 32 Matthew Somerville 2007-09-05 04:59:48 PDT
I agree that the RFC is quite clear, to me; here's the full sentence Henrik quotes part of in comment #13: "When the cache receives a subsequent request whose Request-URI specifies one or more cache entries including a Vary header field, the cache MUST NOT use such a cache entry to construct a response to the new request unless all of the selecting request-headers present in the new request match the corresponding stored request-headers in the original request."

Here's easy steps to show the bug:
* set Firefox to prefer English (Tools->Options->Advanced->Languages)
* clear the cache to make sure we have a clean slate
* fetch a web page that correctly sets Vary: Accept-Language, sends a Last-Modified header, and doesn't send an ETag - e.g. http://www.dracos.co.uk/about/
* change Firefox to prefer French
* fetch the same web page with a refresh

As the Accept-Language request-header has changed (from "en,fr;q=0.5" to "fr,en;q=0.5" here) - it no longer matches the request-header in the original request, and Firefox "MUST NOT" use its cached copy, as per the RFC. However, it currently does, as you can see, and reshows the English page rather than the new French one.

Chris's patch seems sensible to me, with no side effects that I can see, certainly improving Mozilla. Is there anything I can do to help get this implemented?
Comment 33 Henrik Nordstrom 2007-09-05 05:16:47 PDT
Chris patch reduces the value of the cache for Vary:ing objects without ETag. But it's a reasonable compromise without extending the cache to also store the relevant request headers (those listed in Vary).

Extending the cache to also store the relevant request headers is required to be fully compliant however (or alternatively to keep a invalidation timestamp, and making sure the cache is invalidated on actions which may change the request headers, but that does not feel like a good approach). This also allows for cache validations of Vary:ing objects with Last-Modified only (no ETag).

The example you provided is a good one for explaining the scope of the problem. After the user has changed his language preferences the cache of language dependent objects is not supposed to be considered valid, and any visit to such object (even fresh ones) should cause the cache to be revalidated.  If should not be required by the user to force a refresh in order to have the new language preferences reflected. To make that point even more obvious consider the user closing his browser and returning a day later, and since the object is still fresh in the cache he still receives the old language version...
Comment 34 Matthew Somerville 2007-09-05 05:28:35 PDT
"If should not be required by the user to force a refresh in order to have the new language preferences reflected." - it doesn't matter if they do that; even a Ctrl-F5 in Firefox here sends an If-Modified-Since header and so you still get the English version, no matter how much you want the French one. You have to actually clear the cache in order to get the other language.
Comment 35 Phil Endecott 2007-09-05 06:52:09 PDT
> Chris's patch seems sensible to me, with no side effects that I can see

The side effect is increased server load in a quite common case.  It really is necessary to do this properly, i.e. to put the relevant headers in the cache.

Here's the common case: a site has a default appearance that is seen by 99% of visitors.  The remaining 1% are subscribers who see a personalised version.  Subscribers are identified by a cookie, so the site sends vary:cookie.  If done properly, with any cookie header stored in the cache, the cache will hit until the cookie changes.  If done as Chris suggests, the cache will never hit.  So the server load is increased for all visitors, not just subscribers.

Just to re-iterate: if you're implementing a site, your life will be much simpler if you use etags.
Comment 36 Tim McCormack 2007-09-24 19:56:31 PDT
If you need another test case (or live demo) of the Vary-ignoring bug, here's a great one: http://www.tradeups.net/user/admin

In case the page is down, here's a description of the chain of events:
1. Page /user/admin is requested with standard request headers, HTML page is returned. 
2. A script on the page requests /user/admin with Accept: application/json in order to get a JS-readable version of the data. JSON is returned through the AJAX call.
3. View Source or Save Page gets you the JSON, not the HTML, since the JSON is the last to be cached.

All these requests return Vary: Accept in the response headers, but it doesn't help.
Comment 37 Tyler Downer [:Tyler] 2009-06-27 17:34:01 PDT
*** Bug 338656 has been marked as a duplicate of this bug. ***
Comment 38 Christian :Biesinger (don't email me, ping me on IRC) 2009-08-20 01:51:13 PDT
It turns out that a newer bug was filed and has a patch, so marking this as a duplicate of the newer bug.

*** This bug has been marked as a duplicate of bug 510359 ***

Note You need to log in before you can comment on or make changes to this bug.