Closed Bug 718797 Opened 12 years ago Closed 9 years ago

Heuristic freshness calculation does not account for relaxation of rules in HTTPbis

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: briansmith, Assigned: mcmanus)

References

()

Details

(Keywords: perf, Whiteboard: [HTTPbis][parity-Chrome])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #468594 +++

This is a spin-off from bug #462243 addressing the problem of heuristic query-freshness. In that bug, we implemented the RFC 2616 requirement for validating cache entries that contain a query string:

   From http://tools.ietf.org/html/rfc2616#section-13.9:
   
  "We note one exception to this rule: since some applications have
   traditionally used GETs and HEADs with query URLs (those containing a
   "?" in the rel_path part) to perform operations with significant side
   effects, caches MUST NOT treat responses to such URIs as fresh unless
   the server provides an explicit expiration time. This specifically
   means that responses from HTTP/1.0 servers for such URIs SHOULD NOT
   be taken from a cache. See section 9.1.1 for related information."

NOTICE however, that this is limited to ***HTTP/1.0*** servers and does not apply to HTTP/1.1 responses.

HTTPbis has proposed to change this:

   From http://tools.ietf.org/html/draft-ietf-httpbis-p6-cache-18#section-2.3.1.1

  "Note: RFC 2616 ([RFC2616], Section 13.9) required that caches do
   not calculate heuristic freshness for URIs with query components
   (i.e., those containing '?').  In practice, this has not been
   widely implemented.  Therefore, servers are encouraged to send
   explicit directives (e.g., Cache-Control: no-cache) if they wish
   to preclude caching."

We should comment on ietf-http-wg@w3.org that the query string restriction HAS been widely implemented (at least, it was implemented by us, which qualifies on its own as "widely").

We should either change our implementation to take the HTTP version into account (only do validation for 1.0, but not 1.1+). Or, if we think that is a bad idea, we should push back on the HTTPbis change ASAP with our rationale to have the change reverted. I do not know the HTTPbis timeline but this probably needs to be done ASAP.

We should also investigate what IE8/9/10 (at least) do in this situation.

See bug 703995 where a Patrick Müller demonstrated that this is a significant performance difference between Chrome and Firefox.

https://bugzilla.mozilla.org/show_bug.cgi?id=703995
(In reply to Brian Smith (:bsmith) from comment #0)
>    From http://tools.ietf.org/html/rfc2616#section-13.9:
>    
>   "We note one exception to this rule: since some applications have
>    traditionally used GETs and HEADs with query URLs (those containing a
>    "?" in the rel_path part) to perform operations with significant side
>    effects, caches MUST NOT treat responses to such URIs as fresh unless
>    the server provides an explicit expiration time. This specifically
>    means that responses from HTTP/1.0 servers for such URIs SHOULD NOT
>    be taken from a cache. See section 9.1.1 for related information."
> 
> NOTICE however, that this is limited to ***HTTP/1.0*** servers and does not
> apply to HTTP/1.1 responses.

Fwiw, I don't agree with your interpretation. My understanding is that 1.1 responses MUST do a conditional request for such cached entries, but since 1.0 do not define conditional requests we cannot send one, and hence cannot use cached 1.0 entries at all (which we won't, because a 1.0 server will not understand the cond req and just return the resource again with a 200).

> HTTPbis has proposed to change this:
> 
>    From
> http://tools.ietf.org/html/draft-ietf-httpbis-p6-cache-18#section-2.3.1.1
> 
>   "Note: RFC 2616 ([RFC2616], Section 13.9) required that caches do
>    not calculate heuristic freshness for URIs with query components
>    (i.e., those containing '?').  In practice, this has not been
>    widely implemented.  Therefore, servers are encouraged to send
>    explicit directives (e.g., Cache-Control: no-cache) if they wish
>    to preclude caching."

This is fine but doesn't really specify how a client should behave, right? :) It just states that servers should be more explicit. (Which will avoid the whole problem.)

> We should comment on ietf-http-wg@w3.org that the query string restriction
> HAS been widely implemented (at least, it was implemented by us, which
> qualifies on its own as "widely").

Agreed.
(In reply to Bjarne (:bjarne) from comment #1)
> (In reply to Brian Smith (:bsmith) from comment #0)
> Fwiw, I don't agree with your interpretation. My understanding is that 1.1
> responses MUST do a conditional request for such cached entries, but since
> 1.0 do not define conditional requests we cannot send one, and hence cannot
> use cached 1.0 entries at all (which we won't, because a 1.0 server will not
> understand the cond req and just return the resource again with a 200).

You are right. I misread the spec. 

> This is fine but doesn't really specify how a client should behave, right?
> :) It just states that servers should be more explicit. (Which will avoid
> the whole problem.)

The effect is that the requirement "caches MUST NOT treat responses to such URIs as fresh unless the server provides an explicit expiration time" is REMOVED from the HTTP 1.1 specification, which means effectively that we could adopt Chrome's behavior and conform to HTTPbis vs RFC2616.

If we think that HTTPbis is wrong here, and we think we must validate these cached responses, then we need to argue for HTTPbis to revert to the RFC 2616 requirement so that Chrome is considered to be non-compliant. As it is right now, Chrome is just as compliant with HTTPbis as we are (in this respect) AND has much better performance in this scenerio. Not a good situation.

The other thing is that we could be more lenient with some things and not others. For example, we might use heuristic expiration for images, stylesheets, and/or scripts (and related things) either by mime type or by context (e.g. <img src=''>), but force validation for HTML documents, XHR, etc. If we were to do this, it would be difficult for the HTTP specification to say anything too concrete about it, but we should make sure the spec. is more in line with what we do.
Summary: Heuristic freshness calculation does not take into account HTTP version and does not account for relaxation of rules in HTTPbis → Heuristic freshness calculation does not account for relaxation of rules in HTTPbis
on this one choose fast over safe.
If Chrome has been doing this for a while (do we know how long?), then I think it's safe to assume that HTTP/1.1 sites are either not returning transient pages with get + query string, or will fix themselves not to.  So seems like we should take the HTTPBiz behavior.
(In reply to Bjarne (:bjarne) from comment #1)

> My understanding is that 1.1
> responses MUST do a conditional request for such cached entries, but since
> 1.0 do not define conditional requests we cannot send one, and hence cannot
> use cached 1.0 entries at all (which we won't, because a 1.0 server will not
> understand the cond req and just return the resource again with a 200).

Conditional requests still work in 1.0; HTTP versioning isn't monolithic.



> This is fine but doesn't really specify how a client should behave, right?
> :) It just states that servers should be more explicit. (Which will avoid
> the whole problem.)
> 
> > We should comment on ietf-http-wg@w3.org that the query string restriction
> > HAS been widely implemented (at least, it was implemented by us, which
> > qualifies on its own as "widely").
> 
> Agreed.

In that text, "widely" means "by many cache implementations" (i.e., it's not talking about market share). Still happy to talk about it in the WG, of course. 

Please do do quickly, however; we're trying to get to LC.

While you're looking at cache-related issues, see also:
  http://trac.tools.ietf.org/wg/httpbis/trac/ticket/212
... as it'd be good to confirm that works for you too.
Well, it doesn't seem like we ever took this back to the WG.  Should we change something about our implementation here?
Flags: needinfo?(mcmanus)
I've confirmed that chrome will cache these documents - I'm going to make the change to match that behavior.
Flags: needinfo?(mcmanus)
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Comment on attachment 8671883 [details] [diff] [review]
allow heuristic cache of query string resources

Review of attachment 8671883 [details] [diff] [review]:
-----------------------------------------------------------------

I, for one, welcome our new heuristically-valid overlords!
Attachment #8671883 - Flags: review?(hurley) → review+
https://hg.mozilla.org/mozilla-central/rev/bb031a571493
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: