Closed Bug 619376 Opened 9 years ago Closed 7 years ago

Caching files in memory with expired date and/or no-cache headers

Categories

(Core :: Networking: Cache, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED INVALID

People

(Reporter: jduell.mcbugs, Unassigned)

Details

Attachments

(1 file)

Bjarne or Michal: can one of you look at this?

Reported by hughnougher@gmail.com in bug 559729 comment 9 and after.

There's a bunch of possible issues here, but most serious possibility seems to be that we cache responses such as 

Cache-Control: private, no-cache, no-store, must-revalidate
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache

Not sure about the unix epoch expiration (Do we use that when no explicit expiration date is given?), but presumably no-cache content should be cached.

Definitely worth looking into to make sure we're not doing something terribly wrong here.
From a quick read-through I get the impression that the cache-entries in question reside in memory cache? AFAIK, we keep non-cacheable entries in memory-cache to facilitate history operations and these entries are not supposed to be used on subsequent (non-history) requests. If they are, it's clearly a problem.

Or am I missing something crucial?
(In reply to comment #1)
> From a quick read-through I get the impression that the cache-entries in
> question reside in memory cache? AFAIK, we keep non-cacheable entries in
> memory-cache to facilitate history operations and these entries are not
> supposed to be used on subsequent (non-history) requests. If they are, it's
> clearly a problem.

Yes. For the entries in the memory cache that is exactly the problem.

After looking up section 14.9 in http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html the no-store directive is the most important in this because it means "the cache MUST NOT intentionally store the information in non-volatile storage, and MUST make a best-effort attempt to remove the information from volatile storage as promptly as possible".

The "no-cache" in both Cache-Control and Pragma mean exactly the same and it seems I was slightly off with my knowledge of it. I always thought it meant it should not be cached but it seems it can be cached though must revalidate with the server on each use.

The expries header seems to do the same as the "no-cache" directive once the date has been passed.

This is about all I can comment on so I'll just follow this and listen to what others say on this topic.
Hugh, "non-volatile storage" is another way of saying "disk" as opposed to "memory".  Memory is volatile storage, or that is to say, it's lost when the computer is shut down.

So that means if anything, the last MUST of that section is potentially not being given enough "best" effort.

I agree that no-cache entries being in there after 3 days isn't really ideal.  From a security perspective, I expect content served as no-store and no-cache and the like to be gone from my cache as soon as reasonable, e.g. if they're from a banking website, etc., without even having to close my browser.

I guess the other issue mentioned was that the expiries dates of 1970 are confusing to users who don't know about unix timestamps, but I doubt that's a big issue.

-[Unknown]
As you say the second part of the quote I took from the RFC was the reason the I quoted it since it obviously says that all those entries with "no-store" in memory that I saw should not be still there.

With the "no-cache" directive its an interesting choice of what to do with them. Since the RFC (hope the one I liked is the one we should use) says they can be used again if revalidated, maybe we need to check what other browsers/proxies do with these if its even implemented, or run a trail on real uses. I say if its not used then just treat them like the "no-store" because there is no chance of using them again. If it is implemented then maybe just keep larger items or only for a day?

With the 1970 expiries its definitely confusing and I would also like some sort of pagination/search features since with a large disk cache it kills firefox while trying to display it. Maybe this issue should be part of bug 576814 ?
Changed summary to make it clear that this is for caching in memory (RAM) which is referred to in the standards as 'volatile storage'. 
Common practice is to keep also the 'no-store' objects in memory (as long as the browser runs). There is no active eviction policy on the memory cache (besides for size and for doomed entries).

"MUST make a best-effort attempt to remove the information from volatile storage as promptly as possible": What is 'as promptly as possible', and this needs to be balanced with performance, and with other browsers (such as Chrome).

Keeping a more strict policy and flushing objects sooner will impact performance negatively, especially compared to the other browsers.
Summary: Caching files with expired date and/or no-cache headers → Caching files in memory with expired date and/or no-cache headers
Well, that's a reasonable point.

I've also noticed that the handling of certain headers seems to vary.  The common tactic I've seen is to send "Expires: 0", which is explicitly noted in the HTTP 1.1 spec for the purpose of "already expiring" content.

Interestingly, many sites send Expires headers in the past.  These are the ones that end up persisting in the memory cache, from some experiments I've been doing tonight.  "Expires: 0" seems to successfully and immediately evicted (which is great, because that's exactly what banking websites use.)

Technically, HTTP 1.1 says that the following should be treated equally: "0", values before the Date header, and the same value as the Date header.  At least as far as I've always interpreted section 14.21.

Is it possible that's simply not being honored?

I'm not really alarmed, myself, as long as Expires: 0 is properly working, since that's the one most people use who actually care about it not being cached.  AFAIK it works everywhere.

-[Unknown]
After reading section 14.21 and 14.18, I think the "Expires: 0" is supposed to be treated like the 1970 dates and therefore cached unless the "no-store" is given.

And the any Expires date in the past should be changed to expires = current date, date being the date field from the response if exists or the local time if not. http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.4 seems to imply this so that "freshness_lifetime" is zero in these cases.

(In reply to comment #5)
With the "no-store" and having it removed form the cache asap, it should not effect performance and all browsers should (or by RFC its almost a must) be the same. Having "no-store" means it must be collected from the server again for a new page request.

If you are talking about the history list section 13.13 says "a history mechanism is meant to show exactly what the user saw at the time when the resource was retrieved" so each time the back button is pressed it could have a different historical copy of the "no-store" resource. If a browser is reusing a resource that has a "no-store" on it between pages then its not following HTTP 1.1 is what I say.
I'm not entirely certain where we are on this subject now...  Maybe reporter would summarize what is still believed to be bugs in this context, and we'll take it from there? If possible, URLs, about-cache dumps and other information might be attached in order to illustrate any issues.
Collecting together proof of this is tricky because at least the "no-store" usually relates to private content. I will attempt it anyway. Below are 4 examples from my memory cache. I have no idea if same problem is in disk cache because its too difficult to search through.

1. One page which is likely the easiest to reproduce is the twitter calls from (I think) echofon addon. The responses are gaining fetch counts event though it has a no-store on it. Will attach screenshot of an entry.

2. I also see lots of entries from http://pixel.quantserve.com/pixel which have a no-store. They also use a random number in the address so its not effecting them but it is cache entries that have no chance of being used again.

3. This might be the easiest to reuse. Crash pages on http://crash-stats.mozilla.com appear to give the no-store in the headers. One I was looking at earlier was http://crash-stats.mozilla.com/report/index/11200f9a-d074-4fc0-9498-dc2a52101217 which is both still cached and being reused.

4. The entries that brought my attention to it, http://www.facebook.com/plugins/like.php type pages that contain the facebook like button which appears on many pages. (even though they have the no-store on it, this would seem like a good thing to cache for many sites. maybe talk to them?)

My current thinking for the expired entries is that they are fine to keep but maybe items below a certain size should be discarded after expiry due to overhead of storage and revlidating with server. (512B or 1KB?)

My other arguments in comments above also had other possible problems along a similar line though might just be taken as off topic. They include:
- "Expires: 0" content is able to be cached.
- Content with no-store must not be reused between requests, even if its on successive pages in history of a tab (though I guess it could be allowed to save memory IFF its first proven to be the same).
- Content with no-store must not be in disk cache and must be removed from memory cache asap. I also think these items should not be visible in about:cache since they have the most chance of being very private information like bank details.

I hope this collects the issues together and is helpful.
According to the RFC it is OK to reuse a response with no-cache header after validating it with the server, so it makes sense to cache it. And we do cache also expired and no-store responses for purposes like view-source, save as, etc.

> 1. One page which is likely the easiest to reproduce is the twitter calls from
> (I think) echofon addon. The responses are gaining fetch counts event though it
> has a no-store on it. Will attach screenshot of an entry.

The fetch count is increased when the entry is activated in nsCacheService::ActivateEntry(). It actually doesn't mean that the entry was reused so many times. E.g. viewing the entry in about:cache increases the count or even activating the entry just to find out that we can't reuse it increases it too.
(In reply to Michal Novotny (:michal) from comment #11)
> According to the RFC it is OK to reuse a response with no-cache header after
> validating it with the server, so it makes sense to cache it. And we do
> cache also expired and no-store responses for purposes like view-source,
> save as, etc.

and offline mode and session restore, AFAICT.
(In reply to Jason Duell (:jduell) from comment #0)

First, we're talking HTTP, so by default we're allowed to cache the content.

> Cache-Control: private, no-cache, no-store, must-revalidate

This means "If you do cache this, you should not put it in a 
shared cache, you must revalidate the entry before using it, you must not store it on disk, and if the entry is expired you must revalidate it before using it." (Notice how no-cache subsumes must-revalidate so the must-revalidate is redundant, AFAICT.)

> Expires: Sat, 01 Jan 2000 00:00:00 GMT

We must revalidate the entry before using it because it is in the past and there's Cache-Control: must-revalidate. (But, even without Expires and must-revalidate, we must do so, because of no-cache.)

> Pragma: no-cache

This means that the developer of the server does not understand that "Pragma: no-cache" is a request directive, not a response directive. :)

From this, we can conclude that it is totally cromulent for us to be caching this response in memory, so I'm resolving this as invalid. But, please re-open if I overlooked something.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.