Closed Bug 233293 Opened 22 years ago Closed 10 years ago

New Cache Comparison Option

Categories

(Core :: Networking: Cache, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: david, Unassigned)

Details

User-Agent: Build Identifier: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.6) Gecko/20040113 Provide a new option for comparing the page in the cache with the page in the network. This would be "When cache page is more than XXX minutes old", where the user can set the number of minutes. For dynamic pages (e.g., those with server-side includes), the option "Every time I view the page" will always result in fetching a new page from the Internet, as if the "Never" option were selected. The other two existing options will fail to fetch a page with actual updates (unless the expiration date has passed for the "When the page is out of date" option, which means no fetch of an unexpected revision prior to expiration). This new option would improve the performance of Web browsing involving navagating back and forth among pages in a Web site by allowing the user to ignore minor changes in a page (e.g., caused by dynamic pages) for a set number of minutes. Reproducible: Always Steps to Reproduce:
Assignee: general → darin
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking: Cache
Ever confirmed: true
QA Contact: general → cacheqa
dupe of bug 80731 ?
Reference comment #1: This is not a duplicate of bug #80731 although both might possibly be implemented at the same time. For #80731, a cache file would expire after a user-set time. In that case, the file might as well be deleted (automatically by Mozilla). For bug #233293, the cache file would not expire. Instead, it would be used without any checking of the page on the Internet until the file is older than some user-set time. After that time, the cache file would still be used (not expired) unless a check of the page on the Internet shows a change in that page. This is similar to setting the "Never" option until the user-set time elapses, after which the "Every time I view the page" would apply. However, it requires selecting one or the other of those two options for each cache file independently, according to the age of the file. Each of the two indicated enhancements should stand on its own merits because neither overrides or duplicates the other.
Can you give some example headers of a page that includes server side includes, I'm not sure why i behaves the way you suggest.
Reference comment #3: My <http://www.rossde.com/retired.html> uses a SSI to report the date the page was last updated. My <http://www.rossde.com/editorials/edtl_downey_bush.html> uses a SSI to compute and report the number of days remaining until some future date. In both cases, do the following on a system with a dial-up Internet connection: 1. Clear your cache. 2. Go to the indicated page. 3. Use the back button to return to a prior page. 4. Go to the indicated page by again selecting the link. 5. Use the back button to return to a prior page. 6. Go to the indicated page by using the Forward button. Steps 2 and 4 take approximately the same amount of time for the page to load, during which modem activity is also approximately the same. Step 6, which forces the use of the cache is much quicker and no modem activity is seen. Remember, a SSI that reports a result causes a Web page to be dynamically generated, at least in part. In the two cited pages, the SSIs report results that, to the user, are indistinguishable from fixed-coded HTML. Indeed, if you first load either of those two pages and then view the Page Source, you cannot see where the SSI appears; you can only see the results. Each time this happens, the page then has a new time-stamp causing the cached version to be treated as obsolete.
So, you want an entirely different mode, how would it look in prefs? like the cookie pref for limited lifetime?
Reference comment #5: This would require a fifth radio button as an alterantive to the existing four. The label might be something like "When the cached page is older than |__| min", where the |__| is an input box. This might require rearranging the radio buttons into a single column rather than the two columns that exist now. And yes, I do think a time-limit of minutes is appropriate. I wrote this bug while thinking in terms of navigating within a Web site. If I return to the same site 30-45 minutes later, I might indeed want to perform the comparison rather than rely entirely on the cached file. Of course, the input box should be large enough to allow someone to enter three digits in case hours are wanted. (Anyone who wants a time-limit of days might instead choose "Never" and then manually clear the cache once a week.) As for bug #80731, that should be a new checkbox independent of the radio buttons. The label for that might be something like "Expire and remove cached pages older than |__| days". Here, I indicate "days" because that is what the author of that bug indicated.
Reference my comment #4: The page at the first URL now includes both the date the page was last updated and a computation of the number of weeks until a future date. The second URL (for days until a future date) is for a page that no longer exists. I have a number of other pages at my Web site that include SSI scripts for counters, reports of when the page was last updated, and computations of days until some future dates or since some past dates. Rather than adding a new comment here every time one of those changes, please send me an E-mail requesting the URLs whenever a demonstration of this problem or a test of a fix is needed.
-> default owner
Assignee: darin → nobody
How would this interact with HTTP caching? I.e., would it override the freshness information in responses, or would it be limited by that? One way to spin this would be to give a preference that exposes Cache-Control: max-stale as a preference, allowing the user to say "I'm willing to accept cached things that are at most n seconds stale." I think it's likely that even that would need to be a page-by-page preference, rather than browser-wide; fooling with this sort of thing on a Web-wide basis will break many sites.
Here is how I see the proposed functionality, with bug #80731 also implemented. For this bug, there would be five radio buttons where there are now four. The fifth button would be "After cache version is more than |_| minutes old", where the |_| indicates a user-input area for a number. If the user selects this new radio button without inputting a number of minutes, the default of 30 (minutes) would be provided. When a page is cached, the date-time of that action is recorded in the cache (new feature if this is not already done). This is required for both bug enhancements and is in addition to any expiration date in the HTTP header. If any of the existing radio buttons is selected, the functionality would be as already exists. If the new radio button is selected: 1. When a page is requested and the present time is more minutes than the user-specified interval from the time that page was cached, the functionality would be as if the "Every time I view the page" radio button were selected. 2. When a page is requested and the present time is fewer minutes than the user-specified interval from the time that page was cached, the functionality would be as if the "Never" radio button were selected. (I will leave to whoever implements this how to resolve (case 1 or 2) when the present time exactly equals the user-specified interval from the time the page was cached. Note the reliance on existing functionality in the two cases.) For #80731, there would be a new check-box "Expire cached pages after |_| days", where the |_| again indicates a user-input area for a number. If the user selects this new check-box without inputting a number of days, the default of 30 (days) would be provided. If the check-box is checked, any cached page that was cached more than the user-indicated number of days ago would be deleted from the cache. This would be checked both when the browser is started and (for users who leave their browsers running a long time) each time the cache is checked for a requested page. I see both of these integrating well with the existing caching capabilities without any conflict. These would "override the freshness information in responses" only to the extent already done by the current radio buttons (this bug) and to the extent that users sometimes (often?) clear their caches (#80731). However, in the latter case, this would be similar to a selective clearing of the cache, which is not a current capability.
I think that maps roughly to a system-wide Cache-Control: max-stale setting. I don't have any objection to this, but personally I'd rather see focus on assuring that the cache is implemented correctly, according to the RFC. Users overriding / directing the cache behaviour should be a last-ditch mechanism; this stuff should work well automatically.
Which maps to the max-stale setting: my request in this bug report or bug #80731. Or both? I suggested that both bugs be implemented together because (1) they both involve the management of the cache and (2) they both would require changes to the [Preference > Advanced > Cache] window. Beyond that, there are real differences between the two bugs. The problem I'm trying to address with this bug is the impact on caching caused by Web pages generated dynamically at the time they are requested. At least with pages containing content generated by server-side include scripts, I know for sure that the Web servers apply time stamps reflecting when the scripts inserted the content. This invalidates any cached pages because the cached pages will always have earlier time stamps even if there are no real differences between the fetched and cached pages. The problem addressed in #80731 is a desire to delete cached pages merely because the cached copy is old even if the pages' expiration dates have not yet been reached. This deletion should occur without deleting more recently cached pages (unless of course the expiration date has been reached for a more recently cached page). (I hope I interpreted #80731 correctly; I am not the author of that one.) I think I might be satisfied with an alternative solution to these problems providing the existing functionality indicated on the [Preference > Advanced > Cache] window is retained. I have added the author of #80731 to the CC list of this bug report so that he may participate in this discussion, which is not being reflected in his bug report.
I was speaking to your previous summary, except now I notice that you have additional requirements; you want the cache to somehow know when the content includes Server-Side includes. There really isn't any way for it to do this, unless some sort of HTTP extension is defined. I think that what you want can be done by properly using the caching directives already in HTTP (e.g., no-cache, max-age, etc.). With that in mind, I agree with others that this is a dup of bug 80731; the browser doesn't know the difference between "dynamic" and other content; it just caches according to the directives given to it.
I guess I was not clear. I know you can't detect the effect of SSIs in a file sent by a Web server. But I want to compensate for the fact that SSIs cause the server to time-stamp the file with the time when the SSIs execute, not with the time the file was last modified or installed on the server. For example, the file for my <http://www.rossde.com/malaprops/index.html> has the time-stamp of 1 May 2006 while residing on my ISP's Web server. However, if you view that Web page and select [View > Page Info] from the menu bar, you will see the Modified date is equal to when the page loaded and was displayed. That is because I have a SSI "hit" counter embedded in the page. To compensate for this, this enhancement bug report requests that the user have an option to ignore the date provided in the Head information from the server when evaluating whether to use the cached file, ignoring it for at least some user-specified number of minutes after the file was cached. If my indicated default of 30 minutes is used, that means the user will not see a change in the page caused by page counter or other SSI -- OR ANY OTHER CHANGE -- that occurred in those 30 minutes. There are now many Web pages that are dynamically generated at the time they are requested. They too defeat the purpose of caching. Without any attempt to determine what is happening -- without detecting SSIs or any other dynamic generation of content or whole pages -- this change would enable the use of the cache when someone views a page and then decides he or she wants to view it again after a short interval. (For example, you view it. You follow a link elsewhere and yet another link even further. Then you want to return to the page with dynamic content.) Thus, I am attempting to address the problem of how SSIs and other dynamic content impact caching without the impossible attempt to detect them.
With the growing proliferation of Web pages generated "on the fly" -- pages that have LAST_MODIFIED (at least on an Apache server) always equal to "right now" -- this RFE has become more important.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Every time a bug report is marked Resolved/WontFix, a comment should be added to indicate why it will not be fixed.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
fine grained, and little used, options like this add complexity that isn't in the interest of the product. This is the kind of thing that addons can actually do more successfully.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.