Open Bug 288462 Opened 15 years ago Updated 2 years ago

[Meta] Mozilla sometimes re-retrieves pages instead of re-using already-retrieved copy; violates RFC 2616 (Back, Send Page, etc.);

Categories

(Core :: DOM: Navigation, defect)

defect
Not set
normal

Tracking

()

People

(Reporter: dsb, Unassigned)

References

(Depends on 6 open bugs, Blocks 1 open bug)

Details

(Keywords: dataloss, meta)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041217
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041217

In some cases, Mozilla retrieves a new version of a page instead of using the 
previously-retrieved copy.  (For example, Send Page doesn't send what you see;
it retrieves a new, possibly different, copy and sends that.)

This behavior violates both the spirit and letter of IETF RFC 2616, Hypertext 
Transfer Protocol--HTTP/1.1.  It also violates user expectations and fails to 
meet user requirements.

Although this bug report partially duplicates several existing ones, I'm 
hoping that this report can serve as a meta-bug report, so the issue
can be dealt with consistently by all the relevant commands and features in
Mozilla.


RFC 2616, section 13 (http://256.com/gray/docs/rfc2616/13.html#13.13)
says:

  13.13 History Lists

  User agents often have history mechanisms, such as "Back" buttons and 
  history lists, which can be used to redisplay an entity retrieved earlier 
  in a session.

  History mechanisms and caches are different. In particular history 
  mechanisms SHOULD NOT try to show a semantically transparent view of the
  current state of a resource. Rather, a history mechanism is meant to show 
  exactly what the user saw at the time when the resource was retrieved.

  By default, an expiration time does not apply to history mechanisms. If 
  the entity is still in storage, a history mechanism SHOULD display it even 
  if the entity has expired, unless the user has specifically configured the 
  agent to refresh expired history documents.

  This is not to be construed to prohibit the history mechanism from telling 
  the user that a view might be stale.

  Note: if history list mechanisms unnecessarily prevent users from viewing 
  stale resources, this will tend to force service authors to avoid using HTTP 
  expiration controls and cache controls when they would otherwise like to.  
  Service authors may consider it important that users not be presented with 
  error messages or warning messages when they use navigation controls (such 
  as BACK) to view previously fetched resources. Even though sometimes such 
  resources ought not to cached, or ought to expire quickly, user interface 
  considerations may force service authors to resort to other means of 
  preventing caching (e.g. "once-only" URLs) in order not to suffer the 
  effects of improperly functioning history mechanisms. 

Mozilla seems to violate the letter and the spirit of that specification in
several places, for example:
- When trying to go back to a page triggers the dialog box that says that
  the page has expired and asking whether you want to re-post data,
  Mozilla only gives you the options to re-post data or cancel; it does not 
  also give you the option to simply re-display the already-retrieved data.
- The Send Page command does not send what you see; it retrieves a new version
  and sends that.
- If you get a page, get a second page, and then get the first page again, 
  then if you use Back, Back to try to get to the original version of the 
  first page, Mozilla displays the updated version instead of the original 
  version (what you saw two steps earlier).

Even ignoring what RFC 2616 says, Mozilla's behavior doesn't support user
needs and violates user expectations.


It seems that Mozilla has two, possibly systemic problems:
- Mozilla doesn't distinguish correctly between the page-data-history 
  mechansim (Back/Forward) and the HTTP caching mechanism.
- Commands that operate on the current page don't remember to operate on
  the already-retrieved copy.  Send Page definitely has this problem.
  I thought I saw similar behavior with View Source, but I can't reproduce
  it.


Reproducible: Always
Noting related bug: bug 86261
Keywords: dataloss
Noting somewhat-related bug: bug 44845.
Noting related bug: bug 84106.
Noting related bug: bug 136633.
Noting related bug: bug 126530.
Noting related bug: bug 200208. ***
Noting related bug: bug 214783.
Noting related bug: bug 38486.
Noting related bug: bug 274784. ***
Noting possibly-related bug: bug 267253.
Noting related bug: bug 246692.
Noting related bug: bug 254153.
-> networking, but presumably some API users are also buggy
Assignee: general → darin
Component: General → Networking: Cache
Product: Mozilla Application Suite → Core
QA Contact: general → networking.cache
Version: unspecified → 1.7 Branch
Noting related bug: bug 243534.
If you want this to be a tracking bug, use the "depends" and "blocks" fields...
Please don't just comment on all sorts of other bugs pointing to this one.
Noting related bug: bug 235142.
Noting related bug: bug 235142.
Noting related bug: bug 77411.
Noting related bug: bug 189472.
Noting related bug: bug 209292.
Noting related bug: bug 209290.
Noting related bug: bug 39539.
Noting related bug: bug 159387.
Noting related bug: bug 131033.
Noting related bug: bug 251231.
Noting related bug: bug 251231.
> If you want this to be a tracking bug, use the "depends" and "blocks" fields...
> Please don't just comment on all sorts of other bugs pointing to this one.

Okay.

If I add bugs to the depends-on list in this bug, will the other bugs'
pages display a pointer to this one?
> If you want this to be a tracking bug, use the "depends" and "blocks" fields...

Another question:

Should I use "blocks", to mean that fixing all the other bugs right depends
on first understanding the intended behavior and refining the architecture to 
support it?

Or should I use "depends on" to mean that the other bugs needs to be fixed
for this one to be fully satisfied?

> Please don't just comment on all sorts of other bugs pointing to this one.

Did you mean that I should not also point other bug reports to this one 
with comments when I add dependency data, or did you just that I should
not comment without also adding dependency data?

Thanks,
Daniel
(In reply to comment #28)
> If I add bugs to the depends-on list in this bug, will the other bugs'
> pages display a pointer to this one?

yes, in the "bug 123 blocks" field

(In reply to comment #29)
> Should I use "blocks", to mean that fixing all the other bugs right depends
> on first understanding the intended behavior and refining the architecture to 
> support it?

the architecture supports it

> Or should I use "depends on" to mean that the other bugs needs to be fixed
> for this one to be fully satisfied?

so, do this.

> Did you mean that I should not also point other bug reports to this one 
> with comments when I add dependency data, or did you just that I should
> not comment without also adding dependency data?

please don't add comments when adding dependency data. bugs with a lot comments
are not readable.
Note that it's impossible to NEVER refetch the data from the server on 'back'.
The caching mechanism that mozilla uses has for obvious reasons limitations on
the amount of data it will store. So if you go to a bunch of other pages in a
different tab or similar, the page that you go 'back' to is simply unavailable.

Also note that in many cases you don't want to use 'what you see', but rather
what was originally retrived from the server. For example when the page contains
javascript that modifies the page as it is loaded (using document.write or
similar) you'd end up performing that modification twice. This is a problem with
html and nothing we can really do anything about.

However there defenetly might be a bug here too. I.e. that we hit the wire even
though there's a perfectly good copy of the document hanging around in the cache.
Depends on: 44845, 86261
Depends on: 251231
Depends on: 84106
> Note that it's impossible to NEVER refetch the data from the server ... 
> The caching mechanism ...has... 
> limitations on the amount of data it will store. So if you go to a bunch of 
> other pages in a different tab or similar, the page that you go 'back' to
> is simply unavailable.

Of course; and that's fine.  Mozilla should probably tell you it can't 
re-display the original version without re-loading and give you the option 
to re-load the page.

> Also note that in many cases you don't want to use 'what you see', but 
> rather what was originally retri[e]ved from the server. 

Why wouldn't you want to go back to what you saw (or want to send or 
print what you see)?  


> For example when the 
> page contains javascript that modifies the page as it is loaded (using 
> document.write or similar) you'd end up performing that modification twice. 

You're making an invalid assumption somewhere.  

If Mozilla properly let you go back to see what you saw before, it
certainly wouldn't run the Javascript twice.  (On the initial load,
Mozilla would retrieve the response entity from server, interpret
it to some degree (e.g., execute scripts and maybe parse into DOM), 
save it, intrepret it the rest of the way (i.e., render HTML).  To
go back, Mozilla would retrieve the saved data and re-render it.)

It does sound like Mozilla should also save the original data, to 
support View Source.

> This is a problem with html and nothing we can really do anything about.

What exactly is the problem?
No longer depends on: 287358
Depends on: 115174
Depends on: 120809
I would like to add that whether this bug is completely fixed or not, Mozilla
should never, ever, ever, ever, under any circumstances, repost previously
submitted form data without warning the user and asking for confirmation (should
this be submitted as a separate bug?).  See bug 115174 for examples ranging from
quietly lost confirmation pages to potential financial loss resulting from the
use of "Save Page As, Web Page, HTML only".
this is not limited to windows xp, linux also exhibits identical behaviour.
please change OS to "ALL"
OS: Windows XP → All
Flags: blocking-aviary1.1?
Flags: blocking-aviary1.1? → blocking-aviary1.1-
No longer depends on: 285560
No longer depends on: 286365
No longer depends on: 251231
Depends on: 251231
Component: Networking: Cache → History: Session
Version: 1.7 Branch → Trunk
Depends on: 294775
Blocks: 327304
Further bugs related to this issue:

Bug 306916
Bug 307089
Bug 321291
Bug 340120

This bug is a MAJOR pain in the ass for web developers, making firefox effectively useless for app development - view source should give me the source of the page I'm looking at, without touching the server again!

This was fixed in the pre-1.0 days, it shouldn't have been broken again!
So basically what we want is for each page to have a unique id in order to restore it's own individual copy of the history, right? That way, we retain the current dom state and all form data as well. Therefore, revisiting the page only makes a new copy without affecting the old one. View source would also be fixed by passing the unique id for the window as an argument. As would the save as bug. I think this would be need to be a new layer on top of the basic cache mechanism.

So two identical pages loaded twice would look like this:

Page A(1)       Page A(2)
    |               |
cache (uid=1)   cache (uid=2)
       |          |
           |    |
             |
         remote page

My only concern is that storing a local copy of every instance of a page instead of just the url might drastically increase the cache size and cause pages to expire more quickly. Any thoughts?
To expand on my original idea a bit, the cache probably needs to have two layers since new pages should access the shared cache of the single uri, kind of a like a cache buffer.

Page 1(A)        Page 1(B)
    |                |
cache (uid=1)   cache (uid=2) 
        |          |
           |    |
              |
        uri-based cache
              |
         remote page

If a page doesn't have it's own cache copy or it's expired, it should get another copy from the uri-based cache (the same cache mechanism in use now).
-> reassign to default owner
Assignee: darin.moz → nobody
QA Contact: networking.cache → history.session
No longer blocks: 384222
Depends on: 384222
No longer blocks: 327304
Depends on: 327304
I think that Severity needs to be elevated.
(In reply to comment #39)
> I think that Severity needs to be elevated.
> 

I concur. This bug caused me to submit duplicate orders for a $3300 purchase. If the person verifying the orders hadn't caught it, it would've possibly cost me money and at the very least been a pain to resolve.

Firefox should *never* *ever* resubmit form information without warning the user.
Component: History: Session → Document Navigation
QA Contact: history.session → docshell
This is of critical importance for regular users and developers of dynamic web content -- and now that I have read this report, clearly a spec violation as well.  Importance should be raised to Critical.

As for people talking about using more resources to save all these versioned pages:

Simply provide the user with easily accessible preferences that include a minimum number of "most recent" objects to keep in memory, and a maximum size of memory and disk cache to use.  For example 100 objects (pages/images/etc.) or 20MB, whichever comes later.  It might be good to have a separate number for pages and objects, so that even if a page needs to reload an image, at least the HTML would remain as it was when it was originally viewed.

No matter what these numbers in the preferences are, the currently open tabs should under no circumstance be reloaded from the network for printing, saving, viewing source, sending, etc.  These pages should always be pinned in the cache.
Notice these two bugs were considered worthy of VERIFIED FIXED status. I don't understand why since the issue clearly still exists.
Bug 40867
Bug 55583
Hardware: PC → All
Duplicate of this bug: 472895
This bug is meta bug for tracking purpose. Adding "[Meta] in bug summary, to avoid confusion.
No longer depends on: 472895
Summary: Mozilla sometimes re-retrieves pages instead of re-using already-retrieved copy; violates RFC 2616 (Back, Send Page, etc.); → [Meta] Mozilla sometimes re-retrieves pages instead of re-using already-retrieved copy; violates RFC 2616 (Back, Send Page, etc.);
Reinstating depend apparently lost in edit conflict.
Depends on: 472895
No longer depends on: 472895
This should depend on bug 569142; sorry if I'm blind, but I don't see where I can add that dependency myself.
Depends on: 569142
I can confirm this when adding reviews on https://addons.mozilla.org/
Still broken in 4.0.1.
Depends on: 567365, 261312
Noting related Bug 666076 - Inform user when performing automatic reload on back/forward (which applies only to cases where the already-retrieved copy has been discarded)
> Inform user when performing automatic reload on back/forward (which applies only to cases where the already-retrieved copy has been discarded)

Yes, and conversely, if the retrieved copy has _not_ yet been discarded, then just show it...

... which is apparently not the case now (see testcase at http://www.alain.knaff.lu/bug-reports/mozillaNoStore/no-store.cgi )

Clicking continue and going back shows you a new timestamp each time.

However, if before going back, you tick File->"Work Offline", you see the old timestamp -> which proves that the copy has not actually been discarded...

Interestingly enough, even for the "POST" variant of this bug, you can click "cancel" on the postdata box, tick "Work Offline", and go back again: this time you see the page as it was when first visited, without that annoying dialog!
I suppose it would be good to inform the user when discarding the previous visit from session history on back/forward, but if it's not going to get used why is it stored?  Would it be an easier change to inform the user than to use the session history?
(In reply to comment #52)
> Would it be an easier change to inform the user than to use the session history?

Actually, just using the session history would be easier, it would literally be a one-line fix, as shown in bug 567365 comment #1.

Indeed, in most cases the page actually stays in the cache history, but is ignored at the time when it should be shown.

 2101        else if (mLoadFlags & VALIDATE_NEVER) {
 2102            LOG(("VALIDATE_NEVER set\n"));
 2103            // if no-store or if no-cache and ssl, validate cached response (see
 2104            // bug 112564 for an explanation of this logic)
 2105            if (mCachedResponseHead->NoStore() ||
 2106               (mCachedResponseHead->NoCache() && mConnectionInfo->UsingSSL())) {
 2107                LOG(("Validating based on (no-store || (no-cache && ssl)) logic\n"));
 2108                doValidation = PR_TRUE;
 2109            }

Just changing the condition in lines 2105/2106 fixes the issue (tried in back in the Firefox 3 days...), but for some reason, Firefox developers don't want to hear about it.

According to bug 101832 comment #18 and bug 112564, the _real_ explanation (which you won't hear today...) for this strange history-trashing behavior is that Firefox got shoved around by some online banks back in 2001... And look at the other comments to both bug: even back then, nobody (except Mr Fisher himself) really wanted this...

Firefox should remember that by now it has grown teeth, bite the "bullet", change these 2 lines (or remove the entire paragraph!) and become RFC 2616/13.13 compliant again.

Fortunately, the silver lining is that even Darin Fisher seems to be ok with a preference setting that would allow users to disable this annoying behavior. Now, we're almost 10 years later, and we still haven't got that setting! Please, pretty please...
Blocks: 718249
You need to log in before you can comment on or make changes to this bug.