User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1
Steps to reproduce:
A page that sets a URL that actually exists via pushState, after the user leaves the page and later comes back via the Back button of the browser, on some occasions, the browser will show an older copy in the cache and not the one the user just saw.
Firefox showed an older copy from cache.
This behavior happened on multiple copies of Firefox that supports pushState, 4 and 5.
It is observable on the site www.sapo.pt presently.
Steps to replicate:
1) open page
2) click on the several tabs "Desporto", "Tecnologia" and have the URL change
3) click on any headline link (to an outer page)
4) on the other page, click back
5) you see an older copy of the page (in cache of the real url http://www.sapo.pt/desporto) instead of the same html you just saw.
The user should come back to the page / state that he/she saw. At least, show the same HTML content and fire the DOMready events and so on.
Correct use of pushState means that if the URI that was pushed is loaded directly it will show the same content as is shown after the pushState call.
If they're not the same, then the situation is no different from web pages that return different content for the same URI under some circumstances without telling the UA... in which case sometimes the user will see the "wrong" thing.
Sounds to me like www.sapo.pt is just buggy here.
but are you saying that "returning to the page via a back button" is the same as "loading the url from scratch"? The page won't get cached if loaded directly, why is this scenario different?
This is weird, because this behavior only started happening when we moved from fragment hashes to pushState.
I'll have a look at the cache headers & documentation at w3c/what-wg. Btw, I'm not setting an object to represent the history state because I don't need it. But the problem is not that, it's the HTML that is completely different (several days outdated).
If we're doing something wrong, then I apologize for bringing it here. We did have a look before submitting and it only showed up on firefox and after pushState...
When you click back in step (4), we're re-fetching the page. Normally we try to keep these documents alive in what we call the "back/forward cache" (bfcache). If a document is in bfcache, then when you go back to it, it won't be re-fetched from the network -- it'll just be there, as though you never went away.
There are a variety of ways sites can prevent us from caching their page in the bfcache. Registering an onunload handler is one common way.
So the problem is that we can't bfcache this page, for whatever reason. When you go back, we re-fetch from the network. But we re-fetch from the document's current address -- that is, the address after pushstate. It sounds like your cache headers aren't right, so when we fetch, we use a cached version of the page!
I guess the unexpected behavior here is that when you load X.html, then call pushState and change a document's URI to Y.html, that doesn't set the cache entry for Y.html to the data we retrieved for X.html. But I don't think we want to do that.
> but are you saying that "returning to the page via a back button" is the same
> as "loading the url from scratch"?
In some cases it can be, yes.
> The page won't get cached if loaded directly
It sure can be; that's why we have a cache.
For history navigation we force a load from cache no matter what, even if the cached content is stale. Which is what causes problems here: the content we have cached for Y.html in Justin's example in comment 3 is not the same as a user would get if they loaded Y.html now, presumably.
Justin, do you think it makes sense to not do the "force from cache" thing in the "loading a pushed state from network" case? It really doesn't make sense there, because the cache may well not be primed with the right data (unlike normal history navigations).
Thanks @Justin, that made sense. The most commonly desired behavior would be to use the copy in the bfcache and not the network. That's the best UX in any case (returning to the same content he just "saw").
Now, I'm not aware of what specs say in this cases (and whether they should, i can bring it up on what-wg/w3)... but like Boris say, if you change the URL via pushState you want that URL to be that content and not have it fetched from the regular cache is what makes most sense.
Indeed, the direct access can be cached by the user agent, I'm not really sure if we want to avoid it by sending the appropriate caches, but that is not to say that we'd prefer the contents of the network cache over the pushState cache.
I'll read up tonight on this matter on the spec documents to see if they mention anything about caching. A coherent behavior is always better than an optimal but diverse from UA to UA. ;)
> For history navigation we force a load from cache no matter what, even if the
> cached content is stale.
Ah, I didn't realize this.
> Justin, do you think it makes sense to not do the "force from cache" thing in
> the "loading a pushed state from network" case? It really doesn't make sense
> there, because the cache may well not be primed with the right data (unlike
> normal history navigations).
Yes, I agree! If we pushState from X to Y, we could just nuke the cache for Y.
I don't see a need to nuke the cache; just don't set VALIDATE_NEVER in the history case in DoChannelLoad if we're loading something that was pushstated. How to get that info down there, now....
I guess nuking the cache is unnecessary. I was concerned what would happen if they open in a new tab a link to the current URL -- we could fetch from cache, which wouldn't match what the user is currently seeing. But if the page has screwed up its cache headers, that's a separate issue.
> How to get that info down there, now....
The SHEntry has mLoadType. That might indicate what we want.
Ugh, no the load type apparently isn't what we want.
// If the user did a shift-reload on this frameset page,
// we don't want to load the subframes from history.
if (loadType == nsIDocShellLoadInfo::loadReloadBypassCache ||
loadType == nsIDocShellLoadInfo::loadReloadBypassProxy ||
loadType == nsIDocShellLoadInfo::loadReloadBypassProxyAndCache ||
loadType == nsIDocShellLoadInfo::loadRefresh)
I guess we'll have to add another field to the SHEntry.
I'm writing a test for this, but I'm having difficulty figuring out what to use for the cache-control header.
The page we retrieve needs to be cachable, because if we never insert it into the cache, we can't possibly retrieve it when we force-load from the cache. But we also want the cache entry to be invalid when we go <back> and load the page again -- if the cache entry were valid, then it would be correct to display it. The bug is that we display expired cache entries.
If I make the document expire sometime in the past, we won't insert it into the cache at all, right? But if I make it expire sometime in the future, then if the OS hiccups for long enough, that date will become the past, and the test will fail.
I tried no-cache (which unfortunately means "always revalidate", not "don't cache") and that doesn't seem to work -- it seems that we respect this header and revalidate the page when we load forced from cache.
Created attachment 544385 [details] [diff] [review]
Test (passes on trunk, see comment 10)
> If I make the document expire sometime in the past, we won't insert it into the
> cache at all, right?
No, we should insert that into the cache.....
You could try that, or you could try "Cache-control: max-age=0"
Neither an expires header from the past nor max-age=0 seems to work. The test I wrote passes with both those headers.
Here's exactly what's being sent:
> Date:Thu, 07 Jul 2011 17:30:45 GMT
> Expires:01 Jan 2000 00:00:00 GMT
(Doesn't work when I send both the cache-control and the expires header or when I send just one.)
I also tried
> Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
(which I stole from Wikipedia) -- same results.
Created attachment 544544 [details]
This is a log (NSPR_LOG_MODULES=nsHttp:5) of the following:
* Clear cache.
* Visit http://sapo.pt.
* Click desporto.
* Refresh (so sapo.pt/noticias/desporto is in the cache).
* Visit http://sapo.pt.
* Click desporto.
* Click first headline (Rui Machado defronta Federer)
* Click back.
Created attachment 544550 [details]
HTTP log from running test
This is with the test sending Cache-Control: max-age=0.
Created attachment 544570 [details] [diff] [review]
Test (fails, as expected, on trunk)
bz helped me figure out what was wrong with this test -- we weren't initially priming the cache, so of course we weren't getting cache hits where expected!
Created attachment 544613 [details] [diff] [review]
Comment on attachment 544613 [details] [diff] [review]
The replaceState comment doesn't match the code, as discussed. Fix that, please.
r=me with that.
Pushed to mozilla-inbound: http://hg.mozilla.org/integration/mozilla-inbound/rev/470cfda4dd83
André, someone should come by and mark this bug as fixed and pushed to mozilla-central within a day or so, assuming this change doesn't break anything. One day after that, you'll be able to download a nightly build of Firefox  which contains this change. Please let us know if it fixes your problem.
Thanks for filing this bug!
Actually, you can just try a build with this fix tomorrow morning at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-inbound/