Closed Bug 840299 Opened 11 years ago Closed 2 years ago

Force use of BFCache for pages that normally do not qualify

Categories

(Core :: DOM: Navigation, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: jeffp, Unassigned)

References

Details

Facebook uses cache-control: no-cache and cache-control: no-store over HTTPS connections to prevent ISPs from MITM-caching requests and so potentially showing a page intended for one user to another.  An undesirable side-effect of this is that Firefox will, consequently, not place Facebook pages in the bfcache, which could improve efficiency (by avoiding unnecessary requests) and user experience (by a user being able to return to a page instantly rather than waiting for it to reload).  Other websites also have trouble making it into bfcache (e.g. Google Maps) although possibly for different reasons.

A direct solution to this problem would be to either add a new header along the lines of "cache-control: bfcacheable", or alternatively, a Javascript call such as "window.bfcacheable = true", which would inform Firefox that the page can be safely bfcached regardless of whether the page otherwise satisfies the normal rules for automatic inclusion.
See Also: → 738599
Component: Untriaged → Document Navigation
Product: Firefox → Core
Justin, did you have a hand in writing bfcache? What do you think of the proposal? (Also, can you CC anyone else you think is relevant?)
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Dan Witte (:dwitte) (not reading bugmail, email to contact) from comment #1)
> Justin, did you have a hand in writing bfcache?

no; smaug, bz, and sicking know much more about it than I.

> What do you think of the proposal?

This sounds like a pretty scary idea to me.  You can assert that your page will function properly if placed into Firefox's bfcache as it is designed today, but you can't assert that your page will function correctly in another browser's bfcache, or that your page can handle any additional requirements we place on pages before they may enter the bfcache.

I wonder if bfcache really your biggest problem here.  Sending these headers also will cause browsers not to cache anything, right?
The only potentially tricky case I'm aware of is if there are open XHRs; these could just be aborted or otherwise terminated in some documented way and it's up to the site to make sure that it handles this (either just before the page is hidden or when it is shown again).

If a page makes it into bfcache and gets restored, the browser's network-level cache shouldn't (AFAIK) matter, but in general the real cost here isn't the time spent getting the HTML (our other, static assets don't use cache-busting) it's the time spent dynamically pulling and rendering data once that HTML is loaded (which could be avoided by use of the bfcache).
Doesn't sound too bad idea. It would be left to page to do the right thing if pageshow event
is dispatched later. But need to design carefully in which cases bfcaching would be possible.

http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsDocument.cpp?rev=966fddc4ff1f#7311
has some current limitations
Running IndexedDB transactions will also be aborted when the user leaves a page.

And it's not just running XHR requests that will be aborted. All running network activity will be aborted, including any image loads, iframe loads, CSS loads, WebSocket connections etc.

Something that's a big problem is that the error/abort events fired in response to all these aborts won't fire because we suppress all event firing once the user has left the page. We'd likely have to stall rather than suppress those events somehow.


But if we keep supressing bfcache in all those instances, and only ignore the bfcache rejection due to cache-control headers, then I think we'd be fine.


In general, if we add features for allowing a page to opt in to bfcachability we should probably not let the page say "i can be bfcached, trust me", but rather "don't kick me out of bfcache for reasons X and Y, I can handle failures in those features".
I like the idea of more granular control.  And it looks (to my unfamiliar eye) that suppressing rejection due to cache-control would be straightforward, and definitely a big improvement.  Hopefully it would also be relatively easy to just ignore the beforeunload/unload events (to eliminate the need to special-case with older browsers that don't support bfcache-forcing).

Being able to stall (or raise immediately, before stashing in bfcache) the network error/abort events would also be great if it could be done, since IIRC even if you listen to PageHide and abort them yourself open network connections still prevent bfcaching (which could be seen as a bug in itself).
(In reply to Jeff Pasternack from comment #6)
> Hopefully it would also
> be relatively easy to just ignore the beforeunload/unload events (to
> eliminate the need to special-case with older browsers that don't support
> bfcache-forcing).

I don't understand what you mean here.

> Being able to stall (or raise immediately, before stashing in bfcache) the
> network error/abort events would also be great if it could be done, since
> IIRC even if you listen to PageHide and abort them yourself open network
> connections still prevent bfcaching (which could be seen as a bug in itself).

Indeed. This sounds like a good idea to allow pages to handle network requests. Definitely worth a separate bug.
(In reply to Jonas Sicking (:sicking) from comment #7)
> (In reply to Jeff Pasternack from comment #6)
> > Hopefully it would also
> > be relatively easy to just ignore the beforeunload/unload events (to
> > eliminate the need to special-case with older browsers that don't support
> > bfcache-forcing).
> 
> I don't understand what you mean here.

Right now these events stop a page from being bfcached since they might never get run if a page goes into bfcache and never comes out again.  The alternative is to use PageHide, which much older browsers don't support; it'd be convenient to be able to say "sure, I've got beforeunload/unload events, but if they ultimately don't run, meh".  Not a big deal though because we can still address this from our side by feature detection.

> > Being able to stall (or raise immediately, before stashing in bfcache) the
> > network error/abort events would also be great if it could be done, since
> > IIRC even if you listen to PageHide and abort them yourself open network
> > connections still prevent bfcaching (which could be seen as a bug in itself).
> 
> Indeed. This sounds like a good idea to allow pages to handle network
> requests. Definitely worth a separate bug.

I encountered this about 6 months ago, so I'm not certain it's still present.  But I'll try to reproduce and then file if it is.
> and so potentially showing a page intended for one user to another.

The proper Vary headers really don't cover that?

Past that, the only reason we don't bfcache no-cache/no-store stuff is so that someone can't walk up to a computer after you log out of your bank and go back in history to your account data.  So I would have no problem adding an opt-in for a page to say it should be bfcachable even if it has no-cache or no-store, if it's not actually sensitive in any way.  But again, I think using no-store for non-sensitive pages is abusing the protocol; it seems pretty odd to me that intermediate caches would honor that but not the right Vary headers...
(In reply to Boris Zbarsky (:bz) from comment #9)
> > and so potentially showing a page intended for one user to another.
> 
> The proper Vary headers really don't cover that?
> 
> Past that, the only reason we don't bfcache no-cache/no-store stuff is so
> that someone can't walk up to a computer after you log out of your bank and
> go back in history to your account data.  So I would have no problem adding
> an opt-in for a page to say it should be bfcachable even if it has no-cache
> or no-store, if it's not actually sensitive in any way.  But again, I think
> using no-store for non-sensitive pages is abusing the protocol; it seems
> pretty odd to me that intermediate caches would honor that but not the right
> Vary headers...

Well, we have to act defensively; if we were to set a Vary header without cache-control cache-busting headers and a poorly-coded or very old proxy cached the request when it shouldn't, we'd be in trouble.

WRT security, in our case we have a secure session but people rarely log out, so loading out of bfcache is no more risky than letting them go back and reload the page.  However, in the cases where the user does log out, we would indeed need a way to tell Firefox "please invalidate everything from our domain currently in the bfcache".
> and a poorly-coded or very old proxy cached the request when it shouldn't,

A poorly-coded proxy can also ignore cache-control headers....  At some point you have to assume some level of non-broken.

> we would indeed need a way to tell Firefox "please invalidate everything from our domain
> currently in the bfcache".

Right.  There is no facility for that right now; bfcache is meant to be completely transparent to pages as much as possible...
(In reply to Boris Zbarsky (:bz) from comment #11)
> A poorly-coded proxy can also ignore cache-control headers....  At some
> point you have to assume some level of non-broken.

We err on the side of caution and basically set every header possible to prevent caching.  Risking user privacy on the hope that intermediaries will behave themselves is just something we can't do, unfortunately.
 
> Right.  There is no facility for that right now; bfcache is meant to be
> completely transparent to pages as much as possible...

Yes; transparency is great because you can take a conservative approach to the bfcache and avoid incorrect behavior while still getting some benefit.  But bfcache can make a huge impact on the user experience with sites that rely heavily on dynamic content/state, so I definitely think it makes sense to have an explicit mechanism for those who wish to use it.
> Risking user privacy on the hope that intermediaries will behave themselves is just 
> something we can't do, unfortunately.

Okay, but we should recognize that you already rely on intermediaries to behave themselves.  They're acting as intermediaries on a secure connection, so they can do whatever they want.  So these extra headers may or may not reduce the risk to privacy posed by these proxies, but they by no means eliminate it.

I don't mean to sound snarky, because of course we take both privacy and speed very seriously here, but the value proposition of us adding a new feature to the Web explicitly to work around misbehaving SSL MITMs which may or may not even exist is not great IMO.
> Okay, but we should recognize that you already rely on intermediaries to
> behave themselves.  They're acting as intermediaries on a secure connection,
> so they can do whatever they want.  So these extra headers may or may not
> reduce the risk to privacy posed by these proxies, but they by no means
> eliminate it.

Sure; one can never completely eliminate the risk here, but we need to minimize it.

> I don't mean to sound snarky, because of course we take both privacy and
> speed very seriously here, but the value proposition of us adding a new
> feature to the Web explicitly to work around misbehaving SSL MITMs which may
> or may not even exist is not great IMO.

This is one of the reasons explicit bfcache control is needed, but certainly not the only reason; we also don't want the browser (or anyone else) storing someone's page long-term in some accessible place (e.g. a cache folder) so even in the absence of SSL MITMs we still couldn't use the bfcache, and, as mentioned, we need a way to invalidate bfcache'd pages when a user logs out.  This enhancement would be very beneficial to our users (~1B people and counting) in addition to Google and anyone else whose pages contain dynamic, personalized content; I'd certainly hope this would be enough upside to make it worthwhile!
(In reply to Jeff Pasternack from comment #14)
> and, as
> mentioned, we need a way to invalidate bfcache'd pages when a user logs out.
> This enhancement would be very beneficial to our users (~1B people and
> counting) in addition to Google and anyone else whose pages contain dynamic,
> personalized content; I'd certainly hope this would be enough upside to make
> it worthwhile!

In my opinion, the problem is there's no standard way of "log in" and "log out". Then a browser can't tell its user is using the web service or not...

... and it's not true. Most browsers are able to tell whether a website is HTTP-authenticated or not. The actual problem is, few, if any, websites use HTTP auth, mainly because of browsers' poor UI. Firefox does not support HTTP auth's "log out" explicitly. Not only major web service providers like Facebook and Google but also bugzilla.mozilla.org ignores HTTP auth.

I'm a little curious: is there any chance that Facebook will shift from session-ID-driven login to HTTP-auth-driven login, if those UIs of IE/Fx/Safari/Chrome etc. (and HTTP spec if necessary) are *unbelievably* improved? In other words, will it be possible for a user to control "log in/out" status not through content-area UI but through chrome-area UI?

If a user notifies the browser on logging out, the browser has no difficulty in deleting private information including bfcache, image/video/audio cache and form-filled content altogether. This would be speedy and secure enough, isn't it?
Authentication tends to be deeply embedded in major sites, and while I agree improvements to HTTP auth are definitely warranted, it's not a feasible solution in the foreseeable future if only for reasons of practicality and the need for extremely widespread availability.

Some kind of generic "begin private session" and "end private session" mechanism would be great (assuming everything cached was held securely in RAM and it would cause the browser to ignore cache-control headers as appropriate, etc.) but this would also be a more difficult task than bfcache control alone (which is the primary concern).
> Authentication tends to be deeply embedded in major sites

I agree. Modern technology such as OAuth2 (Graph API) would accelerate the tendency, if someone inside won't try to stop it.

> but this would also be a more difficult task than bfcache
> control alone (which is the primary concern).

Really? Hmm. If I remember correctly, the HTTP response header to prevent ISPs / proxy servers from creating cache data is "cache-control:private". With "cache-control:private", I'm not to sure the difference between normal cache and bfcache. If bfcache is such a great thing, I believe we can/should handle normal cache in the exactly same way.
The cache-control headers will prevent either from being used; that's basically the problem.  We need those headers to stop intermediaries from caching the data, but we also want to instruct the browser itself to use its bfcache regardless (and clear it when the user logs out).  BFCache stores the state of a page while the "normal" cache stores assets pulled off the network (it would be nice to have but still much less useful than bfcache control itself).

If there's still interest in this it should be standardized in WHATWG's HTML standard first.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.