Open Bug 1318234 Opened 3 years ago Updated 8 months ago

view-source: uses the cached version of a page, not respecting the caching headers from GMail or other sites

Categories

(Core :: Networking, defect, P3, critical)

49 Branch
defect

Tracking

()

ASSIGNED
Tracking Status
platform-rel --- -

People

(Reporter: sebg.1969, Assigned: mayhemer)

References

Details

(Keywords: sec-low, Whiteboard: [necko-active][platform-rel-Google][platform-rel-Gmail])

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
Build ID: 20161019084923

Steps to reproduce:

-Login to Gmail
-Logout of Gmail
-View the Gmail source, and you can see all mails in the last <script>-part
 (view-source:https://mail.google.com/mail/u/0)


Actual results:

You can see the Mails of the last out-logged person.


Expected results:

It must respecting the caching headers that GMail is sending.
(It have to use the code of the actual website, and not a cached version of a page.)
Severity: normal → critical
Component: Untriaged → Security
Not confirmed yet, but moving to an appropriate component. The mail page is sent with Cache-Control "no-cache, no-store, max-age=0, must-revalidate"

Need to compare to chrome's behavior. In the steps above the user never leaves or closes the page so I don't see where cache comes into it -- if the data is still in the page then Google has left it there.
Group: firefox-core-security
Component: Security → Networking
Product: Firefox → Core
Unhiding because this is not remotely attackable
Maby i should say, that i have it report to Google too, but they said, it i a bug of Firefox...
Firefox do not respecting the caching headers, that Gmail sending.
Cache -> me
Assignee: nobody → honzab.moz
Whiteboard: [necko-active]
The view-source requests is using LOAD_FROM_CACHE which tells the channel to load the cached content regardless no-cache: "NOT validating based on LOAD_FROM_CACHE load flag" [1].

The whole cache header in response to https://mail.google.com/mail/u/0 in my case is:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate

no-cache: tells the local cache to validate before presenting to user under normal circumstances
no-store: don't use non-volatile storage
max-age: the entry expires immediately (doesn't mean it cannot be reused, tho!)
must-revalidate: tells intermediate caches to validate the resource with the server before the cached version is used

So, a "solution" here may be to let view-source use VALIDATE_NEVER instead of LOAD_FROM_CACHE.  That flag makes us revalidate the resource when there is no-store in the cache directive.  

But I don't see it as an appropriate fix at all.  Here the problem is that the data has a privacy aspect and is based on the logged in session, which means presenting the right cookie.  

There should actually be "Vary: cookie" response header set by the gmail web server.  that would override the LOAD_FROM_CACHE header.

Also, the cached entry can be found in about:cache in the list of memory entries, so even adding the proper Vary header would not hide the content after logging out.




So, in the end this is about how we treat no-store.  For ages we keep no-store entries in the memory cache, never store them to disk.

My understanding of the specification here is to no use a non-volatile (which means persistent) storage for such entries.  RAM is volatile and the content goes away when the browser instance goes away or when the cache entry is evicted using normal eviction mechanism (mostly memory cache is hitting its limit).


:dveditz, what do you think?  Should we change the policy how we keep no-store marked entries?  I was not able to find https://mail.google.com/mail/u/0 in Chrome's cache (chrome://cache) even when logged in.



[1] https://dxr.mozilla.org/mozilla-central/rev/28e2a6dde76ab6ad4464a3662df1bd57af04398a/netwerk/protocol/http/nsHttpChannel.cpp#3872
Flags: needinfo?(dveditz)
(In reply to Honza Bambas (:mayhemer) from comment #6)
> must-revalidate: tells intermediate caches to validate the resource with the
> server before the cached version is used

must-revalidate applies to all caches, public and private (maybe that's what you meant by intermediate?). Only applies to stale entries, but in this case max-age=0 means it's always stale.

> the data has a privacy aspect and is based on the logged in session, which
> means presenting the right cookie.  
> 
> There should actually be "Vary: cookie" response header set by the gmail web
> server.  that would override the LOAD_FROM_CACHE header.

That only makes sense for data the server expects to be cached for some amount of time. For extremely dynamic content many sites want the client to always grab the latest copy; a Vary header makes no sense if they want zero caching full stop. We can wish it were different, but the advice to use "no-cache, no-store, max-age=0, must-revalidate" in this situation is all over the web and that's what we have to live with.

> So, in the end this is about how we treat no-store.  For ages we keep
> no-store entries in the memory cache, never store them to disk.

That's fine, and the spec says as much (only forbids 'non-volatile' storage). It's more about the max-age/must-revalidate combination. When and how often do we force load stuff from the cache without validation? Would it be worth saving the memory and never caching these at all for the special-case max-age=0? I assume there must be important performance considerations.

> So, a "solution" here may be to let view-source use VALIDATE_NEVER instead
> of LOAD_FROM_CACHE.  That flag makes us revalidate the resource when there
> is no-store in the cache directive.

DevTools are a big user of LOAD_FROM_CACHE, and view-source: is essentially a developer tool. It totally makes sense that if you're inspecting some web resource you want to see the source you loaded, not re-load it and potentially get different values. I'm sure in this case LOAD_FROM_CACHE was chosen based on functionality and not performance -- I'd hate to break that.

In fact we did make this explicit choice 17 years ago: bug 6119 (and many other dupes), reiterated when we changed the cache flags (bug 76657). view-source reloading content was causing problems debugging dynamic sites.

> Also, the cached entry can be found in about:cache in the list of memory
> entries, so even adding the proper Vary header would not hide the content
> after logging out.

Yup. The only solution would be whether we want to consider "no-store,max-age=0,must-revalidate" a special case and not even put it in the memory cache. Just "fixing" this for view-source (by breaking view-source) isn't solving the problem the reporter is concerned about.

I bet that would lead to horrible perf and lots of extra network traffic.

> I was not able to find https://mail.google.com/mail/u/0 in Chrome's cache
> (chrome://cache) even when logged in.

Yeah, neither could I. Does that represent only their disk cache or does that cover disk and memory? Hard to imagine they're actually reloading resources for all the multiple transient internal uses. According to their network tools they are for view-source: though.


> :dveditz, what do you think?  Should we change the policy how we keep
> no-store marked entries?  

Not convinced we need to. Certainly not for no-store in general. If we special-cased max-age=0,must-revalidate (which seems common) could we measure the perf impact? Seems like normal web loads are going to go down the revalidate path so caching isn't buying us anything there. And if we made that a special case then we'd fix view-source: as a side-effect in this case without breaking view-source in the general case.
Flags: needinfo?(dveditz)
Keywords: sec-moderatesec-low
Summary: Firefox uses the cached version of a page when viewing the source, and it appears that Firefox is not respecting the caching headers that GMail or other Sites are sending. → view-source: uses the cached version of a page, not respecting the caching headers from GMail or other sites
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
according comment 7, which I agree with, we may treat the must-revalidate and max-age=0 case as "don't ever cache".  it will never be reused from cache under the normal circumstances anyway.  and we may treat it a bit as a "privacy" flag.
Attachment #8814997 - Flags: review?(mcmanus)
Comment on attachment 8814997 [details] [diff] [review]
v1 (must-revalidate + max-age=0 is not cached)

Review of attachment 8814997 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/runxpcshelltests.py
@@ -46,5 @@
>  
>  HARNESS_TIMEOUT = 5 * 60
>  
>  # benchmarking on tbpl revealed that this works best for now
> -NUM_THREADS = int(cpu_count() * 4)

ignore this bit, it slipped in by mistake
(In reply to Honza Bambas (:mayhemer) from comment #8)
> Created attachment 8814997 [details] [diff] [review]
> v1 (must-revalidate + max-age=0 is not cached)
> 
> according comment 7, which I agree with, we may treat the must-revalidate
> and max-age=0 case as "don't ever cache".  it will never be reused from
> cache under the normal circumstances anyway.  and we may treat it a bit as a
> "privacy" flag.

I'm fine with titling the balance towards correctness and away from cache-hit rates in the face of these kinds of directives. Not caching anything would be spec compliant - so tweaks at the tail are fine by me. So I'll r+.

seems weird to give this combination 'safer' handling than no-store though - which seems like a stronger explicit signal. Do you want to rethink that?
Comment on attachment 8814997 [details] [diff] [review]
v1 (must-revalidate + max-age=0 is not cached)

Review of attachment 8814997 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_must-revalidate-max-age-0-uncached.js
@@ +25,5 @@
> +{
> +  do_check_eq(buffer, responseBody);
> +
> +  asyncOpenCacheEntry(URL, "disk", Ci.nsICacheStorage.OPEN_READONLY, null,
> +    new OpenCallback(NOTFOUND, null, null, function() {

I think you should document that you are testing desired implementation behavior, not spec behavior.. to make it easier to sort out what's required if a change is made in the future. i.e. it would be spec compliant to store and revalidate on reuse.
Attachment #8814997 - Flags: review?(mcmanus) → review+
Duplicate of this bug: 1322974
(In reply to Patrick McManus [:mcmanus] from comment #10)
> seems weird to give this combination 'safer' handling than no-store though -
> which seems like a stronger explicit signal. Do you want to rethink that?

Also see bug 1322974 where the combo is Cache-Control: no-cache, no-store, must-revalidate.  result is the same as in this bug.  Maybe we should do both: must-revalidate && (no-store || max-age==0).

I liked the idea of must-revalidate && max-age==0 because those two are connected.  must-revalidate forces validation when the entry is expired, which here it explicitly is.  no-store has no relation to must-revalidate.

We special case when a cached entry has Authorization request header stored.  I mention it because it's an indication of kind of a "login".  We could do the same when must-revalidate is specified and cookie is present.  But what doesn't have cookies set today, right? :)

Or, we could enforce vary: cookie when must-revalidate and no-store is present.  what about that?
platform-rel: --- → ?
Whiteboard: [necko-active] → [necko-active][platform-rel-Google][platform-rel-Gmail]
Why hasn't this landed?
Flags: needinfo?(honzab.moz)
Duplicate of this bug: 1331200
Because the patch needs an update and it's not entirely clear what we want to do here.  And I simply didn't get back to this bug.

Patrick, please see comment 13.
Flags: needinfo?(honzab.moz) → needinfo?(mcmanus)
Comment on attachment 8814997 [details] [diff] [review]
v1 (must-revalidate + max-age=0 is not cached)

Dropping r since we may need to think of something better here.
Attachment #8814997 - Flags: review+
platform-rel: ? → -
reading this from the top, I'm wondering if we don't just want an asthetic change.

view-source accesses non persistent storage which is perfectly fine and is not disrespecting headers and its also perfectly fine to use stale content if that's your policy. Based on what I've re-read I don't think I would change anything.

However I think perhaps I would special case no-store when producing about:cache - and only report items in the persistent storage when making that listing. Its a minor tweak, but I think it represents what is really going on.
Flags: needinfo?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #18)
> reading this from the top, I'm wondering if we don't just want an asthetic
> change.
> 
> view-source accesses non persistent storage which is perfectly fine and is
> not disrespecting headers and its also perfectly fine to use stale content
> if that's your policy. Based on what I've re-read I don't think I would
> change anything.
> 
> However I think perhaps I would special case no-store when producing
> about:cache - and only report items in the persistent storage when making
> that listing. 

Oh, no.  no-store items are in the cache, specifically in the memory cache.  Not showing them would be inconsistent, since we use such cached entries.

> Its a minor tweak, but I think it represents what is really
> going on.

I'm more thinking about the implicit vary:cookie thing which is actually what's going on here.  I'll think about the right condition to enforce it.  OTOH, some cookie parts may change regardless login, leading to less cache hits.
(In reply to Honza Bambas (:mayhemer) from comment #19)
> (In reply to Patrick McManus [:mcmanus] from comment #18)

> 
> Oh, no.  no-store items are in the cache, specifically in the memory cache. 
> Not showing them would be inconsistent, since we use such cached entries.

we only sort of use them (just for offline). If we're going to enumerate them maybe we can indicate that in some way - like a light grey stylesheet.
Duplicate of this bug: 1389635
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Priority: P1 → P2
so, I believe this is the way to go.  Other browsers don't cache no-store responses, apparently.  I only had to update two xpcshell network tests to accommodate this change, otherwise try looks good.
Comment on attachment 8972866 [details] [diff] [review]
v2 (pref to prevent any kind of caching of no-store responses, on by default)

Review of attachment 8972866 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is an improvement. thanks
Attachment #8972866 - Flags: review?(mcmanus) → review+
Comment on attachment 8972866 [details] [diff] [review]
v2 (pref to prevent any kind of caching of no-store responses, on by default)

Dropping r to get this back on my no-patch list.  We first want to make sure that a bf-cached no-store POST is not broken in a way that would break e.g banking.
Attachment #8972866 - Flags: review+
Duplicate of this bug: 1483249
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.