Open Bug 1318234 Opened 8 years ago Updated 1 month ago

view-source: uses the cached version of a page, not respecting the caching headers from GMail or other sites

Categories

(Core :: Networking, defect, P3)

49 Branch
defect

Tracking

()

Tracking Status
platform-rel --- -

People

(Reporter: sebg.1969, Unassigned)

References

Details

(Keywords: sec-low, Whiteboard: [necko-triaged][platform-rel-Google][platform-rel-Gmail][necko-priority-next])

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
Build ID: 20161019084923

Steps to reproduce:

-Login to Gmail
-Logout of Gmail
-View the Gmail source, and you can see all mails in the last <script>-part
 (view-source:https://mail.google.com/mail/u/0)


Actual results:

You can see the Mails of the last out-logged person.


Expected results:

It must respecting the caching headers that GMail is sending.
(It have to use the code of the actual website, and not a cached version of a page.)
Severity: normal → critical
Component: Untriaged → Security
Not confirmed yet, but moving to an appropriate component. The mail page is sent with Cache-Control "no-cache, no-store, max-age=0, must-revalidate"

Need to compare to chrome's behavior. In the steps above the user never leaves or closes the page so I don't see where cache comes into it -- if the data is still in the page then Google has left it there.
Group: firefox-core-security
Component: Security → Networking
Product: Firefox → Core
Unhiding because this is not remotely attackable
Maby i should say, that i have it report to Google too, but they said, it i a bug of Firefox...
Firefox do not respecting the caching headers, that Gmail sending.
Cache -> me
Assignee: nobody → honzab.moz
Whiteboard: [necko-active]
The view-source requests is using LOAD_FROM_CACHE which tells the channel to load the cached content regardless no-cache: "NOT validating based on LOAD_FROM_CACHE load flag" [1].

The whole cache header in response to https://mail.google.com/mail/u/0 in my case is:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate

no-cache: tells the local cache to validate before presenting to user under normal circumstances
no-store: don't use non-volatile storage
max-age: the entry expires immediately (doesn't mean it cannot be reused, tho!)
must-revalidate: tells intermediate caches to validate the resource with the server before the cached version is used

So, a "solution" here may be to let view-source use VALIDATE_NEVER instead of LOAD_FROM_CACHE.  That flag makes us revalidate the resource when there is no-store in the cache directive.  

But I don't see it as an appropriate fix at all.  Here the problem is that the data has a privacy aspect and is based on the logged in session, which means presenting the right cookie.  

There should actually be "Vary: cookie" response header set by the gmail web server.  that would override the LOAD_FROM_CACHE header.

Also, the cached entry can be found in about:cache in the list of memory entries, so even adding the proper Vary header would not hide the content after logging out.




So, in the end this is about how we treat no-store.  For ages we keep no-store entries in the memory cache, never store them to disk.

My understanding of the specification here is to no use a non-volatile (which means persistent) storage for such entries.  RAM is volatile and the content goes away when the browser instance goes away or when the cache entry is evicted using normal eviction mechanism (mostly memory cache is hitting its limit).


:dveditz, what do you think?  Should we change the policy how we keep no-store marked entries?  I was not able to find https://mail.google.com/mail/u/0 in Chrome's cache (chrome://cache) even when logged in.



[1] https://dxr.mozilla.org/mozilla-central/rev/28e2a6dde76ab6ad4464a3662df1bd57af04398a/netwerk/protocol/http/nsHttpChannel.cpp#3872
Flags: needinfo?(dveditz)
(In reply to Honza Bambas (:mayhemer) from comment #6)
> must-revalidate: tells intermediate caches to validate the resource with the
> server before the cached version is used

must-revalidate applies to all caches, public and private (maybe that's what you meant by intermediate?). Only applies to stale entries, but in this case max-age=0 means it's always stale.

> the data has a privacy aspect and is based on the logged in session, which
> means presenting the right cookie.  
> 
> There should actually be "Vary: cookie" response header set by the gmail web
> server.  that would override the LOAD_FROM_CACHE header.

That only makes sense for data the server expects to be cached for some amount of time. For extremely dynamic content many sites want the client to always grab the latest copy; a Vary header makes no sense if they want zero caching full stop. We can wish it were different, but the advice to use "no-cache, no-store, max-age=0, must-revalidate" in this situation is all over the web and that's what we have to live with.

> So, in the end this is about how we treat no-store.  For ages we keep
> no-store entries in the memory cache, never store them to disk.

That's fine, and the spec says as much (only forbids 'non-volatile' storage). It's more about the max-age/must-revalidate combination. When and how often do we force load stuff from the cache without validation? Would it be worth saving the memory and never caching these at all for the special-case max-age=0? I assume there must be important performance considerations.

> So, a "solution" here may be to let view-source use VALIDATE_NEVER instead
> of LOAD_FROM_CACHE.  That flag makes us revalidate the resource when there
> is no-store in the cache directive.

DevTools are a big user of LOAD_FROM_CACHE, and view-source: is essentially a developer tool. It totally makes sense that if you're inspecting some web resource you want to see the source you loaded, not re-load it and potentially get different values. I'm sure in this case LOAD_FROM_CACHE was chosen based on functionality and not performance -- I'd hate to break that.

In fact we did make this explicit choice 17 years ago: bug 6119 (and many other dupes), reiterated when we changed the cache flags (bug 76657). view-source reloading content was causing problems debugging dynamic sites.

> Also, the cached entry can be found in about:cache in the list of memory
> entries, so even adding the proper Vary header would not hide the content
> after logging out.

Yup. The only solution would be whether we want to consider "no-store,max-age=0,must-revalidate" a special case and not even put it in the memory cache. Just "fixing" this for view-source (by breaking view-source) isn't solving the problem the reporter is concerned about.

I bet that would lead to horrible perf and lots of extra network traffic.

> I was not able to find https://mail.google.com/mail/u/0 in Chrome's cache
> (chrome://cache) even when logged in.

Yeah, neither could I. Does that represent only their disk cache or does that cover disk and memory? Hard to imagine they're actually reloading resources for all the multiple transient internal uses. According to their network tools they are for view-source: though.


> :dveditz, what do you think?  Should we change the policy how we keep
> no-store marked entries?  

Not convinced we need to. Certainly not for no-store in general. If we special-cased max-age=0,must-revalidate (which seems common) could we measure the perf impact? Seems like normal web loads are going to go down the revalidate path so caching isn't buying us anything there. And if we made that a special case then we'd fix view-source: as a side-effect in this case without breaking view-source in the general case.
Flags: needinfo?(dveditz)
Keywords: sec-moderatesec-low
Summary: Firefox uses the cached version of a page when viewing the source, and it appears that Firefox is not respecting the caching headers that GMail or other Sites are sending. → view-source: uses the cached version of a page, not respecting the caching headers from GMail or other sites
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
according comment 7, which I agree with, we may treat the must-revalidate and max-age=0 case as "don't ever cache".  it will never be reused from cache under the normal circumstances anyway.  and we may treat it a bit as a "privacy" flag.
Attachment #8814997 - Flags: review?(mcmanus)
Comment on attachment 8814997 [details] [diff] [review]
v1 (must-revalidate + max-age=0 is not cached)

Review of attachment 8814997 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/runxpcshelltests.py
@@ -46,5 @@
>  
>  HARNESS_TIMEOUT = 5 * 60
>  
>  # benchmarking on tbpl revealed that this works best for now
> -NUM_THREADS = int(cpu_count() * 4)

ignore this bit, it slipped in by mistake
(In reply to Honza Bambas (:mayhemer) from comment #8)
> Created attachment 8814997 [details] [diff] [review]
> v1 (must-revalidate + max-age=0 is not cached)
> 
> according comment 7, which I agree with, we may treat the must-revalidate
> and max-age=0 case as "don't ever cache".  it will never be reused from
> cache under the normal circumstances anyway.  and we may treat it a bit as a
> "privacy" flag.

I'm fine with titling the balance towards correctness and away from cache-hit rates in the face of these kinds of directives. Not caching anything would be spec compliant - so tweaks at the tail are fine by me. So I'll r+.

seems weird to give this combination 'safer' handling than no-store though - which seems like a stronger explicit signal. Do you want to rethink that?
Comment on attachment 8814997 [details] [diff] [review]
v1 (must-revalidate + max-age=0 is not cached)

Review of attachment 8814997 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_must-revalidate-max-age-0-uncached.js
@@ +25,5 @@
> +{
> +  do_check_eq(buffer, responseBody);
> +
> +  asyncOpenCacheEntry(URL, "disk", Ci.nsICacheStorage.OPEN_READONLY, null,
> +    new OpenCallback(NOTFOUND, null, null, function() {

I think you should document that you are testing desired implementation behavior, not spec behavior.. to make it easier to sort out what's required if a change is made in the future. i.e. it would be spec compliant to store and revalidate on reuse.
Attachment #8814997 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #10)
> seems weird to give this combination 'safer' handling than no-store though -
> which seems like a stronger explicit signal. Do you want to rethink that?

Also see bug 1322974 where the combo is Cache-Control: no-cache, no-store, must-revalidate.  result is the same as in this bug.  Maybe we should do both: must-revalidate && (no-store || max-age==0).

I liked the idea of must-revalidate && max-age==0 because those two are connected.  must-revalidate forces validation when the entry is expired, which here it explicitly is.  no-store has no relation to must-revalidate.

We special case when a cached entry has Authorization request header stored.  I mention it because it's an indication of kind of a "login".  We could do the same when must-revalidate is specified and cookie is present.  But what doesn't have cookies set today, right? :)

Or, we could enforce vary: cookie when must-revalidate and no-store is present.  what about that?
platform-rel: --- → ?
Whiteboard: [necko-active] → [necko-active][platform-rel-Google][platform-rel-Gmail]
Why hasn't this landed?
Flags: needinfo?(honzab.moz)
Because the patch needs an update and it's not entirely clear what we want to do here.  And I simply didn't get back to this bug.

Patrick, please see comment 13.
Flags: needinfo?(honzab.moz) → needinfo?(mcmanus)
Comment on attachment 8814997 [details] [diff] [review]
v1 (must-revalidate + max-age=0 is not cached)

Dropping r since we may need to think of something better here.
Attachment #8814997 - Flags: review+
platform-rel: ? → -
reading this from the top, I'm wondering if we don't just want an asthetic change.

view-source accesses non persistent storage which is perfectly fine and is not disrespecting headers and its also perfectly fine to use stale content if that's your policy. Based on what I've re-read I don't think I would change anything.

However I think perhaps I would special case no-store when producing about:cache - and only report items in the persistent storage when making that listing. Its a minor tweak, but I think it represents what is really going on.
Flags: needinfo?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #18)
> reading this from the top, I'm wondering if we don't just want an asthetic
> change.
> 
> view-source accesses non persistent storage which is perfectly fine and is
> not disrespecting headers and its also perfectly fine to use stale content
> if that's your policy. Based on what I've re-read I don't think I would
> change anything.
> 
> However I think perhaps I would special case no-store when producing
> about:cache - and only report items in the persistent storage when making
> that listing. 

Oh, no.  no-store items are in the cache, specifically in the memory cache.  Not showing them would be inconsistent, since we use such cached entries.

> Its a minor tweak, but I think it represents what is really
> going on.

I'm more thinking about the implicit vary:cookie thing which is actually what's going on here.  I'll think about the right condition to enforce it.  OTOH, some cookie parts may change regardless login, leading to less cache hits.
(In reply to Honza Bambas (:mayhemer) from comment #19)
> (In reply to Patrick McManus [:mcmanus] from comment #18)

> 
> Oh, no.  no-store items are in the cache, specifically in the memory cache. 
> Not showing them would be inconsistent, since we use such cached entries.

we only sort of use them (just for offline). If we're going to enumerate them maybe we can indicate that in some way - like a light grey stylesheet.
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Priority: P1 → P2
so, I believe this is the way to go.  Other browsers don't cache no-store responses, apparently.  I only had to update two xpcshell network tests to accommodate this change, otherwise try looks good.
Comment on attachment 8972866 [details] [diff] [review]
v2 (pref to prevent any kind of caching of no-store responses, on by default)

Review of attachment 8972866 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is an improvement. thanks
Attachment #8972866 - Flags: review?(mcmanus) → review+
Comment on attachment 8972866 [details] [diff] [review]
v2 (pref to prevent any kind of caching of no-store responses, on by default)

Dropping r to get this back on my no-patch list.  We first want to make sure that a bf-cached no-store POST is not broken in a way that would break e.g banking.
Attachment #8972866 - Flags: review+
Priority: P2 → P3
Assignee: honzab.moz → nobody
Status: ASSIGNED → NEW

This is my first contribution with bugzilla, so this may be inappropriate or the wrong place to submit but I think my experience is related to this ticket.

I'm recently switched from chrome to firefox 84.0.2 (64-Bit) Windows version as my local development/daytoday browser.

I encountered an issue while dealing with locahost form submissions. My form is being cached (filled out) until I explicitly press "SHIFT+STRG+R". Even with the appropriate no-cache headers. (Cache-Control: max-age=0, private, must-revalidate)

I measured this over 15 minutes now and a normal "refresh" wont purge my form. This leads to several backend issues and I can imagine that there is a lot of bug-potential when firefox just caching forms, that may be dynamic, or just "stale". How many forms out there are not being processed with this behavior?

I discovered while evaluating form submission during development, and it took me about 20 minutes to realize my code is just fine, but the form which is being submitted is stale... (i shouted at rails first, sorry buddy...). Whats also interesting, that even with a errored form submission (500) firefox just happily caches the form, when pressing the back button.

Coming from chrome, where a "STRG+R" purges your form, makes it hard for me arguing that this is not a bug. If so, please let me know.

(In reply to Niklas.Hanft from comment #30)

This is my first contribution with bugzilla, so this may be inappropriate or the wrong place to submit but I think my experience is related to this ticket.

I'm recently switched from chrome to firefox 84.0.2 (64-Bit) Windows version as my local development/daytoday browser.

Thank you for switching.

I encountered an issue while dealing with locahost form submissions. My form is being cached (filled out) until I explicitly press "SHIFT+STRG+R". Even with the appropriate no-cache headers. (Cache-Control: max-age=0, private, must-revalidate)

This bug report on which you're commenting is specifically related to "view source" functionality, and the issue you're reporting does not appear to be about that (even if both are related to caching - caching is a pretty broad subject!). Can you file a new bug, please? You can either flag me for needinfo on the new bug or comment here with the new bug number when you've filed it and I can make sure it is looked at.

Flags: needinfo?(Niklas.Hanft)

(In reply to :Gijs (he/him) from comment #31)

(In reply to Niklas.Hanft from comment #30)

This is my first contribution with bugzilla, so this may be inappropriate or the wrong place to submit but I think my experience is related to this ticket.

I'm recently switched from chrome to firefox 84.0.2 (64-Bit) Windows version as my local development/daytoday browser.

Thank you for switching.

I encountered an issue while dealing with locahost form submissions. My form is being cached (filled out) until I explicitly press "SHIFT+STRG+R". Even with the appropriate no-cache headers. (Cache-Control: max-age=0, private, must-revalidate)

This bug report on which you're commenting is specifically related to "view source" functionality, and the issue you're reporting does not appear to be about that (even if both are related to caching - caching is a pretty broad subject!). Can you file a new bug, please? You can either flag me for needinfo on the new bug or comment here with the new bug number when you've filed it and I can make sure it is looked at.

Sorry, I must overread that! I opened a new bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1688183

Flags: needinfo?(Niklas.Hanft)

I just found this comments looking for the bug I just discovered : YES ! "view-source" have cache problems ! And is VERY annoying, because I'm trying to depurate my own webpage and Firefox is NOT refreshing the source code

I change one of the HTMLs I'm programming and Firefox can see the webpage changes, no problem, just "F5" refresh the webpage perfectly, but, if I hit "CTRL-U" to see the source code, Always show the same obsolete old code ! The only way to fix that is to clean Firefox cache every time I change the code

So, I'm testing my code in Chrome, no problem showing the source code updated when I hit "CTRL-U", but also works in Opera, not tested in MS browser, I don't like it too much, but I suppose will work OK

A few hours lost because of this bug ! But I love Firefox, I hope is fixed soon :D

Cheers !

I can confirm there is a bug of showing old cached page when using "view-source" function.

Looks like a serious regression.

Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0

View source showing cached page still not fixed in Firefox 88. Please fix, not exactly developer friendly with this bug.

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0

I'm using Firefox 88 on Linux Mint and view source is showing cached info.
Serious problem.

On FF 88.0.1 / Linux mint this still happens, very annoying, my view source is going back to the cached login page from a previous redirect, even though I'm logged in to a site I'm testing, which difficults testing.

I'm on FF Developer 90.0b12 / Windows 7, and this is still a problem. Is there any timetable or something about when this could be fixed?
It not only is very annoying, but very problematic.
A couple of examples:

I teach web development.
When evaluating exams, one point is validating HTML to see if there's any issues. When multiple students use the sames URLs (for instance, using the same base server http://localhost:8000), like /login or /productos, if I don't remember to every single time to refresh the 'source code', then I will not be validating the correct HTML, which depending which was initially, could mistakingly mark some errores, or mistakingly show no errors.

Another case which has arisen, is when teaching HTML/php beginners a login form. When trying to login with wrong credentials, sometimes the student repopulates no only the email/user, but also the password, which is a BIG issue. One way to easily show them why it's dangerous, it's show them in the 'source code' that the password is there plainly to see. However, with this bug that is not possible.
The first time I enter the login form (and FF caches the page's source code) the form is empty. Then when I insert wrong credentials and get redirected again to the form, the values in the page will be repopulated in the form. But when I view the source code, it will show me again the original source, with the form empty. In that scenario, I can't even refresh the source code cache, because will again load the form empty.
Usually I have to show them with the Devtools or something similar, but for people very new, that can be confusing.

And also, many new students have suffered lots of time lost because this over-aggressive cache strategy. Caching local assets in localhost by default is already pretty aggressive imo. But caching the source code also is too much. I can't see how that could ever be useful.

I think it's very important to fix this, at the very least for FF Developer. For experienced users, it's extremely annoying. And for beginners, it's very confusing, error-prone and frustrating. I can attest to that from teaching.

I hope we can get some input from the dev team on this, it would be greatly appreacited.

Best regards!

@gallino.santi The workaround for web development is currently to remember to always reload the source-view: tab (!, page tab is not enough) after opening it with Firefox.

A common pattern is (FF 89.0.2):

  • load a page
  • change code
  • reload page (CTRL+F5)
  • check source code (CTRL+U)

Expected:

  • see changed code

Actual:

  • see source code from first page load

This has lead to much frustration here, too.

(In reply to flightvision from comment #40)

@gallino.santi The workaround for web development is currently to remember to always reload the source-view: tab (!, page tab is not enough) after opening it with Firefox.

A common pattern is (FF 89.0.2):

  • load a page
  • change code
  • reload page (CTRL+F5)
  • check source code (CTRL+U)

Expected:

  • see changed code

Actual:

  • see source code from first page load

This has lead to much frustration here, too.

Reload doesn't work if the cached page is a redirect to another page, like a redirect to the login page, you will be reloading the login page in the view-source. The problem is the caching when developing.

See Also: → 1721580

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --
Severity: -- → S3
Whiteboard: [necko-active][platform-rel-Google][platform-rel-Gmail] → [necko-triaged][platform-rel-Google][platform-rel-Gmail]
Duplicate of this bug: 1806223
See Also: → 1806223

Given the slew of more recent comments here (vs. it being filed 6 years ago), as well as dupes, I wonder if something else has changed? In bug 1806223, the headers have no-cache and Pragma: no-cache but not no-store or max-age=0, AFAICT - but I'm not a cache expert so not sure if that's relevant - I don't know if we're still seeing the same thing or if this problem is now more wide-spread (comment #34 claims a regression but it's not clear on what basis). I also don't understand how refreshing the original page and then using view-source would end up showing the old version of the page - which again is different from what was reported in comment 0, as the logout would not have been on the same URL - I'd have expected the cache to have been replaced with the cache of the new page. Valentin or Kershaw, can you take another look? (Can't seem to needinfo Valentin atm...)

Flags: needinfo?(kershaw)
See Also: 1806223

This is getting worse. Now view-source if used after submit resubmits form on the page without confirmation. I don't think this shoud happen given the possible implications.
Chromium based browsers reload the page on view-source, but they confirm form resubmission.
I think reloading the page on view-source is wrong. If page is reloaded from the server on view-source, it won't show the source of the page on view, but a newly loaded copy.

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0

(In reply to Esa Ristilä from comment #45)

This is getting worse. Now view-source if used after submit resubmits form on the page without confirmation.

This is bug 1721580.

See Also: → 1851015

(In reply to :Gijs (he/him) from comment #44)

Given the slew of more recent comments here (vs. it being filed 6 years ago), as well as dupes, I wonder if something else has changed? In bug 1806223, the headers have no-cache and Pragma: no-cache but not no-store or max-age=0, AFAICT - but I'm not a cache expert so not sure if that's relevant - I don't know if we're still seeing the same thing or if this problem is now more wide-spread (comment #34 claims a regression but it's not clear on what basis). I also don't understand how refreshing the original page and then using view-source would end up showing the old version of the page - which again is different from what was reported in comment 0, as the logout would not have been on the same URL - I'd have expected the cache to have been replaced with the cache of the new page. Valentin or Kershaw, can you take another look? (Can't seem to needinfo Valentin atm...)

Sorry for not having time looking at this bug. I'll bring this bug to our team's triage meeting and see what's the next step.

Flags: needinfo?(kershaw)
Whiteboard: [necko-triaged][platform-rel-Google][platform-rel-Gmail] → [necko-triaged][platform-rel-Google][platform-rel-Gmail][necko-priority-new]

So I'm thinking we can probably change nsHttpChannel so that even if the channel has LOAD_FROM_CACHE, it the cache entry has no-cache and the request is for view-source, we always go to the network.

Whiteboard: [necko-triaged][platform-rel-Google][platform-rel-Gmail][necko-priority-new] → [necko-triaged][platform-rel-Google][platform-rel-Gmail][necko-priority-review]
Whiteboard: [necko-triaged][platform-rel-Google][platform-rel-Gmail][necko-priority-review] → [necko-triaged][platform-rel-Google][platform-rel-Gmail][necko-priority-next]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: