Closed Bug 1489624 Opened 2 years ago Closed 2 years ago
Hitting Back sometimes shows old cached content
Nightly local build (set to non-e10s) Startup. Type cnn.com in urlbar, view current CNN content. Click on a story, read it. Hit Back. Browser shows content from June (3 months ago+) which was probably the last time I visited CNN in this (test) profile. I *think* I've seen this before, but can't be sure.
memory Number of entries: 428 Maximum storage size: 32768 KiB Storage in use: 3814 KiB Storage disk location: none, only stored in memory List Cache Entries disk Number of entries: 12308 Maximum storage size: 1048576 KiB Storage in use: 422948 KiB Storage disk location: /home/jesup/.cache/mozilla/firefox/jr1tyg9z.webrtc/cache2 List Cache Entries appcache Number of entries: 43 Maximum storage size: 512000 KiB Storage in use: 501 KiB Storage disk location: /home/jesup/.cache/mozilla/firefox/jr1tyg9z.webrtc/OfflineCache Nothing in the memory cache matches https://www.cnn.com (top URL) There are many CNN entries in the disk cache. However, many of them (all?) have a last-modified data *after* the Expires date. https://www.cnn.com/.a/2.77.3/assets/sprite-chrome.png 3728 bytes 8 2018-09-07 15:53:46 2018-05-20 15:29:16 https://www.cnn.com/css/2.77.3/pages/page.css 148709 bytes 10 2018-09-07 15:53:46 2018-05-20 15:29:15
Weird behavior, we should look into this. Note: jesup tells me he's saved the Firefox profile that's manifesting this, so please feel free to ask him if looking into individual files would be helpful here. Also note that he's running in non-e10s, though I can't think of any specific reason that would affect our cache algorithms here.
Assignee: nobody → michal.novotny
Priority: -- → P2
Randell, could you please provide a log of returning to cnn.com main page? Default modules in about:networking are fine. I was able to reproduce this behavior on Czech news server https://www.idnes.cz/ when cookies were blocked for that domain. The problem was introduced by bug 1480780. There are multiple cache entries with different content for this domain with various prefixes: -unique:https://praha.idnes.cz,:https://www.idnes.cz/ -unique:https://zlin.idnes.cz,:https://www.idnes.cz/ -unique:https://fotbal.idnes.cz,:https://www.idnes.cz/ -unique:https://zpravy.idnes.cz,:https://www.idnes.cz/ etc. Which one is used depends on from what section/subdomain I'm returning back. It looks like the change in bug 1480780 landed without review from necko peer. I'm not sure how to fix this and if this behavior was expected.
> It looks like the change in bug 1480780 landed without review from necko > peer. I'm not sure how to fix this and if this behavior was expected. The behavior is clearly not expected. What we want is to have separate cache entries for resources loaded by trackers. Cache entry has a prefix composed by: "-unique:" + the origin of the op-level window. This was reviewed by mayhamer in bug 1461921 (part 7). I'm trying to reproduce this issue. I keep the NI.
Wondering if we should make a unique cache key using eTLD+1 top-level origin instead of the full origin. This will fix the issue for https://www.idnes.cz.
Pretty sure 63 is also affected by this.
The "-unique:" extension has not been reviewed by a necko peer. It has landed in bug 1480780 and not 1461921. I'll walk the code to see where we are returning false exactly to understand how this works. Not sure why navigating back should consider referrer as a top level origin.
Using the test case from comment 3 (disabled cookies for "idnes.cz") I can see the following in AntiTrackingCommon::IsFirstPartyStorageAccessGrantedFor when going back from foo.idnes.cz to www.idnes.cz: aURI = https://www.idnes.cz/ channel's load info: mLoadingPrincipal = NULL mTriggeringPrincipal = mozilla::NullPrincipal mPrincipalToInherit = mTriggeringPrincipal (nullprincipal) mTopLevelPrincipal = NULL mTopLevelStorageAreaPrincipal = NULL GetChannelResultPrincipal returns mozilla::ContentPrincipal, codebase=https://www.idnes.cz/, stored to channelPrincipal eventually, we use channelPrincipal for the toplevelPrincipal, as this is a top level document load. then, for the domain (and apparently all subdomains) cookies are disabled, hence we return false and (back in nsHttpChannel::OpenCacheEntryInternal) assign top-level domain -unique prefix to the cache entry key: topWindowURI is https://foo.idnes.cz/ (the page I'm returning from), because: load context of the channel (www.idnes.cz) is associated to a window (top-level, content) with ContentPrincipal having codebase=foo.idnes.cz. This is observed at this stack, during opening of the channel: xul.dll!mozilla::net::HttpChannelChild::AsyncOpen(0x0000020c9af48e80, 0x0000000000000000) Line 2711 C++ xul.dll!mozilla::net::HttpChannelChild::AsyncOpen2(0x0000020c9af48e80) Line 2723 C++ xul.dll!nsURILoader::OpenURI(0x0000020ca36e2078, 0, 0x0000020c98941830) Line 860 C++ xul.dll!nsDocShell::DoChannelLoad(0x0000020ca36e2078, 0x0000020ca3cc17e0, false) Line 11233 C++ To sum - the top level URI of a channel during navigation is the URI of the page we are navigating from, being it a back or forward button or a link click. We don't use channel's final channel URI (which I would expect!) We do that only when getting top window URI fails (either of: no 3rd party util, no top window for the channel or no uri for that window - not the case here). The cache uniqueness approach as is now seems wrong to me... OTOH, Randal gave me his about:cache and there were no -unique entries, so his problem (the original report) could be something different.
I've had it happen again... with the same build in a Very different profile (my "main" linux profile, which has ~3000 tabs, 56 loaded (fairly recent restart)) m-c rev fbe65340170c (Aug 21) Again with cnn.com - load a page, (load another page and go back), then go back to cnn.com, and I see their homepage from yesterday (last time I visited before this). I still see cache entries with expires < last-modified: 276:https://cdn.cnn.com/cnnnext/dam/assets/180915143129-fan-bingbing-medium-tease.jpg 17616 bytes 32 2018-09-18 16:13:04 2018-09-17 16:17:25 880:https://www.cnn.com/ 169134 bytes 1 2018-09-18 16:12:53 2018-09-18 15:56:59 11555:predictor-origin:http://www.cnn.com/ 0 bytes 1 2018-09-17 15:24:55 No expiration time 11556:http://www.cnn.com/ 0 bytes 1 2018-09-17 15:24:55 2018-09-17 15:27:24 The last one might be interesting, since I browsed probably to www.cnn.com (without https). However, the awsomebar may well have sent me to the https://www.cnn.com/ address anyways due to history. I tried repeating the actions (open cnn in new tab. Read the same article. Paste WashingtonPost article URL in. Hit back. Hit back again. However, this time it went to the "correct" current CNN page. While it is an older build (3ish weeks), it also does not match any known bugs from that timeframe that I know of. This also tends to vote against profile corruption or other oddities. I have no extensions in this profile.
Did you meant Honza Bambas
(In reply to Randell Jesup [:jesup] from comment #9) > Again with cnn.com - load a page, (load another page and go back), then go > back to cnn.com, and I see their homepage from yesterday (last time I > visited before this). In theory, this might be caused by RCWN. I.e. the cache contains page from yesterday, you visit the page today, but it's not cached because cache is too slow, then when going back in history, the cache entry with older content is used.
> In theory, this might be caused by RCWN. I.e. the cache contains page from > yesterday, you visit the page today, but it's not cached because cache is > too slow, then when going back in history, the cache entry with older > content is used. Now *that* makes sense - if we RCWN and network wins, we need to still evict/replace the cache entries as normal. Especially as after the first RCWN it's probably in a buffercache, and would beat the network on Back. And for whatever reason it's not in the FBCache.
(Responding to comment 8, since there seems to be at least two bugs being discussed here.) The regression from bug 1480780, I think, is the removal of checking mIsTrackingResource here: <https://hg.mozilla.org/mozilla-central/rev/e70b1d7ba1eb#l14.67>. The problem is that before this change, that check would guarantee that the "trackerFor" cache entries would only be created for tracking channels. After that change, however, we removed that check and moved it here: <https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/toolkit/components/antitracking/AntiTrackingCommon.cpp#806>. On a closer look, that is changing semantics here though, since there are several code paths in IsFirstPartyAccessGrantedFor() where we bail our returning false before we have checked to ensure the channel comes from a tracker. That is normally fine for the consumers of this function. This particular Necko consumer, however, does care about whether channels are tracking or not. So it needs to continue to perform the check on its own. I have a patch for this.
Comment on attachment 9010338 [details] Bug 1489624 - Ensure that the Necko cache will only be partitioned for third-party tracker resources Honza Bambas (:mayhemer) has approved the revision.
Attachment #9010338 - Flags: review+
(In reply to Randell Jesup [:jesup] from comment #9) > I've had it happen again... As Michal said, this could be RCWN, but I would like to confirm. Randall, I would like to ask you one thing. Before you update your nightly to contain Ehsan's patch, can you please run it with full logging for a while? We have a file size limit (rotation) so you don't have to worry about flooding your disk. There are two ways to do it: - Using environment var, but this effects (makes to log to the same file) all Gecko products (e.g. Thunderbird) or other Firefox instances you may run in parallel - Using command line args Directions to do it can be found here: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Logging_HTTP_activity_by_manually_setting_environment_variables and here: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Start_logging_using_command_line_arguments Just use the list of default modules as are in the document. If you are willing to give up more disk space, you can change the rotate:200 argument to rotate:500 or even more. It's the limit of MB per process. When this problem occurs, just copy the logs somewhere, pack (gzip or xz) and best send to me directly as it may contain sensitive info (cookies, urls). And let me know which URL it failed for you that time. Thanks!
Flags: needinfo?(honzab.moz) → needinfo?(rjesup)
(In reply to Honza Bambas (:mayhemer) from comment #16) > (In reply to Randell Jesup [:jesup] from comment #9) > > I've had it happen again... > > As Michal said, this could be RCWN, but I would like to confirm. > > Randall, I would like to ask you one thing. Before you update your nightly > to contain Ehsan's patch, can you please run it with full logging for a > while? We have a file size limit (rotation) so you don't have to worry > about flooding your disk. I can try... note: this is HARD to repro - it's pretty random. and generally in my "main" browser, which has ~35 windows, and ~3100 tabs. Though by default it loads around 40 at startup. And I just pulled m-c and rebuilt today.... I could roll it back and rebuild Do I need to roll back?
Flags: needinfo?(rjesup) → needinfo?(honzab.moz)
We could probably test the RCWN hypothesis artificially with a hack--add an about:config pref that delays cache requests by some 1-2 seconds when set (so network wins all races). Then hit back and see what happens? I'm not 100% sure it'd work. Do we know for sure (i.e. someone who knows the code please tell me) if RCWN doesn't store the network data in the cache when the network wins? If that's true, it seems like we'll need to do something (even if it's not causing this bug). If storing the data is hard in this case, maybe just dooming the cache entry whenever the net wins might be a good enough fix for now?
Attachment #9010338 - Attachment description: Bug 1489624 - Ensure that the Necko cache will only be partitioned for third-party tracker resources; r=mayhemer → Bug 1489624 - Ensure that the Necko cache will only be partitioned for third-party tracker resources
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/70ca8b995ad1 Ensure that the Necko cache will only be partitioned for third-party tracker resources r=mayhemer
Michal, comment 18?
Flags: needinfo?(honzab.moz) → needinfo?(michal.novotny)
(In reply to Randell Jesup [:jesup] from comment #17) > Do I need to roll back? I think not. I suspect that the cause is yet something else. If you keep logging running and when the problem occurs again catch the logs quickly to not overwrite, we will very likely find out the cause. I'm not a big fan of experimenting with RCWN, I'd rather leave the settings as are and get the logs.
Comment on attachment 9010338 [details] Bug 1489624 - Ensure that the Necko cache will only be partitioned for third-party tracker resources Approval Request Comment [Feature/Bug causing the regression]: Bug 1480780 [User impact if declined]: comment 3, and comment 0 [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: It was verified manually locally [Needs manual test from QE? If yes, steps to reproduce]: Yes. 1. Create a new profile. 2. Load https://www.idnes.cz/ in a tab. 3. Open about:cache?storage=disk in a new tab. 4. Look for cache entries with prefixes such as the ones mentioned in comment 3. There should be none in a build without this bug. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: Very small surgical fix. [String changes made/needed]: None.
Attachment #9010338 - Flags: approval-mozilla-beta?
(In reply to Jason Duell [:jduell] (needinfo me) from comment #18) > Do we know for sure (i.e. someone who knows the code please tell me) if RCWN > doesn't store the network data in the cache when the network wins? If > that's true, it seems like we'll need to do something (even if it's not > causing this bug). If storing the data is hard in this case, maybe just > dooming the cache entry whenever the net wins might be a good enough fix for > now? Yes, we don't store the content. AFAICS, in very rare cases it could happen that we have cache entry when we want to initialize it for writing when network wins.
Comment on attachment 9010338 [details] Bug 1489624 - Ensure that the Necko cache will only be partitioned for third-party tracker resources P2+ minimal patch fixing a 63 regression, uplift approved for 63 beta 9, thanks.
Attachment #9010338 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
While trying to reproduce(on a build that should be affected), I could only find in the mentioned format the item bellow -unique:https://fotbal.idnes.cz:https://eus.rubiconproject.com/usync.html?&geo=eu&co=ro Is this what we're looking after? Also, on the latest nightly there where some .js files cached with the path bellow -unique:https://fotbal.idnes.cz:https://s1.adform.net/banners/scripts/video/outstream/inview.js If I understood right, the fotbal prefix should not be noted in the link; right?
I believe rubiconproject.com and adform.net are both trackers, so it is expected for such entries to be found, specially after this bug has been fixed. What we are looking for are top level pages with the `-unique:<foo.tld>:` prefix.
Thanks to Michal I was finally able to reproduce the issue after blocking all cookies (with 63.0b8 on win10). Verified the issue on 63.0b9 and 64.0a1 (2018-09-27) on win10, ubuntu16.04, macOS 10.9 and did not encounter any more prefixes like mentioned in comment 3.
You need to log in before you can comment on or make changes to this bug.