Closed Bug 1545909 Opened 1 year ago Closed 5 months ago

Hitting Back sometimes shows old cached content

Categories

(Core :: Networking: Cache, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
firefox-esr68 --- wontfix
firefox68 --- wontfix
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- fixed

People

(Reporter: jesup, Assigned: mayhemer)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1489624 +++

I'm continuing to see this behavior in 68.0a1 local-build nightly (m-c rev 6b5221949e77). I've also seen it rarely in the last few months; the place I've continued to notice it is CNN; always when I've left it there for a while (hours or a day) and it's been updating, then I read an article and hit back. Note that CNN is not bf-cacheable.

I have a dump of the about:cache data for memory and disk right after this occurred most recently, as well as a save of the page (not sure if that uses the in-memory copy, or re-downloads). I also have those cache dumps edited down to only urls that include "cnn." anywhere in them (with would include parameter strings)

I think whatever was fixed didn't fix the problem I originally reported.

Flags: needinfo?(honzab.moz)

I hit refresh on CNN, and saw current content. Clicked on a story. Hit back, went back to the same old page (from the day after the Notre Dame fire).

That happens reliably, so I'll leave it alone for now and wait for instructions on any debugging I can do with the live browser.

Flags: needinfo?(michal.novotny)

Ideally, we would have a complete log, but that might be pretty large, because it should cover initial load, loading the newer version of the page and finally hitting back and loading the old page. If the problem is that the newer version isn't cached (which might be caused e.g. by RCWN), we definitely need a log from the moment when we expect the newer version is cached.

Flags: needinfo?(michal.novotny)

Michal - so what should I do to get a log? I have a repeatable sequence of : reload, click on article, back. Likely the log would be large... but that's ok

Flags: needinfo?(michal.novotny)

Just turn logging on in about:networking as described here: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

The default log modules should be fine.

Flags: needinfo?(michal.novotny)

Shared log file on Drive with Michal and Honza

Out of curiosity, how is this bug related to bug 1480780? There is no information here that helps me figure out the connection.

Flags: needinfo?(rjesup)

(In reply to :Ehsan Akhgari from comment #7)

Out of curiosity, how is this bug related to bug 1480780? There is no information here that helps me figure out the connection.

I think it was due to it being copied from the cloned bug...

No longer blocks: 1480780
Flags: needinfo?(rjesup)
Keywords: regression
No longer regressed by: 1489624

(In reply to Randell Jesup [:jesup] (needinfo me) (PTO until 4/22) from comment #6)

Shared log file on Drive with Michal and Honza

What's the document's URL? What I see in the log for entry https://www.cnn.com/ is:

  1. There is an old cache entry present before logging was turned on, date in the header is "Sat, 20 Apr 2019 06:43:56 GMT" and the content has 174778 bytes.

  2. nsHttpChannel (0x7f6954269000) is created at 14:47:37.588424, it opens the entry and decides that the entry needs to be revalidated. We try to send conditional request, but because there is neither etag nor last-modified header, unconditional request is sent. The response's date is "Sat, 20 Apr 2019 14:47:37 GMT" and has 174262 bytes.

  3. When going back in history, nsHttpChannel (0x7f698b035000) is created at 14:48:16.283305. The content isn't revalidated due to VALIDATE_NEVER flag. The entry has the date and size of the response received in step 2.

There doesn't seem to be any problem.

Sitting on www.cnn.com (showing days-old content). Reload, see current content. Click on article. Hit back. Saw old (very old) content. Like days old (with lead articles from the day after Notre Dame)

Mentioned this to Bas today; his comment was "you've seen it too!!" He's seen this for months (at times) on CNN and I think elsewhere, including on mobile (Fennec I presume).

Michal: what could be causing this that isn't captured in the log? Or maybe there's something else hidden in there? (I haven't looked at how CNN structures the content in a page - in the main html file, or some separate sub-resource)

Flags: needinfo?(michal.novotny)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #10)

Michal: what could be causing this that isn't captured in the log? Or maybe there's something else hidden in there? (I haven't looked at how CNN structures the content in a page - in the main html file, or some separate sub-resource)

The content doesn't seem to be a part of the main html file. Looks like the content is fetched via XHR like this https://www.cnn.com/data/ocs/section/_homepage-zone-injection/index.html:homepage-injection-zone-1/views/zones/common/zone-manager.izl

Flags: needinfo?(michal.novotny)

(In reply to Michal Novotny [:michal] from comment #11)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #10)

Michal: what could be causing this that isn't captured in the log? Or maybe there's something else hidden in there? (I haven't looked at how CNN structures the content in a page - in the main html file, or some separate sub-resource)

The content doesn't seem to be a part of the main html file. Looks like the content is fetched via XHR like this https://www.cnn.com/data/ocs/section/_homepage-zone-injection/index.html:homepage-injection-zone-1/views/zones/common/zone-manager.izl

In the log Randal shared yesterday I can see just two requests for this URL: from a reload (refetched, with max-age=60) and from a nav (separated by some 30+ seconds, used from cache). Similarly when looking at https://www.cnn.com/data*.

W/o understanding how the article data is delivered we can't probably move forward. A continues logging (you can set command line args in shortcuts) with rotate:1000 (to have up to 1gb per process of logs) would be great to have. Some care on tracking the steps performed would be good too, so we know what to expect in the log.

BTW, this could well be some kind of a bug on cnn that is exposed thanks some minor difference in our caching algos against chrome/other browsers.

Flags: needinfo?(honzab.moz) → needinfo?(rjesup)

I'm still seeing this; today (also on CNN) on a different machine (Mac, running Nightly 2019-05-09). I this I hit reload (had closed it after loading CNN this afternoon), then browsed an article and hit back. Saw the content from this afternoon. browsed another article, hit back, still old content. Reload - new content. Browse article, back - new content.

Flags: needinfo?(rjesup)

The priority flag is not set for this bug.
:valentin, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(valentin.gosu)

I may have already asked for that before, but would you be willing to run a permanent rotated log? firefox -MOZ_LOG=timestamp,rotate:250,cache2:5,nsHttp:5? -MOZ_LOG_FILE=/where-there-is-enough-space/log.txt

250MB per process.

Provisionally assigning to Michal.

Question, since we hit this when navigating back, that means we load the content from the BF cache, right?(In reply to Randell Jesup [:jesup] (needinfo me) from comment #0)

+++ This bug was initially created as a clone of Bug #1489624 +++
Note that CNN is not bf-cacheable.

I am not very sure how that works, but are we certain that going back is not loaded from the BF cache?

Also, I would be interested to see how the resources appear in the devtools - if you have it open before hitting back.

Assignee: nobody → michal.novotny
Flags: needinfo?(valentin.gosu)
Priority: -- → P2
Whiteboard: [necko-triaged]

I would never have devtools open before hitting back (unless it's reproing and i try that - and it may perturb it such that it stops working, which happens)

I can try to run permanent logs, but I'm really doubting I'll be able to do so and catch it (once every few weeks...)

From experience, cnn never goes to bfcache - it has unload handlers. Also that would be "interesting" as we got new content (reload), browse to a page (which would possibly bfcache the page), then hit back. You'd expect that to restore the "new" (reloaded) content, not the old content from before reload.
Note this also seems to happen on auto-reloads (site-initiated refreshes)

I caught another instance of it in the profile, with screenshots and a network track in the profile (yes, far less info than if I was running full necko logging).
In this case, the content it took me back to I never had seen in this session, it was from perhaps a week earlier when I had last loaded cnn.com in this profile. started browser, I browsed somewhere else, fiddled with some stuff in other tabs (profiles, Addons page). Loaded cnn, saw today's content (mueller, etc). Read an article, hit back. Saw old content.
Starting profiler. Hit reload, see new content again. read an article, hit back. Old content (with the current Alert at the top about Mueller). Capture profile.
http://bit.ly/2I2Sh0a

Just checking in; this still happens consistently. Backtrack may be able to figure out why it's loading old, cached content on back instead of the content it had just had before following a link.

Backtrack is likely not the right tool for this. MOZ_LOG running from the start of the browser is.

I tried adding those env vars to a profile that had been repeatedly getting this on every "back" to CNN.com for days (always going to the same week-ish old page, fixed by reload). After adding the env vars, it stopped repeating. I'm not sure if the cache (entry) got cleared, or if the LOG requests have an unintended effect on the execution (due to races, perhaps?)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #21)

I tried adding those env vars to a profile that had been repeatedly getting this on every "back" to CNN.com for days (always going to the same week-ish old page, fixed by reload). After adding the env vars, it stopped repeating. I'm not sure if the cache (entry) got cleared, or if the LOG requests have an unintended effect on the execution (due to races, perhaps?)

If RCWN is involved then it could be, yes. You may well go just with command line args as -MOZ_LOG=rotate:400,timestamp,nsHttp:5 (no sync!) and -MOZ_LOG_FILE=... and see if it manifests then. Problem is that nsHttp:4 omits some stuff (we need to consolidate our logging better - something I want to do for a long time) while :5 also logs all h2 IO and cause substantial delays.

I managed to repro with :5; private email with link to the logs sent.

This looks like the same bug as our users on Fenix (using GeckoView Nightly) are seeing reliably and reporting.
Our reporter says:


For me, this happens 100% of the time for both nyt and hn.

Steps

  1. Open webpage
  2. Click on a link which redirects (even to same TLD)
  3. Press back button

Expect: Version of page I just saw
Actual: Previous cached version

I will note that I've seen versions of this happening for a long time, I used to see it on Fennec. Now it's consistent though.

Android Pixel 3
Nightly 191010 06:01
GV 71.0a1-20191004094656


Originally reported at: https://github.com/mozilla-mobile/fenix/issues/5869

I can also reproduce this. Seems a long standing problem we should prioritize fixing?

Flags: needinfo?(nhnguyen)

Let's try to fix this for 72.

Flags: needinfo?(nhnguyen)
Priority: P2 → P1

Problem with this bug is that it's very hard to reproduce. I have logs from Randell - with not much ambient info provided, including child logs, but I was not able after quite a few time spent looking in it decide why would load the old content, also because I actually don't know what request(s) represents the current vs obsolete articles load. The articles are loaded dynamically and I believe it also depends on geolocation what site you are looking at. In EU I'm redirected to edition.cnn.com that may technically behave differently than cnn.com in US. So it's hard to try to locally reproduce and get understanding what to actually look for.

One of my wish list items for Backtrack is to be able to pick an element/tag in the content and track back from various milestones captured on it, like DOM insertion point, inner content change, repaints etc. In this case we could at least narrow down what network request(s) are associated and look them up in logs.

W/o it I don't know what to actually look for :( There is a mixture of expired and non-expired responses in the cache, so I can't deduce which one is the responsible one.

(In reply to Vicky Chin [:vchin] from comment #25)

I can also reproduce this. Seems a long standing problem we should prioritize fixing?

Vicky, on what site and what product can you reproduce it?

None of the necko developers can reproduce the issue and as Honza already wrote in comment #27, this bug is hard to trace on sites like cnn.com or nytimes.com from the logs. But hacker news mentioned in comment #24 is very simple page, so it would be helpful if anybody who can reproduce it on HN can provide the log. To make sure this is really necko issue and it's not related to bf-cache, please add also PageCache:5,nsDocShell:5,nsDocShellLeak:5 to the MOZ_LOG, i.e. it should be timestamp,cache2:5,nsHttp:5,PageCache:5,nsDocShell:5,nsDocShellLeak:5

Flags: needinfo?(vchin)

Sorry, I missed this NI.

I was on cnn.com using Nightly.

Flags: needinfo?(vchin)

I sometimes get this bug too, on various sites. I often notice it on Hacker News. Steps:

  1. Go to https://news.ycombinator.com, get current version of the page
  2. Click on a link
  3. Go back, get an old version of the HN homepage (older than the one we got in 1)

I'll try to be careful next time to check if it happens if I stay on the same origin. As HN homepage is a simpler page than CNN, it may be easier to debug from this example.

I can reproduce this bug fairly reliably on Firefox Preview Nightly. The STR is roughly as follows.

  1. Make a new tab in Firefox Preview and navigate it to http://www.bbc.com/persian.
  2. You can have other tabs open as well, it doesn't really impact reproducing this bug.
  3. On the BBC page, read a few articles in the same tab. Each time go back to the home page and click on another article.
  4. When done, go back to the home page and go back to the Android home page.
  5. Leave Firefox Preview in the background, wait for it to be killed.
  6. Wait for about a day or so.
  7. After that, open Firefox Preview again, switch to the BBC Persian tab again. Reload the home page to make sure you're seeing the latest page.
  8. Click on one article from the home page.
  9. Spend a few minutes as if you're reading the article.
  10. Press back. At this point you will be taken back to a cached version of the home page, as it was the last time you saw it in step 3.

In my experience this is 100% reproducible, and it has been so both on Fennec and Fenix for maybe a couple of years if not more. The only reason why sometimes this bug doesn't reproduce using the STR above is if in step 6 you don't wait long enough. I've also experienced this bug on many new variants of Fenix so I don't believe it's dependent on a profile or a specific pref.

Duplicate of this bug: 1500797

There was a bug for some time that we failed to cache a resource when the channel was redirected. The channel has mChooseApplicationCache set to true, CacheStorage::AsyncOpenURI() fails at https://searchfox.org/mozilla-central/rev/a49a561af8b3314e0991919e86fc66175041d241/netwerk/cache2/CacheStorage.cpp#73 and we return early from nsHttpChannel::OpenCacheEntryInternal() at https://searchfox.org/mozilla-central/rev/a49a561af8b3314e0991919e86fc66175041d241/netwerk/protocol/http/nsHttpChannel.cpp#4158 without a cache entry.

This happened in the cache:

  1. visit bbc.com which redirects to https://www.bbc.com
  2. Loading https://www.bbc.com doesn't cache the content because of the bug
  3. Click on some article to navigate away
  4. Return back to main page, this time we load https://www.bbc.com without the redirect, so the content is cached
  5. after some time when the page has new content, visit again bbc.com which is again a redirect
  6. Loading https://www.bbc.com fetches fresh page without updating the cache because of the bug
  7. Navigate away
  8. Return back, this time we load the old entry cached in step 4.

Using mozregression I found out that the bug was introduced by https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c75bf39e389b9989cab163e313b99f1263af7776&tochange=d52a0108ed8ef2593db6fe0dafedbc2790941fd0 and was fixed by https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=f748a3d2cdf108e9443fd15332efe477c7c398a9&tochange=25b533bff4051e6a8177bbc83eed49ac460e109e

Honza, I'm not familiar with the appcache logic. Could you please check what exactly happened and whether we can consider this to be really fixed now? It surprises me that none of the existing tests revealed the bug...

Flags: needinfo?(honzab.moz)

FWIW, I still see this happening with today's Fenix Nightly on my Pixel 2.

Nightly 191228 18:01 (Build #13621814)
27.0.0, d10c36024
GV: 73.0a1-20191227094418

(In reply to Michal Novotny [:michal] from comment #33)

There was a bug for some time that we failed to cache a resource when the channel was redirected. The channel has mChooseApplicationCache set to true, CacheStorage::AsyncOpenURI() fails at https://searchfox.org/mozilla-central/rev/a49a561af8b3314e0991919e86fc66175041d241/netwerk/cache2/CacheStorage.cpp#73 and we return early from nsHttpChannel::OpenCacheEntryInternal() at https://searchfox.org/mozilla-central/rev/a49a561af8b3314e0991919e86fc66175041d241/netwerk/protocol/http/nsHttpChannel.cpp#4158 without a cache entry.

This happened in the cache:

  1. visit bbc.com which redirects to https://www.bbc.com
  2. Loading https://www.bbc.com doesn't cache the content because of the bug
  3. Click on some article to navigate away
  4. Return back to main page, this time we load https://www.bbc.com without the redirect, so the content is cached
  5. after some time when the page has new content, visit again bbc.com which is again a redirect
  6. Loading https://www.bbc.com fetches fresh page without updating the cache because of the bug
  7. Navigate away
  8. Return back, this time we load the old entry cached in step 4.

Using mozregression I found out that the bug was introduced by https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c75bf39e389b9989cab163e313b99f1263af7776&tochange=d52a0108ed8ef2593db6fe0dafedbc2790941fd0

Yes, this happens when Necko changes are not reviewed by Necko peers!

and was fixed by https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=f748a3d2cdf108e9443fd15332efe477c7c398a9&tochange=25b533bff4051e6a8177bbc83eed49ac460e109e

It kinda sounds like just that we hide the bug. If we still fail to open the cache entry because of the choose-app-cache flag, then we probably, because of these patch series, just don't set the choose-app-cache flag, which is a bug!

Honza, I'm not familiar with the appcache logic. Could you please check what exactly happened and whether we can consider this to be really fixed now? It surprises me that none of the existing tests revealed the bug...

OK, so when the STR was kept as you wrote it, it would be a permanent problem, with some intermittency based on expiration time and the last cached content for https://www.bbc.com, right? So, this could be one cause. Thanks, Michal, for figuring this out!

(In reply to Dirkjan Ochtman (:djc) from comment #34)

FWIW, I still see this happening with today's Fenix Nightly on my Pixel 2.

Nightly 191228 18:01 (Build #13621814)
27.0.0, d10c36024
GV: 73.0a1-20191227094418

Just to make sure: Fenix Nightly is based on mozilla-central code, up to date? It may well be that there is more than one reason of this hard to catch bug... Can you provide your exact STR? Thanks.

Flags: needinfo?(honzab.moz)

Yes, it just happened again this morning on the latest Fenix Nightly.

  1. Navigate to news.ycombinator.com, which gives me a fresh copy
  2. Click one of the links, spend a few minutes reading it
  3. Tap the back button to navigate back to HN

This time, I'll get an old copy of HN (from last night, I think). Refreshing it manually will give me the fresh copy again. I've seen similar problems with Reddit and GitHub in the past, but right now I think HN is the most common cause for me.

This is happening to me 100% of the time on a number of sites with Firefox Preview nightly. Another example is Ar sTechnica.

I just tested latest Fenix nightly with news.ycombinator.com and I still see happening what's described in comment #33. I.e. http://news.ycombinator.com redirects to https://news.ycombinator.com which is not being cached because it searches appcache for some reason. Honza, I'm assigning this bug to you to investigate appcache logic changes mentioned in comment #33.

Assignee: michal.novotny → honzab.moz
Priority: P3 → P1

Note:

# If set, use DocumentChannel with nsDocShell in place of HttpChannel.
- name: browser.tabs.documentchannel
  type: bool
#if defined(ANDROID)
  value: false  # See bug 1589982.
#else
  value: true
#endif
  mirror: always

so the bug Michal found in comment 33 is still present on Android.

The code from bug 1312646 is wrong - this is a functional and also a big perf regression from that bug that now popped up!

The patch started to set ChooseApplicationCache on the redirected HttpChannelParent here where its source is here. According this (wrong) assumption the child docshell is expected to update the flag, but it never did on the child! (current code).

HttpChannelParent does the use-appcache check here when it's being asyncope'ed. As the redirected channel wants to do the check again (the true purpose of bug 1312646) we need to copy that same code to this place.

Status: NEW → ASSIGNED
Regressed by: 1312646
Pushed by honzab.moz@firemni.cz:
https://hg.mozilla.org/integration/autoland/rev/2d6480f16286
Part 1 of 2: Recheck whether to use appcache on the parent process on the redirected top-level loading channel, r=michal
Pushed by honzab.moz@firemni.cz:
https://hg.mozilla.org/integration/autoland/rev/1b371510efa2
Part 2 of 2: Silently fail when appcache is disabled so that we have a cache entry, r=michal
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Honza: this bug still appears to happen with cnn.com. Did we verify that the patches that landed fixed known problem sites? I'm sure your fix fixes something, but perhaps it's not the cause of the problems reported here, or the cause of only some of the problems.

We should either reopen this, or clone a new bug off of it. Thanks

Flags: needinfo?(honzab.moz)

Ditto: I encountered the problem again in Fenix Nightly just a couple days ago (for the first time in weeks), on dn.no.

Please clone and provide logs and reference to URLs that manifested the problem. Thanks.

Flags: needinfo?(honzab.moz)
Blocks: 1630000

Was this supposed to also fix "undo closed tab" loading old content? I just hit this on 20200524092930.

  • loaded GitHub issue page days ago (version A)
  • loaded it today (version B)
  • added comment (version C)
  • closed tab
  • undo closed tab
  • got version A
  • reloaded for version C

Restoring a page intentionally loads the cached version, i.e. basically to pretend you never actually closed that tab; and commenting on Github doesn't seem to actually reload the page, so in combination getting version B after restoring the tab might be expected in some way, if not quite the ideal state of things with those kinds of dynamic pages.
Getting back version A on the other hand definitively doesn't sound right, though in any case it's probably best to open a new bug for that?

You need to log in before you can comment on or make changes to this bug.