Closed Bug 610417 Opened 9 years ago Closed 9 years ago

bfcache evicts the wrong entries, making it "not work" when multiple tabs are open

Categories

(Firefox for Android Graveyard :: General, defect)

defect
Not set

Tracking

(fennec2.0+)

VERIFIED FIXED
Firefox 4.0
Tracking Status
fennec 2.0+ ---

People

(Reporter: vlad, Assigned: azakai)

References

Details

Attachments

(3 files, 4 obsolete files)

I'm not sure that max_viewers = 1 means "keep 1 page back", I think it might mean "keep just 1 page" or something.  Regardless, going "back" one page should almost always just snap that page in.  Right now, when you go back, the previous page goes through what looks like a loading process to display.  This makes it much harder to browse, especially when reading articles from a blog.

Also, other browsers are horrible at "back", so this could be a pretty big win for Fennec if we can get it right.
tracking-fennec: --- → ?
The intent has always been to keep 1 extra back page, so lets investigate this and make sure it is doing what we hope.
Assignee: nobody → doug.turner
tracking-fennec: ? → 2.0+
mikek, can you take a look?
Assignee: doug.turner → mkristoffersen
I think the value "1" is correct if we want to be able to go back one page.

In order to verify it I created a set of testpages:

http://www.mikek.dk/Samples/NoCachePages1.html
http://www.mikek.dk/Samples/NoCachePages2.html
..
http://www.mikek.dk/Samples/NoCachePages5.html

(Don't mind the "NoCache" in the names)

that displays the load time for each page (min:sec), by going to page 1,2,3,4,5 and then using the forward and back buttons it looks like when setting "browser.sessionhistory.max_total_viewers" to 1 then you can return to one page without a reload, when you set it to 2 then you can return to two pages without a reload - it is however important to use the forward/back buttons not by clicking on the links.
Ok -- so perhaps that value is correct, but the actual effect is still wrong.  Load a large page, say a blog -- http://www.rockpapershotgun.com/.  Click on a link to read an article.  Then click back.  The previous page still looks like it's loading and snapping in, so I think we need to investigate why things are loading.  Are we hitting the network because we no longer have things cached?  Did we have to discard the bfcache stuff due to memory pressure? Etc.

This is I think a pretty major usability issue, if we want to make "browsing" possible in fennec.  Clicking a link and going back quickly are pretty key for that experience.
We are not reloading that page from the network when we return to it - this can be easily verified by going to that page, clicking on a link, wait for the new page to load and then go into flight-mode/off-line mode - you are still able to push the "back" key and have the page reloaded - it is true that it takes a long time (at least on the N900) to display the page again, but it is without network access.

I agree that it is slower to reload than it should be - ideally it should be like switching between tabs, I'll try to figure out what is going on...
Okay, this is "very interesting", I was asked to check if we had 1 bfcache entry per tab, or one for the whole browser (Fennec).

We have 1 for the whole browser, but...  this one doesn't seem to be shared between the different tabs! 

What I see is that if I open tab 1 I can go back-forward between two pages without reloads, but if I open tab 2 and look at more than 1 page in that tab, then tab 1 looses the ability to cache anything, I need to close tab 2 in order to get the caching ability back to tab 1.
probably what vlad was seeing.  Can you figure out what is going on mike?  The expected behavior is for us to have one per browser but shared between tabs.  We may change this (based on memory usage) to have one per tab, but as you pointed out, we think these things might be 4mb each.
Hmm, maybe what I was seeing?  I'm pretty sure that I only had one tab open (though for some reason the home screen tab really wants to stick around, I have to explicitly close it to get it to go away... even when I type something in to the url bar?)

That's very interesting data though, I wonder if that affects desktop as well..
Nja... the number of tabs doesn't seem to be important, as long as it hasn't changed its content (you only viewed one page in the tab) - but as far as this bug goes, it still takes a significant amount of time to go back to the page http://www.rockpapershotgun.com/ even its stored in the cache (I don't know if it is actually stored in the bfcache or in the file cache).

As far as FF goes, the behavior doesn't seem to be changed by the number of open tabs (tested on 3.6.12)
Attached patch Lets try again (obsolete) — Splinter Review
Attachment #493736 - Attachment is obsolete: true
If a page gets a load event when coming out from session history, then it
was *not* bfcached.
If the page gets *only* pageshow event (not load event), then it was bfcached.

If the page has unload or beforeunload event listeners, it is never
bfcached. Also pages with active sync XHRs aren't bfcached.
And if there are some active network connections.
Comment on attachment 493750 [details] [diff] [review]
Lets try again


> 
>-[scriptable, uuid(39b73c3a-48eb-4189-8069-247279c3c42d)]
>+[scriptable, uuid(bb66ac35-253b-471f-a317-3ece940f04c5)]
> interface nsISHEntry : nsIHistoryEntry
Note, you shouldn't be modifying interfaces atm.
You might be able to add the new thing to nsISHEntryInternal.


> NS_IMETHODIMP
> nsSHistory::LoadEntry(PRInt32 aIndex, long aLoadType, PRUint32 aHistCmd)
...
>+  // Remember that this entry is getting loaded at this point in the sequence
>+  nextEntry->SetLastTouched(++gTouchCounter);
>+
Could you explain why you call SetLastTouched here?
I would have expected it to be set in InitiateLoad,
though subframe traversal might not work in that case.

But in general, something like this could work.
I wonder how HistoryTracker plays along with this all.
(In reply to comment #13)
> Comment on attachment 493750 [details] [diff] [review]
> Lets try again
> 
> 
> > 
> >-[scriptable, uuid(39b73c3a-48eb-4189-8069-247279c3c42d)]
> >+[scriptable, uuid(bb66ac35-253b-471f-a317-3ece940f04c5)]
> > interface nsISHEntry : nsIHistoryEntry
> Note, you shouldn't be modifying interfaces atm.
> You might be able to add the new thing to nsISHEntryInternal.

I think the first question should be, is if there any reasonable way to do this without an interface change?

The reason I didn't use the nsISHEntryInternal in the first place, is that it was only used internally in nsSHEntry and then a single place in IndexedDatabaseManager so wasn't sure how internal it was - and I didn't think it made a difference as far as considering it "an interface change", I'll post an alternative patch where it has been moved in a few moments.
> 
> 
> > NS_IMETHODIMP
> > nsSHistory::LoadEntry(PRInt32 aIndex, long aLoadType, PRUint32 aHistCmd)
> ...
> >+  // Remember that this entry is getting loaded at this point in the sequence
> >+  nextEntry->SetLastTouched(++gTouchCounter);
> >+
> Could you explain why you call SetLastTouched here?
> I would have expected it to be set in InitiateLoad,
> though subframe traversal might not work in that case.

No particular reason, I just placed it in a place that I felt comfortable with :)  - Not that I doubt you, but is there some design reason to prefer it in InitiateLoad instead?

> 
> But in general, something like this could work.
> I wonder how HistoryTracker plays along with this all.

I can only guess at what that thing does, is there something in particular that you worry about, then I might be able to test it...
Attachment #494156 - Flags: feedback?(Olli.Pettay)
Comment on attachment 494156 [details] [diff] [review]
Same as previous, except the interface change is in the "internal" interface

>@@ -910,20 +913,21 @@ nsSHistory::EvictGlobalContentViewer()
>   // However, if somebody resets the pref value, we might occasionally
>   // need to evict more than one.
>   PRBool shouldTryEviction = PR_TRUE;
>   while (shouldTryEviction) {
>     // Walk through our list of SHistory objects, looking for content
>     // viewers in the possible active window of all of the SHEntry objects.
>     // Keep track of the SHEntry object that has a ContentViewer and is
>     // farthest from the current focus in any SHistory object.  The
>     // ContentViewer associated with that SHEntry will be evicted
>     PRInt32 distanceFromFocus = 0;
>+    PRUint32 candidateLastTouched = 0;
>     nsCOMPtr<nsISHEntry> evictFromSHE;
>     nsCOMPtr<nsIContentViewer> evictViewer;
>     PRInt32 totalContentViewers = 0;
>     nsSHistory* shist = static_cast<nsSHistory*>
>                                    (PR_LIST_HEAD(&gSHistoryList));
>     while (shist != &gSHistoryList) {
>       // Calculate the window of SHEntries that could possibly have a content
>       // viewer.  There could be up to gHistoryMaxViewers content viewers,
>       // but we don't know whether they are before or after the mIndex position
>       // in the SHEntry list.  Just check both sides, to be safe.
>@@ -934,20 +938,27 @@ nsSHistory::EvictGlobalContentViewer()
>       shist->GetTransactionAtIndex(startIndex, getter_AddRefs(trans));
>       
>       for (PRInt32 i = startIndex; i <= endIndex; ++i) {
>         nsCOMPtr<nsISHEntry> entry;
>         trans->GetSHEntry(getter_AddRefs(entry));
>         nsCOMPtr<nsIContentViewer> viewer;
>         nsCOMPtr<nsISHEntry> ownerEntry;
>         entry->GetAnyContentViewer(getter_AddRefs(ownerEntry),
>                                    getter_AddRefs(viewer));
> 
>+        nsCOMPtr<nsISHEntryInternal> entryInternal = do_QueryInterface(entry);
>+        PRUint32 entryLastTouched = 0;
>+
>+        if (entryInternal) {
>+          entryInternal->GetLastTouched(&entryLastTouched);
>+        }
>+
> #ifdef DEBUG_PAGE_CACHE
>         nsCOMPtr<nsIURI> uri;
>         if (ownerEntry) {
>           ownerEntry->GetURI(getter_AddRefs(uri));
>         } else {
>           entry->GetURI(getter_AddRefs(uri));
>         }
>         nsCAutoString spec;
>         if (uri) {
>           uri->GetSpec(spec);
>@@ -958,26 +969,26 @@ nsSHistory::EvictGlobalContentViewer()
>         // This SHEntry has a ContentViewer, so check how far away it is from
>         // the currently used SHEntry within this SHistory object
>         if (viewer) {
>           PRInt32 distance = PR_ABS(shist->mIndex - i);
>           
> #ifdef DEBUG_PAGE_CACHE
>           printf("Has a cached content viewer: %s\n", spec.get());
>           printf("mIndex: %d i: %d\n", shist->mIndex, i);
> #endif
>           totalContentViewers++;
>-          if (distance > distanceFromFocus) {
>-            
>+          if (distance > distanceFromFocus || (distance == distanceFromFocus && candidateLastTouched > entryLastTouched)) {
>+


How do you guarantee that some content viewers aren't cached too long?
I mean the case when distance and distanceFromFocus are gHistoryMaxViewers, but
entryLastTouched > candidateLastTouched. In that case some SHistory might have more than
gHistoryMaxViewers and EvictGlobalContentViewer wouldn't ever find those ones which are
outside the range [-gHistoryMaxViewers, gHistoryMaxViewers].

feedback- until the question is answered/addressed.
Attachment #494156 - Flags: feedback?(Olli.Pettay) → feedback-
(In reply to comment #16)
> 
> How do you guarantee that some content viewers aren't cached too long?
> I mean the case when distance and distanceFromFocus are gHistoryMaxViewers, but
> entryLastTouched > candidateLastTouched. In that case some SHistory might have
> more than
> gHistoryMaxViewers and EvictGlobalContentViewer wouldn't ever find those ones
> which are
> outside the range [-gHistoryMaxViewers, gHistoryMaxViewers].
> 
> feedback- until the question is answered/addressed.

I don't see that my change will influence the behavior around gHistoryMaxVierwers?  - But isn't it the responsibility of EvictContentViewers to ensure that there aren't any content viewers outside of the [-gHistoryMaxViewers, gHistoryMaxViewers] range? - Not that this is obvious or designed in a way that makes the code easy to maintain.

(please note, this patch is not the complete fix for this bug, it addresses one of the specific problems I found when looking into the bug - maybe I should split it out into its own bug and make this one depend on that bug?)
Mike + Vlad,

okay... there is alot of stuff in this bug report. So, let me just summarize what I know.  


Not all pages currently go into the bfcache.  For example, and in the case of www.rockpapershotgun.com, a documents that contains a plugin will not be put into the bf cache.  All of this filtering happens on the document and its children.  See |CanSavePresentation|.

Since flash isn't supported on mobile, I do not think that this is a worry for us.

Desktop might get a nice win if we can figure out how to cache plugins:

http://mxr.mozilla.org/mozilla-central/source/content/html/document/src/nsPluginDocument.cpp#222


Enabling #define DEBUG_PAGE_CACHE (search for them in mxr) helps you see what is going on, as well as adding printf's in the false returns from CanSavePresentation.
(In reply to comment #18)
> Not all pages currently go into the bfcache.  For example, and in the case of
> www.rockpapershotgun.com, a documents that contains a plugin will not be put
> into the bf cache. 
This is too strong. PluginDocuments won't go into bfcache.

> http://mxr.mozilla.org/mozilla-central/source/content/html/document/src/nsPluginDocument.cpp#222
Are full page plugins really that common?
Comment on attachment 494156 [details] [diff] [review]
Same as previous, except the interface change is in the "internal" interface

Ah, ok, I think I was wrong. Sorry.
Attachment #494156 - Flags: feedback- → feedback+
Could you please document the new behavior.
Attached patch Now with some more comments (obsolete) — Splinter Review
Is this what you meant by documenting it?
Attachment #493750 - Attachment is obsolete: true
Attachment #494156 - Attachment is obsolete: true
Attachment #497774 - Flags: review?(Olli.Pettay)
mike, reassigning to me.  feel free to take back the ones that you want.
Assignee: mozstuff → doug.turner
Comment on attachment 497774 [details] [diff] [review]
Now with some more comments

I think we could try this.
But if this should be taken to moz2.0, this needs a pref which is
by default disabled on FF4. (Could be enabled for Fennec.)
Attachment #497774 - Flags: review?(Olli.Pettay) → review+
Why would it need a pref? - Is there any case where we would want the old behavior?
At this point I don't see any reason to change
desktop FF4 behavior, and I want to keep the risk for regressions absolutely
minimum.
Assignee: doug.turner → azakai
Attached patch patch with prefSplinter Review
Attachment #497774 - Attachment is obsolete: true
Attachment #502646 - Flags: review?(Olli.Pettay)
mobile-browser patch that enables the new pref in Fennec.
Attachment #502648 - Attachment is patch: true
Attachment #502648 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 502646 [details] [diff] [review]
patch with pref

Why you have the pref in two .js files?
Attachment #502646 - Flags: review?(Olli.Pettay) → review+
ah, because of WINCE
Attachment #502648 - Flags: review?(mark.finkle)
Attachment #502648 - Flags: review?(mark.finkle) → review+
Keywords: checkin-needed
m-b part: http://hg.mozilla.org/mobile-browser/rev/583f1e86ca7d
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Keywords: checkin-needed
Blocks: 627215
Filed followup bug 627215 to enable this pref on Firefox, post-4.0.
fwiw, this should be using |PRUint32| not |unsigned int| for the method signature, no?  This patch would fail to compile on an arch on which int is not 32-bit...
Blocks: FF2SM
Here's a patch for the PRUint32 issue (will this need review and/or approval-2.0, btw? I'm not sure of the procedure for a tiny followup like this).

One question, though - shouldn't our builds on 64-bit systems on tinderbox have failed due to this? They succeeded...
Comment on attachment 505491 [details] [diff] [review]
followup for PRUint32 issue

Sorry, that I missed this.
Attachment #505491 - Flags: review+
I'd like to push this fix, I'm not sure of the rules though. Do I need approval? Or can I push it since it's in a bug that is blocking?
the follow up is also a+.  push according to the tree rules.
Summary: bfcache needs to work better → bfcache evicts the wrong entries, making it "not work" when multiple tabs are open
(In reply to comment #35)
> One question, though - shouldn't our builds on 64-bit systems on tinderbox
> have failed due to this? They succeeded...

int on 64-bit Windows/Linux/Mac is 32 bits wide.  See http://en.wikipedia.org/wiki/64-bit#Specific_C-language_data_models
I can still see the previous page reloading when tapping the Back arrow. Is this verified or should I reopen it?
Anna, exactly what steps are you taking, and what's your browser's full user-agent string?
It is expected that in some cases the cache will not work. The cache can't work if the page is doing an XHR for example.

If the cache doesn't work 100% of the time on simple websites where it should, then there might be a problem here.
The most common way to disable bfcache is to have beforeunload or unload
event listeners in the page.
Target Milestone: --- → Firefox 4.0
I followed these steps to reproduce
> can be easily verified by going to that page, clicking on a link, wait for
> the new page to load and then go into flight-mode/off-line mode - you are
> still able to push the "back" key and have the page reloaded - it is true
> that it takes a long time (at least on the N900) to display the page again,
> but it is without network access.
> 
> I agree that it is slower to reload than it should be - ideally it should be
> like switching between tabs, I'll try to figure out what is going on...

and I can see the previous page reloading. When going back to the previous page  shouldn't it be like switching between tabs as Mike said? Since we have already loaded that page.
I've tried to reproduce this bug performing the Mike's steps from comment #5, but after I've go into off-line mode and hit the "Back" key, instead of loading a cached webpage, an error page is loaded:

"Offline mode

Firefox is currently in offline mode and can't browse the Web.
* Try again. Nightly will attempt to open a connection and reload the page.
[Try Again]"

I've tried this on Desktop side too, and it seems that's working fine there. Should I reopen this bug? Or is there another way to verify it?

--
Mozilla/5.0 (Android;Linux armv7l;rv:9.0a1)Gecko/20110911
Firefox/9.0a1 Fennec/9.0a1
Device: Samsung Galaxy S
OS: Android 2.2
Please file new bugs for the issue(s) that you are seeing. Don't reopen this bug.
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #47)
> Please file new bugs for the issue(s) that you are seeing. Don't reopen this
> bug.

Should I close this bug as Verified Fixed then?
It seems that is working fine on the latest Nightly build. I will close this issue as verified fixed.

--
Mozilla/5.0 (Android;Linux armv7l;rv:9.0a1)Gecko/20110913
Firefox/9.0a1 Fennec/9.0a1
Device: Samsung Galaxy S
OS: Android 2.2
Status: RESOLVED → VERIFIED
No longer blocks: FF2SM
You need to log in before you can comment on or make changes to this bug.