Closed Bug 1121701 Opened 5 years ago Closed 5 years ago

pageshow event has wrong .persisted value when restoring a page from bfcache. (Happens for example when going back from Google maps to Google search results.)

Categories

(Core :: Document Navigation, defect)

10 Branch
x86
All
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38

People

(Reporter: htt, Assigned: smaug)

References

Details

Attachments

(2 files, 4 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.69 Safari/537.36

Steps to reproduce:

1. Go to https://www.google.com/#q=gary+danko+sf
2. Clicks a photo that has "See photos" label on right hand side.
3. On imagery viewer, click [x] mark at top right corner.


Actual results:

Search result page gets blank, since search query gets emptied from search box. This happens on fire fox 34 only. The url is https://www.google.com/?gws_rd=ssl#q=cascal


Expected results:

Search result page shouldn't get blank with the url is https://www.google.com/?gws_rd=ssl#q=cascal
OS: Mac OS X → All
I cannot reproduce
You may go to www.google.com home page first, then type a query like "cascal", so that your search is made instantly. Check "Google Instant predictions" setting in Search settings and make sure that you enable instant results.

The issue is seen also when you open a google map and go back to the search result page using browser's back button.
(In reply to Toshi Tajima from comment #2)
> You may go to www.google.com home page first, then type a query like
> "cascal", so that your search is made instantly. Check "Google Instant
> predictions" setting in Search settings and make sure that you enable
> instant results.
> 
> The issue is seen also when you open a google map and go back to the search
> result page using browser's back button.

I can reproduce this some of the time. What actually happens is that google's JS sets visibility: hidden on all of the items. Firefox renders this correctly. There are no JS errors in the console, so it's really not clear to me what is going on, but it seems likely to be a bug on Google's side.
STR:
1. Make sure instant search enabled in Google search setting
2. Search "cascal" from  https://www.google.com
3. Click Maps on right hand side.
4. Navigation bak
5. Navigation forward 
6. Navigation bak 
7. Repeat step 2-7 several times

browser.sessionhistory.max_total_viewers = 0 seems to fix the problem in Nightly38.0a1 as well as Firefox34.


Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=39c5a324b374&tochange=ff1d493bf113

Suspect: Bug 690056
Blocks: 690056
Status: UNCONFIRMED → NEW
Component: Untriaged → Document Navigation
Ever confirmed: true
Product: Firefox → Core
Version: 34 Branch → 10 Branch
Hi, I'm a Google Search engineer.  We troubleshot this issue and found that the pageshow event doesn't set the persisted bit in these broken cases, even though the page state was preserved.  When persisted is not set, we assume that it's the initial page load (not from cache).  Is this still the way the persisted bit is supposed to work?
(In reply to daverobishaw from comment #5)
> Hi, I'm a Google Search engineer.  We troubleshot this issue and found that
> the pageshow event doesn't set the persisted bit in these broken cases, even
> though the page state was preserved.  When persisted is not set, we assume
> that it's the initial page load (not from cache).  Is this still the way the
> persisted bit is supposed to work?

This is very helpful, but I don't know the answer to your question. Boris, do you? :-)
Flags: needinfo?(bzbarsky)
The pageshow event should have the "persisted" flag true if the load came from the back/forward full-document cache.  If it just came from the HTTP cache, possibly with form state restoration, that shouldn't have the "persisted" flag set.

Olli, can you reproduce this, and if so, do you have time to take a look at it?
Flags: needinfo?(bzbarsky) → needinfo?(bugs)
Looks like I can reproduce on 64bit Linux, nightly, opt.
Looking
Ok, so this is fun. Using STR from comment 4.

(1) First we dispatch beforeunload to the documents under the maps page.
(2) One of the iframes uses beforeunload to post a message (window.postMessage()) to the main document.
(3) Then we start restoring the previous document and add its channel to loadgroup
http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp?rev=f925d9830745&mark=8106-8106#8083
(4) then we dispatch a runnable to continue restoring
http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp?rev=f925d9830745&mark=8253-8254#8236
(5) But before we handle that runnable, the event listener for the message posted earlier
sets src attribute of some <img> element and that ends up adding more stuff to the load group.
(6) Then when the runnable actually is called, there are 2 items in the loadgroup and
http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp?rev=f925d9830745&mark=8163-8165,8169-8171#8144
doesn't do what it is expected to do, and
(7) we wait until the img is loaded before dispatching pageshow/load, and of course at that point
we aren't restoring anymore.


bz, any good ideas what we could do here?

It is not quite clear to me why we need to have all that BeginRestore happening sync, but FinishRestore async.
If those both would be async, but happening withing the same task, fixing this issue would be easier.
We could cancel the loadgroup right before the restoring.
Flags: needinfo?(bugs) → needinfo?(bzbarsky)
(In reply to Olli Pettay [:smaug] from comment #9)
> (2) One of the iframes uses beforeunload to post a message
> (window.postMessage()) to the main document.
Actually, not necessarily an iframe, but could be the main page itself doing postMessage.
But that doesn't really matter here.
Attached patch wip (obsolete) — Splinter Review
Something like this
https://tbpl.mozilla.org/?tree=Try&rev=31e0301d18d4


bz, you may recall some reasoning for the current code.
(I'm still going through the comments in bug 274784 )
Attachment #8550931 - Flags: feedback?(bzbarsky)
Attached patch wip2 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=ef3c4de4ead4
Attachment #8550931 - Attachment is obsolete: true
Attachment #8550931 - Flags: feedback?(bzbarsky)
Attachment #8550937 - Flags: feedback?(bzbarsky)
Comment on attachment 8550937 [details] [diff] [review]
wip2

> It is not quite clear to me why we need to have all that BeginRestore
> happening sync, but FinishRestore async.

I _think_ this was meant to mimic a necko load: the start is sync at AsyncOpen and the end is async.

The bug here comes from two pieces: 1) the loadgroup is shared across documents and 2) the mIsRestoringDocument state is just scoped way too narrowly.

The change here is maybe plausible, though I'm not convinced about the naming of the ForgetPresentationEvent class: it's not an event, right?  And I'm pretty worried about it changing webprogress notification orderings...

>+    NS_ENSURE_STATE(currentPresentationRestoration ==
>+                    mRestorePresentationEvent.get());

So if we take the early return here... then what?

How plausible would it be to just delay the "mIsRestoringDocument = false" until we get the STATE_STOP that's supposed to fire the pageshow event?  It seems like that would most directly address the class of problems comment 9 points out.
Flags: needinfo?(bzbarsky)
Attachment #8550937 - Flags: feedback?(bzbarsky) → feedback+
(In reply to Boris Zbarsky [:bz] from comment #15)
> Comment on attachment 8550937 [details] [diff] [review]
> wip2
> 
> > It is not quite clear to me why we need to have all that BeginRestore
> > happening sync, but FinishRestore async.
> 
> I _think_ this was meant to mimic a necko load: the start is sync at
> AsyncOpen and the end is async.
> 
> The bug here comes from two pieces: 1) the loadgroup is shared across
> documents and 2) the mIsRestoringDocument state is just scoped way too
> narrowly.
> 
> The change here is maybe plausible, though I'm not convinced about the
> naming of the ForgetPresentationEvent class: it's not an event, right? 
yeah, it is not Event nor Runnable, but thing which ensures that we forget a RevocableEvent.
PresentationEventForgetter then :)

> And
> I'm pretty worried about it changing webprogress notification orderings...
Hmm, where is the ordering changed?
I can see the synchronousness is changed, but the patch add more check to see if something happened
after we begin the restoration.


> 
> >+    NS_ENSURE_STATE(currentPresentationRestoration ==
> >+                    mRestorePresentationEvent.get());
> 
> So if we take the early return here... then what?
Then there is another restorepresentationevent ongoing and that will do the restoration.
Or something has cancelled the restoration and doing some other load.


> How plausible would it be to just delay the "mIsRestoringDocument = false"
> until we get the STATE_STOP that's supposed to fire the pageshow event?  It
> seems like that would most directly address the class of problems comment 9
> points out.
How would you keep track on when "mIsRestoringDocument = false" should be set to ensure that mIsRestoringDocument == true doesn't leak to
wrong places.
And besides, IMO we really don't want to rely on previous page's network usage to restore the page.
> PresentationEventForgetter then :)

Sold.

> Hmm, where is the ordering changed?

That's a good question.  There's nothing else we fire as part of session history traversal that's not fired by begin/finish here?  If so, you're right that the ordering ahas not changed.

> Then there is another restorepresentationevent ongoing and that will do the restoration.
> Or something has cancelled the restoration and doing some other load.

Sounds good.  Worth documenting.

> IMO we really don't want to rely on previous page's network usage to restore the page.

That's fair.
Attached patch v3 (obsolete) — Splinter Review
Some comment changes and renamed the class.
Assignee: nobody → bugs
Attachment #8550937 - Attachment is obsolete: true
Attachment #8553272 - Flags: review?(bzbarsky)
Attachment #8551000 - Flags: review?(bzbarsky)
daverobishaw, if Google wants to avoid triggering the Gecko bug, don't
post such messages in beforeunload event listener, where message listener might cause triggering
loading something new from the network, in this case changing src of an <img>.
Flags: needinfo?(daverobishaw)
(it does sounds like a bug in Google's sites too that it even tries to load something so late.)
Comment on attachment 8553272 [details] [diff] [review]
v3

> +class MOZ_STACK_CLASS PresentationEventForgetter

Can you put this in an anonymous namespace?

>+    PresentationEventForgetter forget(mRestorePresentationEvent);

I'd prefer "forgetter" for the variable name.

r=me
Attachment #8553272 - Flags: review?(bzbarsky) → review+
Comment on attachment 8551000 [details] [diff] [review]
mochitest

r=me
Attachment #8551000 - Flags: review?(bzbarsky) → review+
Attached patch v4Splinter Review
Attachment #8553272 - Attachment is obsolete: true
Changing the summary a bit to indicate what is being fixed. If there is still something else to fix,
please file new bugs.
Summary: Shows blank search result pages while clicking on "x" from the "See photo" page → pageshow event has wrong .persisted value when restoring a page from bfcache. (Happens for example when going back from Google maps to Google search results.)
https://hg.mozilla.org/mozilla-central/rev/a51f428b5c54
https://hg.mozilla.org/mozilla-central/rev/7eeaad3abb70
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Flags: needinfo?(daverobishaw)
You need to log in before you can comment on or make changes to this bug.