Closed Bug 274784 (blazinglyfastback) Opened 20 years ago Closed 18 years ago

Make back and forward blazingly fast and side-effect free

Categories

(Core :: DOM: Navigation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8beta2

People

(Reporter: brendan, Assigned: bryner)

References

(Depends on 12 open bugs, Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: (o72555) Set browser.sessionhistory.max_viewers to nonzero to test)

Attachments

(9 files, 6 obsolete files)

1.36 KB, patch
shaver
: review+
Details | Diff | Splinter Review
316 bytes, text/html
Details
501 bytes, text/html
Details
146.10 KB, patch
bzbarsky
: review-
bzbarsky
: superreview+
brendan
: approval1.8b2+
Details | Diff | Splinter Review
146.49 KB, patch
Details | Diff | Splinter Review
688 bytes, text/html
Details
690 bytes, text/html
Details
164.66 KB, patch
Details | Diff | Splinter Review
1.57 KB, patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
This bug is where the real work to fix bug 38486 will go on.  That bug is a pile
of noise and a useless soapbox on which free-lunch-demanders wish to stand.

/be
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla1.8alpha6
Can this bug block bug 91351 163993 as the original bug did?  I think it's
important for those tracking (though I realize it may attract people to this bug).
bug 91351 seems fine. I'm not marking it as blocking 163993 since it not known
whether this idea is a good thing yet.
Blocks: 91351
In the death-bug, I said: To do this properly, I think you have to separate the
JS global scope object from the "window" object; which is something I think we
should do for security reasons anyway. I don't see why scripting is a big deal;
if we're going back/forward to a page which has already "loaded", there is no
reason to fire onload events.

Brendan then said, "we must fire onload when reloading from historoy, for
backward compatibility -- so must every other browser"

And Hixie said:
<Hixie> opera doesn't do that
<Hixie> it's the source of many of our dom bugs
<Hixie> (to do with history)
<Hixie> but we consider them worth the price of lightning fast back/forward
I could see a pref for fast vs. compatible back/forward being worth the UI, but
before conceding that point, perhaps we could do some analysis of pages that
break if they don't get reloaded (from cache) on session history navigation
(back/forw).

/be
Ian, can you give a few examples of URLs that break without onload-on-Back?

/be
Not off-hand, no.
Interestingly enough, I just came across a page of discussion on this bug,
except in reverse and for Opera:
http://www.quirksmode.org/bugreports/archives/2004/11/load_and_unload.html - if
you're still looking for an example of a page broken by this functionality, I'm
sure you could ask him. It seems as though the Opera folks really haven't found
a solution either, they just went in the opposite direction.

Would it be possible to simply not load from cache if the page includes onLoad
or onUnload events? That would improve performance (storage speed
notwithstanding) for most pages, while retaining compatability for javascript
generated pages, wouldn't it?

Oh, also, does anyone have some some kind of estimate regarding how much storage
this'd take? 
Is it not feasible to cache the work of the parser?  I was under the impression
the XUL cache does that.  It won't cause any back/forward quirks, and it could
be used for all page loads, not just for backing up to a previous page.

If an html page is to be loaded from the disk cache, check for an associated
parser cache entry.  If it's there, use it to load the page.
But I, at least, don't want the parser results when I go back. I want exactly
the same DOM tree that I left when I went forward. I'm pretty sure that
not-firing the onload even would be the correct behavior in this case; I just
want to see what parts of the existing web this would break, and why. Perhaps we
need to implement the "onenter" event mentioned in the linked opera thread.
hm, I guess not firing onload would break things that do stuff that's not
reflected in the DOM... (window opening... alerts... confirm()... others?)
A lot of sites use onload events or top-level scripts, but most of these sites
would not be broken by DOM-storing.  For example, a site that just twiddles
element positions would not be affected, and a site that prompts for your age
while loading or in onload would be *improved* by DOM-storing.

The only onload actions that might need to be redone are actions that change
external state, such as variables in another frame/window or a cookie.  I would
expect many sites that use scripts to alter external state to have onunload
events that undo those changes.  Is that true -- do most sites broken by Opera's
DOM-storing feature use onunload?  If so, Firefox could use the following
heuristic: store the DOM in history unless the site has an onunload event.

To summarize the possibilities:
A) Store everything.  Drop support for onunload. (Opera)
B) Store unless site uses onunload. (My suggestion)
C) Store unless site uses onunload, onload, or inline scripts. (Comment 7)

(A) or (B) could be combined with adding support for onenter and onleave.
What I might like to see is an option to have Back/Forward behave essentially
as though each page was in a separate tabbed window, because that is how I
have been workarounding the current behavior-- when I want to follow a link
from which I expect I may want to go Back, I open the link in a new tab, and
then use Shift+Ctrl+Tab to go Back and Ctrl+Tab to go Forward; when a page
is outofdate, I do Ctrl+R.  When combined with using tabs for branches in
the browsing path/sequence, this approach is not ideal, but for plain linear
browsing, I find it works great, and so I think having the equivalent behavior
just using Back/Forward rather than tabs would be even better.

I guess the key thing is that I can be the one who decides the behavior
regarding reloading, rather than the browser having to try to figure out what
to do for me, when it may not have enough understanding of what is going on at
the web application level to make a good choice.  I would still want to have an
option to have the browser treat Back/Forward as some sort of movement along a
path of pages where only one page can be visible/visited at a time, rather than
that other model of any page which has been visited may still be visible in its
previous state, regardless of whatever pages have been visited since then, and
it is up to the user to worry about treating the set of pages as a sequence.
What do you plan to do about plugins?

Re comment 11:  I agree that (B) is reasonable.  However, that makes the
assumption that onbeforeunload handlers are idempotent while onunload handlers
are not.  The way onbeforeunload works (prompt using a returned string) suggests
that typical onbeforeunload handlers probably ought to be idempotent.  I wonder
if we could get away with making the same assumption about onunload.  It doesn't
seem entirely unreasonable.  (How widely used are both of these?)
There's another tricky case. Forms that uses onclick/onsubmit. These scripts
could be setting hidden values or make other modifications to formfields before
they are submitted.

OTOH, these pages are already in trouble since formfields are restored when you
go back to a page. But there are actually pages that break on this already.
(Can't come up with an example off the top of my head).

I think the best way to deal with those pages is some sort of 'onenter' or
'onreturn' event. Also, it would be usefull to be able to tag a formfield (or
maybe an entire form) for being automatically reset on this event. This is
usefull for things like navigation selects, or seach-fields (it's always
confusing returning to a google seachpage and see a value you searched for
_leaving_ that page, rather then see the value you searched for _getting_ that
page).

In fact, this stuff could be independent of this bug. Though if we did store the
DOM it would be usefull if the event indicated wheather the DOM was restored or
just the form-values. Something for WHATWG, i'll see if I can write up a formal
proposal...
Actually, what happens if I leave a page, resize the window, then go back to the page?  Is 
there an onload event?  Is there an onresize event?  If there is neither, how is the page to 
reposition things if it does the usual "position it all in JS" thing?

As for doing prototype stuff like XUL, it's been thought of.... it might still happen.  It 
wouldn't really solve this bug as filed, though, I suspect.
(In reply to comment #15)
> Actually, what happens if I leave a page, resize the window, then go back
> to the page?

There's an onresize event.
Whiteboard: (o72555)
Target Milestone: mozilla1.8alpha6 → mozilla1.8beta2
Blocks: 203448
I'm not gonna have time for this till 1.9, but bryner has already started
hacking on it, and we've been talking.  He's the man for this bug, so reassigning.

/be
Assignee: brendan → bryner
Status: ASSIGNED → NEW
Hixie, anyone: do any of Opera, Safari, or IE implement an onreturn on onenter
event handler as sicking proposed?  (Isn't onback the obvious handler name? 
Hmm, not if you go forward.  onhistorymove?)  It would be good to build on a
de-facto standard, if one exists and doesn't suck.

/be
Alias: blazinglyfastback
We were this close back when jband and I last messed around with this odd case.
 The idea is that you have o1, you want to resolve id in it, but o1[id] doesn't
exist.	o1's class has a new-resolve hook that defines id in some o2 not == o1
and not on o1's prototype chain.  That should work, because all callers of
js_LookupProperty* (direct or via OBJ_LOOKUP_PROPERTY) respect the out params. 
This patch fixes the bug that broke it.

For this bug, consider window == o1, an inner window object that's N:1 against
the window, where N is the length of session history as o2.

bryner, find me on IRC if you run into further problems.  They should all be
fixable.  Thanks,

/be
Attachment #177491 - Flags: review?(shaver)
Comment on attachment 177491 [details] [diff] [review]
fix so JSNewResolveOp can return an unrelated object

In my feverish haze, that looks great! r=shaver
Attachment #177491 - Flags: review?(shaver) → review+
Comment on attachment 177491 [details] [diff] [review]
fix so JSNewResolveOp can return an unrelated object

Checked in, although this won't be needed for now, if ever, by this bug.

/be
Note bug 288462.
Depends on: 285404
bryner's got a patch, it's getting there.  Please don't spam this bug with notes
about other bugs.  Yeah, once bryner's patch lands, and if you go back within
the limit of in-memory-cached history items, we will comply with RFC 2616.  Woo.

/be
Blocks: 288462
*** Bug 288432 has been marked as a duplicate of this bug. ***
Attached patch work in progress (obsolete) — Splinter Review
This is where I'm at right now.  There are a number of known issues here:

- random crashes, often when closing windows
- plugins need to be stopped and restarted
- timeouts need to be stopped and restarted
- need to fire a new event so that password autocomplete can run when going
back
- random rendering problem with reload button on mac
- the cache size needs to be tweaked, and ideally based on available RAM
- need to handle window size changes between saving and restoring presentation
- do something with marquee
One source of crashes seems to happen if an iframe has focus when you close a
window -- this has to do with the non-sticky case in DocumentViewerImpl::Hide. 
Working on that now.
bryner, what's the plan for loads that aren't done when we leave a document? 
Say an image is still loading?  Or the document itself is not done loading?
(In reply to comment #27)
> bryner, what's the plan for loads that aren't done when we leave a document? 
> Say an image is still loading?  Or the document itself is not done loading?

Brian and I spoke about this today.  I think we agreed that partially loaded
pages should not be put in the back cache since they might compete against the
newly loading page for network and CPU bandwidth.  Either that, or we should
find some way to suspend the loading of an incomplete page... but that sounds
very complicated.
> since they might compete against the newly loading page for network and CPU
> bandwidth.

I was thinking more in terms of sharing the same loadgroup, so affecting onload
and the like... ;)

I agree that just not putting incompletely loaded pages in the back cache is a
simple solution for the time being.
Attached patch patch for review (obsolete) — Splinter Review
I think this still has a few problems.	In fact, I'm pretty sure of it.  But I
think we should get this in, turned on, so that we can start getting crashes
isolated with Talkback, etc. and we can have this stable by Firefox 1.1.
Attachment #179734 - Attachment is obsolete: true
Attachment #181003 - Flags: superreview?(jst)
Attachment #181003 - Flags: review?(roc)
This is targeted at 1.8b3, not b2, right?
I'll do what I can but I think bz-review would be invaluable also.
I'd certainly like to see it for b2 so that we can get testing exposure in the
Firefox developer preview.  Unless it turns out to be so bad that it masks all
other problems with the developer preview, in which case we'd be best off
getting it in as soon as possible after that release.
I've been using bf-cache builds for days and not had any problems that would
warrant it being turned off. This should totally be checked in, without a pref.
> This should totally be checked in, without a pref.

Pref'd on to be sure, but without a pref to disable just in case?!?  A simple
pref that lets me disable bf-cache to see if it is the culprit in some instance
would be invaluable IMO.
Some none-substantive comments:

Shouldn't a Sanitize() failure cause the DOM to automatically not be cached?  It
looks like we just ignore the return from Sanitize()...

Also, what happens if content fires DOMPageRestore events?  Wouldn't autofill
trigger on those?  Do we need some isTrusted checks here?

I'd really rather we stopped putting our custom events in the DOM* namespace. 
How about Moz* event names instead?
mIsGoingAway is never set for a document with this patch, as far as I can see. 
That seems wrong.  It'll lead to extra work in RemoveStyleSheet() when the
document is finally destroyed.

It may also be worth seeing when we currently enter GetScriptGlobalObject()
while mIsGoingAway is true, since the behavior of those codepaths will change
with this patch (in particular, if they actually need a script global they'll
break).

I'm assuming you did some basic testing and verified that the destructors for
the documents you evict from the cache are in fact firing, right?  Removing the
code that tears down the DOM in SetScriptGlobalObject is a tad scary, but if we
can do that without effectively leaking, that's great.

Some subclasses override SetScriptGlobalObject and do nontrivial things in it. 
If you're setting it to null and then back to non-null on the same document, for
example, this should cause nsImageDocument to create a new root and call
SetRootContent on it, triggering some asserts.  Worth checking on this and
nsPluginDocument's behavior.

If a document is loaded but there are still network requests in the loadgroup
(background images, loads started _after_ the onload event, etc), we don't deal
at all well, as far as I can tell -- either they end up in the loadgroup of the
new document we're loading (which is the same exact loadgroup), or we kill them
and probably don't restart them properly on restore.  We need to figure out how
to do this right (while recalling that at the moment reflow events, say, look
like a foreground request in the loadgroup).

More comments later....
I'd vote for no pref because we're getting drowned in prefs. If this is checked
in _with_ a pref then I would say we absolutely must remove at least two other
prefs first. (I'm talking hidden prefs here of course, not UI prefs.)
Blocks: 290394
I was using this patch for two days, and had to go back to default build because
of intermittent problems with keyboard focus. Keyboard navigation was sometimes
totally broken, and while usually click inside the window helped, this trick
didn't work here.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050413, GTK2/Xft
I'm using build with this patch for 2 days without any problems. It works great.
I built Linux builds with this, and a few folks used it - also without a problem.

Today I'm going to build Windows one, to get the feedback from Win users.
Comment on attachment 181003 [details] [diff] [review]
patch for review

roc said it was fine to switch reviewer of record to bz, who has started
already.  roc's gonna have a look too.

jst is on vacation still, so I'm going to sr.  At this point, apart from overt
bugs and design mistakes that will be hard to fix the longer we wait, I'd like
to see review happen in the next couple of days, so we can get this patch in
and get wider testing.

/be
Attachment #181003 - Flags: superreview?(jst)
Attachment #181003 - Flags: superreview?(brendan)
Attachment #181003 - Flags: review?(roc)
Attachment #181003 - Flags: review?(bzbarsky)
I'm not going to be able to do a thourough review of this till early next week....
Attached patch new patch (obsolete) — Splinter Review
Issues fixed since the last patch:

- Added a null check in the accessibility code so that we don't crash if the
document has no container.  I don't actually think this is related to my
changes but I hit it while testing, and aaronl's recent checkin seems to be the
culprit.

- Added additional checking in nsDocShell::CanSavePresentation to avoid saving
the presentation when we're reloading the page.

- Added save and restore of the docshell tree (oops)

- Refactored timeout suspend/resume code and made sure to suspend and resume
for child windows as well.
Attachment #181003 - Attachment is obsolete: true
Attachment #181463 - Flags: superreview?(brendan)
Attachment #181463 - Flags: review?(bzbarsky)
Comment on attachment 181003 [details] [diff] [review]
patch for review

removing obsolete requests
Attachment #181003 - Flags: superreview?(brendan)
Attachment #181003 - Flags: review?(bzbarsky)
The patch changes the behavior on this simple testcase.  Replace the CGI with
one that does JPEG push or XMLHttpRequest activity or whatever to get an idea
of where this could become a pretty major issue.

The problem, of course, is that a page transition stops all network activity in
a docshell's loadgroup.  This is desirable, since the loadgroup belongs to the
_docshell_.  But then when we fast-back we don't restart the network requests
we killed.
I'd like to see some documentation in nsDocShell.h.  Specifically, I'd like to
see documentation as to the lifetime and ownership model of mContentViewer, and
documentation on what the three methods you're adding do, when it's OK to call
them, what happens if aURI doesn't match the URI in aSHEntry in
RestorePresentation(), what's an acceptable SHEntry to pass to RestorePresentation.

I'm still sorting out through the subframe navigation handling, so if you
actually wrote down somewhere how that works with this patch (given that cloning
the session history entries doesn't clone this cached presentation stuff), let
me know please...  I suspect it works out in some cases because we only actually
do a load when we have a different history entry, on the specific docshell the
entry is different for, but doesn't this mean that going back to the result of a
subframe navigation across a toplevel load will be slow?
More things to check on I thought of...

1)  Is there an issue with the cached prescontexts holding a backpointer to the
docshell (the "container")?  It's an nsWeakPtr, so it's not that it can start
pointing to dellocated memory or anything, but could something in the
prescontext or some prescontext consumer make a call on the docshell when the
docshell doesn't expect it?  I seem to recall anchors getting the docshell for
traversal via this pointer.  What happens if someone fires a (trusted, so say
from chrome) DOM click event on a pointer to a node in a document that we've
moved away from?

2)  Are there issues with the device context being shared by the cached
viewmanagers and the current live one?  I'm not sure what could arise here; roc
might know this part better...  So if we go back to a frameset from a non-frame
page, do we create new docshells for the subframes?  And if so, do they create
new device contexts?  Or is the device context held by the content viewer?
(In reply to comment #45)
> a docshell's loadgroup.  This is desirable, since the loadgroup belongs to the
> _docshell_.  But then when we fast-back we don't restart the network requests
> we killed.

I'm going to post a new patch in a minute that avoids this problem by not
caching the presentation if there are any active requests in the load group
(other than the request for the new document).  That way we don't have to figure
out how to restart something like an XMLHttpRequest.

(In reply to comment #46)
> them, what happens if aURI doesn't match the URI in aSHEntry in

That would be bad and should not happen.  Do you know of a case where it does?

> I'm still sorting out through the subframe navigation handling, so if you
> actually wrote down somewhere how that works with this patch (given that cloning
> the session history entries doesn't clone this cached presentation stuff), let
> me know please...  I suspect it works out in some cases because we only actually
> do a load when we have a different history entry, on the specific docshell the
> entry is different for, but doesn't this mean that going back to the result of a
> subframe navigation across a toplevel load will be slow?

I'm not entirely clear on how it does or should work either, or I would.  Do you
think just copying the nsIContentViewer pointer to the cloned entry would fix
this?  It seems like if you do that, once you restore one of the history
entries, you would need to nuke the saved presentation pointer from any other
entries that are using it.
(In reply to comment #48)
> traversal via this pointer.  What happens if someone fires a (trusted, so say
> from chrome) DOM click event on a pointer to a node in a document that we've
> moved away from?
> 

The intent is that the cached document is completely dead to events.  You have
to explicitly go to session history to get a pointer to one, so I don't think we
need to plug this at every entry point.

> 2)  Are there issues with the device context being shared by the cached
> viewmanagers and the current live one?  I'm not sure what could arise here; roc
> might know this part better...  So if we go back to a frameset from a non-frame
> page, do we create new docshells for the subframes?  And if so, do they create
> new device contexts?  Or is the device context held by the content viewer?

If we go back to a frameset, the subframe docshells are cached with the toplevel
content viewer (because the subdocuments will have a reference to them).  So
they will retain their original device contexts.
> not caching the presentation if there are any active requests in the load group

What if the active request is a pending reflow?  Or a pending image error or
load event?  Same thing?

As far as that goes, what _is_ the plan for pending PLEvents of various sorts
when the presentation is cached?  Do we want them firing on the cached presentation?

> That would be bad and should not happen.

Then why do you need both the URI and the SHEntry?  If the URI is always the URI
in the SHEntry, just pass the latter?

> Do you think just copying the nsIContentViewer pointer to the cloned entry
> would fix this?

I'm not sure, for the same reasons you're not sure (ownership issues, Hide/Show
ordering, etc).

> You have to explicitly go to session history to get a pointer to one

Why?  What if you're holding a pointer to a document (say in a different frame
on the same website) and then the document gets unloaded before you do something
with it?
Attached patch new patch (obsolete) — Splinter Review
Changes since the last patch:

- Added a check for active requests in the docshell's load group, and skip
caching the presentation if there are any.
- Got rid of nsDocument::mIsGoingAway in favor of checking for null SGO.
- Fixed nsEventListenerManager::HasUnloadEvents to check for beforeunload as
well.
- Made nsPluginDocument recreate its mStreamListener when restored from session
history.
- Moved nsISecureBrowserUI ownership to the docshell, so that we can always get
at this to cache it (it formerly lived on the XUL <browser>, in a way that was
inaccessible from C++).  Added capture and restore of security state so that
alerts fire as exepcted when going back/forward.
- Added suspend and resume of meta-refresh loads.  To do this I ended up adding
to nsITimer so that you can retrieve the nsITimerCallback object.

This doesn't address Boris's most recent comments yet, but I wanted to get this
new code up.
Attachment #181463 - Attachment is obsolete: true
Attachment #181797 - Flags: superreview?(brendan)
Attachment #181797 - Flags: review?(bzbarsky)
(In reply to comment #51)
> What if the active request is a pending reflow?  Or a pending image error or
> load event?  Same thing?
> 
> As far as that goes, what _is_ the plan for pending PLEvents of various sorts
> when the presentation is cached?  Do we want them firing on the cached
presentation?
> 

No, I don't really want anything to fire on the cached presentation.  If it's
not something that we can figure out how to suspend and resume, then I'd say it
should be a no-cache condition.  I'd rather start off overly-conservative and
then gradually expand the types of pages that we can cache.

> Then why do you need both the URI and the SHEntry?  If the URI is always the URI
> in the SHEntry, just pass the latter?
> 

Could... aURI originates in LoadHistoryEntry if aSHEntry is non-null, so it
looks like they definitely have to correspond.

> > You have to explicitly go to session history to get a pointer to one
> 
> Why?  What if you're holding a pointer to a document (say in a different frame
> on the same website) and then the document gets unloaded before you do something
> with it?

The document needs to act as if it has no presentation, in that case.
> - Got rid of nsDocument::mIsGoingAway in favor of checking for null SGO.

Note that a document created via the DOMImplementation should have a null SGO,
last I checked.  Not sure about documents loaded via XMLHttpRequest and such,
but that may be the case there too...

I need to think about how embedding apps are affected by the secure browser UI
change.  In general, any time we have to use nsIDocShell in our chrome that
means the functionality is not available to embeddors (since nsIDocShell is not
frozen and never will be).

> No, I don't really want anything to fire on the cached presentation.

Then we probably need to audit all places where we create and post PLEvents and
figure out what's supposed to happen for all of them....  Similar for various
timers (the caret blink timer comes to mind).

> The document needs to act as if it has no presentation, in that case.

I'm not sure this patch accomplishes that.

One other thing to watch out for -- computed style. Once someone gets a computed
style object, we need to somehow invalidate it (or something?) when the document
is cached.
(In reply to comment #54)
> I need to think about how embedding apps are affected by the secure browser UI
> change.  In general, any time we have to use nsIDocShell in our chrome that
> means the functionality is not available to embeddors (since nsIDocShell is not
> frozen and never will be).
> 

Darin and I went through this... the secure browser UI object is not accessible
via any public interface on nsWebBrowser, either.  So it should be no change.

> > No, I don't really want anything to fire on the cached presentation.
> 
> Then we probably need to audit all places where we create and post PLEvents and
> figure out what's supposed to happen for all of them....  Similar for various
> timers (the caret blink timer comes to mind).

Right, we could also try harder to suspend things that might post such events. 
Even if we did that, it might not hurt to have a global "inactive" flag for a
presentation that would kill anything like this.

> 
> > The document needs to act as if it has no presentation, in that case.
> 
> I'm not sure this patch accomplishes that.
> 

No, I don't think it does.  I'd say unless there are immediate known issues,
that could be addressed later.

> One other thing to watch out for -- computed style. Once someone gets a computed
> style object, we need to somehow invalidate it (or something?) when the document
> is cached.

It might not hurt to see what Safari does in this case (and the above case, as
well).
So just to be clear, I've somewhat accepted that we plan to check this in with
known issues.  I'd like to make sure those issues are not ones that affect the
basic architecture of the the design. Things like deciding on the ownership of
things, clear statement of our design invariants, and a plan for enforcing those
invariants are, in my mind, prerequisites for sorting out what issues can be
solved in the current design....
Attached patch new patchSplinter Review
Made sure to disable any active carets before putting a document into session
history.  Also added lots of comments in nsDocShell.h about how this all works.
Attachment #181797 - Attachment is obsolete: true
Attachment #181828 - Flags: superreview?(brendan)
Attachment #181828 - Flags: review?(bzbarsky)
Comment on attachment 181828 [details] [diff] [review]
new patch

bryner, great work.  sr+a=me with the major issues below fixed before checkin;
minor issues for your consideration only.  My marks carry over if you attach a
new patch fixing these issues shortly.

With bz agreeing, I think you should check this in as soon as possible. 
Followup bugs should be filed and fixed as usual for any big patch,
particularly on the XXX and XXXbryner issues, which should, if you follow Nat
Friedman's wise advice, be FIXME: <bug#> comments instead.  (So do file
followup bugs now).

It would be good to have a followup bug about better global-LRU memory pressure
instead of the at-most-five-by-default pref.  I'll file it if you like; let me
know.

/be

Major issues:

CopyJSPropertyArray should use the JS_GetStringChars,
JS_GetUCPropertyAttributes, JS_GetUCProperty, and JS_DefineUCProperty APIs to
handle full Unicode.

If any of those APIs fail, there was a JS error reported (OOM) or set as a
pending exception (other than OOM), so you should propagate failure from
CopyJSPropertyArray.

It looks like user-defined window-level getters and setters (which will be
enumerable) will be replaced by undefined values, which is bad.  Instead of

+    jsval propvalue;
+    if (!::JS_GetProperty(cx, aSource, propname_str, &propvalue)) {
+      NS_WARNING("property was found, but get failed");
+      continue;
+    }
+
+    PRBool res = ::JS_DefineProperty(cx, aDest, propname_str, propvalue,
+				      NULL, NULL, attrs);

you want to declare JSPropertyOp getter, setter; call
JS_GetUCPropertyAttrsGetterAndSetter (I just added this API) instead of
JS_GetPropertyAttributes; and define the (Unicode-named) property with getter
and setter passed instead of NULL and NULL.

Why don't you have to save and restore mTimeoutInsertionPoint as well as
mTimeouts?

Minor issues:

Nit: use two-space indenting in nsDocument::Sanitize().

Nit: localize input and form to their respective for loop body blocks in
nsDocument::Sanitize()

Nit: SubDocEnumData *args = NS_STATIC_CAST(SubDocEnumData*, arg);
inSubDocHashEnum to avoid data->data (args->data, nice).  In light of that:
SubDocEnumArgs, not SubDocEnumData.

DOMPageRestore is ok by me -- we should standardize this via whatwg or (hah! 
I'll believe it when I see it) the re-forming w3c DOM group.

Uber-nit: in nsDocShell::CanSavePresentation,

   // difficult (i.e. XMLHttpRequest).

s/i.e./e.g./

In nsDocShell::RestorePresentation,

// Pulling the viewer out of the CachedPresentation will thaw() it.

thaw refers to some old name for what method?  open?
Attachment #181828 - Flags: superreview?(brendan)
Attachment #181828 - Flags: superreview+
Attachment #181828 - Flags: approval1.8b2+
Status: NEW → ASSIGNED
Flags: blocking1.8b2+
Oh, and bryner found what I should have remembered to mention: intervals need
mWhen updating on resume.  And we agreed timeouts and intervals should resume
with whatever time remains in the current timeout or interval period, to avoid
skewing  outstanding timeouts from any related intervals.

/be
Comment on attachment 181463 [details] [diff] [review]
new patch

patch is obsolete. Removing review-request.
Attachment #181463 - Flags: superreview?(brendan)
Attachment #181463 - Flags: review?(bzbarsky)
Comment on attachment 181797 [details] [diff] [review]
new patch

Patch is obsolete, removing review-request.
Attachment #181797 - Flags: superreview?(brendan)
Attachment #181797 - Flags: review?(bzbarsky)
Running with this, lots of windows, tabs, plugins, maps.google.com... crashed here:

#0  0x006837a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x008ea624 in raise () from /lib/tls/libpthread.so.0
#2  0x080545fe in nsProfileLock::FatalSignalHandler ()
#3  <signal handler called>
#4  0xb6f00c9e in nsDocShell::CanSavePresentation ()
   from /home/brendan/src/firefox-opt/mozilla/dist/bin/components/libdocshell.so
#5  0xb6f03458 in nsDocShell::CreateContentViewer ()
   from /home/brendan/src/firefox-opt/mozilla/dist/bin/components/libdocshell.so
#6  0xb6f0c7ea in nsDSURIContentListener::DoContent ()
   from /home/brendan/src/firefox-opt/mozilla/dist/bin/components/libdocshell.so

Looking at the disassembly before $eip in frame 4, I'm thinking

    nsCOMPtr<nsIDocument> doc = do_QueryInterface(pWin->GetExtantDocument());
    if (doc->HasRequests(aNewRequest))
        return PR_FALSE;

needs a null check.  bryner?

/be
Nits: printf("Copied window property: %s\n", propname_str); won't work too well
now that propname_str is jschar *.

/be
###!!! ASSERTION: Already have a root content!  Clear out first!: '!mRootContent
', file c:/mozilla.org/trunk2/mozilla/content/base/src/nsDocument.cpp, line 1501

I'm guessing that's comment 37? things get real bad after that assert.
Perhaps nsGenericElement::ShouldFocus shoul default to false when there is no
script global?	That would make a lot more sense to me...
Comment on attachment 181828 [details] [diff] [review]
new patch

Drive-by review comments:

+class WindowStateHolder : public nsISupports
[...]
+  JSRuntime *mRuntime;

In Mozilla there's only ever one JSRuntime, and you can always get to that
through the nsIJSRuntimeService, so no need to store that here IMO. Or is this
potentially needed after the service is no longer reachable? If so, add
comments about that.

Also nsDocument::Sanitize() should use 2-space indentation.
Attachment #181828 - Flags: superreview+ → superreview?(brendan)
Comment on attachment 181828 [details] [diff] [review]
new patch

I finally built a tree with this patch and did a run with XPCOM_MEM_LEAK_LOG
set.  Just start up and shut down.  With this patch, that leaks 22 nsDocument
objects (and all sorts of other stuff that they're holding references too, of
course).  That's not so good.

Then I tried using this patch.	Loaded
http://lxr.mozilla.org/seamonkey/source/layout/base/nsCSSFrameConstructor.cpp
then clicked the "CVS Log" link.  So far so good.  Then clicked back (which was
fast).	Then clicked forward.  Then clicked back and watched
nsCSSFrameConstructor be reparsed.  I guess the problem is that I'm using
SeaMonkey, so the max_viewers pref ended up being 0.  But if it's zero, why did
we cache the viewer for nsCSSFrameConstructor the first time around?  That
should be fixed.

If I start the browser, open a new tab, load about:config in the new tab,
filter for "svg", right-click on the svg.enabled pref, toggle it, and shut down
the browser, I leak two more documents (24 instead of 22).  That may just be
because I loaded more XUL documents, of course.

I got some assertions that seem to be caused by this patch:

###!!! ASSERTION: Can't get interface requestor: 'ifreq', file
/home/bzbarsky/mozilla/xlib/mozilla/extensions/typeaheadfind/src/nsTypeAheadFin
d.cpp, line 2096

###!!! ASSERTION: failed to get the nsIScriptGlobalObject. bug 95465?:
'boundGlobal', file
/home/bzbarsky/mozilla/xlib/mozilla/content/xbl/src/nsXBLPrototypeHandler.cpp,
line 423

The former I can reproduce at will.  Start the browser, hit Ctrl-T to open a
new tab, then focus the content area and hit Ctrl-W to close the tab.  Gives
that assert and some nice frame manager warnings.  The latter assert, I'm not
quite sure how I got...  I managed to reproduce it once more while messing with
about:config for the "svg" pref, though.

With this patch, I still fail the "resuming network activity" testcase, most of
the time (more on this when I comment on the docshell changes).  When I _do_
pass it, going forward again I get:

###!!! ASSERTION: Request list is not empty.: 'mRequests.entryCount == 0', file
/home/bzbarsky/mozilla/xlib/mozilla/netwerk/base/src/nsLoadGroup.cpp, line 410
###!!! ASSERTION: Foreground URLs are active.: 'mForegroundCount == 0', file
/home/bzbarsky/mozilla/xlib/mozilla/netwerk/base/src/nsLoadGroup.cpp, line 411

On the "forms in cached document submit" testcase, if I just click the link in
the subframe and then go back (without clicking the button), I get:

###!!! ASSERTION: Unexpected root view: 'rootView ==
origView->GetViewManager()->RootViewManager()->GetRootView()', file
/home/bzbarsky/mozilla/xlib/mozilla/view/src/nsViewManager.cpp, line 4108

on every paint...  Sounds like restoring viewmanager state for subdocuments
doesn't quite work.

Moving on to the patch itself:

>Index: browser/app/profile/firefox.js

Please make a similar change to browser-prefs.js.  Or maybe the C++ code should
default to something nonzero here, so embeddors get this feature automatically
(and those who don't want it have to disable it explicitly)?

>Index: content/base/public/nsIDocument.h

>@@ -52,6 +52,7 @@
>+#include "pldhash.h"

Why?  I don't see you using it anywhere in this header...

>+  /**
>+   * Sanitize the document by resetting all input elements and forms that have
>+   * autocomplete=off to their default values.

Do we really want to be that specific?	What effect, if any, should Sanitize
have on XForms?  Might want to file a followup bug on that.  Certainly not an
initil-landing blocker.

>+  typedef PRBool (*nsSubDocEnumFunc)(nsIDocument *aDocument, void *aData);

What does the return value mean?

>Index: content/base/src/nsDocument.cpp

>@@ -1616,7 +1618,12 @@ nsDocument::RemoveStyleSheet(nsIStyleShe
>+  // If there is no script global object, we assume that either:
>+  //   1) Document teardown is in progress
>+  //   2) The document is in session history not being shown; it >should not be

That's not a good assumption for data documents, etc (though they may not ever
get RemoveStyleSheet called on them; I'm not sure).  Wouldn't it make sense to
key off of mInDestructor here, if that's the only place where we remove all our
content?

>@@ -1832,7 +1839,7 @@ nsDocument::GetScriptGlobalObject() cons
>+   if (!mScriptGlobalObject) {
>      nsCOMPtr<nsIInterfaceRequestor> requestor =
>        do_QueryReferent(mDocumentContainer);

I don't see anything in this patch that changes a document's container... So it
looks to me like cached documents will point to their original docshell as the
container, and will point to the window in that docshell as their script
global.  Is that really desirable?  jst might be able to say for sure, but it
seems wrong to me...  Then again, it's no different from what we already had if
someone held a pointer to a document past the point where the window loaded a
new document...

>@@ -1847,37 +1854,6 @@ nsDocument::GetScriptGlobalObject() cons
> nsDocument::SetScriptGlobalObject(nsIScriptGlobalObject *aScriptGlobalObject)

I'll bet that the changes to this function are causing the leaks I observed,
fwiw.

>+nsDocument::Sanitize()

Is there a reason we don't have to worry about autocomplete="off" <textarea>
elements?

>+SubDocHasRequests(PLDHashTable *table, PLDHashEntryHdr *hdr,
>+                  PRUint32 number, void *arg)

>+  PRBool hasRequest = subdoc ? PR_FALSE : subdoc->HasRequests(nsnull);

That should be the other way around?

>Index: content/events/public/nsIEventListenerManager.h

>+   * Allows us to quickly determine if we have unload or beforeunload

s/if/whether/

>Index: content/events/src/nsDOMEvent.cpp

>+  "DOMPageRestore", "submit", "reset", "change", "select", "input", "paint",

I'm still not so happy with putting this in the DOM spec event name namespace. 
I guess if jst and brendan both feel this part is ok I'll let it be, though...

>Index: content/html/content/src/nsGenericHTMLElement.cpp

I still think it's odd that we plan to allow anchors in cached documents to
perform page traversals in the relevant docshell (which they get off the
prescontext).  See the "anchors in cached documents" testcase.	Note that that
testcase updates the status bar too.

As the "forms submit" testcase shows, we also allow forms to submit...	It
sounds like we need a much clearer concept of what is and what is not allowed
in these cached documents and then we need to make sure that we actually
satisfy these constraints somehow...

We also allow focus() and blur() to do something useful.  That's probably
wrong...  For Focus() it can be fixed easily, but blur() basically assumes that
as long as we have a prescontext blurring is ok.

I'm not sure we can just pretend to have no prescontext either -- there are
various DOM changes that would need to be propagated to the frame tree as
things stand (inline style changes, image URI changes, etc).

Maybe it would make sense to drop the cached DOM if it gets mutated?

>Index: content/html/document/src/nsPluginDocument.cpp

>+  } else if (!mStreamListener) {
>+    mStreamListener = new nsMediaDocumentStreamListener(this);
>+    // XXX no way to indicate OOM

I'm not sure what this is trying to do... The mStreamListener is what data is
being pumped into by necko when the document starts loading.  We really
shouldn't need it after we finish loading the document, I suspect; we certainly
shouldn't need to recreate it on fast-back...  If you're trying to make the
plugin show up again on back, that's not the way to do it; you'd have to reopen
the network stream yourself or something.  To test, load a full-page plugin
(http://www.copyright.gov/legislation/dmca.pdf is a good example), go to
another page, go back.	Notice lack of plugin after back.

I don't see changes to nsImageDocument here.   Didn't I point out that this
patch breaks it?  To test, load an image
(https://bugzilla.mozilla.org/mozilla-banner.gif will do nicely), then go
somewhere, then go back.  Note the assert.  After this operation, we have a DOM
node around that thinks it's in the DOM when it really isn't.  After that, some
finagling should probably cause it to crash.

>Index: content/xbl/src/nsXBLPrototypeHandler.cpp

>-      focusController = privateWindow->GetRootFocusController();
>+      if (privateWindow)
>+        focusController = privateWindow->GetRootFocusController();

I think if we have no privateWindow in here we should be bailing out
altogether...  I can't think of a reason we'd want to run XBL command handlers
in a cached document, in general.
Attachment #181828 - Flags: superreview?(brendan) → superreview+
As I said a few weeks ago on IRC, I don't think a cache of 5 documents
per-session-history object is the right approach.  I think we should determine
when to evict documents based on a global MRU list (perhaps around 25?).  Users
who keep lots of tabs or windows open probably don't want 5 documents hanging
around for all of them, especially ones that haven't been touched for days.
(In reply to comment #70)
> As I said a few weeks ago on IRC, I don't think a cache of 5 documents
> per-session-history object is the right approach.  I think we should determine
> when to evict documents based on a global MRU list (perhaps around 25?).  Users
> who keep lots of tabs or windows open probably don't want 5 documents hanging
> around for all of them, especially ones that haven't been touched for days.

I'm sure everyone agrees, I mentioned darin and I discussing this last week --
but this should be a followup bug.  There's no point in adding to the hurdles
for the patch in this bug.  Please file one.

/be
> I'm sure everyone agrees, I mentioned darin and I discussing this last week --
> but this should be a followup bug.  There's no point in adding to the hurdles
> for the patch in this bug.  Please file one.

Yeah, brendan and I spoke about possibly implementing a memory pressure observer
since not all pages are equal in size and it is difficult to estimate the size
of a page.  We of course only implement the memory pressure system on Windows,
but in theory it seems like the right approach.  We monitor overall memory usage
by the browser, and then ask subsystems to reduce memory usage when we see the
overal memory usage of the browser increasing.  Of course, something simple for
bfcache in the meantime seems ideal ;-)
Comment on attachment 181828 [details] [diff] [review]
new patch

>Index: docshell/base/nsDocShell.cpp

>+#include "nsICaret.h"

I'm not really so thrilled with making docshell depend on caret...  If we can
move caret-disabling into presshell (which already knows about caret), great. 
If not, I guess I can deal.

>+nsDocShell::SetSecurityUI(nsISecureBrowserUI *aSecurityUI)
>+{
>+    mSecurityUI = aSecurityUI;

Do we want to assert here if both mSecurityUI and aSecurityUI are non-null?  Or
do we really support that correctly?

Note to self: This adds another strong ref from docshell to window.  Need to
update the map.

>+nsDocShell::SuspendRefreshURIs()

>+    for (PRUint32 i = 0;  i < n; ++i) {
>+        timer = do_QueryElementAt(mRefreshURIList, --n);

That doesn't look right (both incrementing i and decrementing n).  In
particular, say n == 3.  We put the callback from slot 2 in slot 0, then put
the callback from slot 1 in slot 1.  The callback in slot 0 got lost.

>+nsDocShell::CanSavePresentation(nsIRequest *aNewRequest)

>+    nsCOMPtr<nsPIDOMWindow> pWin = do_QueryInterface(mScriptGlobal);
>+    if (pWin && pWin->IsLoading())

Can pWin be null here?

>+    nsCOMPtr<nsIDocument> doc = do_QueryInterface(pWin->GetExtantDocument());

If so, this will crash.  I think it makes more sense to return false on null
pWin, really.

>+    if (doc->HasRequests(aNewRequest))

Unfortunately, this call is far too late... This method is being called when
we're setting up the new content viewer, but all network activity for the old
document was already stopped by the Stop() call in nsDocShell::InternalLoad. 
This is why this patch still fails the "resuming network activity" testcase...

>+    // Only save presentation for "normal" loads and link loads.  Anything else
>+    // probably wants to refetch the page, so caching the old presentation
>+    // would be incorrect.

It might be safe to also cache for LOAD_NORMAL_REPLACE, but that's followup bug
material....

>+    // If there is an unload or beforeunload listener on the window, then we do
>+    // not save the presentation into the session history cache.

What if there's an unload or beforeunload listener on a subframe of this
window?In general, it seems to me that we want to save only if we can save the
presentation for this docshell _and_ all its descendants.

>+nsresult
>+nsDocShell::CaptureState()

This method, or something like it, needs to be called recursively on all kids
of this docshell.  We don't want to do the "save the kids in the SHEntry" part
for all descendants, and we probably don't want to save the script global
objects's properties (since we're not clearing them), but we _do_ want to
suspend all timeouts and intervals in subframes, suspend refresh timers,
capture any security UI state as needed, etc.  As you pointed out we do suspend
timeouts and intervals on the kids, but refresh timers in the kids look like
they would still fire.

>+nsDocShell::RestorePresentation(nsISHEntry *aSHEntry)

>+    if (!viewer) {
>+        return NS_ERROR_FAILURE;

I think NOT_AVAILABLE would make more sense here.

>+    nsresult rv = mContentViewer->Open();
>+    NS_ENSURE_SUCCESS(rv, rv);

On failure here we have both the SHEntry and the docshell holding a ref to the
viewer, which violates the invariant we have about it being owned by one or the
other.	Fix that?

>+    // restore the sticky state of the viewer
>+    PRBool sticky;
>+    aSHEntry->GetSticky(&sticky);
>+    mContentViewer->SetSticky(sticky);

It's a little confusing to me that we're saving this state in the viewer but
restoring it here.... I guess by this point the viewer doesn't have a ref to
the shtentry, right?  Might be worth a comment here that we're undoing what
Destroy() on the viewer did.

>+    // save the previous content viewer size
>+    nsRect oldBounds(0, 0, 0, 0);
>+    aSHEntry->GetViewerBounds(oldBounds);

Not only are we not saving anything here, oldBounds is completely unused...  So
I'm not sure what this code or the code that saved the bounds in the SHEntry is
trying to do, exactly.

>+    // Insert the new root view at the correct location in the view tree.
>+    if (rootViewParent) {
...
>+                    if (oldPresShell)
>+                        oldPresShell->InvalidateViewManagers();

Did you mean to call that on the _new_ shell?  I don't see us fixing up the
root viewmanager pointer for the new viewmanager anywhere; that's probably
causing those asserts I saw.

>+    RefreshURIFromQueue();
>+
>+    mOSHE = aSHEntry;
>+    EndPageLoad(nsnull, nsnull, NS_OK);

EndPageLoad already calls RefreshURIFromQueue, no?  So probably no need to do
it here.

I assume that this avoids firing onload from EndPageLoad because
mEODForCurrentDocument is true, right?	May be worth documenting that here.

>+    if (rootSH && (mLoadType & LOAD_CMD_HISTORY)) {

Can we get here if this is not a history load?	Perhaps it would make more
sense to assert up front in this method that |mLoadType & LOAD_CMD_HISTORY|?

>+    nsCOMPtr<nsIPresShell> shell;
>+    nsDocShell::GetPresShell(getter_AddRefs(shell));
>+
>+    if (shell) {

(when setting the new size) This is duplicating the code for the case when we
have a rootViewParent.	Perhaps we should unconditionally do some of this
earlier so we don't have two copies of this code within 20-some lines of each
other?

>+    if (mScriptGlobal) {

This method earlier asserts that it got a privWin, so mScriptGlobal isn't null
by now (or we've crashed).

>+            mScriptGlobal->HandleDOMEvent(presContext, &restoreEvent, nsnull,
>+                                          NS_EVENT_FLAG_INIT, &status);

I assume there's a reason we don't need to allocate the private data here?

>+    // Restart plugins
>+    if (shell)
>+        shell->StartPlugins();

Is it safe to do this synchronously?  That is, can't the plugins run random
script as they instantiate (if nothing else, by firing broken-plugin events)? 
If they can, we could reenter docshell code from in here.... is that something
we're OK with?

Where do we restore caret visibility?  Or do we?

>@@ -4727,7 +5209,10 @@ nsDocShell::CreateContentViewer(const ch
>+    if (savePresentation) {
>+        // The old content viewer will be saved during the call to Embed().
>+        rv = CaptureState();
>+        gSavingOldViewer = PR_TRUE;
>+        // XXX this isn't really fatal, necessarily...
>+        // should we try to fire unload if NS_FAILED(rv) ?

Indeed.  Also, if CaptureState fails we want to NOT cache the presentation
(because we may not have disabled parts of it that need to be disabled).

>+    gSavingOldViewer = PR_FALSE;

I'd be much happier with a member var here, just in case Embed causes something
else to happen in some other docshell somewhere...  If we can do that without
increasing docshell size, great.  Otherwise, I'll take bigger size in return
for safer clearer code.

>@@ -5005,6 +5499,7 @@ nsDocShell::SetupNewViewer(nsIContentVie
>+    nsCOMPtr<nsIDOMDocument> oldDocument;

That seems to be write-only.  Is it just holding a reference so the document
won't die, or something else?  If this really needs to be here, please document
why so it won't get removed?

>@@ -5627,6 +6131,20 @@ nsDocShell::InternalLoad(nsIURI * aURI,
>+        if (!CanSavePresentation(nsnull))
>+            FireUnloadNotification();
>+
>+        mFiredUnloadEvent = PR_FALSE;
>+
>+        CaptureState();

Do we really want to do that if CanSavePresentation returned false?  I
don't think we do.

>Index: docshell/base/nsDocShell.h
>@@ -395,6 +397,49 @@ protected:

>+    // Captures the state of the supporting elements of the presentation
>+    // (the "window" object, docshell tree, meta-refresh loads, and security
>+    // state) and stores them on |mOSHE|.
>+    nsresult CaptureState();

Are there non-fatal error returns?  If so, which?

>+    // Restores the presentation stored in |aSHEntry|.
>+    nsresult RestorePresentation(nsISHEntry *aSHEntry);

Again, there are basically nonfatal error returns (eg if we couldn't cache the
rpresentation for aSHEntry for some reason).

I'm still not quite sure the order of the various calls in this code is right;
I'd have to spend another day or so sorting through this stuff to make sure...
Titles are probably not updating because SetTitle is called by the content sink
on the document, which passes it on to the docshell.  So on restore we probably
want to get the title from the document and set it on ourselves.  The fact that
we're not firing any of the usual progress notifications other than
OnLocationChange probably means we're breaking some extensions.  I'd like to
see some reasoning behind why we think it's OK to not fire the other expected
web progress listener notifications here (and darin should probably check that
reasoning out, since we're basically changing the behavior of the frozen
progress listener apis).

>Index: docshell/base/nsIContentViewer.idl

It would be nice to document 

>@@ -68,4 +69,21 @@ interface nsIContentViewer : nsISupports

>+   *  (*) unless the document is currently being printed

What will happen if it _is_ being printed?

One other thing that sort of bothers me in all this code is that if we fail
somewhere we leave the SHEntry holding references to all sorts of stuff (eg
whatever propertties were set on the window).  It'd really be good to clean
that up better.

>Index: dom/public/base/nsPIDOMWindow.h
>@@ -139,6 +145,12 @@ public:

>+  // Returns an object containing the window's state.
>+  virtual nsresult SaveWindowState(nsISupports **aState) = 0;

This should probably indicate that it also suspends all current timeouts, etc,
and stores them all in the state....  That is, this not only saves things into
the state but also changes the state of the window.

>Index: dom/public/coreEvents/nsIDOMLoadListener.h

>+  /**
>+   * Processes a DOMPageRestore event.  This is dispatched when a page's
>+   * presentation is restored from session history (onload does not fire
>+   * in this case).

Will this fire only for the "topmost" page being restored (unlike onload, which
fires for all subframes too)?  That's what the code looked like to me... Should
this have a behavior more like onload?

>Index: dom/src/base/nsGlobalWindow.cpp

>+  WindowStateHolder(JSContext *cx, JSObject *aObject, nsGlobalWindow *aWindow);

Document the args?

>+  JSObject* GetObject() { return mJSObj; }

Document this?	Same for other methods on this class?

>+  nsCOMPtr<nsIEventListenerManager> mListenerManager;
>+  nsCOMPtr<nsIDOMElement> mFocusedElement;
>+  nsCOMPtr<nsIDOMWindowInternal> mFocusedWindow;

I assume you've checked these for cycles and there are no problems?

>+CopyJSPropertyArray(JSContext *cx, JSObject *aSource, JSObject *aDest,

>+    if (!::JS_IdToValue(cx, props->vector[i], &propname_value) ||
>+        !JSVAL_IS_STRING(propname_value)) {
>+      NS_WARNING("Failed to copy non-string window property\n");

Can that actually happen?  If so, shouldn't we actually propagate the error and
fall back on normal loads instead of just warning and leading to broken
behavior?  Is "non-string" referring to the value of the property?  Or its
name?

I'm not really so much following this code, so I'm assuming Brendan has given
it a careful review.

>+    if (!::JS_GetProperty(cx, aSource, propname_str, &propvalue)) {
>+      NS_WARNING("property was found, but get failed");

Again, we should error out.

>+  JSObject *stateObj = ::JS_NewObject(cx, &sWindowStateClass, NULL, NULL);
>+  NS_ENSURE_TRUE(stateObj, NS_ERROR_OUT_OF_MEMORY);

So we have a newly allocated unrooted JSObject...

>+  CopyJSProperties(cx, mJSObject, stateObj);

And we make JS_DefineUCProperty calls in here.	Those can trigger GC.

>+  *aState = new WindowStateHolder(cx, stateObj, this);

Which means that stateObj may have been GCed by now.  I think we want to do all
this copying either in the WindowStateHolder or at least after we've created
the holder and rooted stateObj.

>+nsGlobalWindow::RestoreWindowState(nsISupports *aState)

>+    nsIDOMElement *focusedElement = holder->GetFocusedElement();
>+    if (focusedElement) {
>+      nsCOMPtr<nsIDOMNSHTMLElement> htmlElement =
>+        do_QueryInterface(focusedElement);
>+      if (htmlElement) {
>+        htmlElement->Focus();

Hmm... So back() will pull focus into the new document we navigate to in most
cases (since we left it by clicking a link in it, probably, so focus was in
it).  I guess that makes some sense....

Shouldn't we focus the window if the element wasn't HTML or XUL?

For that matter, don't we have APIs to focus arbitrary elements?  Shouldn't we
use them?

Also, calling Focus() on HTML elements fires the onfocus handler on that
element, as well as onblur handlers for whatever element used to have focus. 
Given the codepath into this code, is that really something we want happening
here?  Consider handlers starting another load in this very window, or handlers
closing the window, or whatever....

>+nsGlobalWindow::SuspendTimeouts()

>+  PRInt64 now = PR_IntervalNow();

>+      t->mWhen = PR_MAX(0, t->mWhen - now);

No can do.  First of all, there is no '-' operator for PRInt64, in general. 
Second of all, there is no '>' operator for it (which is needed by PR_MAX). 
Either use nsInt64, or use the LL_* macros as needed here.

>+nsGlobalWindow::ResumeTimeouts()

>+      t->mWhen += now;

Same here.

>Index: dom/src/base/nsGlobalWindow.h

>+  virtual NS_HIDDEN_(void) SuspendTimeouts();
>+  virtual NS_HIDDEN_(nsresult) ResumeTimeouts();

Why are these virtual?	And it's worth documenting what they do (esp. wrt
kids).

>@@ -215,6 +219,8 @@ public:
>+  friend class WindowStateHolder;

This will lead to build bustage on some of our platforms, since there is no
prior decl of |class WindowStateHolder| (a forward-decl would suffice, outside
the nsGlobalWindo class declaration).

>Index: layout/base/nsDocumentViewer.cpp

> DocumentViewerImpl::Destroy()

>+    // This is after Hide() so that the user doesn't see the inputs clear.
>+    if (mDocument)
>+      mDocument->Sanitize();

Sanitize() can fail.  If it does, we'll have sensitive data left sitting around
in the cached document.  Not only should we be checking the rv here, we should
be propagating it back to someplace that nukes the cached viewer (so that on
back to a document we failed to Sanitize we'll reparse it).

>@@ -1323,8 +1386,9 @@ DocumentViewerImpl::Stop(void)

>+  if (mEnableRendering && (mLoaded || mStopped) && mPresContext) {
>     mPresContext->SetImageAnimationMode(imgIContainer::kDontAnimMode);
>+  }

Note that this means that image animations in cached documents are disabled,
right?	I don't see us reenabling them anywhere...  Basic testing shows that on
back to a document with animated images they're all not animated.

>Index: layout/base/nsPresShell.cpp

>+void
>+PresShell::StartPlugins()

What about plugins in subdocuments?  I don't see us starting those...  Needs to
happen.

>+PresShell::StopPlugins()

Same issue here.

>Index: netwerk/base/public/nsISecureBrowserUI.idl

>+interface nsISecureBrowserUIState : nsISupports

Why do we need an empty interface for this?  Why not just use nsISupports?

>Index: security/manager/boot/src/nsSecureBrowserUIImpl.h

>+  nsresult HandleStateChange(PRBool aIsToplevel, nsIURI *aURI,
>+                             PRUint32 aLoadFlags, PRUint32 aProgressStateFlgs);

This seems to be unused.

>Index: toolkit/components/passwordmgr/base/nsPasswordManager.cpp

Doesn't the password manager in extensions/wallet also need updating?

Note that if we delivered the expected progress/state notifications this change
would not be necessary....

>Index: toolkit/content/widgets/browser.xml

The xpfe version also needs to be updated, no?

>Index: view/src/nsViewManager.cpp
>+nsViewManager::InvalidateHierarchy()
>+      mRootViewManager = parent->GetViewManager()->RootViewManager();
>+      NS_ADDREF(mRootViewManager);

Assert that mRootViewManager != this here, ok?

>Index: xpfe/components/shistory/public/nsISHEntry.idl

>+    /**
>+     * Saved child docshells corresponding to contentViewer.  There are weak
>+     * references since it's assumed that the content viewer's document has
>+     * an owning reference to the subdocument for each shell.

That's not a valid assumption, since the DOM of the content viewer's document
can be mutated arbitrarily by anyone who happened to grab a pointer to it
before we left the page.

In fact, the assumption that this list has anything to do with the content
viewer document is not a valid assumption as things stand.

>+    void addChildShell(in nsIDocShellTreeItem shell);
>+    nsIDocShellTreeItem childShellAt(in long index);
>+    void clearChildShells();

Documentation, please?	In particular it's not clear how one would go about
enumerating the child shells in a sane way.  If the index is out of bounds
(which bounds?), do we return null, throw, or crash?  Does addChildShell add to
the end?

>@@ -77,12 +84,21 @@ nsSHEntry::nsSHEntry(const nsSHEntry &ot
>+  , mSticky(other.mSticky)
>+  , mViewerBounds(other.mViewerBounds)

Do we really want to copy those, but none of the other restoration state, when
cloning?

>Index: xpfe/components/shistory/src/nsSHistory.cpp

>@@ -104,10 +109,10 @@ nsSHistory::Init()

Do we care that the pref we added isn't live?

And that there's no way for the user to manually flush out this cache?

>-    nsCOMPtr<nsIPrefBranch> defaultBranch;
>-    prefs->GetDefaultBranch(nsnull, getter_AddRefs(defaultBranch));

This would regress bug 126826.

>@@ -171,7 +176,10 @@ nsSHistory::AddEntry(nsISHEntry * aSHEnt

>+  EvictContentViewers(mIndex - 1, mIndex);

I suspect this happenns before we store the content viewer in the entry at
mIndex - 1, which is why caching 0 viewers doesn't work (caches one or two
after all).
Comment on attachment 181828 [details] [diff] [review]
new patch

OK.  So overall design comments now:

As I mentioned earlier, I think changing what happens wrt web progress listener
notifications (at least the state ones) on session history traversal is a
really bad idea.

To do this patch right, we need to define what the behavior is for various
accesses to the cached document (which is available to various content and
chrome scripts).  What happens on DOM modification?  Calls to DOM interfaces?  
 There needs to be a clear idea of what codepaths are safe to execute in a
cached document.  Once we have that, we need to make sure that those code paths
are the only ones that we execute.  There further needs to be a clear concept
of how much of the cached presentation should be visible to scripts (eg via
computed style) and we need to enforce that.

Once we have this codepaths thing sorted out, we need to look at the various
ways of triggering them, including the various PLEvents we post in Gecko and
the various timers we set up (blink timer come to mind here, as do the object
frame timers, XUL listbox timers, menu timers, combobox PLEvents, restyle
events, selection scroll events, image box frame events, viewmanager invalidate
events, etc).

I would like to second David's comment about having a global cache here instead
of a per-session-history one.

I would also like to say, in case anyone actually cares to listen, that I feel
that trying to force this patch in for 1.8/1.1	is a grave mistake.  It's
nowhere close to being ready as it stands, I suspect it will lead to a number
of hard-to-detect regressions that we won't catch before final because of the
limited testing schedule.

That is, assuming we're still planning to branch for final sometime this June. 
If we've decided to add another few months to the schedule, then maybe (but not
definitely) we could have this working by then.
Attachment #181828 - Flags: review?(bzbarsky) → review-
(In reply to comment #71)
> I'm sure everyone agrees, I mentioned darin and I discussing this last week --
> but this should be a followup bug.  There's no point in adding to the hurdles
> for the patch in this bug.  Please file one.

Since when is it the responsibility of design and code reviewers to file bugs on
the review comments that the patch author chooses to ignore?

And sorry, but I think doing something vaguely reasonable regarding eviction
does belong in this bug and should block it from landing.  Maybe the fancy
memory pressure stuff doesn't need to, but a simple change like the one I
suggested would help a lot.  I've put in quite a few weeks of work on memory
leaks in this release cycle alone, mostly cleaning up after other people's
messes.  I'd rather not see all that work shot to pieces by a patch that's IMO
too risky to land anytime in the month before the freeze, never mind nearly a
month after the freeze.

I'll say more bluntly what I've said somewhat gently before:  I am opposed to
this patch landing for 1.8.  Getting this right for all the corner cases that we
need to get right in order to be able to ship is going to take time.  We also
need to ensure that the API that we provide to web pages for the case where our
heuristics don't do the right thing is sufficient and well-documented.  And then
there are crashes, memory leaks, and catching up from lost testing on all the
other things that we're trying to stabilize for the release.  I'm tired of
planning my own work responsibly around a proposed release schedule and then
having everybody else ignore the schedule, start *building feature lists* around
or after the freeze, and leave me doing stabilization-before-a-release work for
nine months out of the year.  And I certainly have no plans to do another
leak-cleanup cycle for this release, since I already did that in late February
and late March, which was an appropriate time given the schedule that I
foolishly believed.
> Since when is it the responsibility of design and code reviewers to file bugs
> on the review comments that the patch author chooses to ignore?

It's not, I didn't imply that, and bryner hasn't chosen to ignore anything here.
It's my opinion, and I said so, that we shouldn't have to do everything in one
fell swoop.  That can limit testing and serialize work through one hacker and
one or a few reviewers, compared to the alternative of followup bugs.

I shouldn't have presumed you agreed; sorry about that.

> And sorry, but I think doing something vaguely reasonable regarding eviction
> does belong in this bug and should block it from landing.  Maybe the fancy
> memory pressure stuff doesn't need to, but a simple change like the one I
> suggested would help a lot.  I've put in quite a few weeks of work on memory
> leaks in this release cycle alone, mostly cleaning up after other people's
> messes.  I'd rather not see all that work shot to pieces by a patch that's IMO
> too risky to land anytime in the month before the freeze, never mind nearly a
> month after the freeze.

Then the only hope is that this patch goes on a branch for two months, where in
spite of lots of conflicting changes and other demands, bryner manages to keep
it alive.  That looks like a risky plan too.

There's an alternative, but it can't be done without more people helping, even
if you personally don't: we get the patch reviewed and fixed more, get it in for
an alpha (not necessarily the first alpha), and divide the resulting regression
and perf/footprint fixing among a number of people larger than just bryner.

> I'll say more bluntly what I've said somewhat gently before:  I am opposed to
> this patch landing for 1.8.  Getting this right for all the corner cases that
> we need to get right in order to be able to ship is going to take time.

I think it will take more than just time.  If it's serialized through one person
on a branch, besides the risk of conflicts and other demands leading to that
branch becoming a dead limb, the regressions that one person with some review
help will never find still must be confronted when it lands.  That may be a
better plan for everyone else hacking on the code, but it may be a worse plan
than the alternative sketched above for users.

It's hard to say, but I believe a shipably fixed version patch is more valuable
for users than almost any other tradeable big ticket item.  It may not be worth
as much as a bunch of little fixes to severe bugs, though.

There's another point I'd like to make: bz esp., and others including me, talk
about document things like docshell, cleaning up interfaces, reforming bad old
code.  But that is never going to happen as an end in itself -- nor should it!
It should happen as part of an aggressive improvement to end-user function such
as this patch attempts.  That is the best way to guide trade-offs away from
diminishing returns.

I am still in favor of trying to get this patch into an alpha soon.

/be
Sorry for the typos, it's late and it has been a very long day.

In a nutshell, my last point comes down to the observation that we should not
put off reform as a separate exercise.

My next to last point, which I am not sure is a worthwhile generalization, is
that forcing ambitious plans to serialize through strong hackers on (virtual or
real) branches is limiting our ability to benefit end users and actually improve
our codebase at the same time.

I think this is likely a half-truth, but my gut says there is something real and
important to it.  I wonder if we did more pair programming, whether we could get
more done, more quickly, and in certain particular dimensions, even if it meant
fewer good results among all dimensions.

/be
> I am still in favor of trying to get this patch into an alpha soon.

pref'ed off with zero regressions in the pref'ed off case, I should have said
(it should go without saying!).

/be
At some point, I remember there being talk about getting this patch into the
tree pref'd off by default.  What happened to that idea?

This is a big patch, and a valuable one IMO.  Getting the code into the tree
will not only avoid bit-rot but also allow other people to more easily test and
contribute to the patch.  Please correct me if I'm mistaken, but it would seem
that there should be some pinch-point (or at most a couple) where we can decide
whether or not to enable this feature.

Trying to get this patch to be 100% perfect before everyone uses it by default
is the right goal, but that doesn't mean that it has to be 100% perfect before
it is checked into the tree.

This patch sort of reminds me of the HTTP/1.1 pipelining patch.  If I had not
checked in the patch -- pref'd off -- then it would never have landed.  It is
still not good enough to be enabled by default, but yet people do enable it. 
Over time, people have reported bugs, and the feature has slowly improved as a
result.  Now, in some ways pipelining is probably not the best example since it
still isn't enabled by default after all these years ;-)  That has to do with
insurmountable server compatibility issues, but I wouldn't know that if users
hadn't tried the feature and reported sites that were broken.

This isn't to say that Boris isn't correct in saying that more analysis is
needed.  Of course, the more we know about the system, the better we can ensure
that this patch "hits on all cylinders."  I'm just saying that perhaps we're
close to a point where we can learn even more from user feedback as well.

So, are we anywhere close to be able to land this pref'd off? ;-)
First, people seem to be conveniently forgetting that we're in a _beta_ Gecko
milestone.  The _second_ beta, in fact.  As David points out, we've already gone
through a bunch of post-alpha stabilization work on Gecko in the _first_ beta
cycle (most done, by David, I admit).  Just because Firefox is having an alpha
for its UI doesn't exactly make this an "alpha milestone".

Second, most of the patch is in code that really shouldn't be changing much in
beta and the following final cycle if we're doing things right.  So I think it
should be reasonable to maintain it until we reopen for 1.9a.  If desired I will
put my time where my mouth is and stick this in a tree all alone and repost the
merged-to-tip patch here when 1.9a opens.

Third, minimal requirements for getting this in even preffed of would be (at
least; I may be missing something):

1)  Having the preffing off actually work
2)  Not leaking documents
3)  Being very very sure that the ownership model changes in document viewer are
    OK.
4)  Deciding how much we're willing to have not work with the code preffed on
    (eg, do we need to audit the code for security issues as the DOM is mutated,
    or do we clearly say that enabling this code can wipe your hard drive and
    melt your monitor?).

The main issue here, of course, is that even preffed off the patch changes some
very very fragile code in somewhat nontrivial ways.  We'd need to make sure
those changes don't break things either, basically.
> First, people seem to be conveniently forgetting that we're in a _beta_ Gecko
> milestone.  The _second_ beta, in fact.

No, I think everyone is quite aware of that (and the complications of refocusing
the project on a different major release vehicle).  You know as well that this
talk is motivated by a desire to make Firefox 1.1 as compelling to users as it
can be.  Let's not kid ourselves, Firefox is what puts our project on the map. 
Maybe we are trying for too much... ok, that is a fair argument.


> As David points out, we've already gone
> through a bunch of post-alpha stabilization work on Gecko in the _first_ beta
> cycle (most done, by David, I admit).  Just because Firefox is having an alpha
> for its UI doesn't exactly make this an "alpha milestone".

Yes, lot's of great work has been done to improve Gecko for the 1.8 milestone. 
No one wants to regress Gecko or undo all that good work ;-)


> The main issue here, of course, is that even preffed off the patch changes 
> some very very fragile code in somewhat nontrivial ways.  We'd need to make 
> sure those changes don't break things either, basically.

OK, so it sounds like there is currently no way to pref it off completely. 
That's definitely a concern.
I pulled last night and built [SeaMonkey] with attachment 181884 [details] [diff] [review].  I didn't set
any prefs.

1. open browser (home page is slashdot)
2. load http://www.microsoft.com/whdc/devtools/debugging/default.mspx
3. hit back then forward
4. url bar shows http://www.microsoft.com/whdc/devtools/debugging/default.mspx
and page still be slashdot every other time
Between today and 1.1 release there is time for fixing bugs that this patch can
regress. And having this feature for 1.1 is IMO extremly important for marketing
reasons. There is a lot of interest around this stuff, and I believe that bryner
wont be left alone working on fixes for regressions that may happen.
re: comment 82, you also get the following asserts when you do that:

###!!! ASSERTION: User did not call nsIContentViewer::Close: '!mDocument', file
mozilla/layout/base/nsDocumentViewer.cpp, line 516
###!!! ASSERTION: User did not call nsIContentViewer::Destroy: '!mPresShell &&
!mPresContext', file mozilla/layout/base/nsDocumentViewer.cpp, line 522

This happens even with the max_viewers pref set to 5.
(In reply to comment #83)
To clear my comment. I'm just trying to say, that if it would be able to focus
resources on this bug and getting it for 1.1 would be extremly valuable IMO. It
would give us a lot of good press, and would be a key argument in promotion of
Fx 1.1.
What I'm not trying to say is that the bugs and regressions should be ignored
and this patch checked in asap.
Successful software releases are built around features that help the product's
marketability. This is one such feature. I shouldn't have to remind anyone here
that Mozilla is facing a threat from Microsoft this summer. I'm not saying we
should do a poor job of any one feature, but we need to all be on the same page
here. Shipping incremental improvements to existing code as new releases is not
the way we ensure our vitality - we need at least a few high impact end user
features. Everyone *needs* to understand this, otherwise we will run up against
these problems again and again and again. I think there's a lot more we can do
here as a team that recognizes the importance of this moment in time before we
write this feature off. 

Anyhow, enough spam from me.
I filed this bug because the previous bug, bug 38486, became overloaded with
advocacy.  Let's not go there again.

The problem here is not advocacy against the feature, as a feature.  Nor is it a
difference in values about regressions (bryner doesn't want to regress Firefox
1.1).  There probably is a difference in judgment about the value of expediting
and favoring this feature in spite of the trunk "beta" freeze, but again, that
can't be the main problem, or the patch would be in more working order and we
would be arguing only about less tangible risks and release philosophies.

The problem is that the patch is not ready to land as is.  It will cause a storm
of dups, and bad user experiences at the margin that will be counterproductive
in the developer preview.  That's no good, given that bryner is probably pretty
crispy right now.

To make progress by allowing selected others to enable, test, and even help fix
regressions, the change needs to cause no regressions when landed pref'ed off. 
That's what I've talked to bryner about getting working, and that is what I will
support when landing, as I've already said.  So let's do that, or tell start a
newsgroup thread about why this feature should not be expedited, if possible.

/be
bug got my vote!
just tried the feature in Opera. I'm amazed...
If we're shipping Firefox 1.1 with this turned on, we need to have the maximum
amount of testing. Checking this in preffed off won't get us adequate testing.
In fact all it will do is introduce a second codepath which will later cause
support problems (not to mention that each codepath multiplies the QA cost).

I understand that there is a Gecko cycle and that this is the wrong time to land
something like this in the Gecko cycle. But there is also a Firefox cycle and
this is the right time to land this in the Firefox cycle. The Firefox cycle is
what will ensure the Gecko engine remains relevant in the future.

Design problems should be fixed before being checked in. But it needs to go in.
No me-too/gee-whiz bugspam or else; comment 88 is the last one I expect to see.

If bryner comes up with a patch that can be pref'ed off by default without any
regression in that state, then I will make a benevolent dictator decision to
land that patch, so that bryner, myself, and anyone else willing to help (there
will be many) can work to get the design and implementation in agreement with
the (mostly hidden as usual with our codebase) requirements.

This does not mean that I will decide anything else right now, including whether
this feature will be on for 1.1.  No one can decide that right now -- we lack
the information to make an informed decision given the current schedule, and the
schedule itself is too uncertain to decide.

The pref can be temporary, but it is necessary to land this so bryner does not
give up out of frustration.  It's unreasonable to expect him to do it all; it is
also unreasonable to turn it on and coerce everyone not signed up to help to put
up with the regressions likely or known from the current patch.  So pref'ing it
off and landing it soon are the right moves.  I heard dbaron agree in person,
which is great.  I would like to win bz's agreement.

There shouldn't be mistrust that people (specifically, bryer, me, dbaron) will
do the wrong thing, (allow us to) ship an unstable release, or walk away from
unfixed design or implementation bugs.  If we have that level of mistrust, we
shouldn't even be here.

Yes, this feature is late, and it may not make it for 1.8 / Firefox 1.1.  We are
making a special case for it, to get it checked in pref'd off, to avoid risking
the work being lost, and to get more help while not breaking too many people too
soon.  That's an executive decision you can blame me for.  The buck stops here.

/be
If the leaks are fixed, I have no problem with this landing preffed off.  I
agree that that would prevent it bitrotting and would allow people who want to
work on it to do so, which is a very good thing in this case.  We have good
precedent for this sort of thing with the "scary viewmanager" of olde, which
benefited a good bit from the testing it got this way, iirc.
(In reply to comment #69)
> >+nsDocument::Sanitize()
> 
> Is there a reason we don't have to worry about autocomplete="off" <textarea>
> elements?
> 

Microsoft basically defined the behavior of this attribute, and they only define
it to be valid for form and input elements.  We don't honor autocomplete=off for
textareas for purposes of saving form data, either.

> >+nsDocShell::SetSecurityUI(nsISecureBrowserUI *aSecurityUI)
> >+{
> >+    mSecurityUI = aSecurityUI;
> 
> Do we want to assert here if both mSecurityUI and aSecurityUI are non-null?  Or
> do we really support that correctly?
> 

It's just a pointer, what can happen incorrectly?  Granted, I wouldn't expect
this situation to arise, but I don't see what bad thing happens if it does.

> >+    // If there is an unload or beforeunload listener on the window, then we do
> >+    // not save the presentation into the session history cache.
> 
> What if there's an unload or beforeunload listener on a subframe of this
> window?In general, it seems to me that we want to save only if we can save the
> presentation for this docshell _and_ all its descendants.
> 
> >+nsresult
> >+nsDocShell::CaptureState()
> 
> This method, or something like it, needs to be called recursively on all kids
> of this docshell.  We don't want to do the "save the kids in the SHEntry" part
> for all descendants, and we probably don't want to save the script global
> objects's properties (since we're not clearing them), but we _do_ want to
> suspend all timeouts and intervals in subframes, suspend refresh timers,
> capture any security UI state as needed, etc.  As you pointed out we do suspend
> timeouts and intervals on the kids, but refresh timers in the kids look like
> they would still fire.
> 

> >+            mScriptGlobal->HandleDOMEvent(presContext, &restoreEvent, nsnull,
> >+                                          NS_EVENT_FLAG_INIT, &status);
> 
> I assume there's a reason we don't need to allocate the private data here?

I had to allocate it for the load event so that HandleDOMEvent would not try to
set the event's target to the chrome event handler.  Instead, the event gets its
target from the pres context, which is correct.  For the restore event, I am
dispatching to the window object, which will set up the target correctly.

> 
> >+    // Restart plugins
> >+    if (shell)
> >+        shell->StartPlugins();
> 
> Is it safe to do this synchronously?  That is, can't the plugins run random
> script as they instantiate (if nothing else, by firing broken-plugin events)? 
> If they can, we could reenter docshell code from in here.... is that something
> we're OK with?

I'm not really sure.  Maybe jst knows.


> 
> Where do we restore caret visibility?  Or do we?

We don't, it happens as part of restoring focus.  Thinking about it, maybe we
should just blur the element instead of messing with the caret.

> >Index: docshell/base/nsDocShell.h
> >@@ -395,6 +397,49 @@ protected:
> 
> >+    // Captures the state of the supporting elements of the presentation
> >+    // (the "window" object, docshell tree, meta-refresh loads, and security
> >+    // state) and stores them on |mOSHE|.
> >+    nsresult CaptureState();
> 
> Are there non-fatal error returns?  If so, which?

I don't think so...  if anything here fails we should not store the content
viewer in history.

> 
> >+    // Restores the presentation stored in |aSHEntry|.
> >+    nsresult RestorePresentation(nsISHEntry *aSHEntry);
> 
> Again, there are basically nonfatal error returns (eg if we couldn't cache the
> rpresentation for aSHEntry for some reason).

I'm not really following you here, maybe I'm not sure what you mean by "fatal".
 Fatal in what sense?

> 
> I'm still not quite sure the order of the various calls in this code is right;
> I'd have to spend another day or so sorting through this stuff to make sure...
> Titles are probably not updating because SetTitle is called by the content sink
> on the document, which passes it on to the docshell.  So on restore we probably
> want to get the title from the document and set it on ourselves.  The fact that
> we're not firing any of the usual progress notifications other than
> OnLocationChange probably means we're breaking some extensions.  I'd like to
> see some reasoning behind why we think it's OK to not fire the other expected
> web progress listener notifications here (and darin should probably check that
> reasoning out, since we're basically changing the behavior of the frozen
> progress listener apis).
> 

Darin and I talked at length about this and decided to avoid firing progress
listener notifications, if at all possible.  This is to avoid having to create
some kind of dummy request/channel.  

> One other thing that sort of bothers me in all this code is that if we fail
> somewhere we leave the SHEntry holding references to all sorts of stuff (eg
> whatever propertties were set on the window).  It'd really be good to clean
> that up better.

Agreed; but I'm not sure where offhand.  Most everything is only relevant if
there is a content viewer set (or the content viewer is waiting to go into SH),
so maybe it makes sense to do a consistency check between the content viewer and
the history entry after we return from SavePresentation and RestorePresentation.
 If the content viewer doesn't have a pointer to the history entry, or vice
versa, all of the supporting attributes of the history entry could be nulled out.


> >Index: dom/public/coreEvents/nsIDOMLoadListener.h
> 
> >+  /**
> >+   * Processes a DOMPageRestore event.  This is dispatched when a page's
> >+   * presentation is restored from session history (onload does not fire
> >+   * in this case).
> 
> Will this fire only for the "topmost" page being restored (unlike onload, which
> fires for all subframes too)?  That's what the code looked like to me... Should
> this have a behavior more like onload?
> 

You're right, it does fire only for the topmost page.  Perhaps it should fire
for the subframes since they were restored as well.

> >+  nsCOMPtr<nsIEventListenerManager> mListenerManager;
> >+  nsCOMPtr<nsIDOMElement> mFocusedElement;
> >+  nsCOMPtr<nsIDOMWindowInternal> mFocusedWindow;
> 
> I assume you've checked these for cycles and there are no problems?

As far as I know, correct.

> 
> >+CopyJSPropertyArray(JSContext *cx, JSObject *aSource, JSObject *aDest,
> 
> >+    if (!::JS_IdToValue(cx, props->vector[i], &propname_value) ||
> >+        !JSVAL_IS_STRING(propname_value)) {
> >+      NS_WARNING("Failed to copy non-string window property\n");
> 
> Can that actually happen?  If so, shouldn't we actually propagate the error and
> fall back on normal loads instead of just warning and leading to broken
> behavior?  Is "non-string" referring to the value of the property?  Or its
> name?
> 

Brendan said this might be possible with E4X, where the property name can be an
object (?), iirc.

> >+nsGlobalWindow::RestoreWindowState(nsISupports *aState)

----
More later, I'm up to here with either answering these here or fixing the code
(minus the assertions noted, I can test for those after fixing other things up
and see if they're still around).
> Darin and I talked at length about this and decided to avoid firing progress
> listener notifications, if at all possible.  This is to avoid having to create
> some kind of dummy request/channel.  

To be clear, this applies to onStateChange, onStatusChange, and
onProgressChange.  We will still need to generate onLocationChange and
onSecurityChange events.  There are cases already where onLocationChange may be
called with a null request. As for onSecurityChange, none of the consumers in
our tree use the request that is passed to it, presumably because all of the
interesting information is available from the associated browser's
nsISecureBrowserUI object.  We decided that it would be best to try to issue
onSecurityChange events with a null request to avoid having to create a dummy
request.  I asked bryner to post a message to n.p.m.embedding to give people a
heads up about this, and I also said that this may end up requiring a dummy
request... the challenge with that however is in constructing a dummy request. 
Does it QI to nsIChannel?  If so, does it return a non-null value for
nsIChannel::securityInfo?  And, if so... what value?

Otherwise, I see no problem suppressing the other notifications including
onStateChange.  Boris: can you think of any problems with this?
> It's just a pointer, what can happen incorrectly?

Nothing per se, but I can't think of a non-broken reason it'd be getting set
twice....

> Thinking about it, maybe we should just blur the element instead of messing
> with the caret.

Would that work with caret browsing mode?  For that matter, does setting focus
(to the window, say) in caret browsing mode reenable the caret if it's been
disabled?  That sounds odd to me, but...

> I'm not really following you here, maybe I'm not sure what you mean by
> "fatal".

Fatal as in "cases where the return value doesn't indicate that you should just
bail out immediately".  That is, cases that could happen during normal browser
operation that we signal via an error nsresult.  For example, if there was no
presentation cached in aSHEntry (which is perfectly fine), this method will
throw.  It would be good to define exactly what it will throw then (and
preferably make sure it never throws it otherwise)...

> maybe it makes sense to do a consistency check between the content viewer and
> the history entry after we return from SavePresentation and
> RestorePresentation.

That sounds reasonable, yes.  It'd only have to be done on error return too...

> Perhaps it should fire for the subframes since they were restored as well.

It it's meant to serve as a general onload replacement for chrome consumption,
that might make sense... but see below.

On the web progress listener issue, talked to Darin some more.  The problem, of
course, is that of figuring out which approach breaks the smallest number of
consumers due to assumptions they make about correlation between changes in what
a docshell is showing and WPL notifications.  Some specific issues we discussed:

1)  The WPL notifications need a request.  I think this can be dealt with by
    just storing the document's channel in the document.  The HTML document
    already does, and the XML document does until end of pageload.  The latter
    could use a boolean to indicate whether the channel is still "live", and we
    could push mChannel up to nsDocument.  This isn't quite perfect, since the
    channel isn't in the state it normally would be in when STATE_START, say,
    happens (eg. Darin pointed out its cache entry is closed), but better than
    nothing.  ccing sicking and peterv in case they have thoughts against moving
    this to nsDocument...
2)  The WPL notifications are tightly tied to the firing of load an unload
    events and consumers may be expecting said events.  This is a case of which
    approach breaks fewest consumers...  If we do send WPL notifications and
    people get confused because of lack of load/unload we could even consider
    dispatching load/unload events for fastbacked documents to chrome only.
    That may also break things, of course, if said chrome munges the document
    onload in a non-idempotent way.  Have to try to figure out which breaks
    more...
3)  Which notifications to send.  I think onProgressChange makes no sense, nor
    does onStatusChange.  onStateChange with the minimal flags we can use to
    indicate completion of a full pageload (with all subframes) is probably the
    best way to go here.  If we want to be really close to what we do now, we
    could even do it for the child docshells of the docshell doing fastback....

Thoughts?  Do we have a good idea of what extensions are actually assuming and
what our UI code is assuming?  For now the one data point I've seen is that
password manager depends on STATE_DOCUMENT|STATE_STOP notifications.  Note that
it depends on getting the for subframes too, by the way.  We can fix it up with
DOMPageRestore listeners if those are fired on all subdocuments, but again I'm
worried that this is a common usage pattern for WPL.
Other thoughts I had about this, in general:

1)  It may be worth it to have a way for embeddors (or extensions) veto fastback
    on a per-document basis (that is, give them a hook into CanSavePresentation).
    Future-proofing, if nothing else.  For example, XForms can then hook into
    that for whatever XForms has in the way of password inputs.
2)  Do we need a way for embeddors to tell that a fastback operation is starting
    (DOMPageRestore tells them it's done)?  I guess if we implement WPL
    notifications they'll know the document in the docshell is changing.  The
    question is whether they need to know specifically that it's changing because
    of fastback.
The only important UI progress listener is the browser status handler. At least
in the Suite, it needs to see start/stop state, location and security changes as
these already have to be synthesized by the tabbrowser when switching tabs.
Thinking of tabbrowser, the other notification it sends when switching tabs is
the favicon, which expects to see DOMLinkAdded notifications. Otherwise it would
need some way of persisting the favicon to restore it during a DOMPageRestore.
Thinking of DOMLinkAdded notifications, the link toolbar uses a combination of
unload, load, DOMHeadLoaded, DOMLinkAdded and select events. However I think
adding support for DOMPageRestore would suffice. In fact, I'd quite like
tabbrowser to generate DOMPageRestore events instead of select events.
Oh, and I don't think Password Manager should try to restore anything.
Actually I think we can live without state changes, they're only used to force
the URL bar to update to the latest page's URL even if it's been typed into.
we must send STATE_STOP notification to listeners. people are using it to figure
out when a page finished loading, according to questions asked in the newsgroups
;) a11y may depend on it too...
On a practical note, we're going to be rev'ing the extension version for FF 1.1, so extension authors will 
be having to update anyway... as long as we clearly document the notification changes, I think they can 
probably deal.
Yes, but that doesn't help embeddors so much.  Again, nsIWPL is a frozen
interface, so we should preserve backwards compat in its functioning as much as
we can (perfectly if we could, but I don't think we can do that with fastback...)
(In reply to comment #101)
> Yes, but that doesn't help embeddors so much.  Again, nsIWPL is a frozen
> interface, so we should preserve backwards compat in its functioning as much as
> we can (perfectly if we could, but I don't think we can do that with fastback...)

Well, part of the reasoning that Darin and I had was that max_viewers would be enabled in application-
specific prefs, so embedders would have to opt-in.
Right.  I'm not saying people can't work around this and all.  I'm saying we
should do what we can to make it as easy as possible for this feature to be used
with minimal code changes in the rest of the Mozilla tree, extensions, embedding
apps, etc.
> Right.  I'm not saying people can't work around this and all.  I'm saying we
> should do what we can to make it as easy as possible for this feature to be used
> with minimal code changes in the rest of the Mozilla tree, extensions, embedding
> apps, etc.

This should have zero impact on non-firefox stuff (no workarounds required)...
that is, unless other products enable bfcache as well.  For example, pink has
said that he's interested in enabling bfcache for camino.  I'm sure he'll have
to change some things to support bfcache, but that's ok.
Ideally, every single Gecko consumer will be able to enable any Gecko feature
desired with minimal pain...  That's all I'm saying.  I'm really not so happy
with the "it works in Firefox, so that's ok" approach, and I hope that's not
what we're talking about here...
Please, nobody (least of all Darin!) is giving embedders the finger here. 
Embedders who want to use a more complex history and back/forward model will
have to deal with that more complex model.  We are giving them the _opportunity_
to use a bfcache, should they want to do the app-side work to make the user
experience whatever they desire it to be.  Nobody is forcing it on them, and I
don't think it's "ideal" to spend effort to prevent embedders from having to
write some code here if they want to take advantage of the bfcache.

But arguing about an ideal world doesn't seem to be useful here, even if we were
to agree on what "ideal" was.  If existing embeddings of Gecko can continue
unchanged when using a bfcache-capable Gecko, and observe the same behaviour,
then we're exactly in the right place; investment in making it a freer lunch for
embedders who _do_ want to use bfcache should wait until we know more about what
it means to "enable" bfcache, have some idea of what the useful knobs are, and
can spot cliches in embedder use.  Right now, we only know of one app's use of
it (Firefox), and we have far too many embedding interfaces that were frozen on
the basis of single data points.
bz, what do you think about this going in?  The only changes here that will
happen regardless of the pref being enabled are (as far as I know),

- nsDocument::SetScriptGlobalObject(nsnull) changes
- No longer releasing mDocument in DocumentViewerImpl::Close()

If you're comfortable with those changes by themselves, I think this can go in
with minimal risk.

Here's the complete set of changes since the last patch:

- Get rid of default enabled state for firefox
- Added documentation for subdocument enumeration in nsIDocument
- Added nsIDocument::Destroy(), which does what SetScriptGlobalObject(nsnull)
used to do.  This is called when the document viewer releases its reference,
and fixes the leaks bz pointed out.
- Fixed grammar for nsIEventListenerManager.h comments
- Got rid of mStreamListener recreation in nsPluginDocument, it's bogus.
- Changed nsDocShell gSavingOldViewer to be a member variable
- Fixed SuspendRefreshURIs loop copy+paste error
- Made CanSavePresentation short-circuit if max_viewers == 0
- Moved caret disabling into nsIPreShell::Freeze
- Made RestorePresentation take aCanSavePresentation argument, so that the
check can happen before Stop() is called.
- Added comments and error checking to RestorePresentation
- Merged jst event changes
- Got rid of unneeded GetDOMDocument call in SetupNewViewer
- Clarified comments in several files.
- Made CopyJSPropertyArray a little more bulletproof
- Made sure nsGlobalWindow mIsDocumentLoaded is set to true after restoring
from shistory
Attachment #182197 - Flags: review?(bzbarsky)
shaver, I'm not saying that we _have_ to do something more complicated.  Just
that it may be worth considering doing it if the win (a better and more
consistent platform) is there and is worth it.  I somewhat suspect it is, but
I'd welcome data either way.  Do we have a reasonable way yet of contacting
embedders and extension writers about changes like this?

bryner, I'll try to look at the pref'd off patch tonight, but if that doesn't
happen it'll have to be Sunday....
One change I forgot to list:

- never cache full page plugin documents
er, the interface in question
(http://lxr.mozilla.org/seamonkey/source/uriloader/base/nsIWebProgressListener.idl)
is already frozen.

what is the way to get notified that the current page changed, or is changing,
when fastback is involved (if we don't support nsIWebProgressListener for this
purpose)?
(In reply to comment #110)
> what is the way to get notified that the current page changed, or is changing,
> when fastback is involved (if we don't support nsIWebProgressListener for this
> purpose)?

The DOMPageRestore event.
biesi: WPL.onLocationChange is still called.  onLocationChange is the event that
tells you when the contents of the webprogress object have really changed. 
using onStateChange for that purpose is somewhat problematic anyways since
STATE_START doesn't mean the thing is going to be loaded in that webprogress
anyways, and STATE_STOP happens pretty late.  See the WPL implementation here:
http://friedfish.homeip.net/~darinf/xulrunner/content/mybrowser/mybrowser.js

I believe that that WPL would still function correctly with this bfcache.
DOMPageRestore doesn't help me if I'm watching the global nsIWebProgress.

hm, yeah, onLocationChange does, I think... let's hope it suffices...
> onLocationChange is the event that tells you when the contents of the
> webprogress object have really changed. 

That notification is not fired for initial loads in subframes, so currently if
you want to keep track of documents as they're being loaded you listen to
onStateChange...
ok, so, I probably missed it in the last two dozen comments... but what exactly
was the problem with the suggestion in comment 95? it seems like it would be an
easy way to keep the semantics of nsIWebProgressListener.
(In reply to comment #73)
> >+nsGlobalWindow::SuspendTimeouts()
> 
> >+  PRInt64 now = PR_IntervalNow();
> 
> >+      t->mWhen = PR_MAX(0, t->mWhen - now);
> 
> No can do.  First of all, there is no '-' operator for PRInt64, in general. 
> Second of all, there is no '>' operator for it (which is needed by PR_MAX). 
> Either use nsInt64, or use the LL_* macros as needed here.

It was my understanding that the need for the LL_ macros was a limitation of the
old Mac Codewarrior build, and is not necessary for any platforms we currently
compile on.

> >Index: dom/src/base/nsGlobalWindow.h
> 
> >+  virtual NS_HIDDEN_(void) SuspendTimeouts();
> >+  virtual NS_HIDDEN_(nsresult) ResumeTimeouts();
> 
> Why are these virtual?	And it's worth documenting what they do (esp. wrt
> kids).
> 

Because at one point I had them on nsPIDOMWindow... I'll fix that.

> in the cached document.  Not only should we be checking the rv here, we should
> be propagating it back to someplace that nukes the cached viewer (so that on
> back to a document we failed to Sanitize we'll reparse it).
> 
> >@@ -1323,8 +1386,9 @@ DocumentViewerImpl::Stop(void)
> 
> >+  if (mEnableRendering && (mLoaded || mStopped) && mPresContext) {
> >     mPresContext->SetImageAnimationMode(imgIContainer::kDontAnimMode);
> >+  }
> 
> Note that this means that image animations in cached documents are disabled,
> right?	I don't see us reenabling them anywhere...  Basic testing shows that on
> back to a document with animated images they're all not animated.
> 

Oops, what I really meant to do here was to leave the animations enabled, since
we want to avoid the possibility that the new document shares images (which
would also have their animations stop).  We need to do some work in imglib to
make the animations not be shared between documents.

> >Index: netwerk/base/public/nsISecureBrowserUI.idl
> 
> >+interface nsISecureBrowserUIState : nsISupports
> 
> Why do we need an empty interface for this?  Why not just use nsISupports?
> 

The idea was to avoid unnecessary QI'ing around.

> >Index: toolkit/components/passwordmgr/base/nsPasswordManager.cpp
> 
> Doesn't the password manager in extensions/wallet also need updating?
> 

I'm going to hold off, pending the outcome of the WebProgressListener discussion.

> >Index: xpfe/components/shistory/public/nsISHEntry.idl
> 
> >+    /**
> >+     * Saved child docshells corresponding to contentViewer.  There are weak
> >+     * references since it's assumed that the content viewer's document has
> >+     * an owning reference to the subdocument for each shell.
> 
> That's not a valid assumption, since the DOM of the content viewer's document
> can be mutated arbitrarily by anyone who happened to grab a pointer to it
> before we left the page.
> 
> In fact, the assumption that this list has anything to do with the content
> viewer document is not a valid assumption as things stand.
> 

Is your suggestion, then, to throw out the cached presentation when the DOM is
mutated?  Or just to hold strong references here?

> >@@ -77,12 +84,21 @@ nsSHEntry::nsSHEntry(const nsSHEntry &ot
> >+  , mSticky(other.mSticky)
> >+  , mViewerBounds(other.mViewerBounds)
> 
> Do we really want to copy those, but none of the other restoration state, when
> cloning?
> 

I'm not actually sure which things it's safe to copy (because I don't entirely
understand the circumstances where we clone a history entry).  I haven't been
thinking about this with the possibility of multiple history entries hanging on
the same content viewer (and I can imagine it causing all sorts of problems). 
So maybe we don't want to copy any of the state... what do you think?

> >Index: xpfe/components/shistory/src/nsSHistory.cpp
> 
> >@@ -104,10 +109,10 @@ nsSHistory::Init()
> 
> Do we care that the pref we added isn't live?
> 

I don't, really... there's no UI for it.

> And that there's no way for the user to manually flush out this cache?

In firefox, when you clear history it clears session history as well.  It might
be that that's too coarse, but I'd prefer to solve that by making improvements
to the eviction behavior rather than exposing something to the user.

> 
> >-    nsCOMPtr<nsIPrefBranch> defaultBranch;
> >-    prefs->GetDefaultBranch(nsnull, getter_AddRefs(defaultBranch));
> 
> This would regress bug 126826.
> 

Oh, wow, that was intentional.  I will restore it and add a comment so that
someone doesn't do this again.

> >@@ -171,7 +176,10 @@ nsSHistory::AddEntry(nsISHEntry * aSHEnt

----

up to here now, more later
> It was my understanding that the need for the LL_ macros was a limitation

It's a limitation of any platform without an 8-byte long long....  We use them
pretty consistently in our code, and I'd rather we didn't start misusing the
type.  If we want to avoid the macros, we should just use nsInt64.

> The idea was to avoid unnecessary QI'ing around.

But QI'ing from what to what, exactly?  Any one could implement this interface
(which is even scriptable!).  So the cast from it to a concrete class in
nsSecureBrowserUIImpl::TransitionToState is equally dangerous no matter whether
the interface being cast from is nsISupports or nsISecureBrowserUIState...

> Is your suggestion, then, to throw out the cached presentation when the DOM is
> mutated?

I think this is the only thing that makes sense in general, yes...  I don't
think we want to have DOM mutations that happen on the hidden document affect
the state of what the user fast-backs to.  The other option would be to clone
the DOM wholesale on mutation and have the clone to fast-back to while the
original is mutated.  That could take some work, though.  Then again, any
solution to this issue will take work.  :(

> I'm not actually sure which things it's safe to copy (because I don't entirely
> understand the circumstances where we clone a history entry)

We clone in a few cases I can think of:

1)  (Most common)  Subframe pageloads.  These clone the entire session history
    tree and then replace the subtree rooted at the relevant subframe with the
    new session history entry for that subframe.
2)  When handing out a currentDescriptor via nsIWebPageDescriptor.  That clone
    should probably have anything related to bfcache cleared in it before it's
    returned.
3)  When we do a view-source load via nsIWebPageDescriptor.  If #2 is done, this
    should be a non-issue.

Those might well be it.  For now I think copying none of the state is safest,
since it'll just mean that fastback doesn't happen sometimes when maybe it
could.  I _think_ the approach you took to eviction (getting the entry via the
transaction) will find the entries with the viewers correctly...
Comment on attachment 182197 [details] [diff] [review]
pref'd off patch, with some review comments addressed

Not repeating earlier review comments that haven't been addressed, on the
assumption that there are plans to address them.

>+  /**
>+   * Notify the document that its associated ContentViewer is being destroyed.
>+   * This releases circular references so that the document can go away.

So this isn't guaranteed to be called (since not all documents have a content
viewer), right?  That's sort of what we have now, but may be worth documenting
anyway.

And please file a followup bug on figuring out why this is needed at all.

>Index: content/base/src/nsDocument.cpp

>+nsDocument::Destroy()

>+  // XXX HACK ALERT! If the script context owner is null, the document

Fix the comment?   This has nothing to do with the script context owner
anymore.

>+  // This has to be done before we actually
>+  // set the script context owner to null so that the content elements
>+  // can remove references to their script objects.

Is that still relevant?  If so, we have a problem....  If not, it should
probably be removed.

>Index: docshell/base/nsDocShell.cpp

>+nsDocShell::CanSavePresentation(nsIRequest *aNewRequest)

>+    // If there is an unload or beforeunload listener on the window, then we do
>+    // not save the presentation into the session history cache.

Just realized that this doesn't check for such listeners on subdocuments.  It
probably needs to....

>@@ -5588,6 +6053,11 @@ nsDocShell::InternalLoad(nsIURI * aURI,

>+    // Check for saving the presentation here, before calling Stop().
>+    // This is necessary so that we can catch any pending requests.
>+    PRBool savePresentation =
>+        aSHEntry ? CanSavePresentation(nsnull) : PR_FALSE;

This looks wrong, fwiw.  aSHEntry is always null for normal loads, so we'll
always end up with a PR_FALSE here on, say, a link click.  Then we'll stop
current network activity, etc, etc.  So the checking for network activity is
still broken with this patch, if fastback is enabled.  Doesn't affect things
with it disabled, of course.

>+    // If we have a saved content viewer in history, restore and show it now.
>+    if (aSHEntry) {
>+        if (!savePresentation)
>+            FireUnloadNotification();

I don't see why we're doing this (and I should have caught this the first
time).	This will fire unload here if we can't save the presentation.  Why? 
The old code didn't use to fire unload here....  We need to fire unload if we
_restore_ a presentation in this block, but that's handled inside
RestorePresentation().	In any case, this is changing the current codepath for
history loads in a way that looks wrong to me, and does so even when fastback
is disabled.

>Index: docshell/base/nsIContentViewer.idl

>+   *  (*) unless the document is currently being printed, in which case
>+   *      it will never be saved in session history.

But we would have skipped firing unload on it anyway?  That doesn't sound so
great...  Not an issue for in-page listeners (since we check for those), but
chrome listening for unload on the content will get a little confused.	It
might even do that in the non-printing case, though, since we also skip firing
unload there (and the user may never hit back).  Not an issue for the
preffed-off patch, I guess.

>Index: layout/base/nsDocumentViewer.cpp

> DocumentViewerImpl::~DocumentViewerImpl()
>-  NS_ASSERTION(!mDocument, "User did not call nsIContentViewer::Close");

Is it normal to have an mDocument here now?  If not, perhaps we should leave a
warning here?  But I'm guessing that it is?

The rest looks ok.  Note that I didn't read any of the code that's not supposed
to execute when fastback is off.  So the only parts I really reviewed are the
change to docshell to make sure turning it off really leaves it off, the
changes to codepaths that are executed when it's off, and the ownership
stuff...  The issue with firing unload on history traversals needs to be fixed;
once that's fixed I guess I'm ok with this being checked in, preffed off, as
long as the final code we decide to enable gets reviewed before we enable it.
Attachment #182197 - Flags: review?(bzbarsky) → review-
Attached patch new pref'd off patch (obsolete) — Splinter Review
This patch addresses all of the review comments in this bug, except for a few
which I'll mention at the end.	Here are the changes from the last patch:

- Replaced nsIDocument::HasRequests with CanSavePresentation, which is also
overridden by nsPluginDocument to always return false.	This checks for pending
requests and for unload handlers, recursively.
- Change nsGenericElement::ShouldFocus to not allow focus if there is no
window.
- Fixed nsImageDocument::Destroy/SetScriptGlobalObject.
- Fixed nsXBLPrototypeHandler::ExecuteHandler to just bail if it can't get a
window.
- Made refresh URI suspend/resume recursive
- Removed special case in CanSavePresentation for max_viewers == 0, since I
fixed the problems in nsSHistory (see below).
- Changed securityState to be an nsISupports.
- Made DOMPageRestore fire on all subframes, bottom-up.
- Added a mechanism to make the saved presentation state of a history entry
self-consistent.  Call this whenever we had an error saving or restoring a
presentation.
- Made RestorePresentation return a boolean for whether the presentation was
restored; this makes it so there is no error return if we simply hadn't saved
the presentation.
- On Brendan/jst advice, get rid of saving/restoring exception state in
CopyJSPropertyArray.  Instead, use LookupUCProperty, and explicitly exclude the
location property since we prevent it from being defined.
- Clean up focus logic in RestoreWindowState -- don't special case various
element types, and don't steal focus away if the toplevel window is not active.

- Use nsInt64 where appropriate in nsGlobalWindow::SuspendTimeouts.
- Make the document viewer unset the document container and prescontext link
handler when the document stores itself into session history.  These are
restored in Open().  This makes both the form-submit and link-traversal
testcases work as expected.
- Yank the content viewer back out of history if Sanitize() fails.
- Make sure that content viewer eviction runs _after_ storing the content
viewer into history (not after creating the history entry).  This makes the
max_viewers == 0/1 cases work correctly.
- Made plugin stop/restore and caret hiding be recursive.
- Made password manager only process trusted DOMPageRestore events.
- Fixed an unrelated error in nsTypeAheadFind that I found while testing caret
browsing (missing QI entry for nsIObserver).
- Eliminated need for nsIViewManager::InvalidateHierarchy.  Just update the
pointers automatically whenever a root view is inserted or removed.  Roc
thought this would be a good approach; this case is not common at all.
- Made sure to clone none of the presentation state for history entries.

Now, the issues that I did not address:
- It may still be possible for focus and blur to have effects in cached
documents
- The presentation is not cleared out when the DOM is mutated
- I didn't do anything with respect to WebProgressListener notifications; darin
and I continue to feel that we should basically try this and see what breaks.
- I didn't do anything to address the possibility of reentering docshell code
due to plugin start.  I believe this could trivially be put off on a PLEvent
but I think a wait-and-see approach is best.
- Going back fires a focus event on the last-focused content; I still want to
test IE and Safari and copy their behavior on this.
- I did not do anything to make the cache be global instead of per
session-history instance.
- I have not yet filed a follow-up bug to determine why nsIDocument::Destroy is
needed.

All in all I think things are pretty good with this patch, both with it turned
off and with it turned on.  I'm not seeing any assertions or crashes anymore.
Attachment #182197 - Attachment is obsolete: true
Attachment #182539 - Flags: review?(bzbarsky)
That patch also doesn't address the issue with network requests being killed and
restarted.  Just try it on the very first testcase I posted in the bug.  I
pointed out in comment 73 that the current approach to dealing with the issue
simply doesn't work....

I'd really prefer that we leave the short-circuit in place, since that will make
this a lot easier to review.  I simply won't have the time to adequately review
the bulk of this patch this week.
Jesus, let's get this in and be done with it already.  I can and will fix bugs
later.  I'm not disappearing tomorrow.  If you can't review it this week, we
need to find someone who can so that this can go in for alpha.

(In reply to comment #120)
> That patch also doesn't address the issue with network requests being killed and
> restarted.  Just try it on the very first testcase I posted in the bug.  I
> pointed out in comment 73 that the current approach to dealing with the issue
> simply doesn't work....
> 
> I'd really prefer that we leave the short-circuit in place, since that will make
> this a lot easier to review.  I simply won't have the time to adequately review
> the bulk of this patch this week.
Just in case people still are not clear on my position, I want this patch to
land pref'd off without regressions, ASAP.  I trust bryner to respond to bz's
review comments and tests, and do whatever else it takes to fix this bug in the
right time frame, with some ability to rest, recouperate, get help, etc.

So I think this patch should land tomorrow.

/be
Let me rephrase.  I believe that the network-activity testcase is not an
architectural problem and can be fixed by tweaking the CanSavePresentation
logic.  I don't think that there is anything right now that should block this
from landing turned off, and very little, if anything, that should block it from
being turned on.  The short-circuit path is convenient for limiting regressions
in the short-term, but I think we really want this to just be an immediate
ejection from session history, and I think that's pretty safe.  Safe enough for
1.1 alpha, in fact.
Perhaps I need to clarify:

1)  CanSavePresentation is being called at the wrong time to possibly be able to
    fix the network-activity testcase.  If by "tweak the logic" you mean "call
    it at a few other places in the load sequence", I buy that, perhaps.  If you
    mean "edit the logic of the existing method", there's no way that will help.
2)  I believe the short-circuit is the right way to go in general, not just for
    my review convenience, since it avoids a fair amount of work for cases when
    this is disabled.  Given that this being disabled is our proposed embedding
    scenario for the near term, we should make that case reasonable.
3)  I'm getting mixed signals here.  We want this to land preffed off without
    regressions but today?  It'll take me on the order of 15-20 hours of
    reviewing time to be able to sign off on that without the short-circuit.
    With the short-circuit I can do that in 2-3 hours.

So I'm going to review this with the assumption that the short-circuit will be
put back in (which means I'll not review any code called by CanSavePresentation
or any code triggered when that returns true).  Then you can land it if you want
as far as I'm concerned, as long as that code _is_ reviewed (and tested,
dammit!) before the pref is enabled.
I'm not sending mixed messages, I'm trusting bryner to do the right thing.

If the patch lands pref'd off and we do a bunch of extra work that's thrown away
on every back and forward, we are technically not regressing corrrectness, but
we are wasting memory and cycles.  The short-circuit code avoids this.  But is
it really necessary before landing?

I said "without regressions".  bryner: will everyone who runs default builds
(with the pref set to 0) notice the lack of the short circuit?

/be 
It sounds like Thunderbird does not turn off session history, so without the
short circuit this patch will cause presentations to be saved and thrown away on
every message load over top of a previous message view.

That's a case for fixing Thunderbird and other such apps to disable sesson
history altogether in their embeddings.  But it also suggests that the short
circuit is the safer, more conservative way to handle the pref, besides being
more efficient for the pref'd off case.

So i'm in favor of the short circuit.  But, I still trust bryner to do that as
an easy step after landing.

At this point followup bugs seems much better for tracking open issues.

/be
Is there a mechanism to turn off session history for a particular
docshell/<xul:browser>/whathaveyou?
It sounded like Thunderbird didn't disable history, but looking is better than
listening --
http://lxr.mozilla.org/aviary101branch/source/mail/base/content/messageWindow.xul#146.
There may be other <browser> tags that lack that attribute setting, though.

Anyway, please land and fix the short circuit in a separate patch.

/be
We should assume that there are embeddings out there that do not turn off
session history and won't have any chance to rev code before being using with
Gecko 1.8.  I'm thinking of Eclipse mainly (I have not tested it), and there may
be others.
Comment on attachment 182539 [details] [diff] [review]
new pref'd off patch

>Index: content/base/src/nsDocument.cpp
>+nsDocument::CanSavePresentation(nsIRequest *aNewRequest)

>+      if (request && request != aNewRequest) {
...
>+        return PR_TRUE;

PR_FALSE, right?

>Index: content/html/document/src/nsImageDocument.cpp

>+nsImageDocument::Destroy()

This needs to call the superclass Destroy().

>Index: docshell/base/nsDocShell.cpp

>+nsDocShell::FireRestoreEvents()
>+{
>+    // These events fire bottom-up, so call this on our children first.
>+    PRInt32 n = mChildList.Count();

Followup bug -- if event handlers modify the DOM, this can walk off the end of
the array.  In general, it's worth sorting out what should happen here if the
DOM gets mutated (shell subtrees get removed, say).

Similarly, the caller of this method needs to deal with the docshell being
removed from the docshell tree; I'm not sure it does.

>+nsDocShell::RestorePresentation(nsISHEntry *aSHEntry, PRBool aSavePresentation,
>+    // save the previous content viewer size

We're getting it, not saving it....

>Index: docshell/base/nsIDocShell.idl

>+  void suspendRefreshURIs();
>+  void resumeRefreshURIs();
>+  void fireRestoreEvents();

I really wish we didn't need to put these on a "public" scriptable API... Maybe
we can do something like the GetAsDocLoader thing we have... or something.

>Index: layout/base/nsPresShell.cpp

>+FreezeSubDocument(nsIDocument *aDocument, void *aData)

Later on, might be worth doing the recursion in document, not presshell, and
freezing all presentations, not just the one at 0.

Let's file separate bugs on the remaining issues, since they're starting to get
lost here... With the short-circuit restored, this is ok to land, I think.  I
still haven't really had a chance to trace the details of the session history
stuff, but it looks reasonable at first glance.  Thank you for bailing out on
Sanitize failure!

r=bzbarsky (with short-circuit in)
Attachment #182539 - Flags: review?(bzbarsky) → review+
Actually, there is one more serious issue.  Open a new window.  Load an image
(full-page image).  Close the window.  We crash when trying to remove event
listeners off a null event target.  This happens no matter what the pref is set to.

The problem is that mScriptGlobalObject is null by the time we get into
Destroy() there.  Setting and unsetting of listeners on the global object should
remain in SetScriptGlobalObject, not move to Destroy().
Just to clarify, the r= holds with that crash fixed.
(In reply to comment #127)
> Is there a mechanism to turn off session history for a particular
> docshell/<xul:browser>/whathaveyou?

xul:browser has a 'disablehistory' attribute whose presence prevents (or at
least ought to prevent) session history from being used.
Depends on: 292890
- Fixed inverted return value of nsDocument::HasRequests
- Fixed nsImageDocument to deal with the window event listeners in
SetScriptGlobalObject (the mImageContent listener is still unhooked in
Destroy).  Also make sure to call nsMediaDocument::Destroy().
- Restored the short-circuit in nsDocShell::CanSavePresentation.  Reordered the
checks in that method to minimize the overhead in cases where we can't save the
presentation (doc->CanSavePresentation() becomes last since it does a recursive
crawl of the subdocuments).
Attachment #182539 - Attachment is obsolete: true
That last patch looks good to go to me.
Can someone set a review flag on the last patch?
(In reply to comment #136)
> Can someone set a review flag on the last patch?

https://bugzilla.mozilla.org/show_bug.cgi?id=274784#c132
Depends on: 292903
I'm getting a build error on Win32 cygwin MinGW which seems to be due to this
bugs patch :-

e:/mozilla/source/mozilla/dom/src/base/nsGlobalWindow.cpp: In function `nsresult
 CopyJSPropertyArray(JSContext*, JSObject*, JSObject*, JSIdArray*)':
e:/mozilla/source/mozilla/dom/src/base/nsGlobalWindow.cpp:5928: error: invalid s
tatic_cast from type `jschar*' to type `PRUnichar*'
e:/mozilla/source/mozilla/dom/src/base/nsGlobalWindow.cpp:5962: error: invalid s
tatic_cast from type `jschar*' to type `PRUnichar*'
make[5]: *** [nsGlobalWindow.o] Error 1
Attached patch mingw fixSplinter Review
Comment on attachment 182608 [details] [diff] [review]
pref'd off patch w/ short circuit and crash fix

- In CopyJSPropertyArray():

+    JSString *propname = JSVAL_TO_STRING(propname_value);
+    jschar *propname_str = ::JS_GetStringChars(propname);
+    NS_ENSURE_TRUE(propname_str, NS_ERROR_FAILURE);
+
+    // We exclude the "location" property because restoring it this way is
+    // problematic.  It will "just work" without us explicitly saving or
+    // restoring the value.
+
+    if (!nsCRT::strcmp(NS_STATIC_CAST(PRUnichar*, propname_str),
+			NS_LITERAL_STRING("location").get())) {
+      continue;
+    }
+
+    size_t propname_len = ::JS_GetStringLength(propname);

There is a nsDependentJSString class that you could use here, but you do need
the jschar pointer too so I'm not sure it's worth it.

Oh, and on an unrelated note, I wonder if the state-saving code could be
triggerd from a timeout? If it could, then there's some nasty timer issues to
deal with. When an XPCOM timer fires, we may run more than one JS timeout if
the XPCOM timer fires late (which is not that uncommon), or if there are more
than one timeout with the same deadline. In such a case, we need to stop firing
timeouts if we end up saving state, and we need to make sure we never save
timeouts that are on the stack (if this code is triggerd from a timeout there
will be a timeout in the timeout list that's not heap allocated, look for
dummy_timeout in nsGlobalWindow::RunTimeout()).

Separate bug, I guess, but it needs fixing, I think (or are location changes
from JS *guaranteed* to be async?).
Donno if you guys want comments in here for side effects or not.

Some side effects: 
1) Seems like there is a short pause before browser starts loading the
page...actually I think the page loads before the page displayed except for
secure pages (See #3) 
2) The favicon is not changing when I go back and forward...the second page I go
to, its favicon stays in the address bar.
3) Not instant back/foward with secure servers ie.. paypal, bugzilla,
usaa.com...also takes a long time for page to display
File new bugs for new problems, please.

Also, study up on this bug before reporting intentional design decisions as bugs
(and don't report the paint suppression delay as a bug, if that's what comment
141 #1 is).

/be
Component: DOM: Level 0 → History: Session
Depends on: 292926
Blocks: 292934
No longer blocks: 292923, 292933, 292934
Depends on: 292923
No longer depends on: 292926
Blocks: 292933
No longer blocks: 292933
Depends on: 292934
Depends on: 292938
Blocks: 292941
No longer blocks: 292941
Depends on: 292941
Depends on: 292945
Depends on: 292998
This was checked in on 5/4.  Leaving the bug open since this is not yet turned on.
(In reply to comment #143)
> This was checked in on 5/4.  Leaving the bug open since this is not yet turned on.

What's the pref to enable this in the nightlies? (I apologise for the bugspam if
this was mentioned earlier.)
Alex, I think it should already be enabled in latest builds. The number of pages
cached is held in the preference browser.sessionhistory.max_entries and is
defaulted to 50, I think.
max_viewers and defaults to 0, actually.
Whiteboard: (o72555) → (o72555)Set browser.sessionhistory.max_viewers to nonzero to test
(In reply to comment #146)
> max_viewers and defaults to 0, actually.

If max_viewers determines the number of pages to cache and it's set to 0 by
default, does that meant that bfcache is disabled by default, in effect?
Depends on: 293049
(In reply to comment #147)
> If max_viewers determines the number of pages to cache and it's set to 0 by
> default, does that meant that bfcache is disabled by default, in effect?

It is currently disabled by default while testing is carried out and as a
compromise while other patches are attended to that some people have believed to
be more important as they are lower risk.
No longer depends on: 293049
Whiteboard: (o72555)Set browser.sessionhistory.max_viewers to nonzero to test → (o72555)
Depends on: 293049
(In reply to comment #144)
> What's the pref to enable this in the nightlies?

See http://weblogs.mozillazine.org/chase/archives/008085.html

(In reply to comment #145)
> The number of pages
> cached is held in the preference browser.sessionhistory.max_entries and is
> defaulted to 50, I think.

That's the number of pages to store in the session history (Go menu,
Back/Forward menus). browser.sessionhistory.max_viewers is the relevant preference.

(In reply to comment #147)
> If max_viewers determines the number of pages to cache and it's set to 0 by
> default, does that meant that bfcache is disabled by default, in effect?

browser.sessionhistory.max_viewers is the number of pages to store (I think
that's per window/tab right now but may become global later). A value of 5 would
be sensible. By default, it's not set at all, so the bfcache is disabled.

<rant>Most of these questions could be answered just by reading the bug.</rant>
No longer depends on: 285404
This is a great feature. I'm loving it.
I would like to report on some side effects I have seen.

-when pressing Back or Forward, the page view will first become grey, and then
load the page from cache. This isn't the normal behavior that is used when
following a link. I think it should be the same

-animated GIFs on a page you Back or Forward to will no longer animate

-as I think I have read in a previous comment, the website icon isn't displayed
anymore on a page that you Back or Forward to
Please report any side effects or issues in new bugs blocking this one if you
want them to be fixed.
Depends on: 293074
Whiteboard: (o72555) → (o72555) Set browser.sessionhistory.max_viewers to nonzero to test
No longer depends on: 293074
Related crash cases: #293076
This might have caused bug 293135? Bug is reproducable _without_ enabling the pref
Depends on: 293135
Depends on: 293076
Depends on: 293173
Depends on: 293175
Depends on: 293176
Depends on: 293179
Depends on: 293234
Depends on: 293235
Depends on: 293022
Depends on: 293270
Depends on: 293386
Depends on: 293534
Depends on: 293572
I think that this bug should not block 1.8b2 anymore.

Instead I think it should block aviary1.1 and once the time comes the discussion
whether to enable or disable this feature should be here. 
Depends on: 293588
Depends on: 293698
Depends on: 293700
Depends on: 293709
(http://www.animatedgif.net/sitemessages/download/download.shtml)

1. A page with several animated gif's. Select View Image on one of them
2. See the gif and then push Back.

Only one of them (the one you clicked) is only moving. Other are stucked
Filed #293797
Depends on: 293797
No longer depends on: 293797
Depends on: 293793
Depends on: 293606
Depends on: 294258
Depends on: 294231
we'll be tracking blocking regressions individually so I'm pulling this bug off
the blocking 1.8b2 radar.
Flags: blocking1.8b2+
Depends on: 294454
Depends on: 294610
Depends on: 293754
Depends on: 295085
Blocks: majorbugs
Depends on: 295591
No longer blocks: majorbugs
Depends on: 295656
Depends on: 295931
Depends on: 296575
Depends on: 296745
Depends on: 296760
Depends on: 297122
Depends on: 292987
Depends on: 293403
Depends on: 297488
Depends on: 297887
Depends on: 297945
Depends on: 298077
Depends on: 298125
No longer depends on: 292948
Depends on: 298133
No longer depends on: 297122
No longer depends on: 297945
Depends on: 298112
Depends on: 298622
Depends on: 298794
Depends on: 297926
Depends on: 299205
Depends on: 299547
Depends on: 299556
No longer depends on: 299205
Depends on: 300411
Depends on: 299853
Depends on: 300533
Depends on: 300538
Depends on: 300642
Depends on: 300663
Depends on: 300800
Depends on: 300802
Depends on: 300863
No longer depends on: 300863
Depends on: 300572
Depends on: 300944
Depends on: 300956
Depends on: 301354
Depends on: 301358
Depends on: 301397
No longer depends on: 293175
Depends on: 293175
Depends on: 301516
Depends on: 301517
Depends on: 298377
Depends on: 302290
Depends on: 299514
Depends on: 302956
Depends on: 303013
Depends on: 303151
Depends on: 303059
Depends on: 303237
Depends on: 303255
No longer depends on: 303237
Depends on: 303311
Depends on: 303360
Depends on: 303327
Depends on: 303267
Depends on: 303637, 303638
Depends on: 303904
Depends on: 304003
Depends on: 304288
Depends on: 304764
Depends on: 304639
Depends on: 305129
Depends on: 305167
Depends on: 300756
Depends on: 304994
Depends on: 303839
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050823
Firefox/1.0+

http://www.key.ru -- DPA2 doesn't put the page in memory and redownload/rerender
it when clicking back or forward buttons. If it's a bug, will it be fixed?
Depends on: 305954
Depends on: 305995
Depends on: 306283
(In reply to comment #158)
> http://www.key.ru -- DPA2 doesn't put the page in memory and redownload/rerender
> it when clicking back or forward buttons. If it's a bug, will it be fixed?
It's Bug 292941.
Blocks: 304083
Depends on: 308194
Depends on: 297575
No longer blocks: 38486
*** Bug 38486 has been marked as a duplicate of this bug. ***
Blocks: 164421
Depends on: 310738
Depends on: 311791
Depends on: 312027
Depends on: 312117
Depends on: 312816
No longer blocks: 304083
Depends on: 304083
Depends on: 314400
Depends on: 307178
Depends on: 314288
Depends on: 314549
Depends on: 317380
Depends on: 319527
Depends on: 306862
Depends on: 322725
Depends on: 328722
Depends on: 328752
done.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Depends on: 333050
Depends on: 319551
Depends on: 309993
Depends on: 338542
Depends on: 314362
Depends on: 343258
Component: History: Session → Document Navigation
QA Contact: ian → docshell
Depends on: 343747
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: