Open Bug 486921 Opened 12 years ago Updated 9 years ago

For "Save As File Only" functions, reuse the docshell code to prepare the channel to be persisted

Categories

(Firefox :: File Handling, defect)

defect
Not set
normal

Tracking

()

People

(Reporter: Paolo, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8 (.NET CLR 3.5.30729)

When a page that is already displayed in a browser window needs to be saved
as file only, the network channel that was used to fetch the original page
must be set up again, this time with instructions to read the data from
the cache, if available.

This channel setup operation is similar, but not identical, to the operation
performed when a currently displayed page is reloaded.

Currently, this logic is reimplemented (partially) by the various save
functions defined in "contentAreaUtils.js". Instead of having the logic
duplicated, the code in the docshell component could be reused.

Reproducible: Always




The "View Source" window already does something similar to save the displayed
source locally, for editing with an external editor:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/viewsource/content/viewSourceUtils.js#138

In this case, the load operation completes inside the docshell itself, then
the loaded document is saved. This code also performs character set
conversion while saving the file (line 233).

For this method to work in a browser window, without affecting the displayed
document, there should be a way for saving the raw channel data without
actually creating a content viewer. In other words, the page reload code
path should be followed until just before the channel is opened, then the
channel should be returned to the caller, still unopened.

The code path starts here [nsDocShell::LoadPage]:

http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#3625

And terminates here [aURILoader->OpenURI(aChannel, ...)]:

http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#7803

Note that some of the operations that are performed by the docshell code,
such as possibly adding the page to session history, are not needed if the
purpose is to create a channel that can be used elsewhere.
The following bugs may be related to the differences between the two
procedures that set up the channel for save and for reload. Sharing the
code may resolve or help to resolve them:

Bug 485196 - Web page generated by POST is retried as GET when Save
             Frame As used, and the page is no longer in the cache
Bug 479296 - Saving POST result page silently resends POST data if page
             no longer in cache
Bug 177329 - nsIWebBrowserPersist can't persist things whose cache
             entries have expired (maybe, now, a duplicate of 479296)
Bug 472895 - Save page, HTML only, on document.write result document
             (wyciwyg document) completely fails (saves document that
             called write() instead)
Bug 120809 - Save as function refetches data or images that are in the cache
Paolo: I've given you the Bugzilla privileges required to file bugs as NEW, and edit existing bugs.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #2)
> Paolo: I've given you the Bugzilla privileges required to file bugs as NEW, and
> edit existing bugs.

Thanks Gavin, this is helpful :-)
This patch is an experiment, to allow an easier review of the save process.
It does not reflect necessarily how I'd do the final implementation.

I created a new method named newSaveChannel in the nsIWebPageDescriptor
interface. This method basically follows the code path described in
comment 0, and results in a channel being returned to the caller.
The channel to be persisted is prepared in the same way as history
navigation does (LOAD_HISTORY type).

While replicating the code path, I copied the InternalLoad function
and created InternalLoadForNewSaveChannel_COPY_FOR_REVIEW. I removed
much of the tests done by InternalLoad; I'm interested in knowing if
the remaining checks are useful for a "Save As" operation, or if the
InternalLoad function can be bypassed completely.

Also, I'd be interested in knowing if the load flags that are set on the
channel for history reloads may be appropriate for the "Save As" case.

When this patch is present, bug 479296 is fixed since the download fails
if the page with the POST data is no longer in the cache. After a reload,
the page can be saved. Bug 485196 and bug 472895 are apparently fixed.
I don't know whether bug 120809 is affected.
So...  This only helps cases when we're saving a document that's loaded in a docshell; it doesn't help with various other cases (e.g. save image as), right?

Some of the code really doesn't look quite right to me (e.g. the CheckLoadURI check), but I'd need to dig more to see whether it's actually correct.

I think it's worth going through exactly how docshell and web browser persist differ in their channel setup, figuring out what those differences mean in practice (e.g. the former can read from an offline app cache) and then for each difference deciding which behavior is desirable for save as.  Then seeing where we stand on reusing the docshell code.
(In reply to comment #5)
> So...  This only helps cases when we're saving a document that's loaded in a
> docshell; it doesn't help with various other cases (e.g. save image as),
> right?

Yes.

> Some of the code really doesn't look quite right to me (e.g. the CheckLoadURI
> check), but I'd need to dig more to see whether it's actually correct.

Thank you!

> I think it's worth going through exactly how docshell and web browser persist
> differ in their channel setup, figuring out what those differences mean in
> practice (e.g. the former can read from an offline app cache) and then for
> each difference deciding which behavior is desirable for save as.  Then
> seeing where we stand on reusing the docshell code.

Yes, that's nearly what I intended to ask with my specific questions in
comment 4. I don't know what the desirable behavior should be; someone
with more expertise should actually answer these questions. I hope this
experimental patch helps by highlighting the code path on the docshell side,
regardless of what the final implementation will be.

Actually, a reviewer might as well proceed in the opposite direction: first
go through the docshell code, which is the most complex and complete, and
figure out what the desirable behavior for "Save As" could be; then
cross-check with the current "Save As" behavior. Maybe it would turn out
that something that is not a difference in the current channel setup should
actually be done differently.
This is an actual patch implementing the newSaveChannel method. One of the
goals of this work is also to separate the channel preparation from the
actual load in the docshell, which may be useful in general. However, this
is not the cleanest implementation that's possible; feedback is welcome :-)
I've still not analyzed the differences between the load flags that are
currently set up by the "Save As" functions and the flags that this new
method will set. This in another area where some advice would be useful.
Attachment #372907 - Attachment is obsolete: true
Attachment #391907 - Flags: review?(cbiesinger)
Comment on attachment 391907 [details] [diff] [review]
nsIWebPageDescriptor.newSaveChannel method

I like the idea of this patch, but I'm sure it is outdated by now. I'll review a new version, if someone wants to produce one.
Attachment #391907 - Flags: review?(cbiesinger)
You need to log in before you can comment on or make changes to this bug.