Open Bug 238966 Opened 20 years ago Updated 2 years ago

If saving two pages simultaneously save is sometimes incomplete

Categories

(Firefox :: File Handling, defect)

x86
Windows XP
defect

Tracking

()

UNCONFIRMED

People

(Reporter: Time_lord, Unassigned)

Details

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b) Gecko/20040327

(Please note: My local build ID above, based on tarball from 20040313)

I've modified Mozilla to save two pages consecutively (two different tabs), but
sometimes the first page isn't saved completely, or has the name of the second
page, or is not saved at all.
Eg I have 'fred.html' open in tab1 and 'index.htm' open in tab2. Sometimes the
only page that is saved is index.htm. However if I set breakpoints in the
trigger and the callback (for the save) then it works correctly (using JS
debugger/venkman). Depending where the code is paused and where breakpoints are
set, I've even seen the first page saved but with the name of the second page.

It seems the callback (that does the writing) isn't 'reentrant' or there is a
race condition.
Another way of looking at this: "Save as web page complete" is fine, but not if
you call it programatically, i.e for one page then the next, it appears that if
no time is given in between saving the first & second pages, then the second
page clobbers the first. This is unpredictable.
I am working in mozilla\xpfe\communicator\resources\content\contentAreaUtils.js.


Reproducible: Sometimes
Steps to Reproduce:
Programatically save two web pages (two tabs). A quick hack should suffice; my
modifications are more substantial, so won't include here.

Actual Results:  
Unpredictable

Expected Results:  
Should have same result as calling 'Save Page As' for each tab.
i.e. you are using "web page, complete" for both pages? (or, from a programmatic
point of view, nsIWebBrowserPersist?)
(In reply to comment #1)
> i.e. you are using "web page, complete" for both pages? (or, from a programmatic
> point of view, nsIWebBrowserPersist?)

Yes, I am triggering the existing code which determines (or defaults to) "web
page, complete", but I suppress the fp.show (confirmation dialog where the user
confirms directory, filename and file type).
On closer investigation, I believe the code that is triggered stores some data
which the callback accesses (in nsIRequestObserver:onStopRequest) which is
altered when the second page is saved. This is potentially a problem when the
user manually saves 2 pages simultaneously.
Will decide how best to fix.
(In reply to comment #3)
That assumption I now think is false.
hmm, are you maybe using the same nsIWebBrowserPersist object for the two saves?
(In reply to comment #5)
> hmm, are you maybe using the same nsIWebBrowserPersist object for the two saves?
No, the code seems to fail in the callback before that bit of the code runs.
The small change is based on a seamonkey tarball from 20040414; In
mozilla\xpfe\communicator\resources\content, edit contentAreaUtils.js:
235,240c235,243
<   // In both cases here, we want to use cached data because the
<   // document is currently visible.
<   if (aDocument)
<     saveInternal(aDocument.location.href, aDocument, false);
<   else
<     saveInternal(_content.location.href, null, false);
---
>   var browser = getBrowser();
>   var thisBrowser = browser.browsers[0];
>   var ref = thisBrowser.contentDocument.location.href;
>   var doc = thisBrowser.docShell.document;//null;//
>   saveInternal(ref, doc, false);
>   thisBrowser = browser.browsers[1];
>   ref = thisBrowser.contentDocument.location.href;
>   doc = thisBrowser.docShell.document;//null;//
>   saveInternal(ref, doc, false);

Do a debug rebuild, then when you run the code, make sure you have 2 pages (2
tabs) open, then right-click and select 'save page as'. Two dialogs pop up (i.e
one for each tab) - very quickly click save in both of them. If both are
running at the same time all sorts of errors can occur, for example an error
reporting 'code not reached' followed by an empty (and seemingly orphaned)
download dialog. Sometimes there are many errors but both pages are saved (and
sometimes not).

Whether errors occur may depend on the speed of your system, luck, and how fast
you click in the save-as dialogs (I'm running a 2.6G P4 with 512M on XP SP1).
Of course the code I was working on has more smarts (eg checking whether
browser 2 exists) and actually suppressing the save-as dialog (which can make
the problem more pronounced). However the simple change above is enough to
cause the problem.
So based on the thread in n.p.m.netlib it sounds like things are racing
_somewhere_.  I just scanned, and neither the contentAreaUtils code nor the uri
checker really use any global vars....

Brendan, any idea how to debug possible races in the contentAreaUtils.js code?
How is the saved-to filename chosen?  Is nsIFile.createUnique used?

I don't see threads involved here, but I'm not looking hard.

/be
Hmm.... If the filepicker is used, save-to filename is chosen using the
filepicker and the file is not created till later, so things _could_ end up with
the same filename if the second filepicker uses the same filename and is
dismissed before the first download has created the file.  When not using the
filepicker, I'm not sure.  Phil?

As for threads, there are necko threads (nsURIChecker) involved.  There are also
nested event queues involved (at least if filepickers go up).  So there are
various race opportunities.
(In reply to comment #10)
> Hmm.... If the filepicker is used, save-to filename is chosen using the
> filepicker and the file is not created till later, so things _could_ end up with
> the same filename if the second filepicker uses the same filename and is
> dismissed before the first download has created the file.  When not using the
> filepicker, I'm not sure.  Phil?

In my full-blown version, after selecting the destination directory (only once
before processing the tabs), for each tab (page) I see if that file already
exists (and if it does then give the page to be saved a new name). Since I
suppress the 'save-as' dialog (because I am automatically choosing the name),
any conflict caused by the user is avoided.
The potential exception to this is if the first page is about to be saved (but
nothing is yet on disk) and then the filename for the second page is checked
(!exists), but it's actually the same as the first page. So a conflict occurs
when the second page is saved. (That'll be a nasty one, but could occur with any
application.)

One major point to consider is that I think the filepicker is a red-herring, and
the problem occurs before this. I.e after the urichecker has done its thing, the
callback (onStopRequest) to the code in contentAreaUtils doesn't always fire for
both pages. So we haven't yet got to the filechecker code!

BTW, the point of doing all this is that I've added a 'save all pages (tabs)'
feature (obviously different from bookmark all tabs). It works fine except for
this race condition. I'd like to submit it as a proposed enhancement.
Because I'm relatively new to XPCOM and the Moz codebase, I decided to make use
of the existing 'save page as' code. Nice also to re-use code. However I was
reluctant to rearrange the existing code to do all this.... maybe I should,
although in reality not much would change.
Brendan, is the way |new nsHeaderSniffer| used at
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/contentAreaUtils.js#255
and the way it's defined at
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/contentAreaUtils.js#411
and
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/contentAreaUtils.js#440
ok?

In particular, are the assignments to this.xxxx in nsHeaderSniffer() fine, given
that those properties are not declared on the prototype?

Really, we just need to remove all this code, though.  See bug 160454
> In particular, are the assignments to this.xxxx in nsHeaderSniffer() fine, given
> that those properties are not declared on the prototype?

Why not?  Nothing's declared, and it's fine to set instance properties in a
ctor, whether or not they override proto-properties.

/be

Hey, I'm just looking for possible reasons that onStopRequest() wouldn't get
called on the two totally different objects we're creating here....
(In reply to comment #12)
> Really, we just need to remove all this code, though.  See bug 160454

Interesting - but surely most of those things need to happen somewhere? I.e if
the  page isn't cached, check the URL & get the page so it can be saved. And
under what circumstances do you check the URL anyway, because it may be out-of-date?
A little off-topic, but I've looked at bug 160454, but could you point me in the
direction of more info about the nature of the HEAD problem (email me, thanks)?
Seems resolving that will have a major bearing on this code.
After discussion with Boris & Beisi a few weeks back, it seems something
related to this area is a desire to remove the header sniffer stuff. I've done
this, and will attach the modified file (contentAreaUtils.js). I've been using
these changes for several weeks and it seems fine. I see there is some
discussion on this topic at the moment in Bug 160454 (remove HEAD requests) so
I should really be posting there ;-)

To anyone looking at the code, you'll see I broke the 'internalSave' up a bit
and it has a couple of extra parameters here & there. I've done this because
I've written extra code to allow a 'save-all-tabs' option (where the user is
NOT prompted when saving all files - I've omitted that extra code here to avoid
clouding the issue). If the code meets with approval, we could also trim the
comments back a little.

Getting back to the subject of this bug, I find when I attempt to save 2 (or
more) tabs simultaneously (via my new 'save-all-tabs' option), I get an error:
###!!! ASSERTION: CachedChromeStream doesn't receive data: 'Not Reached', file
e:/MozillaBuilds/mozilla/content/xul/document/src/nsXULDocument.cpp, line 4150

This problem still occurs in the new code (i.e new code without all the HEAD
request stuff). After clearing the error dialog, the multiple pages are all
saved, but there is an 'orphaned' save-as dialog left on the screen which never
goes away. If I do save-all-tabs again (and again), there is no error. 

I've looked at nsXULDocument.cpp, but I got a little lost... Any ideas?

Anyway, please review the new contentAreaUtils.js..... :)
Attached image Code not reached dialog
See also snapshot of 'code not reached' dialog.
(In reply to comment #16)
> Created an attachment (id=151280)
> New contentAreaUtils.js with HEAD sniffer code removed.

could you attach this as a patch?
(cvs diff -u6p filename > patchname)
Attached patch Diff of changesSplinter Review
(In reply to comment #18)
> could you attach this as a patch?
> (cvs diff -u6p filename > patchname)

Yep, sorry about that, here it is
Flags: blocking1.8a2?
Comment on attachment 151287 [details] [diff] [review]
Diff of changes

patching file contentAreaUtils.js
Hunk #2 FAILED at 216.

caused by the change of _content to content, I think
Comment on attachment 151287 [details] [diff] [review]
Diff of changes

general comment about the patch... please try to limit lines to 80 characters.

-  // In both cases here, we want to use cached data because the
-  // document is currently visible.
+  // In both cases here, we want to use cached data because the 
+  // document is currently visible. 

please don't add trailing whitespace

+  var browser = getBrowser();

getBrowser() is in navigator.js... I'm not sure if depending on browser/ from
communicator/ is a good idea...

oh nevermind, this function is already used in that file. grrr.

just curious... is there a reason why you renamed saveInternal to internalSave?

+//   aURL		  - The String representation of the URL of the
document being saved, according to
+//			    the browser window, or supplied with 'save-link'
+//   aUriObj		  - If available, the URL of the document being saved
as an nsIURL object

hmm. I don't like this. why not use a single parameter, that is an nsIURI?

+//   aUserTypedValue	  - The URL as supplied by the user (aURL may not match
this if the document is not
+//			    yet available, eg it is 'about:blank' because the
page has not yet been cached)

I don't understand this comment. OK, say I want to save an http document, that
is cached. what would I pass here?

This comment makes it sound like this is the value of the urlbar. is it? if so,
how does it differ from the uri object?

ok, let me try to read the code to understand this parameter...

      if (aUserTypedValue && (aUserTypedValue != aURL) && (aURL ==
"about:blank")) {
can you explain this?

So I suspect this will cause severe problems if the user typed a new value in
the urlbar but did not yet hit enter, and then decides to save the current
page, since getBrowser().userTypedValue will be that new value of the urlbar
then.
What does this fix?

+//   aShouldBypassCache  - Like it says

"if true, the document will always be refetched from the server"

+//   aAutoChosenData	  - If non-null this contains pre-determined data so
that the user does not
+//			    need to be prompted for a target filename.

OK, this one is always null in the callers of this patch, so it seems this is
something you want to offer to extensions. fine with me, but you really should
document what kind of data this expects. a string indicating the target
filename? nsIFile indicating the target filename? indicating the directory?

please document this in the comment.

personally I like doxygen-style comments, like:
/**
 * Used when saving a document or URL, this method:
 * [...]
 * @param aURL	  The string representation [...]
 */

but, I guess I have to be glad that a comment even exists ;)

+// This functionality rearranged for Mozilla 1.8+ to get rid of the
HeaderSniffer code.
+// May 2004.

no need to use comments for history purposes, we have a revision control system
(CVS).

+  var contentEncodingType = (aDocument ? aDocument.actualEncoding : null);

uh, document.actualEncoding is a character set (in fact, it is the very same
thing as document.characterSet, at least in mozilla). this means that this is
wrong.

+    getUriAndFileInfo(fileInfo, aURL, aUriObj, aDocument, aUserTypedValue,
contentType);

why does this function not just return the fileInfo?

  //	  What we should probably do is change nsWebBrowserPersist::SaveURI to
download everything
I'm not sure that that's the way to go... I guess that I'd instead construct a
document in the caller, and then pass that to wbp.

-  // If we're saving a document, and are saving either in complete mode or
+  // If we're saving a document, and are saving either in complete mode or 

trailing whitespace

+    target	 : (!aAutoChosenData ? makeFileURI(fpParams.fp.file) :
makeFileURI(aAutoChosenData.file)),

nsIFilePicker has this nice attribute:
134   * Get the nsIFileURL for the file or directory.
135   *
136   * @return Returns the file currently selected
137   */
138   readonly attribute nsIFileURL fileURL;

oh, the old code didn't use it either... ah well. please change it while you're
touching this code :)

Also... can you change it to read something like:
target: aChosenData ? makeFileURI(aData,file) : fp.fileURI;
i.e. don't negate aChosenData

-  else
+  else 
-
+    

I do wonder how you manage to consistently add trailing whitespace.

+      var destFileName = (!aAutoChosenData ? fpParams.fp.file :
aAutoChosenData.file);

again... please aChosenData ? aChoseData.file : fp.file
(that's easier to read, imo)

-
+      
-      encodingFlags |= nsIWBP.ENCODE_FLAGS_NOFRAMES_CONTENT;
+      encodingFlags |= nsIWBP.ENCODE_FLAGS_NOFRAMES_CONTENT;	     

again...

+    dl.init((!aAutoChosenData ? fileInfo.uriObj : aAutoChosenData.uri),

again (different one than right above though :) )

+    persist.saveDocument(persistArgs.source, persistArgs.target, filesFolder, 

more trailing ws

+    dl.init((!aAutoChosenData ? source : aAutoChosenData.uri),

again, please reverse the condition

+  this.uriFileName = aUriFileName;
+  this.uriFileBaseName = aUriFileBaseName;
+  this.uriFileExt = aUriFileExt;
+  this.uriObj = aUriObj;

I'm not particularly happy with this naming.
how about: fileName, fileBaseName, fileExt, uri

+// Structure for holding info about automatically supplied parameters for the
internalSave(...).

oh, you have the structure even defined here. that really should be mentioned
in the comment of internalSave.

+//   aUriAutoChosen  - This is the nsIURI object for the target

why do both have to be supplied? I mean, there's a bijective mapping from
nsIFile <-> nsIFileURL, isn't there. so internalSave, or maybe this function,
could determine the URL automatically.

+// method which utilises @mozilla.org/network/io-service;1).

I'd just mention "which uses nsIIOService", doesn't really matter though

+function getUriObj(aURL)

this method signature makes me really unhappy. just like makeURI's signature.
they are broken by design. they really need to have a charset parameter for
proper support of URLs containing non-ascii characters. (the character set can
be gotten off the document)

I'm not convinced of the need for this function. Its name is misleading, too,
since it seems redundant with makeURI.

I'd just move the QI to the top of internalSave, with the change to always pass
an nsIURI to internalSave. Or more to the point, I'd move it to
getUriAndFileInfo

(I also don't see a need for this nested try..catch)


    // Special case is user selects 'stop' before anything loaded (when
parameter
    // aDocument is 'about:blank'):
ahh. there comes the userTyped thing from. I rather think that we should save
what's displayed though (i.e. nothing).

What if some document is displayed, I type a url, then hit stop soon enough.
how is that different from your about:blank example?

+    aFI.uriFileName	 = (uO && uO.fileName ? uO.fileName : null);

aFI.uriFileName = uO && uO.fileName;

(same for the next three lines)

I have to say, I hate the use of this uO abbrevation. imho, it makes the code
less readable, since uO is really an abbreviation that makes no sense.

+    // decide a default extension. Should we do this?

I do think we should, since people want to be able to double-click saved web
pages and open them in the browser.

+    if (aFI.uriFileName && !aFI.uriFileExt) {

hm, you only need to do this if the above if wasn't taken, since it already has
a call like the one in this if, right?
but maybe you should instead move the call to getDefaultExtension after the
first if. something like:

    // from the previous incantation of this code...
    if (!aFI.uriFileName && !aFI.uriFileExt) {
      aFI.uriFileName	  = getDefaultFileName(aFI.uriFileName, aDocument,
aFI.uriObj);
      aFI.uriFileBaseName = getFileBaseName(aFI.uriFileName, aFI.uriFileExt);
    }
    if (aFI.uriFileName && !aFI.uriFileExt) {
      aFI.uriFileExt	  = getDefaultExtension(aFI.uriFileName, aFI.uriObj,
aContentType);
    }

+  } catch (e) {
+  }

hmm. so failure to access (say) the filename from the URL will not set any
filename hint. that seems suboptimal to me. why don't you put this catch after
the uO.file* accessors?

  // Ignore the supplied aFileName... well err... the old code seemed to
anyway...
that doesn't seem right... it was used in case 4 of getDefaultFileName, no?

can you explain your changes to getDefaultFileName?

manageFilePicker:
      fp.filterIndex = aFpP.prefs.getIntPref("save_converter_index");
hm... does setting the filterIndex work, when the filters haven't been appended
yet?

sigh, this would've been easier to review if you hadn't rearranged the lines in
manageFilePicker, compared to the old foundHeaderInfo


review- for now.
Attachment #151287 - Flags: review-
I'd suggest, though, to post further patches to bug 160454 since that's the main
issue this patch fixes...
(In reply to comment #21)
Thanks for the comments... sorry I don't know much about the codebase (hence my
suggestion that there may be 'better' ways to do bits) - see my responses inline

> general comment about the patch... please try to limit lines to 80 characters.
Sorry about violating that edict... I think there were already some lines longer
than this so I was happily trying to operate within the bounds of "when in
someone elses source code follow their coding style"

> please don't add trailing whitespace
Damn, that is one of my pet hates, so that embarasses me :-[
I'm more likely to be caught trimming someone *elses* trailing whitespace

> just curious... is there a reason why you renamed saveInternal to internalSave?
The main reason was to highlight that the signature (parameters) was completely
different, should anyone be looking for the old name. We can use the old name if
you prefer?

> 
> +//   aURL		  - The String representation of the URL of the
> document being saved, according to
> +//			    the browser window, or supplied with 'save-link'
> +//   aUriObj		  - If available, the URL of the document being saved
> as an nsIURL object
> 
> hmm. I don't like this. why not use a single parameter, that is an nsIURI?
Yes, a little ugly, like denormalised data. However the nsIURI parameter in this
'standard' code is always null, whereas it is supplied in my 'save all' code.
Maybe I should rethink that.

> +//   aUserTypedValue	  - The URL as supplied by the user (aURL may not match
> this if the document is not
> +//			    yet available, eg it is 'about:blank' because the
> page has not yet been cached)
> 
> I don't understand this comment. OK, say I want to save an http document, that
> is cached. what would I pass here?
> 
> This comment makes it sound like this is the value of the urlbar. is it? if so,
> how does it differ from the uri object?
Hmmm, maybe that isn't clear enough. What I found happens is (for example) you
can enter "www.google.com", but the browser is not online (eg you are waiting
for the PC to dial). If you choose to save at this point, the internalSave
method is presented with a URL (doc) "about:blank", and the users typed value
("www.google.com") can be found in <browser>.userTypedValue.
Although in this scenario the context menu is not available (or at least not a
save option), you can start the save with CTRL-S.
Yes, I guess there is a urlbar to correctly retrieve this from...

> ok, let me try to read the code to understand this parameter...
> 
>       if (aUserTypedValue && (aUserTypedValue != aURL) && (aURL ==
> "about:blank")) {
> can you explain this?
> 
> So I suspect this will cause severe problems if the user typed a new value in
> the urlbar but did not yet hit enter, and then decides to save the current
> page, since getBrowser().userTypedValue will be that new value of the urlbar
> then.
> What does this fix?
I was trying to target the above situation where the internalSave method is
presented with a URL "about:blank", but that is infact because the users
requested page has not yet been retrieved.
So I guess if the user has typed a new URL, not pressed enter and yet has chosen
save, I'd suspect that they typed the new URL for a reason?

> +//   aShouldBypassCache  - Like it says
> 
> "if true, the document will always be refetched from the server"
ok ;-) I was getting lazy... and it wasn't documented before :-)

> +//   aAutoChosenData	  - If non-null this contains pre-determined data so
> that the user does not
> +//			    need to be prompted for a target filename.
> 
> OK, this one is always null in the callers of this patch, so it seems this is
> something you want to offer to extensions. fine with me, but you really should
> document what kind of data this expects. a string indicating the target
> filename? nsIFile indicating the target filename? indicating the directory?
Well-noted, I can include the prototype for aAutoChosenData.
BTW, should something like 'save all open documents' be an extension or a new
option in the standard Mozilla product (as I was hoping)?

> personally I like doxygen-style comments, like:
<snip> AKA javadoc style comments - I'm happy to do that

> +  var contentEncodingType = (aDocument ? aDocument.actualEncoding : null);
> 
> uh, document.actualEncoding is a character set (in fact, it is the very same
> thing as document.characterSet, at least in mozilla). this means that this is
> wrong.
Oh? Not sure where that bit come from, will look into it... but hasn't been
causing me any problems ;-)

> +    getUriAndFileInfo(fileInfo, aURL, aUriObj, aDocument, aUserTypedValue,
> contentType);
> 
> why does this function not just return the fileInfo?
Because in the early days I was having trouble creating a FileInfo object inside
the method, and it was going out of scope when the method finished. Javascript
isn't really my cup of tea... but I'll look at it again

> nsIFilePicker has this nice attribute:
<snip>
> 138   readonly attribute nsIFileURL fileURL;
> 
> oh, the old code didn't use it either... ah well. please change it while you're
> touching this code :)
Noted

> Also... can you change it to read something like:
> target: aChosenData ? makeFileURI(aData,file) : fp.fileURI;
> i.e. don't negate aChosenData
Yep... just trying to keep the 'standard' flow of program contol first in the
"if...else"

> I do wonder how you manage to consistently add trailing whitespace.
Sorry again, must be my damn microsloth editor :-) And had my diff tool set to
ignore changes in whitespace :-[

> I'm not particularly happy with this naming.
> how about: fileName, fileBaseName, fileExt, uri
okey dokey

> +// Structure for holding info about automatically supplied parameters for the
> internalSave(...).
> 
> oh, you have the structure even defined here. that really should be mentioned
> in the comment of internalSave.
oops, so I do :-[

> +//   aUriAutoChosen  - This is the nsIURI object for the target
> 
> why do both have to be supplied? I mean, there's a bijective mapping from
> nsIFile <-> nsIFileURL, isn't there. so internalSave, or maybe this function,
> could determine the URL automatically.
Ok, again that's my unfamiliarity with what XPCOM can do for me (and not being
able to find enough doc's on what it does ;-) )

> +function getUriObj(aURL)
> 
> this method signature makes me really unhappy. just like makeURI's signature.
> they are broken by design. they really need to have a charset parameter for
> proper support of URLs containing non-ascii characters. (the character set can
> be gotten off the document)
> 
> I'm not convinced of the need for this function. Its name is misleading, too,
> since it seems redundant with makeURI.
> 
> I'd just move the QI to the top of internalSave, with the change to always pass
> an nsIURI to internalSave. Or more to the point, I'd move it to
> getUriAndFileInfo
OK

>     // Special case is user selects 'stop' before anything loaded (when
> parameter
>     // aDocument is 'about:blank'):
> ahh. there comes the userTyped thing from. I rather think that we should save
> what's displayed though (i.e. nothing).
It does... sort-of.. i.e the contents of about:blank
"<html><head><title></title></head><body></body></html>"
(interestingly the doctype info is missing when this is saved but I guess this
is something that WBP does when saving)

> What if some document is displayed, I type a url, then hit stop soon enough.
> how is that different from your about:blank example?
It's all a big race really - I guess if the old doc is still displayed when this
method is triggered then that doc is what is presented & processed, whereas if
the browser is blank (in preparation for the new url) then 'about:blank'
document is presented.

> I have to say, I hate the use of this uO abbrevation. imho, it makes the code
> less readable, since uO is really an abbreviation that makes no sense.
Noted. That was 'uO' for 'uri Obj' :-[

> but maybe you should instead move the call to getDefaultExtension after the
> first if. something like:
> 
>     // from the previous incantation of this code...
>     if (!aFI.uriFileName && !aFI.uriFileExt) {
>       aFI.uriFileName	  = getDefaultFileName(aFI.uriFileName, aDocument,
> aFI.uriObj);
>       aFI.uriFileBaseName = getFileBaseName(aFI.uriFileName, aFI.uriFileExt);
>     }
>     if (aFI.uriFileName && !aFI.uriFileExt) {
>       aFI.uriFileExt	  = getDefaultExtension(aFI.uriFileName, aFI.uriObj,
> aContentType);
>     }
> 
> +  } catch (e) {
> +  }
> 
> hmm. so failure to access (say) the filename from the URL will not set any
> filename hint. that seems suboptimal to me. why don't you put this catch after
> the uO.file* accessors?
Noted. This seemed to be a fiddle to get working in all cases but will look at
it again.

>   // Ignore the supplied aFileName... well err... the old code seemed to
> anyway...
> that doesn't seem right... it was used in case 4 of getDefaultFileName, no?
oops :-[ Maybe I got side-tracked by function saveDocument... fill fix up.

> can you explain your changes to getDefaultFileName?
Sure. Obviously step1 got removed because that was derived from header-sniffer
stuff. I replaced this with the code stating that if aDefaultFileName is not
blank, then use it (I mean if it's ok, it was supplied, then why not?)
Next, "if (aDocumentURI)" - i.e why do that block of code if aDocumentURI is
null? However, the 'catch' will not get executed either... but I'd assumed that
aDocument and aDocumentURI are related, so should the catch get executed if
aDocumentURI is null?
Similarly further down, "if (aDocumentURI)", avoid exceptions if aDocumentURI is
null.

> manageFilePicker:
>       fp.filterIndex = aFpP.prefs.getIntPref("save_converter_index");
> hm... does setting the filterIndex work, when the filters haven't been appended
> yet?
Appears to be ok (I haven't had any problems), but that is a good point - I
probably got those bits around the wrong way while moving things :-[

> sigh, this would've been easier to review if you hadn't rearranged the lines in
> manageFilePicker, compared to the old foundHeaderInfo
Not trying to make things hard, I've simply been thru so many iterations of
changes that maybe I didn't leave things as tidy as possible. Will get back to
the old order of things (unless something doesn't seem right, in which case I'll
say so).


Have been snowbound, so hopefully can get on to this in the next day or so.
> So I guess if the user has typed a new URL, not pressed enter and yet has chosen
> save, I'd suspect that they typed the new URL for a reason?

When I do this, it's because I realize that I want to save the page before I
replace its content (usually a receipt of some kind; I don't save much else).
(In reply to comment #24)
> > So I guess if the user has typed a new URL, not pressed enter and yet has chosen
> > save, I'd suspect that they typed the new URL for a reason?
> 
> When I do this, it's because I realize that I want to save the page before I
> replace its content (usually a receipt of some kind; I don't save much else).

good point
(In reply to comment #23)
> Sorry about violating that edict... I think there were already some lines longer
> than this so I was happily trying to operate within the bounds of "when in
> someone elses source code follow their coding style"

hm, in the case of line length, I don't think improving upon current file style
is a bad thing ;)

> > please don't add trailing whitespace
> Damn, that is one of my pet hates, so that embarasses me :-[
> I'm more likely to be caught trimming someone *elses* trailing whitespace

heh :)

> > just curious... is there a reason why you renamed saveInternal to internalSave?
> The main reason was to highlight that the signature (parameters) was completely
> different, should anyone be looking for the old name. We can use the old name if
> you prefer?

No, the new name is fine.

> Yes, a little ugly, like denormalised data. However the nsIURI parameter in this
> 'standard' code is always null, whereas it is supplied in my 'save all' code.
> Maybe I should rethink that.

Oh... hm...  Well I don't want to mark review+ on a patch that has both
parameters; please decide on one :) I'd prefer nsIURI, but I'd accept strings,
too. The problem with strings is that they are not necessarily sufficient to
identify a URI, if non-ASCII characters are involved.

> ok ;-) I was getting lazy... and it wasn't documented before :-)

have to seize the opportunity that someone is willing to write documentation :)

> BTW, should something like 'save all open documents' be an extension or a new
> option in the standard Mozilla product (as I was hoping)?

hm, I think that'd be nice for standard mozilla, but talk to neil about that

> Oh? Not sure where that bit come from, will look into it... but hasn't been
> causing me any problems ;-)

hmmmm.... I wonder if you actually need it... I suspect for currently-displayed
pages, you always want to decode the content.

On the other hand... I wonder what happens when you're saving ("save link
target") a .tar.gz file, sent with Content-Encoding:gzip

it should not be decoded, and I think the current code gets that right.

> > +//   aUriAutoChosen  - This is the nsIURI object for the target
> > 
> > why do both have to be supplied? I mean, there's a bijective mapping from
> > nsIFile <-> nsIFileURL, isn't there. so internalSave, or maybe this function,
> > could determine the URL automatically.
> Ok, again that's my unfamiliarity with what XPCOM can do for me (and not being
> able to find enough doc's on what it does ;-) )

to go from nsIURI to nsIFile, you QueryInterface to nsIFileURL and get the file
attribute
for the other way, you can use newFileUri like your patch does.

> > can you explain your changes to getDefaultFileName?
> Sure. [...]

I'll read that after having had more than 4 hours of sleep :)

Flags: blocking1.8a2? → blocking1.8a2-
(In reply to comment #21)
> +//   aURL		  - The String representation of the URL of the
> document being saved, according to
> +//			    the browser window, or supplied with 'save-link'
> +//   aUriObj		  - If available, the URL of the document being saved
> as an nsIURL object
> 
> hmm. I don't like this. why not use a single parameter, that is an nsIURI?
Removed aUriObj - it was redundant.

> +  var contentEncodingType = (aDocument ? aDocument.actualEncoding : null);
> 
> uh, document.actualEncoding is a character set (in fact, it is the very same
> thing as document.characterSet, at least in mozilla). this means that this is
> wrong.
Can you give me some hints about how to do this? I can see how to get the
encoding type in the old header sniffer code (given a channel), but not sure how
to start if the doc is already cached. I've seen the page you published
(mimetypes.html on Mozilla.org), but couldn't get a starting point as to what to
do for encoding types...

> +//   aUriAutoChosen  - This is the nsIURI object for the target
> 
> why do both have to be supplied? I mean, there's a bijective mapping from
> nsIFile <-> nsIFileURL, isn't there. so internalSave, or maybe this function,
> could determine the URL automatically.
For the normal code yes, but when using the new 'save all' (where the
'auto-chosen' data is actually used), the filename to use may not come directly
from the URL. Eg this is the case if the filename is already used and the 'save
as' code is allowed to automatically pick a new name.


The code is barely a tiny bit smaller, but file has 2.6k more doxygen comments :-[
And I found an obscure bug which hopefully I should fix soon...

I'll post the next patch to bug 160454...
(In reply to comment #27)
> Can you give me some hints about how to do this? I can see how to get the
> encoding type in the old header sniffer code (given a channel), but not sure how
> to start if the doc is already cached.

I don't think you need this at all... it seems to me like we always want to
apply decoding if we are curently displaying the page. I think. bz, thoughts?
That's correct.  The times we want to not apply decoding are the times when we
want to save a gzipped file because that's the "real" file (eg a .tar.gz should
not be decompressed).

One exception I can think of is a .ps.gz that we render with a plugin; that sort
of think we _may_ want to save as a .ps.gz...
oh... re the getDefaultFilename changes... a "diff -w" would've really helped


I'm not sure if I mentioned this already, but:
+  if (aDefaultFileName && (aDefaultFileName != ""))

the second part is not necessary, "" tests as false in js
(In reply to comment #29)
> That's correct.  The times we want to not apply decoding are the times when we
> want to save a gzipped file because that's the "real" file (eg a .tar.gz should
> not be decompressed).
> 
> One exception I can think of is a .ps.gz that we render with a plugin; that sort
> of think we _may_ want to save as a .ps.gz...

So do you mean that we should just use the outermost encoding? And is this what
document.contentType is? (I guess it may be for rendered docs, but not for
non-document objects (eg images), and what do we do when we are saving a link?)
(In reply to comment #30)
> oh... re the getDefaultFilename changes... a "diff -w" would've really helped
> 
> 
> I'm not sure if I mentioned this already, but:
> +  if (aDefaultFileName && (aDefaultFileName != ""))
> 
> the second part is not necessary, "" tests as false in js

<rant>
I'm not sure if I'm keen on taking the odd shortcuts available in javascript;
I'm not overly keen working in a weakly-typed environment like JS.
Eg there was a suggestion to change:
    aFI.uriFileName = (uO && uO.fileName ? uO.fileName : null);
to:
    aFI.uriFileName = uO && uO.fileName;

When the filename does not exist in the uO object, the 'simplified' line yields
void, which in my books is not the same as null (call me old fashioned).
Similarly when a method is called and 1 or more parameters are left out by the
caller - this can have the ugly result of (eg) a boolean parameter having the
value of void. Almost as bad (IMHO) in the old code is where null was passed to
a method where a boolean was expected, and vice-versa...
Maybe in the world of javascript, void/null/false are generally all equivalent,
but I noticed subtle quirks in venkman when I took that approach...
</rant>

Maybe this isn't the best place for a rant :-[
> So do you mean that we should just use the outermost encoding?

I'm not sure what you mean by that.

> And is this what document.contentType is?

document.contentType is a type.  It's never an encoding.

> (I guess it may be for rendered docs, but not for non-document objects (eg
> images)

document.contentType works just fine for standalone images and plugins...

> and what do we do when we are saving a link?)

We guess the type based on the extension of the URI.

> I'm not overly keen working in a weakly-typed environment like JS.

yeah well. me neither. let's move this code to c++ ;) (just kidding)
> > I'm not overly keen working in a weakly-typed environment like JS.
> 
> yeah well. me neither. let's move this code to c++ ;) (just kidding)

Don't write JS as if it were C -- that is bound to get you in trouble. Learn the
language, including its idioms. Use them well. Otherwise, you should confine
yourself to C or C++, and be prepared to have patches that should have been
written in JS rejected.

Programming is not about making the problem fit your favorite language,
especially not if that language is the wrong tool for the job.

Time_lord: please report "subtle quirks in Venkman" as bugs.  JS, like many
other dynamic languages, is best used in ways that do not test (arg === false)
when (!arg) is shorter, more efficient, and more generic. 

Why is this bug still UNCONFIRMED, yet being patched?

/be
(In reply to comment #33)
> > So do you mean that we should just use the outermost encoding?
> 
> I'm not sure what you mean by that.
> 
> > And is this what document.contentType is?
> 
> document.contentType is a type.  It's never an encoding.

Ok, that all came from the comments in the old code, i.e

288   if (!isDocument && !shouldDecode && contentEncodingType) {
289     // The data is encoded, we are not going to decode it, and this is not a
290     // document save so we won't be doing a "save as, complete" (which would
291     // break if we reset the type here).  So just set our content type to
292     // correspond to the outermost encoding so we get extensions and the like
293     // right.
294     contentType = contentEncodingType;

So this is where I'm seeing a blurring between the encoding type of a document
and the content type. I guess when the document shouldn't be decoded then the
content type is more closely related to the non-decoded encoding... ;-)

> 
> > (I guess it may be for rendered docs, but not for non-document objects (eg
> > images)
> 
> document.contentType works just fine for standalone images and plugins...
> 
> > and what do we do when we are saving a link?)
> 
> We guess the type based on the extension of the URI.

So what I don't understand is how we get the encoding type when we have a
document (i.e similar to when we want the contentType and we have a document, we
can use document.contentType).
I.e what document field do we access (nice if there was a
document.contentEncodingType) or what XPCOM object do we use, etc. I assume we
shouldn't ALWAYS just examine the extension of the URL?
> I guess when the document shouldn't be decoded then the content type is more
> closely related to the non-decoded encoding... ;-)

Exactly.

> So what I don't understand is how we get the encoding type when we have a
> document

If we need that information, we have to add it to documents. They don't store
that right now.
(In reply to comment #35)
> Time_lord: please report "subtle quirks in Venkman" as bugs.
Sorry, I should have been more specific and stated 'subtle behaviour I observed
in the code while I was running it in venkman'... although it is a while now
since I observed whatever it was that made me shudder and got to me...

> 
> Why is this bug still UNCONFIRMED, yet being patched?
Probably because the bug (the original topic of this bug) IS still unconfirmed
(by someone other than me, the reporter). The patch really belongs with bug
160454 (and I'll post any further patches there). The recent discussion has
become off-topic WRTO this bug - shall we take the remainder of that bit off-line?
> Ok, that all came from the comments in the old code, i.e

ah... yeah, well, necko code maps Content-Encoding types a server sends (e.g.
"gzip") onto content types (e.g. "application/x-gzip" (iirc)) before other code
sees it. So, content-encodings can be interpreted as content types. however,
they are different things.
(In reply to comment #29)
> That's correct.  The times we want to not apply decoding are the times when we
> want to save a gzipped file because that's the "real" file (eg a .tar.gz should
> not be decompressed).
> 
> One exception I can think of is a .ps.gz that we render with a plugin; that sort
> of think we _may_ want to save as a .ps.gz...

I have a stupid question - when DO we want to apply decoding??
If you're saving a file called "index.html" that claims to be of type text/html
and gzip-encoded, for example.
(In reply to comment #41)
> If you're saving a file called "index.html" that claims to be of type text/html
> and gzip-encoded, for example.

But if we're saving a document (which was originally a gzipped text/html file)
which has been passed to this method, then it has already been decoded (if
necessary) prior to rendering, right? So for the purposes of this method, why
would we care about the original encoding type? Is it because (a) the
WebBrowserPersist loads the document out of the cache and the cache stores the
original encoded file, or (b) because the WebBrowserPersist may be told to
reload the document from the original source (even if we already have it cached)
in which case we want the re-retrieved document automatically decoded (just as
it has been when it was first retrieved)?
If it's (a) then is this correct? Because afterall we are passing the document
to the WebBrowserPersist...
If it's (b), is there any harm in assuming that decoding *may* be necessary
(will WBP handle it when it isn't necessary), i.e always assume we should decode:

  if (shouldDecode)
    persist.persistFlags &= ~nsIWBP.PERSIST_FLAGS_NO_CONVERSION;

Or is there something I'm missing? (Likely ;-) )

... not forgetting that we established (#29) that *generally* we don't want to
decode when doing a 'save-link'...
> Is it because (a) the WebBrowserPersist loads the document out of the cache and
> the cache stores the original encoded file

For the "save as HTML only" case this is exactly what happens.  There is no way
to reconstruct the original HTML source from the document object, in general. 
So when saving the raw HTML we reget it as binary data from the cache.

As for save link, that's actually the hard case.  We _really_ have to guess
whether to decode or not there...  The easy case is when we have a document
rendered; chances are we definitely want to decode then.
(In reply to comment #43)

So what would be the effect of always setting:
    persist.persistFlags &= ~nsIWBP.PERSIST_FLAGS_NO_CONVERSION
regardless of encoding type?

Other than that, I'm really stuck at the moment
The effect would be that sometimes you would save a file called index.html that
contained gzip data.
>If it's (a) then is this correct? Because afterall we are passing the document
>to the WebBrowserPersist...

not in the case when you call SaveURI - that does not get the document.
(In reply to comment #43)
> As for save link, that's actually the hard case.  We _really_ have to guess
> whether to decode or not there...  

Or webbrowserpersist could be hacked to ask the helperappservice in
onstartrequest whether to do decoding
> Or webbrowserpersist could be hacked to ask the helperappservice

Oh, right.  That's what I wanted to do -- change applyconversion from a boolean
to a tristate: yes, no, "decide yourself as best you can".
(In reply to comment #47)
> Or webbrowserpersist could be hacked to ask the helperappservice in
> onstartrequest whether to do decoding

ok I'll take a look...
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
ok, responding to that auto-resolve warning. I haven't had time to come back to
this but suspect this is still a problem.
Assignee: file-handling → nobody
QA Contact: ian → file-handling
Product: Core → Firefox
Version: Trunk → unspecified
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: