Closed Bug 118251 Opened 19 years ago Closed 19 years ago
whitespace formatting should not be reworked
I believe our goal for saving as "complete" (poor name, as it redoes links, etc) should be to modify the original document as little as necessary. When I save my personal page, I notice it removes the <?xml version> at the top, and compresses the doctype, html, head, and title tags all into a single row. Additionally, the </head> tag is compressed onto the line with the last <link> on it. This makes the layout quite ugly, and very difficult to notice some of the tags. Expected: whitespace formating would not be reworked. Example is attached, as long lines can't be reproduced in a bugzilla comment.
sounds like a back end bug in nsWebBrowserPersist
Assignee: law → adamlock
Yep, all those newlines before <body> is verbatim.
Notice it also affects </body></html>
I am not sure what I can do about this. The XPFE is telling webbrowserpersist to fixup and save the DOM document and it does so by feeding the loaded DOM tree into the document encoder for saving. It is very likely that any of the original formatting will be gone when it comes out the other end. Regarding the <?xml> tag being stripped. This could be an issue with webbrowserpersist as it stands. Internally it always tells the encoder to save out as text/html and this may be causing the wrong encoder to be invoked, causing odd nodes such as this to be stripped. I will changed the title of this bug to cover that issue until I know for sure.
Summary: xml/doctype/html/head whitespace compressed in save as: complete web page → Webrowserpersist hardcodes text/html for saving DOM documents
Adam, are you saying that the _DOM_ is saved? So if the page contains document.write() calls, the script is saved _and_ the written content is saved? So when loaded again the written content is present twice?
Actually, that exact thing happened to me the other day (on a differant page) and I didn't think anything of it at the time. Yep, all document.write contents was presented twice. I had 2 ad banners and 2 webtrends page counter/spy things.
Boris, yes I am. The double writing thing issue for embedding covered by bug 115328, but in Mozilla whatever code is telling the webbrowserpersist object to save a DOM needs to pass it the original DOM one not the modified one.
Bug 118792 filed.
Bug 110135 is checked in. It should now be possible to specify encoding flags and a mime type in the call to saveDocument to control how a document is saved.
FWIW, after Bens Save check-in last night, loading the saved page via: file and saving again removes most of the extra blank lines. Still some places where the formating changes though.
Webbrowserpersist still uses text/html as the default document encoder but allows the caller to specify any content type they like.
Target Milestone: --- → Future
A combination of tests to figure out the correct content type for xml, html, xhtml * QI for nsIDOMHTMLDocument * Examine doctype to see if is named "html" and whether the public ID contains the substring "xhtml" I am not sure how text documents are represented as a DOM, but if there is an obvious test (e.g. check topmost node is a text node) then it should be possible to handle that too.
How about using the nsIDOMNSDocument::contentType property? That should be somewhat more reliable than looking at the contents... Text documents have the following DOM: <html><body><pre>text</pre></body></html>
This patch uses nsIDOMNSDocument to obtain the content type of the document. It then uses that content type to instantiate the proper encoder or falls back on text/html if no encoder for the content type exists I have noticed that even with the correct content type XML documents still don't come out properly (e.g. missing <?xml version="1.0"?> at the top) so there is likely an issue in the document encoder itself. But the persist object is now doing the right thing.
Attachment #68173 - Attachment is obsolete: true
r=brade if you remove the diskdoc lines
Comment on attachment 69048 [details] [diff] [review] Updated patch after recent upload changes firstname.lastname@example.org with brade's comment addressed. Also, this: + nsAutoString defaultContentType; + defaultContentType.Assign(NS_LITERAL_STRING("text/html")); can be collapsed into a single init: + nsAutoString defaultContentType(NS_LITERAL_STRING("text/html"));
> + nsAutoString defaultContentType(NS_LITERAL_STRING("text/html")); Or better yet: NS_NAMED_LITERAL_STRING(defaultContentType, "text/html"); Which avoids a string copy and all that.
Fix is in
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
over to embed qa for testing [load-balancing]...
QA Contact: sairuh → mdunn
If this fix is actually checked in, it does nothing to clear up what I originally reported. Save as output in todays linux build is exactly the same as in attachment 1 [details] [diff] [review].
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Fix is definitely checked in but I believe the XML encoder is busted. See comment 14
Jeremy, is your personal page served as text/html or with some XML mimetype?
Content-Type: text/html <?xml version="1.0" encoding="US-ASCII"?> <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII" /> When I used an XML content-type for the header at least, three of the six browsers I tested wouldn't load it. Save-as shouldn't be dependent on this at all, as far as the user is concerned.
> Save-as shouldn't be dependent on this at all, as far as the user is concerned. Save HTML only is not. Save Full page is, only because it saves the DOM. The <? ?> syntax is XML-specific and thus does not get encoded. It's not even in the HTML DOM (not being valid HTML), so it could not _possibly_ get encoded. For XML pages, PIs _are_ in the DOM.
This is a dom->text conversion issue
Assignee: adamlock → harishd
Status: REOPENED → NEW
Component: File Handling → DOM to Text Conversion
QA Contact: mdunn → sujay
Summary: Webrowserpersist hardcodes text/html for saving DOM documents → whitespace formatting should not be reworked
Opened bug 127300 to specifically cover problems with the XML serializer. I think this bug should be closed. It is probably an issue for Ben to decide whether we should be using saveDocument to save XML files (except XHTML) since there is nothing to fix up in them anyway. The saveURI method might be a better choice.
Status: NEW → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Reporter: please verify and mark verified-fixed...thanks!
I don't know what to verify. I still get MAJOR whitespace changes when saving my page. If this bug morphed into something else, then whoever morphed it needs to verify. My problem still exsists with 2002052109 Linux 1.0 branch.
You need to log in before you can comment on or make changes to this bug.