Closed Bug 118251 Opened 19 years ago Closed 19 years ago

whitespace formatting should not be reworked


(Core :: DOM: Serializers, defect)

Not set





(Reporter: jmd, Assigned: harishd)




(2 files, 1 obsolete file)

I believe our goal for saving as "complete" (poor name, as it redoes links, etc)
should be to modify the original document as little as necessary. When I save my
personal page, I notice it removes the <?xml version> at the top, and compresses
the doctype, html, head, and title tags all into a single row. Additionally, the
</head> tag is compressed onto the line with the last <link> on it.

This makes the layout quite ugly, and very difficult to notice some of the tags.

Expected: whitespace formating would not be reworked.

Example is attached, as long lines can't be reproduced in a bugzilla comment.
sounds like a back end bug in nsWebBrowserPersist
Assignee: law → adamlock
Attached file before & after
Yep, all those newlines before <body> is verbatim.
Notice it also affects </body></html>

I am not sure what I can do about this. The XPFE is telling webbrowserpersist to
fixup and save the DOM document and it does so by feeding the loaded DOM tree
into the document encoder for saving. It is very likely that any of the original
formatting will be gone when it comes out the other end.

Regarding the <?xml> tag being stripped. This could be an issue with
webbrowserpersist as it stands. Internally it always tells the encoder to save
out as text/html and this may be causing the wrong encoder to be invoked,
causing odd nodes such as this to be stripped. I will changed the title of this
bug to cover that issue until I know for sure.
Summary: xml/doctype/html/head whitespace compressed in save as: complete web page → Webrowserpersist hardcodes text/html for saving DOM documents
Depends on: 110135
Adam, are you saying that the _DOM_ is saved?  So if the page contains
document.write() calls, the script is saved _and_ the written content is saved?
 So when loaded again the written content is present twice?
Actually, that exact thing happened to me the other day (on a differant page)
and I didn't think anything of it at the time.

Yep, all document.write contents was presented twice. I had 2 ad banners and 2
webtrends page counter/spy things.
Boris, yes I am. The double writing thing issue for embedding covered by bug
115328, but in Mozilla whatever code is telling the webbrowserpersist object to
save a DOM needs to pass it the original DOM one not the modified one.
Blocks: 115634
Bug 110135 is checked in. It should now be possible to specify encoding flags
and a mime type in the call to saveDocument to control how a document is saved.
FWIW, after Bens Save check-in last night, loading the saved page via: file and
saving again removes most of the extra blank lines. Still some places where the
formating changes though.
Webbrowserpersist still uses text/html as the default document encoder but
allows the caller to specify any content type they like.
Target Milestone: --- → Future
A combination of tests to figure out the correct content type for xml, html, xhtml

* QI for nsIDOMHTMLDocument
* Examine doctype to see if is named "html" and whether the public ID contains
  the substring "xhtml"

I am not sure how text documents are represented as a DOM, but if there is an
obvious test (e.g. check topmost node is a text node) then it should be possible
to handle that too.
How about using the nsIDOMNSDocument::contentType property?  That should be
somewhat more reliable than looking at the contents...

Text documents have the following DOM:  <html><body><pre>text</pre></body></html>
This patch uses nsIDOMNSDocument to obtain the content type of the document. It
then uses that content type to instantiate the proper encoder or falls back on
text/html if no encoder for the content type exists

I have noticed that even with the correct content type XML documents still
don't come out properly (e.g. missing <?xml version="1.0"?> at the top) so
there is likely an issue in the document encoder itself. But the persist object
is now doing the right thing.
Reviews please?
Attachment #68173 - Attachment is obsolete: true
r=brade if you remove the diskdoc lines
Keywords: helpwanted
Comment on attachment 69048 [details] [diff] [review]
Updated patch after recent upload changes with brade's comment addressed.

Also, this:

+    nsAutoString defaultContentType;
+    defaultContentType.Assign(NS_LITERAL_STRING("text/html"));

can be collapsed into a single init:

+    nsAutoString defaultContentType(NS_LITERAL_STRING("text/html"));
Attachment #69048 - Flags: superreview+
Attachment #69048 - Flags: review+
> +    nsAutoString defaultContentType(NS_LITERAL_STRING("text/html"));

Or better yet:

NS_NAMED_LITERAL_STRING(defaultContentType, "text/html");

Which avoids a string copy and all that.
Fix is in
Closed: 19 years ago
Resolution: --- → FIXED
over to embed qa for testing [load-balancing]...
QA Contact: sairuh → mdunn
If this fix is actually checked in, it does nothing to clear up what I
originally reported. Save as output in todays linux build is exactly the same as
in attachment 1 [details] [diff] [review].
Resolution: FIXED → ---
Fix is definitely checked in but I believe the XML encoder is busted. See comment 14
Jeremy, is your personal page served as text/html or with some XML mimetype?
Content-Type: text/html
<?xml version="1.0" encoding="US-ASCII"?>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII" />

When I used an XML content-type for the header at least, three of the six
browsers I tested wouldn't load it. Save-as shouldn't be dependent on this at
all, as far as the user is concerned.
> Save-as shouldn't be dependent on this at all, as far as the user is concerned.

Save HTML only is not.

Save Full page is, only because it saves the DOM.  The <? ?> syntax is
XML-specific and thus does not get encoded.  It's not even in the HTML DOM (not
being valid HTML), so it could not _possibly_ get encoded.  For XML pages, PIs
_are_ in the DOM.

This is a dom->text conversion issue
Assignee: adamlock → harishd
Component: File Handling → DOM to Text Conversion
QA Contact: mdunn → sujay
Summary: Webrowserpersist hardcodes text/html for saving DOM documents → whitespace formatting should not be reworked
Opened bug 127300 to specifically cover problems with the XML serializer.

I think this bug should be closed. It is probably an issue for Ben to decide
whether we should be using saveDocument to save XML files (except XHTML) since
there is nothing to fix up in them anyway. The saveURI method might be a better
Closed: 19 years ago19 years ago
Resolution: --- → FIXED
Reporter: please verify and mark verified-fixed...thanks!
I don't know what to verify. I still get MAJOR whitespace changes when saving my
page. If this bug morphed into something else, then whoever morphed it needs to
verify. My problem still exsists with 2002052109 Linux 1.0 branch.
You need to log in before you can comment on or make changes to this bug.