Closed Bug 118251 Opened 23 years ago Closed 23 years ago

whitespace formatting should not be reworked

Tracking

()

Status:

RESOLVED FIXED

Milestone:

Future

People

(Reporter: jmd, Assigned: harishd)

References

Details

Attachments

(2 files, 1 obsolete file)

before & after 23 years ago Jeremy M. Dolan 2.22 KB, text/plain		Details
Patch ensures correct document encoder is used for content type. 23 years ago Adam Lock 9.33 KB, patch		Details \| Diff \| Splinter Review
Updated patch after recent upload changes 23 years ago Adam Lock 9.01 KB, patch	kinmoz : review+ kinmoz : superreview+	Details \| Diff \| Splinter Review

Jeremy M. Dolan

Reporter

Description

•

23 years ago

I believe our goal for saving as "complete" (poor name, as it redoes links, etc)
should be to modify the original document as little as necessary. When I save my
personal page, I notice it removes the <?xml version> at the top, and compresses
the doctype, html, head, and title tags all into a single row. Additionally, the
</head> tag is compressed onto the line with the last <link> on it.

This makes the layout quite ugly, and very difficult to notice some of the tags.

Expected: whitespace formating would not be reworked.

Example is attached, as long lines can't be reproduced in a bugzilla comment.

Boris Zbarsky [:bzbarsky]

Comment 1

•

23 years ago

sounds like a back end bug in nsWebBrowserPersist

Assignee: law → adamlock

Jeremy M. Dolan

Reporter

Comment 2

•

23 years ago

Attached file before & after — Details

Yep, all those newlines before <body> is verbatim.

Jeremy M. Dolan

Reporter

Comment 3

•

23 years ago

Notice it also affects </body></html>

Adam Lock

Comment 4

•

23 years ago

I am not sure what I can do about this. The XPFE is telling webbrowserpersist to
fixup and save the DOM document and it does so by feeding the loaded DOM tree
into the document encoder for saving. It is very likely that any of the original
formatting will be gone when it comes out the other end.

Regarding the <?xml> tag being stripped. This could be an issue with
webbrowserpersist as it stands. Internally it always tells the encoder to save
out as text/html and this may be causing the wrong encoder to be invoked,
causing odd nodes such as this to be stripped. I will changed the title of this
bug to cover that issue until I know for sure.

Summary: xml/doctype/html/head whitespace compressed in save as: complete web page → Webrowserpersist hardcodes text/html for saving DOM documents

Adam Lock

Updated

•

23 years ago

Depends on: 110135

Boris Zbarsky [:bzbarsky]

Comment 5

•

23 years ago

Adam, are you saying that the _DOM_ is saved?  So if the page contains
document.write() calls, the script is saved _and_ the written content is saved?
 So when loaded again the written content is present twice?

Jeremy M. Dolan

Reporter

Comment 6

•

23 years ago

Actually, that exact thing happened to me the other day (on a differant page)
and I didn't think anything of it at the time.

Yep, all document.write contents was presented twice. I had 2 ad banners and 2
webtrends page counter/spy things.

Adam Lock

Comment 7

•

23 years ago

Boris, yes I am. The double writing thing issue for embedding covered by bug
115328, but in Mozilla whatever code is telling the webbrowserpersist object to
save a DOM needs to pass it the original DOM one not the modified one.

Boris Zbarsky [:bzbarsky]

Comment 8

•

23 years ago

Bug 118792 filed.

Jonas Jørgensen

Updated

•

23 years ago

Blocks: 115634

Adam Lock

Comment 9

•

23 years ago

Bug 110135 is checked in. It should now be possible to specify encoding flags
and a mime type in the call to saveDocument to control how a document is saved.

Jeremy M. Dolan

Reporter

Comment 10

•

23 years ago

FWIW, after Bens Save check-in last night, loading the saved page via: file and
saving again removes most of the extra blank lines. Still some places where the
formating changes though.

Adam Lock

Comment 11

•

23 years ago

Webbrowserpersist still uses text/html as the default document encoder but
allows the caller to specify any content type they like.

Target Milestone: --- → Future

sairuh (rarely reading bugmail)

Updated

•

23 years ago

Keywords: helpwanted

Adam Lock

Comment 12

•

23 years ago

A combination of tests to figure out the correct content type for xml, html, xhtml

* QI for nsIDOMHTMLDocument
* Examine doctype to see if is named "html" and whether the public ID contains
  the substring "xhtml"

I am not sure how text documents are represented as a DOM, but if there is an
obvious test (e.g. check topmost node is a text node) then it should be possible
to handle that too.

Boris Zbarsky [:bzbarsky]

Comment 13

•

23 years ago

How about using the nsIDOMNSDocument::contentType property?  That should be
somewhat more reliable than looking at the contents...

Text documents have the following DOM:  <html><body><pre>text</pre></body></html>

Adam Lock

Comment 14

•

23 years ago

Attached patch Patch ensures correct document encoder is used for content type. (obsolete) — Details — Splinter Review

This patch uses nsIDOMNSDocument to obtain the content type of the document. It
then uses that content type to instantiate the proper encoder or falls back on
text/html if no encoder for the content type exists

I have noticed that even with the correct content type XML documents still
don't come out properly (e.g. missing <?xml version="1.0"?> at the top) so
there is likely an issue in the document encoder itself. But the persist object
is now doing the right thing.

Adam Lock

Comment 15

•

23 years ago

Attached patch Updated patch after recent upload changes — Details — Splinter Review

Reviews please?

Attachment #68173 - Attachment is obsolete: true

Kathleen :Brade

Comment 16

•

23 years ago

r=brade if you remove the diskdoc lines

Keywords: helpwanted

kinmoz

Comment 17

•

23 years ago

Comment on attachment 69048 [details] [diff] [review]
Updated patch after recent upload changes

sr=kin@netscape.com with brade's comment addressed.

Also, this:

+    nsAutoString defaultContentType;
+    defaultContentType.Assign(NS_LITERAL_STRING("text/html"));


can be collapsed into a single init:

+    nsAutoString defaultContentType(NS_LITERAL_STRING("text/html"));

Attachment #69048 - Flags: superreview+

Attachment #69048 - Flags: review+

Boris Zbarsky [:bzbarsky]

Comment 18

•

23 years ago

> +    nsAutoString defaultContentType(NS_LITERAL_STRING("text/html"));

Or better yet:

NS_NAMED_LITERAL_STRING(defaultContentType, "text/html");

Which avoids a string copy and all that.

Adam Lock

Comment 19

•

23 years ago

Fix is in

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

sairuh (rarely reading bugmail)

Comment 20

•

23 years ago

over to embed qa for testing [load-balancing]...

QA Contact: sairuh → mdunn

Jeremy M. Dolan

Reporter

Comment 21

•

23 years ago

If this fix is actually checked in, it does nothing to clear up what I
originally reported. Save as output in todays linux build is exactly the same as
in attachment 1 [details] [diff] [review].

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Adam Lock

Comment 22

•

23 years ago

Fix is definitely checked in but I believe the XML encoder is busted. See comment 14

Boris Zbarsky [:bzbarsky]

Comment 23

•

23 years ago

Jeremy, is your personal page served as text/html or with some XML mimetype?

Jeremy M. Dolan

Reporter

Comment 24

•

23 years ago

Content-Type: text/html
<?xml version="1.0" encoding="US-ASCII"?>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII" />

When I used an XML content-type for the header at least, three of the six
browsers I tested wouldn't load it. Save-as shouldn't be dependent on this at
all, as far as the user is concerned.

Boris Zbarsky [:bzbarsky]

Comment 25

•

23 years ago

> Save-as shouldn't be dependent on this at all, as far as the user is concerned.

Save HTML only is not.

Save Full page is, only because it saves the DOM.  The <? ?> syntax is
XML-specific and thus does not get encoded.  It's not even in the HTML DOM (not
being valid HTML), so it could not _possibly_ get encoded.  For XML pages, PIs
_are_ in the DOM.

Kathleen :Brade

Comment 26

•

23 years ago

This is a dom->text conversion issue

Assignee: adamlock → harishd

Status: REOPENED → NEW

Component: File Handling → DOM to Text Conversion

QA Contact: mdunn → sujay

Summary: Webrowserpersist hardcodes text/html for saving DOM documents → whitespace formatting should not be reworked

Adam Lock

Comment 27

•

23 years ago

Opened bug 127300 to specifically cover problems with the XML serializer.

I think this bug should be closed. It is probably an issue for Ben to decide
whether we should be using saveDocument to save XML files (except XHTML) since
there is nothing to fix up in them anyway. The saveURI method might be a better
choice.

Status: NEW → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

sujay

Comment 28

•

22 years ago

Reporter: please verify and mark verified-fixed...thanks!

Jeremy M. Dolan

Reporter

Comment 29

•

22 years ago

I don't know what to verify. I still get MAJOR whitespace changes when saving my
page. If this bug morphed into something else, then whoever morphed it needs to
verify. My problem still exsists with 2002052109 Linux 1.0 branch.

You need to log in before you can comment on or make changes to this bug.