Open Bug 155475 Opened 22 years ago Updated 2 months ago

RFE: optionally store open-tag and close-tag text in a DOM node for easy round-tripping

Categories

(Core :: DOM: Core & HTML, enhancement, P5)

enhancement

Tracking

()

People

(Reporter: roc, Unassigned)

References

Details

Attachments

(1 obsolete file)

Currently, loading a document into a DOM and then saving it does not exactly
preserve the source. This leads to a number of bugs, including bugs like bug
129508 where valid HTML gets converted to invalid HTML.

There are three kinds of source text: text that consitutes the opening of a tag,
text that constitutes the closing of a tag, and text that is neither ("the
content"). We already try to preserve the content text exactly. We don't
preserve tag-opening or tag-closing text, but we should. If we did, we could
roundtrip documents through the DOM and preserve a lot more of the exact source
text.

The proposal, then, is to AS AN OPTION store two additional strings in each DOM
node corresponding to a tag: the string that constituted the opening of the
node's tag, and the string that constituted the closing of the node's tag (or an
empty string if there was no closing tag). Any operation that changes the
attributes of the node can simply discard the string annotations. During
serialization we output these strings in lieu of the standard generated
open/close sequences (if present).

If leading or trailing whitespace is treated as part of an open-tag or close-tag
sequence, that whitespace will need to be included in the open-tag or close-tag
string.
*** Bug 155474 has been marked as a duplicate of this bug. ***
I like this idea. I think we should use it for the editor. (We shouldn't use it
for the normal browser's save page feature, because that should just use the
cached copy.)
The only way I can see this being done is to add a hash (global or per document)
that mapps content element to start tag string (I don't see why we'd need to
store the end tag as a string, just storing a bit that says whether there was
one or not should be enough, no?), and does so only when parsing for the editor.
The tricky part here is to make us not pay a performance hit on mutation of an
element in the case where we didn't build the DOM for the editor.
> (I don't see why we'd need to store the end tag as a string, just storing a
bit > that says whether there was one or not should be enough, no?)

I'm told that there are, in fact, many ways to write the end tag.

If we can steal a bit or two in the DOM node, then we can test that bit during
mutation and take the slow path only if the bit is set.
The editor isn't the only case where we serialize DOM output to html.  We also
do on Save As (complete) from the browser.  Are we going to accept the loss of
this extra information in that case?
Save As Complete already screws the documents in many ways, we don't need to
worry about round-tripping in that case.
... and we will *not* pay the overhead of holding the start tag as a string in
memory for every document we load.
I'm working on making room for bits for this in our elements...
Bug 156364 will make it possible for someone (read "not me") to fix this.
Mass-reassigning bugs to dom_bugs@netscape.com
Assignee: jst → dom_bugs
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
Assignee: general → nobody
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046

Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5.

If you have questions, please contact :mdaly.
Priority: -- → P5
Severity: normal → S3
Attachment #9384964 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: