Currently, loading a document into a DOM and then saving it does not exactly preserve the source. This leads to a number of bugs, including bugs like bug 129508 where valid HTML gets converted to invalid HTML. There are three kinds of source text: text that consitutes the opening of a tag, text that constitutes the closing of a tag, and text that is neither ("the content"). We already try to preserve the content text exactly. We don't preserve tag-opening or tag-closing text, but we should. If we did, we could roundtrip documents through the DOM and preserve a lot more of the exact source text. The proposal, then, is to AS AN OPTION store two additional strings in each DOM node corresponding to a tag: the string that constituted the opening of the node's tag, and the string that constituted the closing of the node's tag (or an empty string if there was no closing tag). Any operation that changes the attributes of the node can simply discard the string annotations. During serialization we output these strings in lieu of the standard generated open/close sequences (if present). If leading or trailing whitespace is treated as part of an open-tag or close-tag sequence, that whitespace will need to be included in the open-tag or close-tag string.
*** Bug 155474 has been marked as a duplicate of this bug. ***
I like this idea. I think we should use it for the editor. (We shouldn't use it for the normal browser's save page feature, because that should just use the cached copy.)
The only way I can see this being done is to add a hash (global or per document) that mapps content element to start tag string (I don't see why we'd need to store the end tag as a string, just storing a bit that says whether there was one or not should be enough, no?), and does so only when parsing for the editor. The tricky part here is to make us not pay a performance hit on mutation of an element in the case where we didn't build the DOM for the editor.
> (I don't see why we'd need to store the end tag as a string, just storing a bit > that says whether there was one or not should be enough, no?) I'm told that there are, in fact, many ways to write the end tag. If we can steal a bit or two in the DOM node, then we can test that bit during mutation and take the slow path only if the bit is set.
The editor isn't the only case where we serialize DOM output to html. We also do on Save As (complete) from the browser. Are we going to accept the loss of this extra information in that case?
Save As Complete already screws the documents in many ways, we don't need to worry about round-tripping in that case.
... and we will *not* pay the overhead of holding the start tag as a string in memory for every document we load.
I'm working on making room for bits for this in our elements...
Bug 156364 will make it possible for someone (read "not me") to fix this.
Mass-reassigning bugs to firstname.lastname@example.org