Closed Bug 78061 Opened 24 years ago Closed 24 years ago

DOM contains text nodes for line breaks that, according to SGML, should be ignored.

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 2750

People

(Reporter: val, Unassigned)

References

Details

Attachments

(1 file)

From Bugzilla Helper: User-Agent: Mozilla/4.6 [en-gb]C-CCK-MCD NetscapeOnline.co.uk (Win98; I) BuildID: 2001032319 According to HTML4.01 [1]: "B.3.1 Line breaks "SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception." It is clear that such line breaks play no part in a conforming SGML, except to make the markup more readable, and one would therefore not expect them to be included in the DOM. Reproducible: Always Steps to Reproduce: The attached HTML document contains the following code: <body> <div onclick="showDOM()">Click here to see DOM</div> </body> The only line breaks occur immediately after a start tag, and immediately before an end tag - so they should be ignored. However, Mozilla instead creates a text node for each of them. [1] http://www.w3.org/TR/html401/appendix/notes.html#notes-line-breaks Regarding XML: ============== While the section 2.10 White Space Handling[2] says: "In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code. "An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content. "A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. etc..." Section C XML and SGML (Non-Normative)[3] also makes it clear that: "XML is designed to be a subset of SGML, in that every XML document should also be a conforming SGML document. ..." Regarding XHTML: ================ It is also interesting to note that 3.5. XHTML Family User Agent Conformance[4] says: "The user agent must process white space characters in the data received from the XML processor as follows: - All white space surrounding block elements should be removed. - Comments are removed entirely and do not affect white space handling. One white space character on either side of a comment is treated as two white space characters. - When the 'xml:space' attribute is set to 'preserve', white space characters must be preserved and consequently LINE FEED characters within a block must not be converted. - When the 'xml:space' attribute is not set to 'preserve', then: - Leading and trailing white space inside a block element must be removed. etc..." Given that an XML document should also be a conforming SGML document, one might reasonably assume that 'line feed characters' immediately after the start tag, and immediately before the end tag, should be ignored in all instances, irrespective of the 'xml:space' attribute. Some clarification of the XHTML/XML specs might be required to determine the right course here. [2] http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-white-space [3] http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-xml-and-sgml [4] http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/conformance.html#s_c onform_user_agent See also bug #78059, concerning the rendering of a line break immediately following a start tag for an element with 'white-space:pre'.
Re HTML: I completely agree. Note though that only linefeed caracters are removed, no other whitespace is affected. And as you say, only linefeeds emedeatly following a starttag or preceeding an endtag Re XML: Conformance with SGML for the markup has nothing to do with what is passed to the DOM. No whitespace should removed for generic XML Re XHTML: The same applies as for XML, don't remove the linefeeds. Except in the instances where the entire whitespace-node is removed.
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** Bug 78059 has been marked as a duplicate of this bug. ***
Jonas - The HTML problem that we seemed to agree upon was that, in order to conform to SGML, the specified line breaks should not be passed to the DOM. Re XML, you say that "Conformance with SGML for the markup has nothing to do with what is passed to the DOM." Why is this true for XML, but not for HTML?
*** This bug has been marked as a duplicate of 2750 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
QA Contact: lchiang → stummala
marking as verified dup
Status: RESOLVED → VERIFIED
Component: DOM: Abstract Schemas → HTML: Parser
OS: Windows 98 → All
QA Contact: stummala → parser
Hardware: x86 → All
Assignee: jst → nobody
QA Contact: parser → stummala
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: