Closed
Bug 78061
Opened 24 years ago
Closed 24 years ago
DOM contains text nodes for line breaks that, according to SGML, should be ignored.
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
People
(Reporter: val, Unassigned)
References
Details
Attachments
(1 file)
1.95 KB,
text/html
|
Details |
From Bugzilla Helper:
User-Agent: Mozilla/4.6 [en-gb]C-CCK-MCD NetscapeOnline.co.uk (Win98; I)
BuildID: 2001032319
According to HTML4.01 [1]:
"B.3.1 Line breaks
"SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately
following a start tag must be ignored, as must a line break immediately before
an end tag. This applies to all HTML elements without exception."
It is clear that such line breaks play no part in a conforming SGML, except to
make the markup more readable, and one would therefore not expect them to be
included in the DOM.
Reproducible: Always
Steps to Reproduce:
The attached HTML document contains the following code:
<body>
<div onclick="showDOM()">Click here to see DOM</div>
</body>
The only line breaks occur immediately after a start tag, and immediately before
an end tag - so they should be ignored. However, Mozilla instead creates a text
node for each of them.
[1] http://www.w3.org/TR/html401/appendix/notes.html#notes-line-breaks
Regarding XML:
==============
While the section 2.10 White Space Handling[2] says:
"In editing XML documents, it is often convenient to use "white space"
(spaces, tabs, and blank lines) to set apart the markup for greater readability.
Such white space is typically not intended for inclusion in the delivered
version of the document. On the other hand, "significant" white space that
should be preserved in the delivered version is common, for example in poetry
and source code.
"An XML processor must always pass all characters in a document that are not
markup through to the application. A validating XML processor must also inform
the application which of these characters constitute white space appearing in
element content.
"A special attribute named xml:space may be attached to an element to signal
an intention that in that element, white space should be preserved by
applications. etc..."
Section C XML and SGML (Non-Normative)[3] also makes it clear that:
"XML is designed to be a subset of SGML, in that every XML document should
also be a conforming SGML document. ..."
Regarding XHTML:
================
It is also interesting to note that 3.5. XHTML Family User Agent Conformance[4]
says:
"The user agent must process white space characters in the data received from
the XML processor as follows:
- All white space surrounding block elements should be removed.
- Comments are removed entirely and do not affect white space handling. One
white space character on either side of a comment is treated as two white space
characters.
- When the 'xml:space' attribute is set to 'preserve', white space characters
must be preserved and consequently LINE FEED characters within a block must not
be converted.
- When the 'xml:space' attribute is not set to 'preserve', then:
- Leading and trailing white space inside a block element must be removed.
etc..."
Given that an XML document should also be a conforming SGML document, one might
reasonably assume that 'line feed characters' immediately after the start tag,
and immediately before the end tag, should be ignored in all instances,
irrespective of the 'xml:space' attribute.
Some clarification of the XHTML/XML specs might be required to determine the
right course here.
[2] http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-white-space
[3] http://www.w3.org/TR/2000/REC-xml-20001006.html#sec-xml-and-sgml
[4]
http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/conformance.html#s_c
onform_user_agent
See also bug #78059, concerning the rendering of a line break immediately
following a start tag for an element with 'white-space:pre'.
Re HTML:
I completely agree. Note though that only linefeed caracters are removed, no
other whitespace is affected. And as you say, only linefeeds emedeatly
following a starttag or preceeding an endtag
Re XML:
Conformance with SGML for the markup has nothing to do with what is passed to
the DOM. No whitespace should removed for generic XML
Re XHTML:
The same applies as for XML, don't remove the linefeeds. Except in the
instances where the entire whitespace-node is removed.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Jonas -
The HTML problem that we seemed to agree upon was that, in order to conform to
SGML, the specified line breaks should not be passed to the DOM.
Re XML, you say that "Conformance with SGML for the markup has nothing to do
with what is passed to the DOM."
Why is this true for XML, but not for HTML?
Comment 5•24 years ago
|
||
*** This bug has been marked as a duplicate of 2750 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
Updated•24 years ago
|
QA Contact: lchiang → stummala
Updated•15 years ago
|
Component: DOM: Abstract Schemas → HTML: Parser
OS: Windows 98 → All
QA Contact: stummala → parser
Hardware: x86 → All
Updated•15 years ago
|
Assignee: jst → nobody
QA Contact: parser → stummala
You need to log in
before you can comment on or make changes to this bug.
Description
•