Closed Bug 26179 Opened 25 years ago Closed 22 years ago

Mozilla reports existence of phantom text nodes in the DOM

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

VERIFIED INVALID
Future

People

(Reporter: lori, Assigned: jst)

References

()

Details

(Keywords: dom1, testcase, Whiteboard: http://mozilla.org/docs/dom/technote/whitespace/)

Attachments

(2 files)

The following code, when opened in IE5, shows an almost-correct representation 
of the DOM (except for the SCRIPT and STYLE nodes having no children, and the 
comment being reported as a tag with tagName ! -- duh! ;). In M13, virtually 
every node is reported as having a text node for a child, even when none 
actually exists in the document. (Note all the Text:s followed by blank space 
rather than actual text, and you'll see what I mean.)

<html>
<head>
<title>Foo Document</title>

<script language="JavaScript">

var structWin;

function openStructWin(){
  structWin = window.open
('','foo','width=600,scrollbars=yes,resizable=yes,menubar=yes');
  structWin.document.write('<' + 'html>\n<' + 'head>\n<' + 'title>Document 
Structure</title>\n<' + 'style type="text/css">\nPRE ' + '{font-size: 24pt}' 
+ '\n</style>\n</head>\n\n<' + 'body bgcolor="#FFFFFF">\n\n<' + 'pre>');
}

function getNodeAndChildrenAsString(indentString,theNode){
    var structString = '';
    var structString = '';
    var nodeString;
    var theChildren = theNode.childNodes;
            
    // If theNode is an element node
    if (theNode.nodeType == '1'){
      nodeString = "<" + "b>" + theNode.tagName + "</b>";
    // Otherwise, if theNode is a text node
    }else if (theNode.nodeType == '3'){
      nodeString = "<" + "b>Text: </b>" + theNode.data;
    // In any other case (and here I'm assuming
    // that the only other case would be a
    // comment node)
    }else{
        nodeString = "<" + "b>Comment: </b>" + theNode.data;
    }

    
    structString += indentString + nodeString;
    structString += '\n';
    
    structWin.document.write(structString);
    
    for (var i=0; i < theChildren.length; i++){
      getNodeAndChildrenAsString(indentString + "  ", theChildren[i]);
    }        
}

function writeClosingTags(){
  structWin.document.write('<' + '/pre>\n\n<' + '/body>\n<' + '/html>');
}
</script>

<style type="text/css">
<!--
.red {  color: #FF0000}
-->
</style>

</head>

<body onLoad="openStructWin();getNodeAndChildrenAsString
('',document.documentElement);writeClosingTags()">
<table width="600" border="0" cellspacing="10" cellpadding="2">
<tr>
<th colspan="2" align="left">Flavors</th>
</tr>

<tr>
<td>grape</td>
<td>cherry</td>
</tr>

<tr>
<td>lemon</td>
<td>lime</td>
</tr>

<tr>
<td>orange</td>
<td>raspberry</td>
</tr>

</table>

<p>
That was a table, and this is a paragraph with some <b>bold text</b> in it.
</p>

<p>
This paragraph contains an image. <img src="images/sun.gif" width="100" 
height="100"> 
</p>

<!-- This is a comment before the text in the last paragraph. -->
<p>
This paragraph contains <b>bold text,</b> <code>code text,</code> and <span 
class="red">styled text.</span> 
</p>

</body>
</html>
The "phantom" text nodes are actually the newlines between the elements. The DOM 
spec says that these text nodes can (but don't have to be) be preserved and 
represented in the DOM. We choose to do so, so that document roundtripping can 
occur. Since IE doesn't, it does unfortunately mean that you have to code around 
them. Specifically, scripts that use hardcoded childNode offsets will not work 
across browsers.

We might consider getting rid of these small text nodes for efficiency reasons. 
For now, we've made the explicit decision to keep them. 
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Thanks for clarifying, Vidur -- that was the suspicion of one of the engineers 
here (Macromedia). This could also be what's causing the code in bug 26178 to 
act oddly, though I did check the contents of the nodes before sorting them to 
make sure I had the right ones. I'll look into it and update the status of 
26178 accordingly.
Status: RESOLVED → CLOSED
I'm reopening this one because the more I think about it, the more I think 
exposing whitespace as a text node is the wrong thing to do. Why? Because you 
can't do it consistently. Text nodes are invalid in between table rows and 
table cells, for example -- so the same newline that appears after a </td> 
cannot be represented as a text node the way it can be if it appears after a 
</p>. This harms the roundtripping argument, IMHO. Also, it seems to me that 
whitespace shouldn't affect the structure of a document; I consider an HTML 
document with no linebreaks to be structurally identical to one with all kinds 
of whitespace between tags. If newlines are represented as text nodes, the two 
documents would be structurally different.
Status: CLOSED → REOPENED
Resolution: INVALID → ---
After talking it over with Eric Krock, our decision is to Future this one. It 
means, unfortunately, that scripts need to take into account the possible 
existence of extra text nodes in element content. David Baron is probably going 
to write a set of utilities (at the very least, a tech note) to help with common 
operations. Lhylan is correct, though - we're not even including the text nodes 
consistently. 
Target Milestone: --- → Future
*** Bug 48560 has been marked as a duplicate of this bug. ***
Mass update of qa contact
QA Contact: gerardok → janc
*** Bug 62269 has been marked as a duplicate of this bug. ***
What is the DOM working group position on this? Vidur?
I've checked the DOM standard and it's not clear as to whether there should be
these phantom text nodes. I am running PC Windows 98 on a PC so should the
OS/platform be updated on this bug?

IMHO, I can't see the reason for all these little text nodes. It just fills the
DOM with nonsense. The frustrating thing is the incompatibility with IE. I
thought the W3C was meant to clear this up. If Netscape and Microsoft are going
to differ on interpretation, then we are heading back to the bad old days of
browser incompatibility. Don't you guys ever talk to another ?
Tim: Not only do they talk to each other, they come up with the spec together...
The DOM spec doesn't really say, I don't think.  The SGML spec has some
provisions for ignoring newlines just inside start/end tags, but since we
already totally ignore what it says on whitespace (as have basically all
HTML browsers ever), I don't think we should take it too seriously.

Preserving the white space in the DOM is needed for things like
'white-space: pre' to work and for the editor to output HTML as it
was read in.  There are some notes on how to deal with these "extra"
text nodes at:
http://mozilla.org/docs/dom/technote/whitespace/
Is bug 65658 a dup?
*** Bug 65658 has been marked as a duplicate of this bug. ***
Keywords: dom1
Component: DOM Level 1 → DOM Core
QA contact Update
QA Contact: janc → desale
Updating QA contact to Shivakiran Tummala.
QA Contact: desale → stummala
*** Bug 89782 has been marked as a duplicate of this bug. ***
I hope this can be taken care of before Moz 1.0...
Isn't this INVALID or WONTFIX? And if not, why not?
Whiteboard: WONTFIX?
*** Bug 104785 has been marked as a duplicate of this bug. ***
Attached file testcase
Keywords: testcase
OS: Mac System 8.6 → All
Hardware: Macintosh → All
Vidur can you give an update on this? These "phantom" nodes are going 
to play havoc on folks writing DOM js code - if they have to do a browser 
detect just to filter out all these empty text nodes, and doing things like 
childNodes is going to report very different numbers between IE and 
mozilla.

Adding roger (dhtml apps perspective)
*** Bug 114749 has been marked as a duplicate of this bug. ***
*** Bug 118213 has been marked as a duplicate of this bug. ***
Newlines are translated into a space, is this a wontfix

why? why wouldnt newlines be ignored between tags?
The more I think about this, by speaking with web app developers, and by working
on DOM samples, the more I think the current way to handle whitespace nodes is
flawed. It is however necessary to take a decision and resolve this bug before
1.0, because changing the way it works after 1.0 would be seriously bad for
backwards-compatibility. 
I think including whitespace nodes is clumsy for the following reasons:

1) It is not compatible with MSIE (and probably Opera).
2) The DOM code is heavily dependant on the markup (insert a new line somewhere
and your average childNodes[] or firstChild breaks)
3) Although I have no proof, building nodes for each whitespace node (about 2
per tag if your markup is readable) is probably a perf hit and a footprint hit
(although jst's recent mInner changes made this better)
4) It's highly non-intuitive for new developers. I find myself often wondering
why my .firstChild has no properties.
5) Those nodes carry no information (see below for exceptions)

These observations are based on my own experience, now nearly one year of
discussions with web developers, lurking in developer newsgroups and mailing
lists, and developing web apps myself. This is one of the mostfreq question in
the developer newsgroups and mailing lists.

The only good reasons to keep te whitespace nodes were given by David Baron in a
comment in this bug:
i) 'white-space: pre' currently needs this to work 
ii) the editor needs them to output HTML as it was read in.

This is perhaps not a good argument, but for ii), the editor already makes
enough of a mess of your source code not to worry about some extra new lines. As
far as I'm concerned it should put new lines automatically to make code more
readable :-P

Not sure about i), but perhaps layout has some "whitespace frames" that could be
used for the same purpose?

This bug escaped my attention in my quest for Mozilla 1.0 bugs, but now I
definitely include it.

Thanks for reading :-)
um, I think Sivakiran's missing the point. it's not about newlines being 
converted to spaces -- it's about N6 REPORTING THOSE SPACES AS TEXT NODES! 
because text nodes are not allowed between a TR and a TD, for example, it's 
impossible to apply the "whitespace = text node" rule consistently. so that's 
the logical argument for fixing this bug. Fabian posted a very eloquent version 
of the practical argument while I was busy changing my password. ;)
Which nodes are you proposing to eliminate?  Only the ones inside start-tags and
end-tags, or all of the ones you can (whatever that means)?  If the former, how
do you plan to handle the childNodes array for something like:

<ul>
  <li>foo</li>
  <li>bar</li>
</ul>

I think the DOM benifits of such an approach are minimal.

If the latter, how do you plan to ensure that <span>two</span>
<span>words</span> aren't merged together?

Which does IE do?

In all cases, how do you plan to keep 'white-space: pre' working?
well how about making the actual text (non-parsed) source of the document
accessible and then have parser provide line/column position of every
node/tocken inside of it, so that you can locate nessesary portions of text.

not only you can get 'white-space: pre' working this way, but you can also fix
View Source not displaying actual source of the document.
To get this, removal of white space textnodes that are not allowed, to work 
one would have to check the DTD of the SGML and/or XML document. Is this 
information available from the SGML/XML parser?

I like Alexey Chernyak's idea about adding the line and column position to the 
elements/nodes. This would allow the composer to maintain the layout without 
actually inserting text nodes in the DOM tree where they are not allowed.
I believe, although I'm not entirely certain, that DOM-2 Range covers resolution
down to the individual character.  I do know Mozilla supports DOM-2 Range.
the real pain for web developers is that stuff like previousSibling etc. get
practically useless... or you have to call helper functions every time you want
to use it. just stumbled across it yesterday:

<div id="div1"></div>
<div id="div2"></div>

!=

<div id="div1"></div><div id="div2"></div>

and the stupid thing is, codewise i want to have version 1., because everyting
else is unreadable. js-wise i want to have version 2., because there is no
whitespace in it. but still i cant be sure that no one changes it (including my
unaware self) at a later time and breaks the code.

so what about (uuuh, i know this would have to go through the standardisation
process) making previousSibling etc. a function like
HTMLElement.previousSibling(boolean skipWhitespace) or even
HTMLElement.previousSibling(boolean skipTextNodes).

another possibility would be to distinguish between whitespace and newlines, and
skip all the newlines. as far as i understand this would allow white-space:pre
to work. newlines between tags are always just for source code readability, right?
I wonder if DOM Traversal (TreeWalker (and NodeIterator, once we have it))
would be the right way to work around the problem for scripts.
The problem of whitespace within tables and stuff is a validation issue. Mozilla
doesn't validate, so we shouldn't bother. And I read
http://www.w3.org/TR/2001/WD-DOM-Level-2-HTML-20011210/html.html#ID-6986576
to indicate that one should use the .cells to get to the tr or th. (If that one
would contain textnodes, that'd be bah.)
Is anything moving with regards to this bug? A solution is really important to
some of the stuff I am trying to do, so I hope so. Also does anyone know if
there is any movement for fuller implementation on the CSS2-Transverse-Range module?
nominating nsbeta1  - can we ship with all this cruft in the dom?
Keywords: nsbeta1
I still think this should be WONTFIX.  Furthermore, nobody's actually proposed
how to fix it -- that requires answering the questions in comment 29.
Maybe we should do some sort of check for "white-space: pre" before removing
superfluous white space.  There cannot be so many elements which have that
natively.  Likewise, someone would have to set a style attribute to contain that
phrase before it would take effect on an element that didn't default to that.
The DOM doesn't currently know anything about the style system, and it would be
a major architectural change if it did.  (How could we resolve style without
having DOM nodes to resolve the style on?)
:(

This is a lot like math classes on scientific notation:  which "0" digits are
significant?  000025.7000...

My opinion is, if it renders, include it.  If it doesn't render or isn't meant
to render, exclude it.

For HTML, white space between <head> and <body> tags doesn't render.  White
space between <pre> and </pre> tags does.  White space between any two tags in
general, if there is at least one space or tab between them, we typically treat
as a single space.  (We have &nbsp; for when we want a space.)  

I haven't even begun to think about white space implications for XML DOM.

What if we did a process late in the game, before any event handlers or
scripting take over but after styling, of cleaning up the nodes?
David, latest spec very clearly defines how white space characters should be
treated:
http://www.w3.org/TR/2001/WD-xhtml1-20011004/#uaconf

Here are relevant extracts out of that document for every-day pages:

...snip...
* All white space surrounding block elements should be removed.
* Comments are removed entirely and do not affect white space handling. One
white space character on either side of a comment is treated as two white space
characters.
...snip...
* Leading and trailing white space inside a block element must be removed.
* A sequence of white space characters without any LINE FEED characters must be
reduced to a single SPACE character.
* A sequence of white space characters with one or more LINE FEED characters
must be reduced in the same way as a single LINE FEED character.
...snip...
* The LINE FEED character must be converted into a SPACE character.



This very well answers your question:
<span>two</span><span>words</span> should be merged together.
while in:
<span>two</span>
<span>words</span>
LINE FEED and any other white space characters in between should be replaced
with a SPACE character.

This is a very straight forward spec. And we should be thankful to W3C for
defining it so clearly.

As for 'white-space: pre', see comment 30
Here's an example of what a tree traversal tool like DOM Inspector see's when
it looks at  a website's DOM: 
(see image)
 #text nodes all over the place...
Leading and trailing newlines of a textnode should be skipped. Isn't it as
simple as that?

if i write

<tag1>

</tag1>

i obviously want a newline there. not so for 

<tag1>
</tag1>

if i write:

<pre>
 Some text here.

 Some on another line.
</pre>

i want the data of the text node to be == " Some text here.\n\n Some on another
line."
That's not how it works.

The <pre>...</pre> element means "preformatted text".  That means the browser
must copy the text to the screen, character for character.  Including the
newlines immediately adjoining the tag.  The exception are additional markup
tags, such as <em>...</em>, which would still apply inside the preformatted text
tags.

Inside <pre>...</pre> tags is always considered "significant", unconditionally.
 The preformatted text element is a block-level element, like the <p>...</p>
element.  Typically people do not add newline characters immediately following
the opening tag of either one, but if they do for the preformatted text element,
the browser must assume that's intentional.
As Alexey points out the W3C recommendation is very clear on how white space
should be treated. 
Moving to jst@netscape.com's bug list. Apologies for letting it languish on mine.
Assignee: vidur → jst
Status: REOPENED → NEW
From jscript@pacbell.net aka WeirdAl
> Maybe we should do some sort of check for "white-space: pre" before removing
> superfluous white space. 

not so easy, what if I dynamically set an elements CSS white-space property to pre ?
Parsing should not depend on the stylesheet I think.

From alexey@ihug.com.au
> David, latest spec very clearly defines how white space characters should be
> treated:
> http://www.w3.org/TR/2001/WD-xhtml1-20011004/#uaconf

This will only apply to xhtml served as xml when the xml:space attribute of the
element is not set to preserve. I assume that <pre> in xhtml has the xml:space
attribute
set to preserve.

If the white spaces are removed in HTML based on what that doc mentions but
without a way
to revert back to current behavior, we have effectively removed support for css
property
white-space: pre; in html.

From jscript@pacbell.net aka WeirdAl
> What if we did a process late in the game, before any event handlers or
> scripting take over but after styling, of cleaning up the nodes?

Again not so easy, what if I dynamically set an elements CSS white-space
property to pre ?
As a first pass, how about just deleting #text nodes whose content is
*entirely* white space (incl. newline)?  That should eliminate most of
the problem with the phantom nodes in the DOM tree, but still leave
the information that "white-space: pre" needs.
> how about just deleting #text nodes whose content is *entirely* white space

<p><em>This</em> <em>should</em> <em>have</em> <em>spaces</em></p>
and it does in IE:

HTML
  HEAD
    TITLE
    SCRIPT
    STYLE
  BODY
    P
      EM
        Text: This
      Text:  
      EM
        Text: should
      Text:  
      EM
        Text: have
      Text:  
      EM
        Text: spaces

In N6, the same text nodes are there, but so are several others:

HTML
 HEAD
 TITLE
 Text: Foo Document
    Text:
     SCRIPT Text:
   var structWin;

function openStructWin(){
  structWin = window.open
('','foo','width=600,scrollbars=yes,resizable=yes,menubar=yes');
  structWin.document.write('<' + 'html>\n<' + 'head>\n<' + 'title>Document 
Structure\n<' + 'style type="text/css">\nPRE ' + '{font-size: 24pt}' 
+ '\n\n\n\n<' + 'body bgcolor="#FFFFFF">\n\n<' + 'pre>');
}

function getNodeAndChildrenAsString(indentString,theNode){
    var structString = '';
    var structString = '';
    var nodeString;
    var theChildren = theNode.childNodes;
            
    // If theNode is an element node
    if (theNode.nodeType == '1'){
      nodeString = "<" + "b>" + theNode.tagName + "";
    // Otherwise, if theNode is a text node
    }else if (theNode.nodeType == '3'){
      nodeString = "<" + "b>Text: " + theNode.data;
    // In any other case (and here I'm assuming
    // that the only other case would be a
    // comment node)
    }else{
        nodeString = "<" + "b>Comment: " + theNode.data;
    }

    
    structString += indentString + nodeString;
    structString += '\n';
    
    structWin.document.write(structString);
    
    for (var i=0; i < theChildren.length; i++){
      getNodeAndChildrenAsString(indentString + "  ", theChildren[i]);
    }        
}

function writeClosingTags(){
  structWin.document.write('<' + '/pre>\n\n<' + '/body>\n<' + '/html>');
}

    Text:


    
    STYLE 
      Text:    
      

  Text:   

  BODY 
    Text: 
  
    P 
      EM 
        Text: This
      Text: 
      EM 
        Text: should
      Text: 
      EM 
        Text: have
      Text: 
      EM 
        Text: spaces
      Text:   
Guess I should have said what the point of that little demonstration was... 

My point was: how is IE doing it? I'm sorry I can't propose a solution, since I 
don't know how any of the code works, but it seems like IE must be following 
rules. That is, "whitespace is allowed here, but not here" or some such. I 
would guess that one of the rules in play in the example above is that 
whitespace between tags that are within a paragraph are significant and must be 
preserved as text nodes. Others might be: Whitespace between block-level tags 
is not significant, and therefore need not be preserved as text nodes. Text 
nodes between TR and TD tags is not allowed, and therefore whitespace between 
TR and TD tags is not preserved as text nodes. 

Maybe this is just really hard, or not the way you've tackled this thus far? As 
an engineer on another product, I can't stand it when people who've never seen 
the codebase say, "this should be so easy to implement!", so I won't. ;)
You know, I've looked at this, and I've changed my mind.

I said:  
>My opinion is, if it renders, include it.  If it doesn't render or isn't meant
>to render, exclude it.

But that's a cheap way to approach a fundamental question.  A document's
rendering does not necessarily correspond to how the DOM views it.  Does the DOM
see a <!DOCTYPE > tag?  Yes.  Does the user?  No.

We're dancing around the issue.  Should there or should there not be whitespace
text nodes in the DOM?  We haven't yet figured that out.

Frankly, when I restrict my perspective to that specific question, I am forced
to give those whitespace text nodes the benefit of the doubt.  What reason do we
really have to take them out?  IE's behavior, as we are all well aware, is not
necessarily the correct behavior.  Nor should conveinience always dictate what
we require of Mozilla 1.0.  Sloppy coding by users is what brought people to
condemn Netscape 6.0, because Mozilla and Netscape stopped supporting layers.

I recommend WONTFIX or INVALID.
There is no question on which nodes and where to delete. The spec is very clear
on that. No point in discussing it. The question at hand is how to get CSS
property 'white-space: pre' to work after white spaces were removed from DOM.

>We're dancing around the issue.  Should there or should there not be whitespace
>text nodes in the DOM?  We haven't yet figured that out.

The way I see it, DOM is a mechanism for describing a document structure (HTML).
And HTML specification defines the rules for structure of HTML documents. The
structure described by our DOM violates those rules, so it basically can't be
called an HTML DOM.

We need a different solution for handling 'white-space: pre' from the one we
have now. Also we need different solution for ViewSource, which right now relies
on whitespaces in DOM and doesn't show the actual source of the document! 2
possible approaches are:

1. Storing discarded white space information so it can be used by "pre" or
ViewSource. This will keep ViewSource working, but will not help to show the
*actual* source.
2. Ability to retrieve and use raw portions of code. This is preferable for
ViewSource. However coordinates would have to be preserved for each element for
ViewSource colouring to work. This also would involve re-parsing for "pre", for
it may contain other tags inside of it.

The second approach is more favourable, but looks harder.

So, is this feasible?
Fixing this bug and ViewSource before 1.0 release would be really awesome.
A few comments on the XHTML spec link Alexey posted:

1)  The line in question ("All white space surrounding block elements should be
    removed.") is a "should" not a "must".
2)  By "block" I presume it means things that are declared to be blocks in the
    DTD?  That is, at http://www.zbarsky.org:8000/~bzbarsky/domTest.html the first
    two "Some text" occurences should be on one line with no space between them
    while the second two should be on two separate lines?
In reply to comment 41: The XHTML spec is on crack. See my post in www-talk:
   http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0141.html

I was asked to make a "definitive standards statement". My opinion are my own,
and are thus not normative or anything, but: I would say this bug is a WONTFIX.
In fact I put a comment to that effect in the status whiteboard last July.
Web authors should read:
   http://www.mozilla.org/docs/dom/technote/whitespace/

I'm going to mark this WONTFIX because bz gave me the go-ahead to do so. :-)
Status: NEW → RESOLVED
Closed: 25 years ago22 years ago
Resolution: --- → WONTFIX
To clarify the go-ahead part....  The XHTML specification in question is a
Working Draft.  As such it is most definitely not final.

If one implements what that draft currently says then no strings anywhere in the
DOM would ever have newlines in them; an obviously ridiculous proposition when
one considers the contents of textareas or <pre> elements.  The comment handling
recommended in the draft is also completely bogus -- all the comments would be
gone from the DOM....

I feel fairly certain that this specification's whitespace handling will be
amended before it gets to be a recommendation.  (At least I hope so for the sake
of the sanity of the CSS and DOM specification authors).
Actually, the XHTML specification in question is a REC.
   http://www.w3.org/TR/xhtml1/
Boris, XHTML *is* already a reccomendation. If is final. I have provided a link
to the Second Edition draft which is much more specific on white space handling
than the first edition. Second Edition doesn't change any of these rules, it
just makes them more clear and more precise. You can have a look at non-draft
XHTML Reccomendation as well:
http://www.w3.org/TR/xhtml1/#uaconf

Ian, Boris, are you ready to say that

<div>
</div>     and     <div></div>

are 2 completely different documents DOM-wise? If you are, I'll go with that.
But with my limited knowledge of DOM spec I am really uncomfortable with that
statement.
alexey: Yes, they are different. IMHO. Why wouldn't they be?

Regarding the XHTML spec: Like I said above, it's on crack. It is totally out of
line for the XHTML working group to be laying down rules on how the parser and
the DOM should interact when handling XML. There is no way, IMHO of course, that
we should special-case different namespaces' white-space handling.
Sorry to drag this one up again, but I think that it is very important and I
have a new suggestion. 

Would it be possible to somehow correlate the document with its DTD? If #PCDATA
is not valid in any given element then the children of all elements of this type
could be examined and, if one of its child nodes is an (all whitespace) text
node, is could be removed. In this way at least the 'illegal' text nodes could
be removed so that they don't turn up between </td> <td> etc. These whitespace
text nodes are only there because of the formatting of the source code, and have
nothing to do with whitespace: pre. Of course 

<div>
</div>     and     <div></div>   etc.

would remain different (as they should IMHO) and whitespace: pre would not be
broken which seems to be the main concern. 

Please consider this or some other means of getting rid of at lest the 'illegal'
whitespace text nodes. My concern is that leaving these text nodes in the DOM
after Moz1.0 will also mess things up for scripts working on none HTML documents
such as SVG. Helper functions are all very well, but they are a messy work around. 
This controversy could be dealt with effectively (for the XML DOM, at least) by
implementing a validating XML parser in Mozilla.  Here is the reasoning:

A validating parser would be able to read DTDs.  This would allow us to
determine which elements have xml:space="preserve" in which whitespace *must* be
preserved; we could then change our application convention for "default" to
condense whitespace.  This would mean that xml:space="preserve" elements
retained whitespace within them in the DOM, and that it would be condensed or
stripped away in xml:space="default" elements.  We would also set white-space:
pre on these elements in our html.css, so that they would *display* without
space.  The disadvantage, obviously, is that white-space: pre on other elements
would have no effect.  However, document authors wishing to make use of this
could use an external DTD subset to add the xml:space attribute to any such
elements, which would cause the parser to preserve spaces in them as well, which
they could then reflect or not reflect in display through whitespace: pre.
*** Bug 131169 has been marked as a duplicate of this bug. ***
Actually we can't do any of this at all anyway, otherwise you lose
round-tripping of white-space in document source. e.g. around <table> tags. This
is important for various reasons, e.g. the "generated source" bookmarklet.
*** Bug 144603 has been marked as a duplicate of this bug. ***
*** Bug 147489 has been marked as a duplicate of this bug. ***
WONTFIX?? Aren't you be serious?

My bug report has been marked as a duplicate, but please have a look at it
because I've given suggestion.  It's bug 147489.

If you don't want to look at it, I can summarize what I said:

I don't care about how IE treats it, and I don't care about XHTML.  I just use
the (old) HTML 4.01 and DOM1.  Look at this section in W3C DOM specification:
http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/introduction.html#ID-E7C30821

By looking at the tree inside the figure, I can surely say that those guys in
W3C implicitly agree that there's no #text siblings for block-level elements. 
HTML 4 is very clear about block-level elements and in-line elements.

So the suggestion is simple:
eliminate block-level elements' whitespace siblings,
but preserve in-line elements' whitespace siblings.
*** Bug 147487 has been marked as a duplicate of this bug. ***
Since you have obviously not read this bug... "What happens when the element's 
style is changed from block to inline?"

The problem is that HTML has this concept of block-level elements that is 
totally divorced from the CSS concept of block elements.  Thing is, Mozilla is 
a CSS browser, not an HTML browser.  So if we have to pick which concept 
of "block" we go with we sort of have to choose the CSS one.
Boris,
I've read through about 80% of this whole bug before writing my comment, but
maybe I miss something. So, which comment does your question refer to?
Re comment #54 (which is in response to comment #41):
In HTML 4 spec, there's a section on conformance (2nd paragraph):
http://www.w3.org/TR/html4/conform.html

which further refers to
http://www.ietf.org/rfc/rfc2119.txt

In summary, "should" means "recommended", "but the full implications must be
understood and carefully weighed before choosing a different course." (sic)

On the other hand, when you look up in a dictionary, "should" mostly means
"express obligation" and "must".

IMO, we have to reopen this bug and consider the issue seriously.
HTML does not give any rules for what must appear in the DOM, so the HTML spec
is irrelevant in this case.
> when you look up in a dictionary

This is what that RFC exists for.  To define exactly what those words mean in 
RFCs.  Said meaning has nothing to do with their dictionary meaning; the RFC 
authors chose to pick words that already had existing meanings but they could 
just as easily have come up with brand-new terms and defined them.  "should" 
and "must" are totally different in an RFC and in the HTML specification.

Comment 29 is the comment I was referring to.  Consider taking some <td> or 
<li> nodes and setting them to display:inline....
Re comment #72:
Ok, granted! but DOM does refer back to HTML:
http://www.w3.org/TR/REC-DOM-Level-1/introduction.html#ID-E7C30821

Looking at the last sentence in the paragraph below the figure:
"...if any two Document Object Model implementations are used to create a
representation of the same document, they will create the same structure model,
with precisely the same objects and relationships."

Since the structure model produced by Mozilla is different from what they
represent in the figure, Mozilla has a wrong DOM implementation.  QED!
In complement to my previous comment, look at this (maybe you would say it's
irrelevant because it's XML, not HTML, but I would say they are the same):
http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-745549614

Could you see how they write the XML code?  It's like this:

<elementExample id="demo">
  <subelement1/>
  <subelement2><subsubelement/></subelement2>
</elementExample>  

And they said "... node for "elementExample", which contains TWO child Element 
nodes, ..."

They said TWO, not FIVE!

But if we use Mozilla's DOM implementation, if we take SGML linebreak rule into
account <http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.1>, we have at
least to write the code in this way:

<elementExample id="demo">
<subelement1/><subelement2><subsubelement/></subelement2>
</elementExample>  

If SGML linebreak rule isn't respected, we even have to write the code in this
way in order to produce the same structure:
<elementExample
id="demo"><subelement1/><subelement2><subsubelement/></subelement2></elementExample>

So, either we say HTML, XML and DOM1 specifications are all wrong, or there's
problem in Mozilla's implementation.

As a last word, if we let this state remain in Mozilla 1.0, it won't be accepted
by many people and this bug could be one main responsible of its failure.
Re comment #73 :
Exactly!  And the RFC states clearly that "should" = "recomended", not "optional".

Q: And what _would_ you do if someone _recomends_ you to do something?
A: I think you'd _better_ do it rather than ignoring it.

I talked about dictionary is to point out that if we use the normal meaning of
"should", it's not "recommended" either.
> different from what they represent in the figure

The figure is informative, not normative.

> And they said "... node for "elementExample", which contains TWO child 
> Element nodes, ..."
> They said TWO, not FIVE!

Yes.  #text nodes are not Element nodes.

As the definition of "should" says: "but the full implications must be
understood and carefully weighed before choosing a different course".  They 
have been.  The implications of following that "should" are inconsistent 
behavior and incorrect layout when style is changed on some elements.  Having 
correct layout outweighed the dubious benefit of following a "should" in which 
the HTML specification tries to dictate how the DOM should be constructed 
(something that is outside the scope of the HTML specification).
Re comment #29 :
For the <ul> code:
<ul>
  <li>foo</li>
  <li>bar</li>
</ul>

as I stated before, we just have to strip off block level elements' whitespace
siblings, ie <li>'s whitespace sibling.  In other words, Mozilla's present
implementation gives:

UL
 +--#text
 +--LI
 +--#text
 +--LI
 +--#text

Since LI is block element, this gives
UL
 +--LI
 +--LI
______________________________________________

For SPAN:
<span>two</span> <span>words</span>
Mozilla gives:
 |
 +--SPAN
 +--#text
 +--SPAN

We must not strip the #text node because SPAN is in-line element, so the #text
node is conserved.
______________________________________________

Lastly, for 'white-space: pre' ...... hmmm, I admit that this is very delicate!
 But first of all, we could look altogether at its spec:
http://www.w3.org/TR/REC-CSS1#white-space

Please note that it only applies to block-level elements......
So what?  I don't know yet :(  I'm in Europe and it's evening now.  Let me go
home and think of this issue during the night.

But IMO, there're much more developpers twisting DOM nodes than there're
developpers to use this CSS property.  Or maybe those whitespace siblings could
be made "invisible" in DOM, and when a 'white-space: pre' is used, they're
rendered visible again?
<ul style="display: inline">
  <li style="display: inline">foo</li>
  <li style="display: inline">bar</li>
</ul>

Should there be a space between "foo" and "bar"?  The point is that the 
block/inline distinction for _layout_ purposes is determined by CSS, not HTML.
I'm someone who like to do things according to intuition and common sense, and I
feel really relectant to pick every word somebody else had said, like what those
lawyers did when defending their clients ...... but it seems I've no choice.

Re comment #77 :
Part 1)
I don't see in the page that the figure is only informative, but not nonative. 
And even if granted that it's informative, it's then informative on the
structure as it's written in the paragraph.

Part 2)
Good remark! ... but exactly with the same argument: they didn't talk about text
nodes!
> Looking at the last sentence in the paragraph below the figure:
> "...if any two Document Object Model implementations are used to create a
> representation of the same document, they will create the same structure 
> model, with precisely the same objects and relationships."

If you look at the equivalent sentence in the latest version of the
specification (http://www.w3.org/TR/DOM-Level-2-Core/introduction.html) you'll
see that it now says:

# One important property of DOM structure models is structural isomorphism: if 
# any two Document Object Model implementations are used to create a 
# representation of the same document, they will create the same structure 
# model, in accordance with the XML Information Set.

The XML Information Set (infoset for short) includes white-space nodes.

In fact it goes _on_ to say:

# Note: There may be some variations depending on the parser being used to build
# the DOM. For instance, the DOM may not contain whitespaces in element content
# if the parser discards them.

In other words, we are explicitly within our rights to include the white-space
nodes according to the latest version of the DOM Core Specification.


> Please note that [white-space] only applies to block-level elements......

That is also an error in the spec, and it has been corrected in the recently
published working draft of the next version of the text module. The
'white-space' property applies to all elements and generated content.
> intuition and common sense

Apply those to http://web.mit.edu/bzbarsky/www/testcases/testTextNodes.html 
where IE5.0 has the second <li> as the nextSibling of the first <li> but shows 
space between the two!  Where the hell did that space come from?  Is this the 
behavior you want from Mozilla?



*** Bug 159352 has been marked as a duplicate of this bug. ***
reopening bug, i think this is still an open issue
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
sivarikan, would you care to explain in more detail?  I assume that you've read
the discussion on this bug and have something insightful to add?
#document
  HTML
    HEAD
      TITLE
        #text
      #text  -- not allowed
      SCRIPT
        #text
    #text
    BODY
    ...
    ...
When trying to access the script node, i actually got the text node. 
we should not have a #text node as child of head element.
ref: http://www.w3.org/TR/1998/REC-html40-19980424/struct/global.html#h-7.4.1

This is the HTML parser we're talking about?  Or the XML parser parsing an XHTML
document?
HTML parser
OK.  Could you please file a bug on the parser module saying that text nodes
should not be created where they are illegal per the DTD?  For example, the
following "text" strings should be dropped because there is nowhere in that
document where a textnode would be valid:

<html>
<head>
text
</head>

<body>
text
<dl>
text
</dl>
</body>
</html>

Or did you want to just special-case <head> elements here?
Recommend removing the nsbeta1 keyword as well, as long as we're not certain we 
should do this even in Mozilla, much less Netscape.
Re: comment #89 -- if you do file a separate bug, can you make sure to note the 
table tags case (that text nodes are not allowed between them)? 

I'm cautiously thrilled to think that this discussion could move off of whether 
to preserve whitespace at all and on to whether to create text nodes where they 
are explicitly forbidden, since that was my original argument for reopening 
this bug (comment #3). having said that, thanks much to everyone who's thought 
long and hard about all the issues involved here.
The last comments on this bug are INVALID. While arbitrary text is not allowed
between elements in HTML <head> blocks, text consisting of exclusively
white-space characters _is_ allowed, and no spec that I know of says that this
should not be represented in the DOM.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → INVALID
In fact, looking carefully at the spec, I'd say our behavior is mandated for
XML. (I don't think much of it for HTML, but I generally think that representing
HTML4 with DOM is as bozotic as Appendix C, a view that's unlikely to gain
political traction...)
DOM1, Interface Text: "The Text interface represents the textual content (termed
character data  in XML) of an Element or Attr."
XML 1.0, section 2.4: "All text that is not markup constitutes the character
data of the document."
(Would it have been that painful for XML to distinguish non-significant
whitespace from character data?!)
why should it be represented in DOM when they are not allowed?

Status: RESOLVED → REOPENED
Resolution: INVALID → ---
*sigh* HTML follows the rules of SGML, so the content model of the HTML DTD 
differentiates between whitespace (the "s" production of SGML) and actual 
character data. Whitespace can crop up in the head, even though character data is 
not allowed, because of this distinction. XML and DOM don't make the distinction; 
to them, everything that is not markup is character data. Therefore, things that 
are not character data in HTML/SGML become character data in the DOM. Again, the 
problem is that HTML is being forced into a representation (DOM) which is simply 
inadequate to represent it, and this is one of the places where it shows. (And if 
there's some way we could modify DOM Core to adequately represent SGML-based 
HTML, I'm all for it.)
Sivarikan, you never answered my question from comment 89... What should the DOM
representation of that document be in Mozilla and why?
boris,

this is what i am expecting and yes i opened a bug on parser module.
#document
  HTML
    HEAD
      TITLE
        #text
      SCRIPT
        #text

sivarikan, you must be looking at the wrong HTML... see comment 89 again.
boris, may be i was not clear before, <head> is just an example, in general i am
talking about all the cases where having a text node is illegal according to DTD.
Yeah, that's my point too.  Text is not allowed in <body> by the DTD.  Nor in
<dl>.  Nor in various other places where people commonly put it.  If we made any
attempt to enforce that, large chunks of the web would start failing to render...

For that matter, why pick on this part of the DTD?  Why not use a validating
SGML parser that will completely fail to parse 99% of the pages on the web?
(Think <p> inside <font>, <a> around all sorts of crud, <form> inside <table>,
etc, etc.)

Sorry, but parsing tag soup per DTD is a lost cause.
Not to mention that as choess and hixie pointed out whitespace is not CDATA and
is therefore allowed in <head>...  The DTD does not say anything about the DOM
that is produces, since there is no real concept of a DOM in SGML-land, again as
choess points out.
This specific bug (whitespace nodes existing in the DOM), as described above, is
invalid. Marking as such.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → INVALID
If there was such a thing as "whitespace nodes", then perhaps this would be 
invalid. The problem is that this whitespace is being represented as TEXT NODES 
in the DOM, in places where text nodes are not allowed. That's the logical 
problem. The practical problem is that all these extra text nodes make coding 
for Mozilla/Netscape a nightmare. For those of us who thought standards would 
save the web... well, I guess we were wrong. :(

> in places where text nodes are not allowed

Why are they not allowed there exactly?  PCDATA is not allowed in the source in
those places.  If it's present, that's an error in the source.  Whitespace _is_
allowed in the source there.  The DTD has nothing to do with the DOM representation.

Standards that do not specify behavior (as the DOM standard does not here) can
be a PITA, as you noted.  This is a problem with the DOM standard that pops up
all over the place...
FWIW, I have mostly seen this "bug" exhibited throughout the web in a table
context.  A valid standard compliant workaround for tables is to replace this:

  tableElt.childNodes[0].childNodes[0].childNodes[0];

with this:

  tableElt.tBodies[0].rows[0].cells[0];

Of course there aren't workarounds for everything, but from what I've seen, this
should cure many issues.
> The problem is that this whitespace is being represented as TEXT NODES 
> in the DOM, in places where text nodes are not allowed.

No spec says text nodes aren't allowed there. The HTML spec isn't defined in
terms of the DOM, and the DOM spec isn't defined in terms of SGML DTDs.
n.b. SGML (well, HyTime) does have a DOM-like standard (closer to XML Infoset,
actually), SGML groves. But that's another story.
http://www.codingforums.com/showthread.php?s=&threadid=7028

The above link has a script or two that can be customized to remove the
whitespace nodes in a document.  A few notes:

Re comment 29, if the onus is on the webpage developer (as this bug's invalid
status suggests, and I agree with that), then the webpage developer can simply
move the space inside one of the spans.  True, a stying effect like
text-decoration:underline can spoil things a little, but other styling effects
might be able to achieve the same whitespace rendering in the document.  The
page developer simply has to be careful.
Status: RESOLVED → VERIFIED
Whiteboard: WONTFIX? → http://mozilla.org/docs/dom/technote/whitespace/
so here's a thought. Strings have a shared, common empty buffer that is used
whenever a string is empty, to guarantee that a string has a non-null value.

Could we use the same approach here? Have a single shared #text node that
represents all unstyled text nodes in the tree? Making things like the .parent
node might be ugly, but maybe a simple stack-based wrapper around the real node
in places where you need to access the text node would suffice?

does this sound feasable enough to file a bug?
*** Bug 178508 has been marked as a duplicate of this bug. ***
*** Bug 179709 has been marked as a duplicate of this bug. ***
Hmm... this bug makes for a very interesting read!

<angry swedish web developer mode>

So the solution IN REAL LIFE (=to get it to work in the 2 major browsers on
earth) is to code everything on one row, so my example from bug 179709 would
look like this:

<table border="1"><tr id="TableRow" onClick="getLastChild(this);"><td
id="A">A</td><td id="B">B</td></tr></table>

Yes, Ahaa, Mmm... it really feels like a BIG STEP FORWARD folks! ;)

</angry swedish web developer mode>

I just hope there is a way to solve this "problem" (no, I don't want to have to
use helper functions), because this is really realy bad for Mozilla acceptance
among the "I only code for IE because it's got biggest market share" group. (of
which I'm not a member)

Sorry to have taken up your time!
If you're needing this for tables only, you can use DOM HTML to your advantage.

  var myTable = document.getElementById("myTable");
  var lastTBody = myTable.tBodies[myTable.tBodies.length - 1];
  var lastRow = lastTBody.rows[lastTBody.rows.length - 1];
  var lastCell = lastRow.cells[lastRow.cells.length - 1];

Of course, not every HTML element has a similar API...
*** Bug 189467 has been marked as a duplicate of this bug. ***
*** Bug 196983 has been marked as a duplicate of this bug. ***
To follow up the <angry swedish web developer mode> comment here is:

<angry dutch web developer mode>

I'm slightly shocked after reading this bug from top to bottom. We have an
expression for this in dutch which translates as "Operation succesful, patient
deceased". It seems that you (the Mozilla developers) have succeeded in
defending your point of view to the point that using the DOM becomes fairly useless.

I would like to use a function like:

function toggle(thingy) {
  elem = thingy.nextSibling;
  if(elem.style.display == "none") { display(elem); } else { hide(elem); }
}
  
on some code like:

<h3 onclick="toggle(this)">Header</h3>
  <table>
    <tr><td class="desc">Content</td><td>Content2</td></tr>
    <tr><td class="desc">Content</td><td>Content2</td></tr>
  </table>

Which according to your Vulcan Logic (TM) does not work because there is a text
node between the </h3> and the <table>.

I'd have to write it as:

<h3 onclick="toggle(this)">Header</h3><table>
    <tr><td class="desc">Content</td><td>Content2</td></tr>
    <tr><td class="desc">Content</td><td>Content2</td></tr>
  </table>

Hurray for legibility!!!

Any web developers remember writing code like this to get a background color on
a thin table cell?:

<TD><FONT SIZE=1>&nbsp;</FONT></TD>

Or remember when

<TR>
  <TD>
    <IMG HEIGHT=10 SRC="bg.gif">
  </TD>
</TR>

Had to be written as 

<TR><TD><IMG HEIGHT=10 SRC="bg.gif"</TD></TR>

to render without extra white-space?

I thought/hoped we had left those dark ages behind us.

No matter how "right" you are, I predict many, many duplicates for this bug. IE
DOM developers are going to run, not walk, away from Mozilla, since any
next/previousSibling function is useless without a wrapper. This behaviour is
going to bite and discourage the vast majority of beginning DOM developers,
which is unfortunate, as the idea behind the DOM is really neat.

</angry dutch web developer mode>

Sorry for the rant, but I really hope you will reconsider your point of view on
this issue.
Which part of:  "Either the text nodes are there or the layout may be 
incorrect" do people fail to understand?  Yes, it sucks.  No, the CSS and DOM 
specs leave no other choice.  Please raise this point with the W3C if you don't 
like it.
I reported this duplicate.
http://bugzilla.mozilla.org/show_bug.cgi?id=196983

I think these new-lines and white-spaces are not  HTML Contents.
It so problem....
*** Bug 206729 has been marked as a duplicate of this bug. ***
*** Bug 214943 has been marked as a duplicate of this bug. ***
*** Bug 217842 has been marked as a duplicate of this bug. ***
*** Bug 221364 has been marked as a duplicate of this bug. ***
*** Bug 252684 has been marked as a duplicate of this bug. ***
*** Bug 258564 has been marked as a duplicate of this bug. ***
*** Bug 263813 has been marked as a duplicate of this bug. ***
As others have pointed out, this issue will cause much grief for developers 
attempting to walk DOM trees.

My primary concern at the moment revolves around XML in an XHTML document (e.g. 
something pulled down by an XMLHttpRequest).  With no formal XML parser or DTD 
validator, these text nodes become both extraneous and erroneous.

If there is to eventually be a formal XML parser then fine, but if childNodes() 
and the like are to be used to walk XML as well then we will all be severely 
crippled at best.

For example, as others have pointed out, the following XML:

<myData>
  <myChilddata>
    <data1>A</data1>
    <data2>B</data2>
  </myChilddata>
</myData>


...under current circumstances is completely different from the following XML:

<myData><myChilddata><data1>A</data1><data2>B</data2></myChilddata></myData>

Something will definitely need to be done about this -- either within the 
context of this bug or a more formalized XML-oriented solution.
This is a parser versus application/render issue.
With the latest specs it is clear that this bug is INVALID.

XML parser MUST preserve white space and pass it to application according to
both XML 1.0 and XML 1.1 specs:
http://www.w3.org/TR/REC-xml/#sec-white-space
http://www.w3.org/TR/xml11/#sec-white-space

XHTML 1.0, places a burden of defining white space handling onto CSS2:
http://www.w3.org/TR/xhtml1/#uaconf

Further XHTML languages don't even want to deal with whitespaces leaving it all
to CSS:
http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/introduction.html#s_intro_formatting

However CSS2 does a pretty poor job at this:
http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop

But this is fixed in the upcoming CSS 2.1 standard:
http://www.w3.org/TR/CSS21/text.html#q8

And made even more complex in CSS3:
http://www.w3.org/TR/css3-text/#white-space-processing

Archaic HTML talks about how to "render" whitespaces, not about how to "parse"
them either:
http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks

DOM is simply a Parser's output. As such, whitespaces belong to the DOM
application, and it is up to CSS to decide what to do with them.

You really can't say 2 documents are identical just because they have same
markup. They are not. And CSS is the way to show the differences. As such DOM
should reflect this difference.
*** Bug 299108 has been marked as a duplicate of this bug. ***
*** Bug 311654 has been marked as a duplicate of this bug. ***
<angry german web developer mode>
HTML is still written by humans and has to be readable to them. 

FF's pedantic reading of the specs (which aren't that specific) won't persuade
the other browser's developers to adopt your (exact and even maybe right) point
of view but cause other web developers to ignore FF interpretation of their
code, resulting in less comfort for FF's users.
</angry german web developer mode>

Does whitespace and only whitespace between tags transport so much information
that worth? Worth the protectionism?
*** Bug 315938 has been marked as a duplicate of this bug. ***
My god dudes. Why are we all sitting on our hands about this?

This is far from a minor irritation. It renders nextSibling, lastChild, firstChild, and any kind of indexing useless. Not just in Firefox, but in the Web at large since we develop applications that respect Gecko these days. If that proves too hard then we'll just develop for IE and tell our customers about it's larger market share and what it would cost to additionally support Netscape like we're used to.

The fact that a portion of the DOM doesn't make any sense when implemented like this should be enough to stop you worrying if the spec explicitly states it or not. I'll call this comment a spec if it makes you more comfortable.

I'm considering learning C++ so I can fix this, please someone beat me to it.
> I'll call this comment a spec if it makes you more comfortable.

Your comment doesn't make a good spec, since it's self-contradictory.
> Your comment doesn't make a good spec, since it's self-contradictory.

Do excuse the use of language, but I don't think many are giving this the priority it deserves. It's amazing that I try to do a bit of DOM Level 1 coding and find that Mozilla recommend I put a mass of helper functions in to circumvent a bug. Moreover, the bug has been open for ages and half the people who have looked at it don't think it's a bug. It makes some core Level 1 properties unusable in a sane way, of course it's a bug.

This is out in the wild in Firefox 1.5, it's going to become difficult to change soon and it will mean that the DOM standard failed as an interoperable system for coding. We'll have to get back to sniffing to work out which type of implementation we have and versioning the code. I got the impression that you guys were with me in wanting to see the end of that kind of pain.

Just drop the empty text nodes when parsing white-space:normal nodes. There is no functional use for them when programming but a lot of fiddling when writing many kinds of interactive page. Having to join all the lines up on a page to get a portable DOM is synonymous with the kind of bug that made NS4 most developers least favourite browser.
> It makes some core Level 1 properties unusable in a sane way, of course it's a
> bug.

No, just because the properties are less useful does NOT make the behavior a bug -- the behavior is allowed by the DOM spec and required by other W3C specs.

> it's going to become difficult to change soon

It's not going to be changed.  That's why the bug is resolved "invalid".  There's no way to change it without breaking basic CSS functionality or violating either the DOM or CSS spec (or both, as in IE).

> Just drop the empty text nodes when parsing white-space:normal nodes.

They don't have a "white-space" value while being parsed.  If you don't understand that, I suggest you actually read the DOM and CSS specs until you do.

We're as sorry as you are that IE is breaking the DOM and CSS specs this badly, but there's nothing we can do about IE bugs other than complaining to Microsoft (which I urge you to do).
Is there any definite decision how this will be handled in coming releases of Firefox?!? This behaviour and interpretation seems to be controversial.

We have customers requesting functionality that worked with Firefox 1.0 and complaining about problems because of the changed behaviour in Firefox 1.5.

We have to use kind of excessive DOM-processing in our application, and we need to be sure that we change things for the current release and don't have to fix things for coming releases again.


(In reply to comment #136)

> Is there any definite decision how this will be handled

Yes -- as it is now.

> This behaviour and interpretation seems to be controversial.

It's not.

> We have customers requesting functionality that worked with Firefox 1.0 and
> complaining about problems because of the changed behaviour in Firefox 1.5.

This behavior didn't change and isn't planned to change, so I don't see how what you said is relevant.
(In reply to comment #137)
It did change from Firefox 1.0 to 1.5.

At least in reference to what i originally posted under https://bugzilla.mozilla.org/show_bug.cgi?id=320353
> https://bugzilla.mozilla.org/show_bug.cgi?id=320353

Parsing of <frameset> in particular was buggy on the 1.7 branch; that's been fixed.
*** Bug 324195 has been marked as a duplicate of this bug. ***
*** Bug 324195 has been marked as a duplicate of this bug. ***
*** Bug 326078 has been marked as a duplicate of this bug. ***
(In reply to comment #137)
> > Is there any definite decision how this will be handled
> Yes -- as it is now.

Her famous last words... :-) We (developers) had similar answers about innerHTML way ago (Netscape 6 betas).

Phantom nodes preservation *is* a mistake - more over not directly required by W3C. You'll have to change it sooner or later, but later it is - more expensive it will be, so why not start sooner?

FOr the transition period you could (as suggessted by some) to make a flag preservePhantomNodes one could turn on if needed - but off by default.

You even can make preservePhantomNodes == true by default if you really want to.

But the problem needs to be addressed. Right now all your supporters have to write additional filters atop of all node-related DOM methods, which is totally abnormal IMHO. Do you really think it's just fine?
> Her famous last words... :-) We (developers) had similar answers about
> innerHTML way ago (Netscape 6 betas).

Agreed. innerHTML, document.all, the list goes on...  Is IE 7.0 going to now start preserving phantom nodes even though prior versions didn't, all in the name of the spec?  I doubt it.

I've personally handled a few hundred business cases where phantom nodes weren't wanted or needed.  I'm curious - has anyone actually run into a business case where they *did* need them to remain?

(In reply to comment #143)
> But the problem needs to be addressed. Right now all your supporters have to
> write additional filters atop of all node-related DOM methods, which is totally
> abnormal IMHO. Do you really think it's just fine?

Hear hear. I'm a jobbing web developer and getting sick of writing out loops to find the nodes I want when there are supposedly DOM mechanisms. More complicated DOM work turns into a plethora of nested loops and it gets difficult to keep track. Since any code already created is catering for both manifestations of the DOM, getting rid of this bug shouldn't break any web sites. Please please write it out.
*** Bug 329019 has been marked as a duplicate of this bug. ***
*** Bug 332821 has been marked as a duplicate of this bug. ***
I'm talking here about HTML case, i'm not and will not consider XML.
DOM tree are not walkable easily with Fx, i urge you guys to reconsider the invalid state.
With so many Duplicate, dont you think your point (not removing phantom node) is plain wrong ?

<p>

  <span>one</span>

  <span>two</span>

</p>

It's plain and simple, the P tag has ONLY 2 childs, no reason for Firefox to create new text phantom nodes. Especially since NO OTHER browser is acting this wrong way.
From my side of view, the white-space:pre issue is not a real issue. You can eventually keep an internal DOM representation for all the CSS rendering issues you encounter, but give us the excepted DOM representation without thoses ugly phantom text nodes. nextSibling and such is TOTALLY unusable the way it is right now. And the solution which consists of using some (firefox only) helpers functions from comment #11 is wrong. We dont need thoses nodes, we dont want them and worst we have to use some loops (in firefox ONLY) to fix them.

Why do you force me to use this kind of js, or the helpers functions from comment #11, everytime i want to use the childnodes and xxxxxSibling() from DOM ?

DOM.cleanWhitespace = function(element, recursif)
{
  element = (typeof element == 'string')? $(element):element;
  for (var i=element.childNodes.length-1; i>=0; i--)
  {
    var node = element.childNodes[i];
    if (node.nodeType == document.TEXT_NODE)
    {
      var nodeValue =  node.nodeValue.trim();
      if (nodeValue === '' || nodeValue == '\n' || nodeValue == '\t' || nodeValue == '\r' || nodeValue == '\r\n' || nodeValue == '\n\r')
      {
        element.removeChild(node);
      }
    }
    else if (node.nodeType == document.ELEMENT_NODE)
    {
      DOM.cleanWhitespace(node, recursif);
    }
  }
};

Please reconsider the invalid state. Thanks.

So many Duplicate (31) bugs means something imo. When 1 or 2 of my users are reporting a bug, i consider it a specific issue and i fix for them only (helpers functions), but when 31 users are reporting the exact same issue, it's my duty to reconsider my position (even if i'm right) and fix the issue.

147 comments, 31 duplicate = 21,08% of the comments

The issue is real, the invalid state of this bug is just wrong. I though standards would help us to stop using browser specific code. You are breaking it here by not resolving the bug :(

#5 Dup 48560
#7 Dup 62269
#13 Dup 65658
#16 Dup 89782
#19 Dup 104785
#23 Dup 114749
#24 Dup 118213
#63 Dup 131169
#65 Dup 144603
#66 Dup 147489
#68 Dup 147487
#83 Dup 159352
#110 Dup 178508
#111 Dup 179709
#114 Dup 189467
#115 Dup 196983
#119 Dup 206729
#120 Dup 214943
#121 Dup 217842
#122 Dup 221364
#123 Dup 252684
#124 Dup 258564
#125 Dup 263813
#128 Dup 299108
#129 Dup 311654
#131 Dup 315938
#140 Dup 324195
#141 Dup 324195
#142 Dup 326078
#146 Dup 329019
#147 Dup 332821

> The issue is real, the invalid state of this bug is just wrong. I though
> standards would help us to stop using browser specific code. You are breaking
> it here by not resolving the bug :(
> 
> #5 Dup 48560
> #7 Dup 62269
> #13 Dup 65658
> #16 Dup 89782
> #19 Dup 104785
> #23 Dup 114749
> #24 Dup 118213
> #63 Dup 131169
> #65 Dup 144603
> #66 Dup 147489
> #68 Dup 147487
> #83 Dup 159352
> #110 Dup 178508
> #111 Dup 179709
> #114 Dup 189467
> #115 Dup 196983
> #119 Dup 206729
> #120 Dup 214943
> #121 Dup 217842
> #122 Dup 221364
> #123 Dup 252684
> #124 Dup 258564
> #125 Dup 263813
> #128 Dup 299108
> #129 Dup 311654
> #131 Dup 315938
> #140 Dup 324195
> #141 Dup 324195
> #142 Dup 326078
> #146 Dup 329019
> #147 Dup 332821

Fully completely sustained. "Firefox vs Users" is an opposition I would not imagine in my nightmare...

Besides endless dups just search the Web for relevant blogs and forums. What's wrong to mark it ACTIVE BLOCKING and add a flag like -moz-preserve-phantom-nodes we could set to false (leave it default true, no problem). What's wrong in doing it in the next minor update rather then fight for nothing till the last bullet?

(Not sure though if this whole thread is not in a "kill file" anyway)
> (Not sure though if this whole thread is not in a "kill file" anyway)

It pretty much is, since none of the people doing the talking, yourself included, seem to understand what's going on with the relevant standards, much less the changes they want made in the rendering engine.

I will make one more futile attempt to set people straight, however.  I advise reading closely.

What you seem to want is a fundamental rewrite of the CSS engine in Gecko so it's not DOM-based (like IE).  Then you want the DOM exposed to the web page to not reflect the actual data structures the browser has in memory but rather to depend on the CSS formatting.  All of that that seems like a poor idea, esp. since it would introduce a lot of IE's CSS bugs; bugs that are basically due to its internal representation NOT being a DOM.

Note that Opera 8.5  behaves like Gecko does on the testcase in this bug and on other testcases I've tried for this behavior, in both standards and quirks mode.  So does Konqueror in standards mode (can't test it in quirks mode at the moment).  So does Safari, last I checked (I don't have a Safari build on me right now).

Given that, the "Especially since NO OTHER browser is acting this wrong way" crap in comment 148 simply indicates to me that laurent vilday likes to make claims without testing them.  In fact, every single modern browser other than IE  (we're going to consider IE a modern browser here for the sake of argument) behaves like Gecko does.

I would also like to reply to one other part of comment 148.  Specifically, the part where it says "So many Duplicate (31) bugs means something imo."  What that means is that 31 of the people out there who like coding to IE's mis-implementation of the DOM spec filed bugs requesting that we introduce the same bugs.  No more, no less.  Given the number of "IE-only" sites out there even now, this number is not really all that surprising, at least to me.
(In reply to comment #150)

> What you seem to want is a fundamental rewrite of the CSS engine in
> Gecko so it's not DOM-based (like IE).

We don't need to sound so tragic. We (accounting at least 31 filed dups I can use "we") simply want a possibility to /choose/ the most convenient way to handle the DOM.
 
W3C Box Model is more than disconvenient yet there are maybe people who just love it and cannot live without it. So Mozilla just added -moz-box-sizing so anyone could "go to the hell by his own road" :-) It seems that it did not cause the sky to fail onto the earth, didn't it?

The same way it is needed to give the same choice in DOM Tree structure: some -moz-preserve-everything or so. Whoever likes phantom nodes and even sees some usee for them is welcome to leave the default "yes". The rest can set it to "false". The phantom nodes are being digged out anyway by parser from the source pretty-print, so the relevant patch would take a line or two (check false/true in a flag and either throw away or add to the tree).

I can imagine some circumstances where it is needed to switch a fragment from parsed to <pre> state and back - or to restore the source byte-in-byte as it came from the server. But these are /occasional/ usages as opposed to the mass usage, and really shouldn't be the subject of such intensive preoccupation. Still -moz-preserve-everything (or whatever) set to true takes care even of these occasional situation.

> I would also like to reply to one other part of comment 148.  
> Specifically, the part where it says "So many Duplicate (31) bugs means something imo."  
> What that means is that 31 of the people out there who like coding to IE's
> mis-implementation of the DOM spec filed bugs requesting that we introduce the
> same bugs.

Then my question is: how many bugs to you need to be filed to admit that there is something rotten in the kingDOM? 310? 3,100? 31,000? I believe NN6 failed for innerHTML after 2,000 or so claims. Do you really need another Chartist Movement :-))

Also please note that these are not "just 31 people". These are active Firefox supporters bothered to learn about bugzilla, open an account, prepare testcase and file bug properly. I would easily add behind each of them at least 100 end users who just did not have time for all of that or who was not aware of bugzilla, or simply dropped Firefox. IMHighlyHO.
VK, this bug is about what getFirstChild and getNextSibling return.  Those are defined by the DOM spec and return whatever the DOM has.  If what you want are separate methods to access only parts of the DOM (similar to what the SVG 1.2 Tiny spec has -- they only see Element nodes), then feel free to file _separate_ bugs on that.  Please make sure to clearly define exactly what your proposed methods should return in all cases.  Once there's a clear need established and a clear description of what the methods should do (which was the situation with innerHTML), implementing them can actually be discussed in a reasonable way.

Also, if you have issues with the DOM spec you may want to consider raising them with the W3C so that _all_ browsers would implement these methods you want.  Unless you plan to write script that only works in Gecko but breaks in Opera or Safari or Konqueror?
(In reply to comment #152)

Phantom nodes filed as Bug #339511

> VK, this bug is about what getFirstChild and getNextSibling return.

Not really, it is about "Mozilla reports phantom text nodes in the DOM tree" as the bug description states. Native DOM methods problems is just one of outcomes.

> Those are defined by the DOM spec and return whatever the DOM has.

Last two weeks (when I had free time of course) I spent by trying to find these definitive DOM specs and failed. It seems though that it is the same with all other researchers. All I see is "somebody said something, and as we did not find any better place, we just dumped it in here as emty text nodes" - and I was really careful in reading your arguments in this thread. Yet I might missed something vitally important.

> then feel free to file _separate_ bugs on that.

Not a separate bug really, but the same old mistake reviewed after six years once over again: with new facts and in a whole new situation.
I also linked some testcases which (I think) will be a big surprise to you in application of "DOM specs" ;-)

Phantom nodes filed as Bug #339511
*** Bug 339511 has been marked as a duplicate of this bug. ***
(In reply to comment #154)
> *** Bug 339511 has been marked as a duplicate of this bug. ***

By mistake which is corrected now. Please note that bug #339511 is a feature request, not a "bug" (as something contradicting to the declared behavior). 

*** Bug 339511 has been marked as a duplicate of this bug. ***
*** Bug 339766 has been marked as a duplicate of this bug. ***
A friend of mine, Joao Eiras from Portugal,who is a w3c mailing member, he gave me a valious hint:

TO WORK WITH TABLES THE BEST APPROACH IS table.rows[y].cells[x]

I think it solves 90% of the trouble... (using the correct method)

The funniest thing is that I developed a Javascript Self Explorer (available at http://sitedosergio.sitesbr.net - inside the Computings > Javascript menu) that shows me that exist "rows" and "cells" properties for table objects... but it's hard to know or remember everything ...  :D 

Thanks Joao !
*** Bug 364248 has been marked as a duplicate of this bug. ***
I have seen the following workaround somewhere: embed your readability white space into tags, i.e. use <tag ></tag > instead of <tag> </tag>.  In this way the source code contains whitespace and it is more readable and the whitespace gets consumed by the tag parser and does not make it into the DOM tree.  This trick, however, does not work for HTML comments, which enter the DOM tree as comment nodes.  While you have complete control on where you put your comments, it is not possible to hide them from the DOM.  And adding a comment to HTML or removing one can break your script, which is even more unexpected.  That is going to happen when a future maintainer decides he needs an annotation here and there!
All in all, it seems there is no reliable way to handle this except for using helper functions.
(In reply to comment #29)
> Which nodes are you proposing to eliminate?  Only the ones inside start-tags
> and
> end-tags, or all of the ones you can (whatever that means)?  If the former, how
> do you plan to handle the childNodes array for something like:
> 
> <ul>
>   <li>foo</li>
>   <li>bar</li>
> </ul>
> 
> I think the DOM benifits of such an approach are minimal.
> 
> If the latter, how do you plan to ensure that <span>two</span>
> <span>words</span> aren't merged together?
> 
> Which does IE do?

IE blindly eats up white space between elements and inside elements whether it is appropriate or not:
<http://www.microsoft.com/communities/newsgroups/list/en-us/default.aspx?&lang=en&cr=us&guid=&sloc=en-us&dg=microsoft.public.internetexplorer.general&p=1&tid=925c63ae-5f0b-452b-8b61-a5d5a67a6330&mid=925c63ae-5f0b-452b-8b61-a5d5a67a6330>


> 
> In all cases, how do you plan to keep 'white-space: pre' working?
> 

My problem with the current implementation relates to loading up XML documents. I can understand that you need to keep the white-space around with HTML so that "white-space: pre" works, however I don't see the need for it when loading up XML documents. Given that an XML document is a representation of data, having the white-space nodes there does not make any sense at all.

However, I'm not just raking up the old arguments again, I do have a question to ask:

I'm using:

    XmlDoc = document.implementation.createDocument;

followed by:

    XmlDoc.load('xmlDoc.xml');

to load up an XML document. Is the load method used by the browser when it is creating the DOM of an HTML page?

Because if not, can the load method just not create white-space nodes? Or take an additional boolean parameter to specify it?

Failing that, I see that the third parameter for the document.implementation.createDocument method is not implemented yet - can it not be a boolean to switch white-space node creation on or off?
You should rather load XML data using XMLHttpRequest.
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in before you can comment on or make changes to this bug.