textContent ignores whitespace

RESOLVED FIXED in mozilla1.8beta3



14 years ago
13 years ago


(Reporter: jason.barnabe, Unassigned)




Firefox Tracking Flags

(Not tracked)



(2 attachments, 1 obsolete attachment)


Comment 1

14 years ago
Created attachment 177210 [details]
To be precise, the plaintext serializer in "raw" mode is what munges the
whitespace...  See

That shouldn't be happening, should it?  Quite apart from whatever the behavior
of GetTextContent() is supposed to be?
Assignee: general → dom-to-text
Component: DOM → DOM to Text Conversion
OS: Windows XP → All
QA Contact: ian
Hardware: PC → All
Yeah, I think we're backward here.  Seems the only other thing that uses
OutputRaw is .innerHTML (on the HTMLSerializer) and IE strips out all whitespace
there, but we leave it in for that.... We need to fix textContent to keep
whitespace, and innerHTML to strip it.
The HTML and plaintext serializers do quite different things with OutputRaw...

I can get the spaces to not be stripped if I use OutputPreformatted.  But then
things like the newline in:


are included in the textContent.  Should they be?  Again, the spec is a little
Created attachment 183898 [details] [diff] [review]
.textContent rewrite

I'd already rewritten it using content iterators before discovering this bug.
(I was considering writing nsNode3Tearoff::AppendTextContentTo instead).
It has the advantage of working on nodes without documents ;-)
Both this method and nsIDOMRange::ToString preserve whitespace nodes.
Comment on attachment 183898 [details] [diff] [review]
.textContent rewrite

>Index: nsGenericElement.cpp
>+  while (!iter->IsDone()) {
>+    nsIContent* content = iter->GetCurrentNode();
>+    if (content->IsContentOfType(nsIContent::eTEXT)) {
>+      nsITextContent* textContent = NS_STATIC_CAST(nsITextContent*, content);

Eep! Use a real QI here. There's no guarentee that the above will work.

r=me with that
Attachment #183898 - Flags: review+
We still want to fix the serializer brokenness, though...
True, but there's no reason to take the detour through a serializer when doing
.textContent, so that is IMHO a different bug.

Comment 9

14 years ago
From the DOM level 3 core spec:

On getting, no serialization is performed, the returned string does not contain
any markup.
No whitespace normalization is performed and the returned string does not
contain the
white spaces in element content (see the attribute
Text.isElementContentWhitespace [p.96] ).

Does that mean that getting textContent should NOT get the whitespace and we
have the correct behavior?
No, becuase we stripped the whitespace too often, not just when it was marked as
ElementContentWhitespace. We currently don't support ElementContentWhitespace,
and technically aren't required to since we're not a validating parser.
Created attachment 184581 [details] [diff] [review]
QI version

Transferring sicking's r=.
Attachment #183898 - Attachment is obsolete: true
Attachment #184581 - Flags: superreview?(jst)
Attachment #184581 - Flags: review+
Comment on attachment 184581 [details] [diff] [review]
QI version

Attachment #184581 - Flags: superreview?(jst) → superreview+


14 years ago
Attachment #184581 - Flags: approval1.8b3?
Comment on attachment 184581 [details] [diff] [review]
QI version

Attachment #184581 - Flags: approval1.8b3? → approval1.8b3+
Fix checked in.
Last Resolved: 14 years ago
Resolution: --- → FIXED

Comment 15

14 years ago
*** Bug 295771 has been marked as a duplicate of this bug. ***

Comment 16

14 years ago
The contents of the pre in the testcase are

"there is whitespace here

and here     and here
	!<!--hey, why not some comments too?-->"

and it outputs like 

"there is whitespace here

and here and here
!hey, why not some comments too?"

Is it the expected behaviour of textContent to strip the whitespace between the
two "and here"s and between the start of the last line and the exclamation mark?

Also, I've filed bug 296657 on how it's outputting the comment node. 
Neil: I didn't tell you to remove the IsContentOfType check ;)

The whitespace between the two "and here"s isn't stripped, it's just that the
alert function won't show it. Try doing just alert("hello      hello")
Resolution: FIXED → ---
Which brings the obvious question. Why do we have nsITextContent? Couldn't we
just move those functions to nsIContent? Right now QI-ing to nsITextContent is
very randomly implemented, textnodes and comments QI to it, but not PIs.
Uh, guys, comment nodes and PIs shouldn't be appearing in textContent. Please
look again at the table at:
(In reply to comment #18)
> Right now QI-ing to nsITextContent is
> very randomly implemented, textnodes and comments QI to it, but not PIs.

Only PIs don't QI to it, because that wouldn't make sense. PIs have a
target/data pair, whereas the others have just some text data.
True, though I still don't see a reason to have to QI to nsITextContent at all.
I can't think of any case where a QI to nsITextContent gives you any additional
information. In most cases you're only interested in textnodes, and in no case I
can think of you're interested in just comments or textnodes. If you're
interested in nodevalues you still have to check for PIs separatly.
Neil, is this last patch in?  Is the bug fixed?
This bug was fixed on 1st of June and bug 296657 was fixed on the 8th...
So is there a reason this is still open?


13 years ago
Last Resolved: 14 years ago13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.8beta3
You need to log in before you can comment on or make changes to this bug.