Closed Bug 285830 Opened 20 years ago Closed 19 years ago

textContent ignores whitespace

Categories

(Core :: DOM: Serializers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.8beta3

People

(Reporter: jason.barnabe, Unassigned)

References

Details

(Keywords: testcase)

Attachments

(2 files, 1 obsolete file)

Attached file testcase
To be precise, the plaintext serializer in "raw" mode is what munges the whitespace... See http://lxr.mozilla.org/seamonkey/source/content/base/src/nsGenericElement.cpp#233 That shouldn't be happening, should it? Quite apart from whatever the behavior of GetTextContent() is supposed to be?
Assignee: general → dom-to-text
Component: DOM → DOM to Text Conversion
OS: Windows XP → All
QA Contact: ian
Hardware: PC → All
Yeah, I think we're backward here. Seems the only other thing that uses OutputRaw is .innerHTML (on the HTMLSerializer) and IE strips out all whitespace there, but we leave it in for that.... We need to fix textContent to keep whitespace, and innerHTML to strip it.
The HTML and plaintext serializers do quite different things with OutputRaw... I can get the spaces to not be stripped if I use OutputPreformatted. But then things like the newline in: <body> <pre> are included in the textContent. Should they be? Again, the spec is a little vague...
Attached patch .textContent rewrite (obsolete) — Splinter Review
I'd already rewritten it using content iterators before discovering this bug. (I was considering writing nsNode3Tearoff::AppendTextContentTo instead). It has the advantage of working on nodes without documents ;-) Both this method and nsIDOMRange::ToString preserve whitespace nodes.
Comment on attachment 183898 [details] [diff] [review] .textContent rewrite >Index: nsGenericElement.cpp ... >+ while (!iter->IsDone()) { >+ nsIContent* content = iter->GetCurrentNode(); >+ if (content->IsContentOfType(nsIContent::eTEXT)) { >+ nsITextContent* textContent = NS_STATIC_CAST(nsITextContent*, content); Eep! Use a real QI here. There's no guarentee that the above will work. r=me with that
Attachment #183898 - Flags: review+
We still want to fix the serializer brokenness, though...
True, but there's no reason to take the detour through a serializer when doing .textContent, so that is IMHO a different bug.
From the DOM level 3 core spec: On getting, no serialization is performed, the returned string does not contain any markup. No whitespace normalization is performed and the returned string does not contain the white spaces in element content (see the attribute Text.isElementContentWhitespace [p.96] ). Does that mean that getting textContent should NOT get the whitespace and we have the correct behavior?
No, becuase we stripped the whitespace too often, not just when it was marked as ElementContentWhitespace. We currently don't support ElementContentWhitespace, and technically aren't required to since we're not a validating parser.
Attached patch QI versionSplinter Review
Transferring sicking's r=.
Attachment #183898 - Attachment is obsolete: true
Attachment #184581 - Flags: superreview?(jst)
Attachment #184581 - Flags: review+
Comment on attachment 184581 [details] [diff] [review] QI version sr=jst
Attachment #184581 - Flags: superreview?(jst) → superreview+
Attachment #184581 - Flags: approval1.8b3?
Comment on attachment 184581 [details] [diff] [review] QI version a=shaver
Attachment #184581 - Flags: approval1.8b3? → approval1.8b3+
Fix checked in.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
*** Bug 295771 has been marked as a duplicate of this bug. ***
The contents of the pre in the testcase are "there is whitespace here and here and here !<!--hey, why not some comments too?-->" and it outputs like "there is whitespace here and here and here !hey, why not some comments too?" Is it the expected behaviour of textContent to strip the whitespace between the two "and here"s and between the start of the last line and the exclamation mark? Also, I've filed bug 296657 on how it's outputting the comment node.
Neil: I didn't tell you to remove the IsContentOfType check ;) The whitespace between the two "and here"s isn't stripped, it's just that the alert function won't show it. Try doing just alert("hello hello")
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Which brings the obvious question. Why do we have nsITextContent? Couldn't we just move those functions to nsIContent? Right now QI-ing to nsITextContent is very randomly implemented, textnodes and comments QI to it, but not PIs.
Depends on: 296657
Uh, guys, comment nodes and PIs shouldn't be appearing in textContent. Please look again at the table at: http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent
(In reply to comment #18) > Right now QI-ing to nsITextContent is > very randomly implemented, textnodes and comments QI to it, but not PIs. Only PIs don't QI to it, because that wouldn't make sense. PIs have a target/data pair, whereas the others have just some text data.
True, though I still don't see a reason to have to QI to nsITextContent at all. I can't think of any case where a QI to nsITextContent gives you any additional information. In most cases you're only interested in textnodes, and in no case I can think of you're interested in just comments or textnodes. If you're interested in nodevalues you still have to check for PIs separatly.
Neil, is this last patch in? Is the bug fixed?
This bug was fixed on 1st of June and bug 296657 was fixed on the 8th...
So is there a reason this is still open?
Status: REOPENED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.8beta3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: