Closed
Bug 89780
Opened 23 years ago
Closed 20 years ago
Mozilla wrongly converts space to CR+LF in innerHTML.
Categories
(Core :: DOM: Serializers, defect, P2)
Core
DOM: Serializers
Tracking
()
RESOLVED
FIXED
mozilla1.8beta1
People
(Reporter: val, Assigned: bzbarsky)
References
Details
(Keywords: testcase)
Attachments
(3 files)
1.59 KB,
text/html
|
Details | |
1.97 KB,
text/html
|
Details | |
5.60 KB,
patch
|
peterv
:
review+
peterv
:
superreview+
|
Details | Diff | Splinter Review |
From Bugzilla Helper: User-Agent: Mozilla/4.6 [en-gb]C-CCK-MCD NetscapeOnline.co.uk (Win98; I) BuildID: The innerHTML of an element appears to be 'sectioned' by Mozilla, so that every so often (roughly every 75 characters) a space is converted to a carriage-return + line-feed pair. This may make no difference in the rendering of the HTML, but in string processing one would normally expect the original text to be faithfully represented in the innerHTML. Reproducible: Always Steps to Reproduce: The attached testcase shows (eventually) some text, and the character codes of the innerHTML of that text. Expected Results: The innerHTML should reflect the original whitespace.
I've finally managed to isolate just what has been causing me problems here. Whilst Mozilla inially inserts CR+LF pairs in the original innerHTML, when that innerHTML is copied to another element, the CR characters are dropped, causing serious inconsistencies in the two strings. A second attached testcase illustrates this.
Assignee | ||
Updated•23 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 4•23 years ago
|
||
Future.
OS: Windows 98 → All
Hardware: PC → All
Target Milestone: --- → Future
Updated•23 years ago
|
Priority: -- → P5
Comment 5•22 years ago
|
||
this seems to work now. Val: could you check if this works for you in a recent build? I don't understand the test case very well. Marking WORKSFORME pending feedback.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
Hi Ian - looks like one of those badly written bug reports that should have been separated into two. First problem was that Mozilla was inserting CR+LF pairs in the innerHTML of an element, as described in the summary, and as shown in the first testcase. But the real problem I was having turned out to be that when the innerHTML was copied, the CR characters were then dropped, so the two strings were no longer identical, as shown in the second testcase. The second problem gave rise to serious implications for string handling, but seems to be resolved in that the CR characters are no longer dropped when copying the innerHTML, so the innerHTML and the copy of the innerHTML are now the same. So that looks like a WORKSFORME too (Mozilla 1.2b). But since CR+LF are still being inserted in the innerHTML, then strictly speaking the problem described in the summary still remains. I'm guessing it's required by the editor or some such, so I wouldn't be surprised if it turns out to be a WONTFIX. ... but I still think it's bad form to tamper with the innerHTML of an element ;-)
Comment 7•22 years ago
|
||
Ah, the first testcase. Ok, reopening. This does indeed seem wrong. I can't work out why we would do this.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 8•21 years ago
|
||
Mass-reassigning bugs to dom_bugs@netscape.com
Assignee: jst → dom_bugs
Status: REOPENED → NEW
Assignee | ||
Comment 9•20 years ago
|
||
This is a serializer bug. As far as I can tell, there is no way I can keep the HTML serializer from inserting these line breaks. Setting the "raw" output flag is ignored in AppendText unless I also set the "do formatting" flag! And if I do that, some other functions check the latter but not the former. That's why the newlines are inserted at all. As for why they're CRLF on windows, that's because the "use CR for newline" flag is NOT passed to the document encoder by GetInnetHTML. That should probably be fixed....
Assignee: general → dom-to-text
Component: DOM: Mozilla Extensions → DOM to Text Conversion
Comment 10•20 years ago
|
||
If the serializer is inserting linebreaks at 70-odd char intervals, that generally means someone is telling it to use line-wrapping mode. Where is the serializer being called in this code path? Maybe the caller just needs to stop setting nsIDocumentEncoder::OutputWrap.
Assignee | ||
Comment 11•20 years ago
|
||
(In reply to comment #10) > Where is the serializer being called in this code path? Maybe the caller just > needs to stop setting nsIDocumentEncoder::OutputWrap. The caller is at: http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsGenericHTMLElement.cpp#850 > If the serializer is inserting linebreaks at 70-odd char intervals, that > generally means someone is telling it to use line-wrapping mode. You must be thinking of the plaintext serializer, which is like sane and all. The HTML serializer is not at all like that... Analysis follows. In nsHTMLContentSerializer::AppendText we have the following code: 190 if (mPreLevel > 0) { 191 AppendToStringConvertLF(data, aStr); 192 } 193 else if (!mDoFormat) { 195 PRBool hasLongLines = HasLongLines(data, lastNewlineOffset); 196 if (hasLongLines) { 198 AppendToStringWrapped(data, aStr, PR_FALSE); 201 } 202 else { 203 AppendToStringConvertLF(data, aStr); 204 } 205 } 206 else if (mFlags & nsIDocumentEncoder::OutputRaw) { 208 AppendToString(data, aStr); 211 } 212 else { 213 AppendToStringWrapped(data, aStr, PR_FALSE); 214 } where I've left out some lines that aren't really relevant to the basic logic. Now let's see here: 1) mPreLevel > 0 tests false (since we're not inside a pre/style/script). 2) mDoFormat is false (since we did not pass in the nsIDocumentEncoder::OutputFormatted flag). So we follow the !mDoFormat path. HasLongLines() returns true (since we do), so we end up in AppendToStringWrapped. Further notes: A) nsHTMLContentSerializer never looks at the nsIDocumentEncoder::OutputWrap flag. B) The code has been this way at least since the noXIF landing C) The fact that you can't even get to the nsIDocumentEncoder::OutputRaw branch without setting mDoFormat to true(!) is a regression from bug 108024 Fixing note C and using the "output raw" flag for innerHTML _may_ work. I suspect it will flub the entity conversion we want (which it probably does anyway, since aTranslateEntities in AppendToString()'s arguments defaults to false, no caller ever passes it that I can tell, and the entity conversion flags are ignored if it's false).
Comment 12•20 years ago
|
||
Ugh.
I think nsIDocumentEncoder::OutputRaw is what should be used here -- and yes,
it's not well documented at all in nsIDocumentEncoder.
Needing to set mDoFormat in order to get raw output seems like a bug. Is that
filed as a regression anywhere? Should we handle it here?
If entity conversion changes as a result of using raw, that's not intentional,
or at least it wasn't the original intent. So if that's true (and it may be)
then that's also a bug and worth fixing at the same time.
> nsHTMLContentSerializer never looks at the nsIDocumentEncoder::OutputWrap
> flag.
I just noticed a few weeks ago that composer has no way to set the wrap column,
either (a related problem). I could swear that it used to check a pref called
editor.htmlWrapColumn, and in fact nsEditor::GetWrapWidth does, but nsHTMLEditor
never actually calls that method. We're all screwed up there (I've been meaning
to file a bug, and maybe fix it, or track down whether it used to work and
regressed). But as you point out, the serializer would have to be fixed before
the editor could do anything about it.
Composer has a problem with wrapping, anyway, because there's no way of marking
text nodes with a flag to indicate which are already wrapped/formatted and which
have just been added by composer and so need to be wrapped. That's what the
HasLongLines stuff is about, trying to guess what's already been formatted for
composer's "don't reformat existing html" case. It's all a mess, and that's
probably why none of the html serializer wrapping stuff ever got properly tested.
Enough history. I would suggest:
- change nsGenericHTMLElement.cpp to add nsIDocumentEncoder::OutputRaw
- Fix nsHTMLContentSerializer to not require Formatted in order to do raw
(can we just reorder those two clauses? It looks like the only user of
OutputRaw is nsGenericElement, and that should have the same needs as this bug
so it shouldn't break anything there)
- Fix the comment in nsIDocumentEncoder to state clearly that it means no
formatting and no wrapping, and should not change entity conversion
Then test, making sure entities and the nsGenericElement code both still work.
Meanwhile, the wrapping stuff is still broken, but that's not needed for this
bug and I (or someone) should file it as two separate bugs, one on the
serializer and one on the editor.
Assignee | ||
Comment 13•20 years ago
|
||
Akkana, that sounds like an excellent plan of action. Yes, I think we should just handle the outputraw issue (and just reordering sounds great to me) in this bug, as well as handling the entity conversion issue (if any). For general wrapping, a new bug is definitely the way to go. I have some thoughts on the problems that may need some discussion.... ;)
Assignee | ||
Comment 14•20 years ago
|
||
*** Bug 280579 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 15•20 years ago
|
||
I was wrong about entities. Those are handled fine, since we call AppendTextData (which handles them), _then_ start messing with the line-breaking. So this fixes the bug and doesn't break anything that I can see. akkana, if you have time to look at this that would be great, but I'm not requesting your review in case you don't....
Attachment #173053 -
Flags: superreview?(peterv)
Attachment #173053 -
Flags: review?(peterv)
Comment on attachment 173053 [details] [diff] [review] Patch per comment 12 >+ // XXXbz this doesn't seem to be used by all serializers... Right, because only the HTML serializer needed stuff like pretty printing and space preservation for Composer. Now, Nvu needs it in XML too so...
Assignee | ||
Comment 17•20 years ago
|
||
> >+ // XXXbz this doesn't seem to be used by all serializers...
>
> Right, because only the HTML serializer needed stuff like pretty printing
The option that comment is on is not used by the HTML serializer, as it happens.
It's used by the plaintext one only.
Hence the need for a comment. Someone needs to actually document this interface
(and perhaps decide whether we really need so many mutually exclusive options in
a bitfield, or whether we should be actually using bits with combinations of
bits meaning something).
Updated•20 years ago
|
Attachment #173053 -
Flags: superreview?(peterv)
Attachment #173053 -
Flags: superreview+
Attachment #173053 -
Flags: review?(peterv)
Attachment #173053 -
Flags: review+
Assignee | ||
Comment 18•20 years ago
|
||
Fixed. Akkana, do you want to file those followup bugs?
Assignee: dom-to-text → bzbarsky
Assignee | ||
Updated•20 years ago
|
Status: NEW → RESOLVED
Closed: 22 years ago → 20 years ago
Priority: P5 → P2
Resolution: --- → FIXED
Target Milestone: Future → mozilla1.8beta
Comment 19•20 years ago
|
||
Filed: bug 281407 on documenting the rest of the serializer flags (like those XXXbz comments). bug 281409 on setting wrap width in the editor. Did I miss any? I didn't file that second wrap-width bug I said should be filed on the serializer; I'm not clear at the moment on exactly what the bug should say, so that can be filed later or grouped into any work that goes into bug 281409. I won't be working on these right away since I don't think many people care, but at least they're in the system now. Thanks for the reminder, bz, and the work on this more important bug.
You need to log in
before you can comment on or make changes to this bug.
Description
•