89780 - Mozilla wrongly converts space to CR+LF in innerHTML.

Reporter

Description

•

23 years ago

From Bugzilla Helper:
User-Agent: Mozilla/4.6 [en-gb]C-CCK-MCD NetscapeOnline.co.uk  (Win98; I)
BuildID:    

The innerHTML of an element appears to be 'sectioned' by Mozilla, so that every 
so often (roughly every 75 characters) a space is converted to a carriage-return 
+ line-feed pair.

This may make no difference in the rendering of the HTML, but in string 
processing one would normally expect the original text to be faithfully 
represented in the innerHTML.



Reproducible: Always
Steps to Reproduce:
The attached testcase shows (eventually) some text, and the character codes of 
the innerHTML of that text.


Expected Results:  The innerHTML should reflect the original whitespace.

Val Sharp

Reporter

Comment 1

•

23 years ago

Attached file testcase - CR+LF replaces space in innerHTML — Details

Val Sharp

Reporter

Comment 2

•

23 years ago

I've finally managed to isolate just what has been causing me problems here.

Whilst Mozilla inially inserts CR+LF pairs in the original innerHTML, when that 
innerHTML is copied to another element, the CR characters are dropped, causing 
serious inconsistencies in the two strings.

A second attached testcase illustrates this.

Val Sharp

Reporter

Comment 3

•

23 years ago

Attached file testcase 2 - CR, of CR+LF pair, dropped on copying innerHTML — Details

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

23 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Dylan Schiemann

Updated

•

23 years ago

Keywords: testcase

Johnny Stenback (:jst)

Comment 4

•

23 years ago

Future.

OS: Windows 98 → All

Hardware: PC → All

Target Milestone: --- → Future

Fabian Guisset

Updated

•

23 years ago

Priority: -- → P5

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

22 years ago

Blocks: 117617

timeless

Updated

•

22 years ago

QA Contact: lchiang → ian

Hixie (not reading bugmail)

Comment 5

•

22 years ago

this seems to work now. Val: could you check if this works for you in a recent
build? I don't understand the test case very well.

Marking WORKSFORME pending feedback.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → WORKSFORME

Val Sharp

Reporter

Comment 6

•

22 years ago

Hi Ian - looks like one of those badly written bug reports that should have been
separated into two.

First problem was that Mozilla was inserting CR+LF pairs in the innerHTML of an
element, as described in the summary, and as shown in the first testcase.

But the real problem I was having turned out to be that when the innerHTML was
copied, the CR characters were then dropped, so the two strings were no longer
identical, as shown in the second testcase.

The second problem gave rise to serious implications for string handling, but
seems to be resolved in that the CR characters are no longer dropped when
copying the innerHTML, so the innerHTML and the copy of the innerHTML are now
the same. So that looks like a WORKSFORME too (Mozilla 1.2b).

But since CR+LF are still being inserted in the innerHTML, then strictly
speaking the problem described in the summary still remains. I'm guessing it's
required by the editor or some such, so I wouldn't be surprised if it turns out
to be a WONTFIX.

... but I still think it's bad form to tamper with the innerHTML of an element ;-)

Hixie (not reading bugmail)

Comment 7

•

22 years ago

Ah, the first testcase.

Ok, reopening. This does indeed seem wrong. I can't work out why we would do this.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Johnny Stenback (:jst)

Comment 8

•

21 years ago

Mass-reassigning bugs to dom_bugs@netscape.com

Assignee: jst → dom_bugs

Status: REOPENED → NEW

Boris Zbarsky [:bzbarsky]

Assignee

Comment 9

•

20 years ago

This is a serializer bug.  As far as I can tell, there is no way I can keep the
HTML serializer from inserting these line breaks.  Setting the "raw" output flag
is ignored in AppendText unless I also set the "do formatting" flag!  And if I
do that, some other functions check the latter but not the former.

That's why the newlines are inserted at all.  As for why they're CRLF on
windows, that's because the "use CR for newline" flag is NOT passed to the
document encoder by GetInnetHTML.  That should probably be fixed....

Assignee: general → dom-to-text

Component: DOM: Mozilla Extensions → DOM to Text Conversion

Akkana Peck

Comment 10

•

20 years ago

If the serializer is inserting linebreaks at 70-odd char intervals, that
generally means someone is telling it to use line-wrapping mode.

Where is the serializer being called in this code path?  Maybe the caller just
needs to stop setting nsIDocumentEncoder::OutputWrap.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 11

•

20 years ago

(In reply to comment #10)
> Where is the serializer being called in this code path?  Maybe the caller just
> needs to stop setting nsIDocumentEncoder::OutputWrap.

The caller is at:

http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsGenericHTMLElement.cpp#850

> If the serializer is inserting linebreaks at 70-odd char intervals, that
> generally means someone is telling it to use line-wrapping mode.

You must be thinking of the plaintext serializer, which is like sane and all. 
The HTML serializer is not at all like that...  Analysis follows.

In nsHTMLContentSerializer::AppendText we have the following code:

190   if (mPreLevel > 0) {
191     AppendToStringConvertLF(data, aStr);
192   }
193   else if (!mDoFormat) {
195     PRBool hasLongLines = HasLongLines(data, lastNewlineOffset);
196     if (hasLongLines) {
198       AppendToStringWrapped(data, aStr, PR_FALSE);
201     }
202     else {
203       AppendToStringConvertLF(data, aStr);
204     }
205   }
206   else if (mFlags & nsIDocumentEncoder::OutputRaw) {
208     AppendToString(data, aStr);
211   }
212   else {
213     AppendToStringWrapped(data, aStr, PR_FALSE);
214   }

where I've left out some lines that aren't really relevant to the basic logic. 
Now let's see here:

1)  mPreLevel > 0 tests false (since we're not inside a pre/style/script).
2)  mDoFormat is false (since we did not pass in the
    nsIDocumentEncoder::OutputFormatted flag).

So we follow the !mDoFormat path.  HasLongLines() returns true (since we do), so
we end up in AppendToStringWrapped.

Further notes:

A) nsHTMLContentSerializer never looks at the nsIDocumentEncoder::OutputWrap
   flag.
B) The code has been this way at least since the noXIF landing
C) The fact that you can't even get to the nsIDocumentEncoder::OutputRaw branch
   without setting mDoFormat to true(!) is a regression from bug 108024

Fixing note C and using the "output raw" flag for innerHTML _may_ work.  I
suspect it will flub the entity conversion we want (which it probably does
anyway, since aTranslateEntities in AppendToString()'s arguments defaults to
false, no caller ever passes it that I can tell, and the entity conversion flags
are ignored if it's false).

Akkana Peck

Comment 12

•

20 years ago

Ugh.
I think nsIDocumentEncoder::OutputRaw is what should be used here -- and yes,
it's not well documented at all in nsIDocumentEncoder.

Needing to set mDoFormat in order to get raw output seems like a bug.  Is that
filed as a regression anywhere?  Should we handle it here?

If entity conversion changes as a result of using raw, that's not intentional,
or at least it wasn't the original intent.  So if that's true (and it may be)
then that's also a bug and worth fixing at the same time.

> nsHTMLContentSerializer never looks at the nsIDocumentEncoder::OutputWrap 
> flag.

I just noticed a few weeks ago that composer has no way to set the wrap column,
either (a related problem).  I could swear that it used to check a pref called
editor.htmlWrapColumn, and in fact nsEditor::GetWrapWidth does, but nsHTMLEditor
never actually calls that method.  We're all screwed up there (I've been meaning
to file a bug, and maybe fix it, or track down whether it used to work and
regressed).  But as you point out, the serializer would have to be fixed before
the editor could do anything about it.

Composer has a problem with wrapping, anyway, because there's no way of marking
text nodes with a flag to indicate which are already wrapped/formatted and which
have just been added by composer and so need to be wrapped.  That's what the
HasLongLines stuff is about, trying to guess what's already been formatted for
composer's "don't reformat existing html" case.  It's all a mess, and that's
probably why none of the html serializer wrapping stuff ever got properly tested.

Enough history.  I would suggest:
- change nsGenericHTMLElement.cpp to add nsIDocumentEncoder::OutputRaw
- Fix nsHTMLContentSerializer to not require Formatted in order to do raw
  (can we just reorder those two clauses?  It looks like the only user of
OutputRaw is nsGenericElement, and that should have the same needs as this bug
so it shouldn't break anything there)
- Fix the comment in nsIDocumentEncoder to state clearly that it means no
formatting and no wrapping, and should not change entity conversion

Then test, making sure entities and the nsGenericElement code both still work.

Meanwhile, the wrapping stuff is still broken, but that's not needed for this
bug and I (or someone) should file it as two separate bugs, one on the
serializer and one on   the editor.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

20 years ago

Akkana, that sounds like an excellent plan of action.  Yes, I think we should
just handle the outputraw issue (and just reordering sounds great to me) in this
bug, as well as handling the entity conversion issue (if any).

For general wrapping, a new bug is definitely the way to go.  I have some
thoughts on the problems that may need some discussion.... ;)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 14

•

20 years ago

*** Bug 280579 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Assignee

Comment 15

•

20 years ago

Attached patch Patch per comment 12 — Details — Splinter Review

I was wrong about entities.  Those are handled fine, since we call
AppendTextData (which handles them), _then_ start messing with the
line-breaking.

So this fixes the bug and doesn't break anything that I can see.  akkana, if
you have time to look at this that would be great, but I'm not requesting your
review in case you don't....

Attachment #173053 - Flags: superreview?(peterv)

Attachment #173053 - Flags: review?(peterv)

Daniel Glazman (:glazou) (not active in Mozilla any more)

Comment 16

•

20 years ago

Comment on attachment 173053 [details] [diff] [review]
Patch per comment 12

>+    // XXXbz this doesn't seem to be used by all serializers...

Right, because only the HTML serializer needed stuff like pretty printing
and space preservation for Composer.
Now, Nvu needs it in XML too so...

Boris Zbarsky [:bzbarsky]

Assignee

Comment 17

•

20 years ago

> >+    // XXXbz this doesn't seem to be used by all serializers...
> 
> Right, because only the HTML serializer needed stuff like pretty printing

The option that comment is on is not used by the HTML serializer, as it happens.
 It's used by the plaintext one only.

Hence the need for a comment.  Someone needs to actually document this interface
(and perhaps decide whether we really need so many mutually exclusive options in
a bitfield, or whether we should be actually using bits with combinations of
bits meaning something).

Peter Van der Beken [:peterv]

Updated

•

20 years ago

Attachment #173053 - Flags: superreview?(peterv)

Attachment #173053 - Flags: superreview+

Attachment #173053 - Flags: review?(peterv)

Attachment #173053 - Flags: review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 18

•

20 years ago

Fixed.   Akkana, do you want to file those followup bugs?

Assignee: dom-to-text → bzbarsky

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

20 years ago

Status: NEW → RESOLVED

Closed: 22 years ago → 20 years ago

Priority: P5 → P2

Resolution: --- → FIXED

Target Milestone: Future → mozilla1.8beta

Akkana Peck

Comment 19

•

20 years ago

Filed:
bug 281407 on documenting the rest of the serializer flags (like those XXXbz
comments).
bug 281409 on setting wrap width in the editor.
Did I miss any?

I didn't file that second wrap-width bug I said should be filed on the
serializer; I'm not clear at the moment on exactly what the bug should say, so
that can be filed later or grouped into any work that goes into bug 281409.

I won't be working on these right away since I don't think many people care, but
at least they're in the system now.  Thanks for the reminder, bz, and the work
on this more important bug.

testcase - CR+LF replaces space in innerHTML 23 years ago Val Sharp 1.59 KB, text/html		Details
testcase 2 - CR, of CR+LF pair, dropped on copying innerHTML 23 years ago Val Sharp 1.97 KB, text/html		Details
Patch per comment 12 20 years ago Boris Zbarsky [:bzbarsky] 5.60 KB, patch	peterv : review+ peterv : superreview+	Details \| Diff \| Splinter Review