Closed Bug 107927 Opened 23 years ago Closed 21 years ago

Composer added <cr> to html code, leading to unwanted spaces

Categories

(Core :: Layout, defect, P2)

defect

Tracking

()

RESOLVED DUPLICATE of bug 2750
Future

People

(Reporter: cowwoc2020, Assigned: kmcclusk)

References

()

Details

(Whiteboard: [needs r/sr=][needs-work/testing][ADT3])

Attachments

(4 files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:0.9.5) Gecko/20011016
BuildID:    2001101616

Mozilla interprets a carriage return as a space.

When opening http://bbs.darktech.org/scp.html and saving it, Composer inserts
<cr> characters at the end of each line, leading to unwanted spaces in the title
(fading from one color to the next).

Composer shouldn't add <cr> characters.

Reproducible: Always
Steps to Reproduce:
1. Open http://bbs.darktech.org/scp.html in composer
2. Save file to disk using composer
3. Hit saved (local) file using Mozilla browser and you will notice unwanted
spaces in the title

Actual Results:  Unwanted spaces appear all over the file (most noticably in title)

Expected Results:  Don't insert <cr> characters, thereby avoiding unwanted spaces

If http://bbs.darktech.org/scp.html is down, try again later. It's up most of
the time..
Moving to All. not an Os/2 only problem.

Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: OS/2 → All
Hardware: PC → All
Feels like a data loss bug, we should not be causing change merely by opening up
a file and saving it immediately. Joe, is this worth EDITORBASE? Do we make some
transformation on the content (the DOM) when we load a file into composer?
Assignee: syd → jfrancis
Component: Editor: Composer → Editor: Core
This is not actually the editor, perse.  It is the serializer, after you are
done editing and when the doc is written out (saved).  Over to serializer...
Assignee: jfrancis → harishd
Component: Editor: Core → DOM to Text Conversion
--> peterv
Assignee: harishd → peterv
Priority: -- → P3
Target Milestone: --- → mozilla0.9.9
This bug also introduces unwanted spaces in link text.  For example, if there are
two adjacent images that are used as links:

[A HREF="foo.html"][IMG SRC="foo.jpg"][/A]
[A HREF="bar.html"][IMG SRC="bar.jpg"][/A]

The images are displayed next to each other with a space inbetween them due to
the carriage return after the first line, but this is acceptable since the space
isn't really visible.  However, when Composer "pretty prints" the HTML, it
reformats the HTML and adds a carriage return like so:

[A HREF="foo.html"][IMG SRC="foo.jpg"]
[/A][A HREF="bar.html"][IMG SRC="bar.jpg"]
[/A]

The carriage return at the end of the line is interpretted as a space and
because it is part of the link, it becomes and underlined gap between the
images.  Oops.  Not cool.

This could be a potential bug for any tag that displays text between the start
and end delimiter.  The pretty print feature should be smart enough determine
which tags it is okay to add a carriage return after and which it is not.
Bug seems to be resolved with the same patch that I attached for Bug#85184. 
Second part of the bug which 'Brahm' mentioned above, looks different from the 
original bug. 
Removing "(aName == nsHTMLAtoms::img)" from 
nsHTMLContentSerializer::LineBreakAfterOpen() [line# 830] solves the second 
part of the bug but is there any specification based on which we add a line 
break after these tags?

In general, I feel that we should not add the line breaks just before/after the 
tag unless we reach to the next white space or new line, as these added 
characters are interpreted as 'space' by HTML. Though when added with an 
existing space/new line character, there is no effect in the original HTML  
looks.
--> myself.
Assignee: peterv → tmutreja
Instead of removing the "img" tag from the list of tags that need line break
after the opening tag, patch modifies the code in the following manner:
LineBreakAfterOpen() will still return TRUE for "img" but instead of actually
adding a linebreak based on the true value of LineBreakAfterOpen(), we are
setting a flag "isLineBreakDue" to be True. This flag indicates that a line
break is due and we need to add it as soon as we get a right place. Right place
here is the immediate occurrence of a white space or a new line character. 
So we move ahead to deal with coming tag(s) and try to find the right place and
only then we add a line break.
Considering the fact that chances of not getting a white space/ new line for
quite a long (good enough to make the view source real ugly) are really rare, I
feel that we should break the things at right place to retain the correctness
rather than a pretty view source.

Additionally, when combined this patch with the one for 85184 and 103207, hope
we would be having a nice enough view source too.
> PRBool isLineBreakDue = PR_FALSE;

This should be a new member variable, PRPackedBool mIsLineBreakDue in
nsHTMLContentSerializer.h.  What if two files are being serialized at once, and
one sets this flag and then the other checks it?

I also notice that somehow a PRInt32 got inserted in between the list of
PRPackedBools in the .h file (not your fault, it happened some time earlier). 
PRPackedBools should all be declared together so that the size of the object
ends up being smaller.  Could you add your new flag before or after the other
PRPackedBool members in the .h file, and move mPreLevel to either before or
after the list of packed bools?  Thanks!

>    spaceIndex = tempdata.FindChar(PRUnichar(' '), 0);
>    newLineIndex = tempdata.FindChar(PRUnichar('\n'), 0);
    
nsString has FindCharInSet, which would look for more than one character at the
same time, which would be a little more efficient (you would only have to search
as far as the first of the two characters).

At line 178 it looks like you're assuming that PR_MIN will do the right thing if
they're both kNotFound.  Could that cause problems?  (Though that logic will
probably go away if you use FindCharInSet, so it may not matter.)

I don't understand why you're only doing this for img tags.  The new logic seems
like the right thing to do -- shouldn't we be doing it everywhere that we
consider inserting a line break?  In fact, shouldn't it be setting
isLineBreakDue everywhere that it's currently calling AppendLineBreak?
Patch still deals with "img" tag only. I think we can extend the same logic in
following cases:
1. LineBreakAfterOpen : link , map and area tags.
2. LineBreakBeforeOpen : script, link, style and select tags.
3. LineBreakAfterClose: td, p, select, map and div tags.

Once we have consensus about it, I'll incorporate those changes too.
Comment on attachment 67569 [details] [diff] [review]
Incorporating Akkana's comments...

r=akkana pending replacing the special img tag code with something more
general.  I think we should always do the same thing if LineBreakAfterOpen, etc
...  no need for special-casing for certain tags.
Attachment #67569 - Flags: review+
*** Bug 90257 has been marked as a duplicate of this bug. ***
Keywords: patch
Whiteboard: [needs r/sr=]
Comment on attachment 68320 [details] [diff] [review]
Incorporating Akkana's suggestions for applying the same logic to all tags.

>+  if (mIsLineBreakDue) {
>+    PRInt32 indx = 0;
>+    PRInt32 length = 0;
>+    indx = tempdata.FindCharInSet(" \t\n\r", 0);
>+
>+    if (indx != kNotFound) {
>+        length = tempdata.Length();
>+        tempdata.Left(data, indx);
>+        AppendToString(data, aStr);
>+        AppendLineBreak(aStr);
>+        tempdata.Right(data, length - (indx + 1));
>+    }
>+    else
>+        data.Append(tempdata);
>+  }
>+  else
>+      data.Append(tempdata);
>+

Be consistent with the indentation here, use 2 spaces, not 4.

sr=jst if you fix that.
Attachment #68320 - Flags: superreview+
Attachment #68320 - Flags: review+
Ship it!
Comment on attachment 69450 [details] [diff] [review]
fixed indentation...

I like what you're doing in the code here ... but unfortunately when I try it
on a real page, it's dropping a lot of linebreaks that it should be putting
out.  All the linebreak after close tags are being dropped, possibly others as
well.

To see the problem, try visiting a fairly simple page like
http://shallowsky.com/index.html in the editor, and click on the source
formatting tab; try both with and without the patch, and note the difference.
Attachment #69450 - Flags: needs-work+
nsbeta1+, let's get this in.
Keywords: nsbeta1+
Whiteboard: [needs r/sr=] → [needs r/sr=][needs-work/testing]
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.0
*** Bug 128695 has been marked as a duplicate of this bug. ***
Akkana, it looks tough to find a solution which may resolve this bug without 
ruining nice view source and indentation. 

Basically the bug looks like in Parser. Isn't that the parser is supposed to 
ignore CRLF immediately after an opening tag and before a closing tag. 
 
Only thing I can think to make the patch "a bit" better is "not" to apply the 
patch for all tags e.g. script, style, select, table, tbody may have line break 
after opening tag. Though it would append a line break(replaced to a space 
later) yet the chances of getting a wrong view is quite less in fact nil for 
some tags like script and style. 
Marking it as depending on 15738. I still feel that if we can follow the HTML 
specification http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks 
that 
"The following two HTML examples must be rendered identically:

<P>Thomas is watching TV.</P>

<P>
Thomas is watching TV.
</P>", 

This bug will automatically get resolved with no side effects on "View Source" 
or "Indentation", that means all the new lines that we add here before/after 
tags (opening/closing) essentially those are going to get neglected. 
Depends on: 15378
ADT3 per ADT triage.
Whiteboard: [needs r/sr=][needs-work/testing] → [needs r/sr=][needs-work/testing][ADT3]
Tanu is right.  According to the W3C spec, we shouldn't be showing those line
breaks immediately after open tags or immediately before close tags as spaces.

I don't think this is actually a parser problem; the parser needs to preserve
that information, so that we'll have it at output time if we're trying to
preserve the original source formatting of the document.  So I think it has to
be up to layout to follow that W3C specification.

Cc'ing Kin and Joe for layout advice, and Simon because he might have useful advice.
I've asked around on #mozilla and nobody is sure whether we're deliberately
ignoring the W3C spec on newlines before/after tags or not.

Cc Kevin, the expert on what our intent is for layout.  Kevin, are we doing this
deliberately, or is it a bug that we might fix?

Also cc Beth, who knows everything about W3C specs and may have been privy to
earlier discussions regarding our handling of this part of the spec.
Timeless thinks there's already a bug on this, suggests choess may know about it.
Sounds like bug 2750.
Yes Akkana is right, Tanu is right, Hixie is right, even rickg was right.  It's a 
bug, but in order to preserve round-tripping of the source, we have to fix it in 
layout rather than parser.
Depends on: 2750
Moving to next milestone-> mozilla1.0.1
Target Milestone: mozilla1.0 → mozilla1.0.1
I've written up a little guide to whitespace handling and how it (supposedly)
interacts among the different standards and specifications we follow; see
<URL:http://www.stwing.upenn.edu/~choess/mozilla/whitespace.html>.
Component->layout, reassigned->Nisheeth.
Assignee: tmutreja → nisheeth
Status: ASSIGNED → NEW
Component: DOM to Text Conversion → Layout
-> Layout team's manager, Kevin Mccluskey.
Assignee: nisheeth → kmcclusk
clearing nsbeta1+, nominating for EDITORBASE. We need to discuss this in the
EDITORBASE triage.
Keywords: nsbeta1+nsbeta1
Whiteboard: [needs r/sr=][needs-work/testing][ADT3] → [needs r/sr=][needs-work/testing][ADT3][EDITORBASE]
Minusing. EDITORBASE triage
Keywords: nsbeta1nsbeta1-
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE] → [needs r/sr=][needs-work/testing][ADT3][EDITORBASE-]
In Mozilla 1.1a+
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1a+) Gecko/20020707,

it still happens in my past 1.01 composer, but only in the <style> tag in the
<head> and the <body> looks fine except for the <script> tag where it happens too.
I don't know if this is the same bug. But empty line spaces are added when
editing and saving a document in the <HTML>Source tab.

Editing and saving in the Normal tab preserves the document. But once you start
editing from the Source tab, everytime you switch to Normal or save the document
an empty line will be added in the head and on different parts of the document.
That's bug 145196 and you don't need to use the HTML source view, only reload
and save again.
Target Milestone: mozilla1.0.1 → ---
Priority: P3 → P2
Target Milestone: --- → Future
editorbase?
QA Contact: sujay → beppe
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE-] → [needs r/sr=][needs-work/testing][ADT3][EDITORBASE?]
removing editorbase
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE?] → [needs r/sr=][needs-work/testing][ADT3]
Could someone attach a testcase for this "layout bug"?  We should most certainly
not be ignoring linefeeds in cases like:

<a href="whatever">
text
</a>

The problem, most likely, is that the HTML spec can't make up its little mind on
whitespace handling, and all it says is pretty much incompatible with CSS.  So
being a CSS browser, we go with the CSS way....
From 9.1 of the spec:

In order to avoid problems with SGML line break rules and inconsistencies among
extant implementations, authors should not rely on user agents to render white
space immediately after a start tag or immediately before an end tag. Thus,
authors, and in particular authoring tools, should write:

  <P>We offer free <A>technical support</A> for subscribers.</P>

and not:

  <P>We offer free<A> technical support </A>for subscribers.</P>

The SGML part refers to Appendix B of the spec, which says:

SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately
following a start tag must be ignored, as must a line break immediately before
an end tag. This applies to all HTML elements without exception.

The following two HTML examples must be rendered identically:

<P>Thomas is watching TV.</P>

<P>
Thomas is watching TV.
</P>

So must the following two examples:

<A>My favorite Website</A>

<A>
My favorite Website
</A>


Seems pretty clear to me.
> authors should not rely

All other browsers render said whitespace, and authors do rely on it.  Life is
tough.  Furthermore, what happens when white-space:pre is set?

The point is, what the HTML spec says is not what any actual tag-soup UA does. A
and since we are a CSS browser at heart, what the CSS spec says about layout
supercedes what the HTML spec says.
CSS white-space properties do not bear on this issue. "white-space: pre"
operates on whatever whitespace exists in the "document tree" after parsing. A
proper SGML parse will remove these spaces before the document tree is created.
If bug 2750 is deemed WONTFIX, so be it, but there is no intrinsic conflict
between the requirements of HTML/SGML and CSS here.
See comment 28.  The argument being made is that the parser has to keep the
whitespace "to preserve round-tripping" on parse and editor wants to produce the
whitespace on output to make the source look pretty and that we should hack
layout to somehow deal.  But at that point the whitespace is in the DOM and CSS
applies all the way.
bz: can you point me at the conflicting part of css?  Are you just referring to
the pre-formatted case? 
For whitespace?  Not only that, no.  As far as CSS is concerned all whitespace
between inline boxes is the same and should be rendered as whitespace even when
white-space is not pre (but collapsed to a single whitespace, if white-space is
not pre).
Ok, so if people are arguing that the HTML spec has anything to say here, then
this is just a dup of bug 2750.  Note that we already skip the first newline
inside some tags (eg <pre>).  See
http://lxr.mozilla.org/seamonkey/source/htmlparser/src/CNavDTD.cpp#1077.  It
should not be difficult to adapt that code to skip the first newline in any HTML
tag, if desired.  Probably regress some pages, though.  Doing something with the
last newline is much harder.

*** This bug has been marked as a duplicate of 2750 ***
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: