Closed
Bug 107927
Opened 22 years ago
Closed 20 years ago
Composer added <cr> to html code, leading to unwanted spaces
Categories
(Core :: Layout, defect, P2)
Core
Layout
Tracking
()
Future
People
(Reporter: cowwoc2020, Assigned: kmcclusk)
References
()
Details
(Whiteboard: [needs r/sr=][needs-work/testing][ADT3])
Attachments
(4 files)
5.16 KB,
patch
|
Details | Diff | Splinter Review | |
5.23 KB,
patch
|
akkzilla
:
review+
|
Details | Diff | Splinter Review |
5.33 KB,
patch
|
jst
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
5.31 KB,
patch
|
Details | Diff | Splinter Review |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:0.9.5) Gecko/20011016 BuildID: 2001101616 Mozilla interprets a carriage return as a space. When opening http://bbs.darktech.org/scp.html and saving it, Composer inserts <cr> characters at the end of each line, leading to unwanted spaces in the title (fading from one color to the next). Composer shouldn't add <cr> characters. Reproducible: Always Steps to Reproduce: 1. Open http://bbs.darktech.org/scp.html in composer 2. Save file to disk using composer 3. Hit saved (local) file using Mozilla browser and you will notice unwanted spaces in the title Actual Results: Unwanted spaces appear all over the file (most noticably in title) Expected Results: Don't insert <cr> characters, thereby avoiding unwanted spaces If http://bbs.darktech.org/scp.html is down, try again later. It's up most of the time..
Comment 1•22 years ago
|
||
Moving to All. not an Os/2 only problem.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: OS/2 → All
Hardware: PC → All
Feels like a data loss bug, we should not be causing change merely by opening up a file and saving it immediately. Joe, is this worth EDITORBASE? Do we make some transformation on the content (the DOM) when we load a file into composer?
Assignee: syd → jfrancis
Component: Editor: Composer → Editor: Core
Comment 3•22 years ago
|
||
This is not actually the editor, perse. It is the serializer, after you are done editing and when the doc is written out (saved). Over to serializer...
Assignee: jfrancis → harishd
Component: Editor: Core → DOM to Text Conversion
Updated•22 years ago
|
Priority: -- → P3
Target Milestone: --- → mozilla0.9.9
Comment 5•22 years ago
|
||
This bug also introduces unwanted spaces in link text. For example, if there are two adjacent images that are used as links: [A HREF="foo.html"][IMG SRC="foo.jpg"][/A] [A HREF="bar.html"][IMG SRC="bar.jpg"][/A] The images are displayed next to each other with a space inbetween them due to the carriage return after the first line, but this is acceptable since the space isn't really visible. However, when Composer "pretty prints" the HTML, it reformats the HTML and adds a carriage return like so: [A HREF="foo.html"][IMG SRC="foo.jpg"] [/A][A HREF="bar.html"][IMG SRC="bar.jpg"] [/A] The carriage return at the end of the line is interpretted as a space and because it is part of the link, it becomes and underlined gap between the images. Oops. Not cool. This could be a potential bug for any tag that displays text between the start and end delimiter. The pretty print feature should be smart enough determine which tags it is okay to add a carriage return after and which it is not.
Comment 6•22 years ago
|
||
Bug seems to be resolved with the same patch that I attached for Bug#85184. Second part of the bug which 'Brahm' mentioned above, looks different from the original bug.
Comment 7•22 years ago
|
||
Removing "(aName == nsHTMLAtoms::img)" from nsHTMLContentSerializer::LineBreakAfterOpen() [line# 830] solves the second part of the bug but is there any specification based on which we add a line break after these tags? In general, I feel that we should not add the line breaks just before/after the tag unless we reach to the next white space or new line, as these added characters are interpreted as 'space' by HTML. Though when added with an existing space/new line character, there is no effect in the original HTML looks.
Comment 9•22 years ago
|
||
Instead of removing the "img" tag from the list of tags that need line break after the opening tag, patch modifies the code in the following manner: LineBreakAfterOpen() will still return TRUE for "img" but instead of actually adding a linebreak based on the true value of LineBreakAfterOpen(), we are setting a flag "isLineBreakDue" to be True. This flag indicates that a line break is due and we need to add it as soon as we get a right place. Right place here is the immediate occurrence of a white space or a new line character. So we move ahead to deal with coming tag(s) and try to find the right place and only then we add a line break. Considering the fact that chances of not getting a white space/ new line for quite a long (good enough to make the view source real ugly) are really rare, I feel that we should break the things at right place to retain the correctness rather than a pretty view source. Additionally, when combined this patch with the one for 85184 and 103207, hope we would be having a nice enough view source too.
Comment 10•22 years ago
|
||
> PRBool isLineBreakDue = PR_FALSE; This should be a new member variable, PRPackedBool mIsLineBreakDue in nsHTMLContentSerializer.h. What if two files are being serialized at once, and one sets this flag and then the other checks it? I also notice that somehow a PRInt32 got inserted in between the list of PRPackedBools in the .h file (not your fault, it happened some time earlier). PRPackedBools should all be declared together so that the size of the object ends up being smaller. Could you add your new flag before or after the other PRPackedBool members in the .h file, and move mPreLevel to either before or after the list of packed bools? Thanks! > spaceIndex = tempdata.FindChar(PRUnichar(' '), 0); > newLineIndex = tempdata.FindChar(PRUnichar('\n'), 0); nsString has FindCharInSet, which would look for more than one character at the same time, which would be a little more efficient (you would only have to search as far as the first of the two characters). At line 178 it looks like you're assuming that PR_MIN will do the right thing if they're both kNotFound. Could that cause problems? (Though that logic will probably go away if you use FindCharInSet, so it may not matter.) I don't understand why you're only doing this for img tags. The new logic seems like the right thing to do -- shouldn't we be doing it everywhere that we consider inserting a line break? In fact, shouldn't it be setting isLineBreakDue everywhere that it's currently calling AppendLineBreak?
Comment 11•22 years ago
|
||
Patch still deals with "img" tag only. I think we can extend the same logic in following cases: 1. LineBreakAfterOpen : link , map and area tags. 2. LineBreakBeforeOpen : script, link, style and select tags. 3. LineBreakAfterClose: td, p, select, map and div tags. Once we have consensus about it, I'll incorporate those changes too.
Comment 12•22 years ago
|
||
Comment on attachment 67569 [details] [diff] [review] Incorporating Akkana's comments... r=akkana pending replacing the special img tag code with something more general. I think we should always do the same thing if LineBreakAfterOpen, etc ... no need for special-casing for certain tags.
Attachment #67569 -
Flags: review+
Comment 13•22 years ago
|
||
This fixes bug#90257 too.
Comment 14•22 years ago
|
||
*** Bug 90257 has been marked as a duplicate of this bug. ***
Comment 15•22 years ago
|
||
Comment on attachment 68320 [details] [diff] [review] Incorporating Akkana's suggestions for applying the same logic to all tags. >+ if (mIsLineBreakDue) { >+ PRInt32 indx = 0; >+ PRInt32 length = 0; >+ indx = tempdata.FindCharInSet(" \t\n\r", 0); >+ >+ if (indx != kNotFound) { >+ length = tempdata.Length(); >+ tempdata.Left(data, indx); >+ AppendToString(data, aStr); >+ AppendLineBreak(aStr); >+ tempdata.Right(data, length - (indx + 1)); >+ } >+ else >+ data.Append(tempdata); >+ } >+ else >+ data.Append(tempdata); >+ Be consistent with the indentation here, use 2 spaces, not 4. sr=jst if you fix that.
Attachment #68320 -
Flags: superreview+
Attachment #68320 -
Flags: review+
Comment 16•22 years ago
|
||
Comment 17•22 years ago
|
||
Ship it!
Comment 18•22 years ago
|
||
Comment on attachment 69450 [details] [diff] [review] fixed indentation... I like what you're doing in the code here ... but unfortunately when I try it on a real page, it's dropping a lot of linebreaks that it should be putting out. All the linebreak after close tags are being dropped, possibly others as well. To see the problem, try visiting a fairly simple page like http://shallowsky.com/index.html in the editor, and click on the source formatting tab; try both with and without the patch, and note the difference.
Attachment #69450 -
Flags: needs-work+
Updated•22 years ago
|
Whiteboard: [needs r/sr=] → [needs r/sr=][needs-work/testing]
Updated•22 years ago
|
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.9 → mozilla1.0
Comment 20•22 years ago
|
||
*** Bug 128695 has been marked as a duplicate of this bug. ***
Comment 21•22 years ago
|
||
Akkana, it looks tough to find a solution which may resolve this bug without ruining nice view source and indentation. Basically the bug looks like in Parser. Isn't that the parser is supposed to ignore CRLF immediately after an opening tag and before a closing tag. Only thing I can think to make the patch "a bit" better is "not" to apply the patch for all tags e.g. script, style, select, table, tbody may have line break after opening tag. Though it would append a line break(replaced to a space later) yet the chances of getting a wrong view is quite less in fact nil for some tags like script and style.
Comment 22•22 years ago
|
||
Marking it as depending on 15738. I still feel that if we can follow the HTML specification http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks that "The following two HTML examples must be rendered identically: <P>Thomas is watching TV.</P> <P> Thomas is watching TV. </P>", This bug will automatically get resolved with no side effects on "View Source" or "Indentation", that means all the new lines that we add here before/after tags (opening/closing) essentially those are going to get neglected.
Depends on: 15378
Comment 23•22 years ago
|
||
ADT3 per ADT triage.
Whiteboard: [needs r/sr=][needs-work/testing] → [needs r/sr=][needs-work/testing][ADT3]
Comment 24•22 years ago
|
||
Tanu is right. According to the W3C spec, we shouldn't be showing those line breaks immediately after open tags or immediately before close tags as spaces. I don't think this is actually a parser problem; the parser needs to preserve that information, so that we'll have it at output time if we're trying to preserve the original source formatting of the document. So I think it has to be up to layout to follow that W3C specification. Cc'ing Kin and Joe for layout advice, and Simon because he might have useful advice.
Comment 25•22 years ago
|
||
I've asked around on #mozilla and nobody is sure whether we're deliberately ignoring the W3C spec on newlines before/after tags or not. Cc Kevin, the expert on what our intent is for layout. Kevin, are we doing this deliberately, or is it a bug that we might fix? Also cc Beth, who knows everything about W3C specs and may have been privy to earlier discussions regarding our handling of this part of the spec.
Comment 26•22 years ago
|
||
Timeless thinks there's already a bug on this, suggests choess may know about it.
Comment 27•22 years ago
|
||
Sounds like bug 2750.
Comment 28•22 years ago
|
||
Yes Akkana is right, Tanu is right, Hixie is right, even rickg was right. It's a bug, but in order to preserve round-tripping of the source, we have to fix it in layout rather than parser.
Comment 29•22 years ago
|
||
Moving to next milestone-> mozilla1.0.1
Target Milestone: mozilla1.0 → mozilla1.0.1
Comment 30•22 years ago
|
||
I've written up a little guide to whitespace handling and how it (supposedly) interacts among the different standards and specifications we follow; see <URL:http://www.stwing.upenn.edu/~choess/mozilla/whitespace.html>.
Comment 31•22 years ago
|
||
Component->layout, reassigned->Nisheeth.
Assignee: tmutreja → nisheeth
Status: ASSIGNED → NEW
Component: DOM to Text Conversion → Layout
Assignee | ||
Comment 33•22 years ago
|
||
clearing nsbeta1+, nominating for EDITORBASE. We need to discuss this in the EDITORBASE triage.
Comment 34•22 years ago
|
||
Minusing. EDITORBASE triage
Comment 35•21 years ago
|
||
In Mozilla 1.1a+ Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1a+) Gecko/20020707, it still happens in my past 1.01 composer, but only in the <style> tag in the <head> and the <body> looks fine except for the <script> tag where it happens too.
Comment 36•21 years ago
|
||
I don't know if this is the same bug. But empty line spaces are added when editing and saving a document in the <HTML>Source tab. Editing and saving in the Normal tab preserves the document. But once you start editing from the Source tab, everytime you switch to Normal or save the document an empty line will be added in the head and on different parts of the document.
Comment 37•21 years ago
|
||
That's bug 145196 and you don't need to use the HTML source view, only reload and save again.
Assignee | ||
Updated•21 years ago
|
Priority: P3 → P2
Target Milestone: --- → Future
Comment 38•21 years ago
|
||
editorbase?
QA Contact: sujay → beppe
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE-] → [needs r/sr=][needs-work/testing][ADT3][EDITORBASE?]
Comment 39•21 years ago
|
||
removing editorbase
Whiteboard: [needs r/sr=][needs-work/testing][ADT3][EDITORBASE?] → [needs r/sr=][needs-work/testing][ADT3]
![]() |
||
Comment 40•20 years ago
|
||
Could someone attach a testcase for this "layout bug"? We should most certainly not be ignoring linefeeds in cases like: <a href="whatever"> text </a> The problem, most likely, is that the HTML spec can't make up its little mind on whitespace handling, and all it says is pretty much incompatible with CSS. So being a CSS browser, we go with the CSS way....
Comment 41•20 years ago
|
||
From 9.1 of the spec: In order to avoid problems with SGML line break rules and inconsistencies among extant implementations, authors should not rely on user agents to render white space immediately after a start tag or immediately before an end tag. Thus, authors, and in particular authoring tools, should write: <P>We offer free <A>technical support</A> for subscribers.</P> and not: <P>We offer free<A> technical support </A>for subscribers.</P> The SGML part refers to Appendix B of the spec, which says: SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception. The following two HTML examples must be rendered identically: <P>Thomas is watching TV.</P> <P> Thomas is watching TV. </P> So must the following two examples: <A>My favorite Website</A> <A> My favorite Website </A> Seems pretty clear to me.
![]() |
||
Comment 42•20 years ago
|
||
> authors should not rely
All other browsers render said whitespace, and authors do rely on it. Life is
tough. Furthermore, what happens when white-space:pre is set?
The point is, what the HTML spec says is not what any actual tag-soup UA does. A
and since we are a CSS browser at heart, what the CSS spec says about layout
supercedes what the HTML spec says.
Comment 43•20 years ago
|
||
CSS white-space properties do not bear on this issue. "white-space: pre" operates on whatever whitespace exists in the "document tree" after parsing. A proper SGML parse will remove these spaces before the document tree is created. If bug 2750 is deemed WONTFIX, so be it, but there is no intrinsic conflict between the requirements of HTML/SGML and CSS here.
![]() |
||
Comment 44•20 years ago
|
||
See comment 28. The argument being made is that the parser has to keep the whitespace "to preserve round-tripping" on parse and editor wants to produce the whitespace on output to make the source look pretty and that we should hack layout to somehow deal. But at that point the whitespace is in the DOM and CSS applies all the way.
Comment 45•20 years ago
|
||
bz: can you point me at the conflicting part of css? Are you just referring to the pre-formatted case?
![]() |
||
Comment 46•20 years ago
|
||
For whitespace? Not only that, no. As far as CSS is concerned all whitespace between inline boxes is the same and should be rendered as whitespace even when white-space is not pre (but collapsed to a single whitespace, if white-space is not pre).
![]() |
||
Comment 47•20 years ago
|
||
Ok, so if people are arguing that the HTML spec has anything to say here, then this is just a dup of bug 2750. Note that we already skip the first newline inside some tags (eg <pre>). See http://lxr.mozilla.org/seamonkey/source/htmlparser/src/CNavDTD.cpp#1077. It should not be difficult to adapt that code to skip the first newline in any HTML tag, if desired. Probably regress some pages, though. Doing something with the last newline is much harder. *** This bug has been marked as a duplicate of 2750 ***
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•