Closed Bug 564737 Opened 14 years ago Closed 14 years ago

[regression] First line indented by one character (extra space) for messages sent as plain text and composed in HTML mode

Categories

(Core :: DOM: Serializers, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla2.0b7
Tracking Status
blocking1.9.2 --- -
status1.9.2 --- .11-fixed

People

(Reporter: fehe, Assigned: jik)

References

()

Details

(Keywords: regression, Whiteboard: [tb31needed][tb32wanted][gs][workaround comment 93][read comment 151])

Attachments

(5 files, 8 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a5pre) Gecko/20100509

When sending a plain text message, Lanikai and Shredder indent the first line by one character.  Does not happen with HTML messages.

This is obviously a regression, but I can't search for it now.


Reproducible: Always

Steps to Reproduce:
1. Compose a plain text message, making sure you use the first line and there's no indentation
2. Send the message
3. Check the message in your "Sent" folder.  Notice the indentation.
Version: unspecified → Trunk
This only seems to happen when composing in HTML mode without any formatting applied, then it is down-converted to plain text for sending. This includes a blank space as the first character even though it wasn't typed.

Adjusted steps to reproduce:
1. Compose a message in HTML mode, do not apply any formatting.
2. Make sure Options > Format is set to Auto-Detect.
3. Send message, it will be down-converted to plain text.
4. Plain text includes an inserted space in the first line.

Thus, either an Editor (as filed) or a DOM/Serializer issue.

Confirmed on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a5pre) Gecko/20100504 SeaMonkey/2.1a1pre; first observed in incoming messages with UA-string Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.2pre) Gecko/20100206 Lanikai/3.1b1pre.

I'm tentatively confirming this as I can't find any duplicate either.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Sent plain text messages have the first line indented by one character → Sent plain text messages (composed in HTML mode) have the first line indented by one character
The following is the best I can do for a regression range (Apr 19 to May 8), because there are no usable Windows builds from April 20 to May 8, inclusive:

Works with: http://hg.mozilla.org/mozilla-central/rev/11f68e169ed4
Fails with: http://hg.mozilla.org/mozilla-central/rev/baf06abf0af4

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=11f68e169ed4&tochange=baf06abf0af4

Someone with Linux or Mac could probably narrow it down further.
Those are 2009 dates.
I wonder if this could be a regression from one of the following?: bug 423233, bug 488799, and bug 490747

Someone with Linux or Mac needs to confirm.
Per comment #1, Mac is also affected. I've got the earliest mail showing the issue with a December 2009 Lanikai build, didn't get any earlier ones though
as apparently people were still using the TB 3.0pre nightly builds back then.

This would also reconfirm a Core problem (3.1 is affected whereas 3.0 isn't).
OS: Windows XP → All
Hardware: x86 → All
Summary: Sent plain text messages (composed in HTML mode) have the first line indented by one character → Sent plain text messages (composed in HTML mode) have the first line indented by one character (extra space)
Gary can you hunt that one ?
Maybe the revision history of the plain-text serializer helps to narrow down - http://hg.mozilla.org/mozilla-central/log/067e44b89712/content/base/src/nsPlainTextSerializer.cpp
(In reply to comment #2)
> The following is the best I can do for a regression range (Apr 19 to May 8),
...except that there was no check-in for it during that time frame. :-\
These links might help:

http://hg.mozilla.org/mozilla-central/rev/f6d27eb05ec1

https://bugzilla.mozilla.org/show_bug.cgi?id=422403
https://bugzilla.mozilla.org/show_bug.cgi?id=524975
https://bugzilla.mozilla.org/show_bug.cgi?id=549295

when we are serializing the text node in
nsXHTMLContentSerializer::AppendText(), mDoFormat is true, so we call
AppendToStringFormatedWrapped() which inserts the leading whitespace.

Laurent, any suggestions?
@seth : no suggestion for the moment, I don't see what is wrong. In fact, I would like to have a test case which call the document encoder with an example of a document, because I don't know if I really understood the issue. However, if the origin of the issue is mDoFormat, the question could be: why mDoFormat is true? Are nsIDocumentEncoder flags set correctly?

>it will be down-converted to plain text.

What does it mean? You say that the HTML serializer is called a first time, and then generated html elements are removed from the resulting string? Why do not call directly the plain text serializer? Where is the code which do this convertion in comm-central ?
Attached file Minimal test case
I can only provide the result after sending, but this message (forced to have both plain-text and HTML parts for comparison) contains two lines/paragraphs without any formatting applied and composed in HTML. As can be seen, the first line is indented in the plain-text part whereas it is not in the HTML version (displayed as typed).
Ok, it appears to be caused by the indentation of the first line after <body>. Modified HTML code of the test case:

<html>
<head>

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
First line<br>
Second line<br>
</body>
</html>

results in no indentation of the first line. If I add back some space before "First" and "Second" without other changes, the "First" will be indented.

I've tested that scenario by sending an HTML-only message like this, then using View > Message Body As > Plain Text to force serialization to plain text.

E-mails sent with current stable Thunderbird 3.0.5 does not exhibit the leading spaces, the generated HTML code looks like the one pasted above, thus the result looks ok on either branch. Thus, somewhere the HTML generation code in either serializer or editor added the indentation levels for the HTML, thus exposing the issue on the serializer side.
Thanks for your test rsx11m, I will do some additional unit tests only with the serializer and the flag nsIDocumentEncoder::OutputFormatted, so I could verify if the issue is really inside the serializer. (If I'm not wrong, the initial call is here http://mxr.mozilla.org/comm-1.9.2/source/mailnews/compose/src/nsMsgCompose.cpp#1072 )
Whiteboard: [tb31wants]
I investigated and here my thought.

First, the HTML serializer is ok for me. Existing unit tests show that serialization is ok with the flag nsIDocumentEncoder::OutputFormatted. Changes I made are improvements on the indentation, this is why we have additionnal spaces in the html in the attachement 447194.

Second, it seems that the nsMsgCompose::sendMsg() method (if this is really this method which is called when we send a message) doesn't call the html serializer, but the plain text serializer, since it calls the document encoder with the mime type "text/plain" (well, in fact it calls nsEditor::OutputToString which then calls the encoder). So my work on the HTML serializer is not (at least directly) responsible of the bug.

So I think that the bug is inside the plain text serializer. Then, why this bug appears after my work on the HTML serializer? I guess that the HTML serializer is called some times (when and where, I don't know, maybe by the editor), and the result is probably re-parsed, so we certainly have a new DOM with these additionnals white spaces (for indendation). But the plain text serializer probably doesn't support very well these additionnal spaces and add some spaces in the result when nsMsgCompose::sendMsg() call it.

I'm not familiar at all with the mail code, so perhaps I'm saying wrong things. Don't hesitate to correct my hypothesis if they are wrong.

I could see inside the plain text serializer if there is really a bug, but not before the next week.
Laurent, thanks for looking into this. I'm not really sure where in the MailNews code the conversion is actually performed, nsMsgCompose::sendMsg() seems to cover only one of several cases. However, given that looking at an HTML-only message in Plain Text mode (comment #14) exhibits the problem as well, any issue of accidental double-serializing would have to occur on both ends. It appears to be a combination of the HTML serializer now indents the code nicely with the plain-text serializer having problems with such indentation and appears to treat it as an &nbsp; where it isn't. Thus, the second issue has been there all along and was just exposed by the new indentation behavior on the HTML side.
rsx11m.pub@gmail.com and Laurent, thanks for looking into this.

to expand on comment #10, here's the (partial) stack to where the leading space is inserted:

...
nsXHTMLContentSerializer::AppendToStringFormatedWrapped()
nsXHTMLContentSerializer::AppendText()
nsDocumentEncoder::SerializeNodeStart()
nsDocumentEncoder::SerializeToStringRecursive()               nsDocumentEncoder::SerializeToStringRecursive()
nsDocumentEncoder::SerializeToStringRecursive()
nsDocumentEncoder::SerializeToStringRecursive()
nsDocumentEncoder::EncodeToString()
nsMsgComposeAndSend::GetBodyFromEditor()
nsMsgComposeAndSend::Init()
nsMsgComposeAndSend::CreateAndSendMessage()
...
Is there any reason why this bug lives in editor?
It was filed that way, feel free to move it to Serializers instead, which appears to be the more proper component given the discussion here...
Component: Editor → Serializers
QA Contact: editor → dom-to-text
Summary: Sent plain text messages (composed in HTML mode) have the first line indented by one character (extra space) → [regression] First line indented by one character (extra space) for messages sent as plain text and composed in HTML mode
Whiteboard: [tb31wants] → [tb31wants][gs]
Anybody having an idea where to go from here? This is obviously gaining visibility based on the duplicate count...
Whiteboard: [tb31wants][gs] → [tb31wants][tb32wants][gs]
Whiteboard: [tb31wants][tb32wants][gs] → [tb31needs][tb32wants][gs]
Assignee: nobody → laurent
Status: NEW → ASSIGNED
Just confirming that I've also seen this for the first time, beginning with 3.1.  It is 100% repeatable.
Same here. Since 3.1 (Win7 32Bit) I have this annoying space character at the beginning of every written mail.
Just updated to Thunderbird 3.1.1 and the bug is still there. It seems that very basis functionality is getting neglected, which is not good.
Attached patch potential fix (obsolete) — Splinter Review
I've analyzed this problem and I am attaching a proposed fix. It works for me and seems to be correct, but this is the first time I've ever attempted to change anything in the innards of Mozilla / Thunderbird code, so I don't claim to be an expert and assume that someone more experienced than I will have to vet the change.

Here's the story...

Now that the HTML in the body is being pretty-printed, there are two new tokens immediately after the <body> tag being fed to nsPlainTextSerializer::DoAddLeaf -- the newline following the <body> tag, and then the whitespace indenting the first line of the body. Either of these was enough to cause a space to be inserted at the beginning of the converted plaintext message. The newline would cause a space to be inserted whenever <pre> was not in use (wrong -- shouldn't happen at the start of the body), and the indentation whitespace would cause a space to be inserted whenever not in whitespace (wrong -- shouldn't happen at the start of the body) or when OutputSelectionOnly was set (OK, but not if the selection includes the start of the body).

I don't know if I'm doing a great job of explaining this. If not, forget about my explanation, look at the eHTMLTag_whitespace and eHTMLTag_newline branches of nsPlainTextSerializer::DoAddLeaf before my patch, and observe what will happen there when whitespace or newline is passed into this function, which they will be at the start of the body.

My patch addresses all of these issues by adding another boolean flag, mSawBodyTag, which if true means that we're cutting and pasting from the start of the body and therefore initial whitespace needs to be ignored even if OutputSelectionOnly is true, and by treating newline more similarly to how other whitespace is treated.

I've got to say, working in this code is quite challenging. ;-)
Comment on attachment 458852 [details] [diff] [review]
potential fix

Let's pick Ben as a reviewer for this patch.

Jonathan, I have to say, that picking this particular problem as the first place to fix a mailnews bug is quite impressive!

(as an aside, your name rings a bell, likely from my peripheral involvement in netnews at the tail end of the last century).

(I don't know how doable it is, but figuring out an automated test to accompany the patch would be awesome).
Attachment #458852 - Flags: review?(ben.bucksch)
I fixed it because it was annoying me. ;-)

The logic of a test case should be easy, but I hope someone more familiar than I with the test infrastructure can figure out how to set one up.
Great to see a patch here, thanks!

> (comment #38) Let's pick Ben as a reviewer for this patch.

But, is Ben a "licensed" Serializer reviewer?
This is Core code, not MailNews Core...
Comment on attachment 458852 [details] [diff] [review]
potential fix

My bad, I thought this was mailnews code.  Jst, any thoughts about who should review this patch?
Attachment #458852 - Flags: review?(ben.bucksch) → review?(jst)
Attached file HTML doc testcase
Here's a testcase of an HTML document, showing how linebreaks after tags and indented lines should be rendered.
I take Gecko HTML rendering (i.e. open in Firefox) as reference.

- All whitespace (linebreaks and indention) in the source code is collapsed.
  (As the HTML spec, section 9.1 Whitespace <http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1> specifies.)
- The whitespace after the tag is ignored.
  (The above section explicitly specifies this as unspecified, but says
  that implementation MAY not render this whitespace, and that
  "authors, and in particular authoring tools," SHOULD NOT output it.

Therefore, I think it's a bug in both the pretty-printer and the plaintext serializer.

1) The pretty-printer violates this explicit advise in the HTML spec to not put this whitespace around tags
2) The plaintext serializer should not output whitespace immediately after or before tags, unless in <pre> mode.

(I assume we are not in a <pre>, because we use the HTML editor and just decide later to send as plaintext.)
Comment on attachment 458852 [details] [diff] [review]
potential fix

I don't know whether akk (Akkana Peck) is still around, but she's about the only reviewer for this code. Daniel Bratell wrote most of this class, esp. the whitespace code, but he's not an official reviewer.

Jonathan, awesome work for the first patch of Mozilla! Thanks a lot!

> I've got to say, working in this code is quite challenging. ;-)

Oh yes, esp. the whitespace code is super-tricky and highly fragile, partially because of the 2 input modes (HTML flow text and <pre>) and 3 output modes (format=flowed, normal plaintext document, selection/copy&paste). That's why this patch is risky.

I also think a different patch based on different logic is better, given my last comment.
Attachment #458852 - Flags: review?(jst) → review?(akkzilla)
Attachment #458852 - Flags: superreview?(jst)
Attachment #458852 - Flags: review?(bratell)
You may indeed be correct that the pretty-printer could be more conformant, but not the use of MAY and SHOULD in the text you quoted rather than MUST in either case. Note, also, that if you render this in firefox 3.6:

<html>
<body>
<p>This is<a> a test </a>of the word-breaking system.</p>
</body>
</html>

Both the space after <a> and the space before </a> are rendered. Whoever owns the firefox rendering engine therefore seems to have decided that the logic I implemented is, at least for the time being, correct. I would rather have the plaintext serializer behave like the graphical HTML renderer than like the spec.
Sorry, I meant "note the use" rather than "not the use".

Also, replying to David's earlier comment about using this as my first foray into the Mozilla code, that's not entirely true. I warmed up for it by first taking over maintenance of an old add-on that had been abandoned by its author (https://addons.mozilla.org/en-US/thunderbird/addon/195275/). It has been quite a learning experience.

Now, if any of the Mozilla folks here happen to have influence over the order in which new add-ons are reviewed, and would like to employ that influence to get my add-on reviewed faster in gratitude for the time I spent researching this issue, I wouldn't complain ;-).
> rendering engine therefore seems to have decided that the logic I
> implemented is ... correct

From what I understand, the rendering engine has a different logic.
From the comments, you only intend to suppress whitespace before the body. Is that correct?
While the HTML parser (which PlaintextSerializer uses) may already collapse whitespace for us, it seems it doesn't collapse newlines with whitespace, and I see no provision in the PlaintextSerializer to do that.

Firefox seems to collapse all whitespace including newlines around tags, both before and after the tag, treating both of them as one. (With "collapse", I mean "treat as one", not "treat as nothing".) From my testcase, I cannot infer a special treatment of the <body> tag in FF.
(In reply to comment #46)
> While the HTML parser (which PlaintextSerializer uses) may already collapse
> whitespace for us, it seems it doesn't collapse newlines with whitespace, and I
> see no provision in the PlaintextSerializer to do that.

I'm not sure I understand your point here. As far as I can tell, there is certainly code in the area I was working in that suppresses output of more than one space for a sequence of any number of spaces, tabs, and newlines in the input HTML. Note that in the area of code I was working in, unless output is being treated as preformatted, a space is only output when mInWhitespace is false (or the other special case mentioned below).

> Firefox seems to collapse all whitespace including newlines around tags, both
> before and after the tag, treating both of them as one. (With "collapse", I
> mean "treat as one", not "treat as nothing".) From my testcase, I cannot infer
> a special treatment of the <body> tag in FF.

You're correct that Firefox doesn't treat <body> differently from any other tag, but the serializer explicitly does. Note this comment in the region of the code I changed: "One exception: at the very beginning of a selection, we want to preserve whitespace." My special treatment of the body was intended to preserve that functionality explicitly specified by the previous author of the code.

Firefox doesn't have to deal with this case because "selections" aren't rendered in Firefox windows.
Jonathan:

> but note the use of MAY and SHOULD in the text you
> quoted rather than MUST in either case.

If you're going to start standards lawyering, make sure you're dead certain you know what these terms mean in the W3C specs first. The spec (http://www.w3.org/TR/REC-html40/conform.html) references RFC 2119, which defines "SHOULD" as:

   SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

"Understood and carefully weighed" isn't the same as "ignored at the implementor's leisure." I don't think this careful weighing actually went into the decision in this case. I'm pretty certain this is just an oversight.
It's entirely possible that it's just an oversight; I don't know, since I didn't write the code.

What I do know is that as far as I can tell (and no one has said any different here), my patch makes the behavior of the code no *worse* than it was before, and in some respects it makes it better.

So you can leave my patch out and hope that someone comes along who groks the entirety of the HTML renderer and plaintext serializer and submits a much more complex patch which solves the problem in the more comprehensive way, or you can install my patch, with a big comment if you want indicating that it is a temporary fix pending a more thorough reworking of the logic, and fix a regression that is impacting many people in a visible way on a daily basis.
Jonathan, please don't take comment 48 personally. I'm sure you know what SHOULD in RFCs mean. Can I persuade you to try to create a patch that collapses all whitespace (spaces and newlines), and ignores whitespace at the beginning and end of block elements including <body>?
Attached patch better potential fix (obsolete) — Splinter Review
Ignoring whitespace at the end of block elements is hard, since it would require doing lookahead, something that the code doesn't do now, and would therefore be a rather expensive change. I'm not really convinced that it's worth the effort, considering that since whitespace is being collapsed, the worst-case scenario for rendered whitespace at the end of a block element is a single extra space output there in the plaintext version, which is hardly the end of the world.

In any case, the attached path collapsed all whitespace (my last patch did as well, but just to be clear, the new one does as well) and ignores whitespace at the beginning of block elements including body as requested. I've removed the special treatment of the body tag.

Please let me know your thoughts
Attachment #459749 - Flags: superreview?(jst)
Attachment #459749 - Flags: review?(akkzilla)
Attachment #459749 - Flags: review?(bratell)
Attachment #458852 - Flags: superreview?(jst)
Attachment #458852 - Flags: review?(bratell)
Attachment #458852 - Flags: review?(akkzilla)
I don't know which of the two fixes are better yet. The first one did pass try server tests, the second one should be pushed too, if that's the way to go here.

Either way, this needs tests, badly, so that we don't regress this further going forward, and that will be a requirement for taking this change in mozilla-central.
Does the plain text serializer have any tests to build on? It seems like it would be fairly simple to write an xpcshell test for it.
I honestly don't know if there are any. It *might* be easier to write the tests in mochitest since constructing a DOM tree in an xpcshell test might be more work than getting to the plaintext serializer in a mochitest.
There's /content/base/test/TestPlainTextSerializer.cpp but I think that depends on the old parser which is being phased out...
(In reply to comment #54)
> I honestly don't know if there are any. It *might* be easier to write the tests
> in mochitest since constructing a DOM tree in an xpcshell test might be more
> work than getting to the plaintext serializer in a mochitest.

Is nsIParser going away? To the untrained eye, that .cpp test looks like it could be an xpcshell test fairly easily.
I don't think nsIParser is going away, but the implementation that is the old HTML parser is likely going away, and I don't know what that means in terms of driving the plain text serializer directly from the parser etc (as opposed to serializing a DOM tree etc).
So, what do we do here? Can we just use/extend the current test?
Please note that this is a high-profile bug in Thunderbird, and Thunderbird is the main (only) user of the class.
> Thunderbird is the main (only) user of the class.

Correction, not true: It's also used for Copy to clipboard in FF.
I was thinking of just adding one more test case to TestPlainTextSerializer - it's not going to add much work to whoever fixes the other test cases to deal with testing the serialization of this particular input string...
sounds reasonable, that's what I thought, too. jst, are you OK with that?
Yup, ok with me given that there's no obviously better and/or easy way to add a test here.
this adds a test case for the original regression (or, rather, the existing bug exposed by the pretty printing change). I'm open to suggestions for additional html to add serialization tests for...
Attachment #460354 - Flags: review?(jst)
This bug is titled "First line indented by one character (extra space) for messages sent as plain text and composed in HTML mode".

However in my experience I have always seen *two* spaces at the beginning of the message.

In the normal message view, I can see *one* space, but when in the message source view, I see *two* spaces.
Yeah, it's actually two spaces, not one. The first space is caused by the newline after the <body> tag in the HTML, and the second space is caused by the whitespace following the newline in the HTML, because the old code (before I fixed it) didn't treat newlines as whitespace.
Comment on attachment 459749 [details] [diff] [review]
better potential fix

   // Flow
   mEmptyLines = 1; // The start of the document is an "empty line" in itself,
-  mInWhitespace = PR_TRUE;
+  // This will get set when the first block tag is reached
+  // mInWhitespace = PR_TRUE;
   mPreFormatted = PR_FALSE;
   mStartedOutput = PR_FALSE;

This code is in the nsPlainTextSerializer constructor and does need to set mInWhitespace to something, PR_FALSE presumably, or you'll get an undefined value in mInWhitespace until it happens to be set the first time.
 
@@ -853,6 +854,13 @@

Next time, please generate diffs with the -p option so that the diff says which function a given change is in on lines like the above one (makes patches *much* easier to review), or set up hg to do the right thing for you per the configuration section on https://developer.mozilla.org/en/Mercurial_FAQ.

Do we know which bug actually regressed this? sr- due to the uninitialized member...
Attachment #459749 - Flags: superreview?(jst) → superreview-
> Do we know which bug actually regressed this?

In comment 17, Laurent Jouanneau talked about "my work on the HTML serializer". If that's the pretty-printing, that's probably the trigger for the regression.
I'm not using Hg -- I'm building from RPM source and doing development / patching in /usr/src/redhat/BUILD/thunderbird-*. Gross but effective.

Thanks for catching that issue. I am attaching an updated patch. It initializes mInWhitespace to PR_FALSE as you suggest.

And yes, it was the HTML pretty-printing changes that caused this bug, not a change to the plaintext serializer. As others have pointed out, the HTML pretty-printer is actually buggy right now, because it's generating whitespace after the <body> tag, and the spec says that authors are not supposed to insert such whitespace.
Attached patch potential fix (obsolete) — Splinter Review
Attachment #458852 - Attachment is obsolete: true
Attachment #459749 - Attachment is obsolete: true
Attachment #459749 - Flags: review?(bratell)
Attachment #459749 - Flags: review?(akkzilla)
> As others have pointed out, the HTML pretty-printer is actually buggy right now, because it's generating whitespace after the <body> tag, and the spec says that authors are not supposed to insert such whitespace.

I don't find, in the spec of HTML4 or HTML5, where it says that spaces are not allowed before or after a <body> tag when the DOM is serialized (into HTML). Or let me know where it is. For me, the HTML serializer does its job.

The plaintext serializer should take care about the content of *all* child nodes of <body>. And if the first child node is a text node with 0 or many whitespaces at the begining, it should treat them as it should do. No matter if this text node is generated by the HTML parser or not, or if it has been added dynamically for any reason during the live of the DOM.

(I'm very sorry to not help more at this time, unfortunately, I have some higher professionnal priorities :-( )
Comment on attachment 460730 [details] [diff] [review]
potential fix

If I'm understanding the comments correctly, then this should be good enough for re-superreview and maybe even review?
Attachment #460730 - Flags: superreview?(jst)
(In reply to comment #71)
> I don't find, in the spec of HTML4 or HTML5, where it says that spaces are not
> allowed before or after a <body> tag when the DOM is serialized (into HTML). Or
> let me know where it is. For me, the HTML serializer does its job.

The specs I've found are ambiguous.

They talk about collapsing whitespace, and they talk about ignoring a newline immediately after a block tag. They also talk about how whitespace surrounding a tag should be put outside the tag rather than inside it because of "inconsistencies among extant implementations," thus suggesting that different implementations do different things with non-newline whitespace that appears immediately after a block start tag or immediately before a block end tag.

Given all this, it seems risky for me for the serializer to be inserting indentation right after the <body> tag and other block tags.

But like I said, I can't find anywhere definitive that says that it's wrong per se.
(In reply to comment #73)
> The specs I've found are ambiguous.
> 
> They talk about collapsing whitespace, and they talk about ignoring a newline
> immediately after a block tag. They also talk about how whitespace surrounding
> a tag should be put outside the tag rather than inside it because of
> "inconsistencies among extant implementations," thus suggesting that different
> implementations do different things with non-newline whitespace that appears
> immediately after a block start tag or immediately before a block end tag.

This is about the parsing, not the serializing, isn't it ? So this part of the spec concerns the HTML parser, not the HTML serializer.
The plaintext serializer works on the DOM, not on a string. The DOM is generated by the parser and eventually modified by some code which could add any text nodes and whitespaces anywhere (the editor for example). So the plaintext serializer should take care correctly about whitespaces in text nodes, regardless of how these whitespaces have been added and by who :)
(In reply to comment #74)
> This is about the parsing, not the serializing, isn't it ?

No. As noted above in comment 42, the HTML 4 spec, in section 9.1, says that AUTHORS, i.e., the people generating HTML, should not put space immediately after start tags or before end tags because the correct interpretation of said space is ambiguous and different HTML parsers will do different things with it.

I wish I could find a spec somewhere which says what the HTML parsers are SUPPOSED to do with it, but like I said, I can't. All I can find is the warning that it might not behave the way you think it should so you shouldn't do it.

Simply put, this:

<blockquote>
foo
</blockquote>

is NOT syntactically the same as this:

<blockquote>
  foo
</blockquote>

The reason these two may not be syntactically the same (like I said, I'm not certain) is that although the spec requires the newline after the opening tag to be *ignored*, it only required the indentation in the second case to be *collapsed*, not *ignored*.

I want to believe that there's something in a spec somewhere that says that all whitespace at the beginning or end of a block element should be ignored, because I believe that's the right behavior, but I just can't find it written down anywhere, which is why your pretty-printing is making me nervous.
this patch breaks our xpcshell unit test, base/test/unit/test_getMsgTextFromStream.js - so the good news is we have a test that covers this code :-) I'm not sure if the test is hard-coded to believe the buggy behavior is correct. I'll dig into it.
I believe the issue has to do with whitespace collapsing in the base64-3 test. I'll attach a diff for the test case in a bit, once I verify this.
(In reply to comment #78)
> Created attachment 461705 [details] [diff] [review]
> fix mailnews unit test

So in your estimation, it's a bug in the unit test, not something I have to fix in my patch?

Phew! After staying up until something like 3am fixing the problem you found in my other patch, I'm glad this one turned out to be not my problem ;-).

Thanks for all the work you're doing on the unit tests, btw. I'm grateful that there's someone helping me out with getting these patches through the system and into the source tree. I couldn't possibly do it by myself.
(In reply to comment #79)
> 
> So in your estimation, it's a bug in the unit test, not something I have to fix
> in my patch?
I believe so - the test I changed assumed that multiple spaces in html wouldn't get folded into one, and your patch makes sure the spaces get folded, so I just changed the data for the result we expect to have two spaces folded into one.
> 
> Thanks for all the work you're doing on the unit tests, btw. I'm grateful that
> there's someone helping me out with getting these patches through the system
> and into the source tree. I couldn't possibly do it by myself.

No problem - we really appreciate you working on these problems...and our test requirements make contributing a patch that much more daunting. Which reminds me, I have to go try the new mimalt patch.
> if the test is hard-coded to believe the buggy behavior is correct.

FWIW, this is pretty common in the tests for this class. They were created by just making a whole HTML file, running it through the serializer, and then putting the result in the unit test. I.e. fix the behavior as-is. This pretty much garantees that any bugfix in the serializer breaks the tests. (Good example of how not to write tests. Thankfully, the test that bienvenu added here is minimal and good.)
Confirmed on Mozilla/5.0 (Windows; U; Windows NT 6.0; nl; rv:1.9.2.7) Gecko/20100713 Lightning/1.0b2 Thunderbird/3.1.1. Always reproducible.
Requesting blocking for the next gecko release - this is a highly visible bug for Thunderbird users and we need to be pushing this into core.
blocking1.9.2: --- → ?
(In reply to comment #83)
> Requesting blocking for the next gecko release - this is a highly visible bug
> for Thunderbird users and we need to be pushing this into core.

You requested blocking 1.9.2.  Did you instead mean to requests blocking 2.0?
> You requested blocking 1.9.2.  Did you instead mean to requests blocking 2.0?

No, 1.9.2 is correct, TB 3.1 bases on Gecko 1.9.2.
I'm not going to put this as blocking, as I don't think we would hold up security fixes for this bug. I will set it as wanted though, as we do want the fix when it is ready. When a safe fix has been reviewed, please nominate and we will get it in the next point release.
blocking1.9.2: ? → -
Hey folks, what's the status here? Are we still waiting for review from jst?

I'd really like to be getting this fix in soon.
This bug looks very simple but why it takes too long to be fixed? What is the problem here ---- a 3-month-old bug.
@jst: Ping for reviews, this is a hot topic for Thunderbird users as you can see from the number of duplicates. Any chances to get this in for 1.9.2.9?

For reference, Thunderbird's 3.1.3 code freeze is currently scheduled for Friday next week, August 20.
By the way, here are some workarounds until the fix makes it into a release:

1. Install userChromeJS <http://userchromejs.mozdev.org/> and
put this into your userChrome.js file:

function DetermineConvertibility() {
    if (!gMsgCompose.composeHTML)
        return nsIMsgCompConvertible.Plain;
    return nsIMsgCompConvertible.No;
}

This will prevent Thunderbird from converting HTML messages to
plaintext-only when sending them. It'll instead send messages
with both plain-text and HTML versions. The plain-text version
will still have the extra spaces at the beginning of it, but
most email recipients nowadays read the HTML version, not the
plaintext version, so they'll never see it.

2. Put some HTML formatting into your signature (e.g., bold,
italic, font size change, whatever), which will also prevent
Thunderbird from converting your message to plaintext-only,
with the same result as above.

3. (Gross but works) Write your message and send it with Send
Later (Ctrl-Shift-Enter) instead of sending it directly. This
will put the message into your Outbox. Move the message from
your Outbox into your Drafts folder, double-click on it to
edit it, remove the extra spaces, and send it for real this
time. This works because since the message has already been
converted into plaintext, Thunderbird won't convert it again,
so when you delete the extra spaces, they'll stay deleted.
(In reply to comment #92)
> @jst: Ping for reviews, this is a hot topic for Thunderbird users as you can
> see from the number of duplicates. Any chances to get this in for 1.9.2.9?

plus http://gsfn.us/t/16h1u is getting duplicates
Whiteboard: [tb31needs][tb32wants][gs] → [tb31needs][tb32wants][gs][workaround comment 93]
What up with the bugfix update?

"For reference, Thunderbird's 3.1.3 code freeze is currently scheduled for
Friday next week, August 20."
Comment on attachment 460730 [details] [diff] [review]
potential fix

- In nsPlainTextSerializer::DoOpenContainer():

+  /* Container elements are always block elements, so we shouldn't
+     output any whitespace immediately after the container tag even if
+     there's extra whitespace there because the HTML is pretty-printed
+     or something. To ensure that happens, tell the parser we're

s/parse/serializer/.

- In nsPlainTextSerializer::DoAddLeaf():

>-    else if(!mInWhitespace ||
>-            (!mStartedOutput
>-             && mFlags | nsIDocumentEncoder::OutputSelectionOnly)) {
+    else if(!mInWhitespace) {
       mInWhitespace = PR_FALSE;

Now that we only end up in her if mInWhitespace is false there's no need to set it to false again.

sr=jst with that.
Attachment #460730 - Flags: superreview?(jst) → superreview+
Attachment #460354 - Flags: review?(jst) → review+
<confused>
Do I need to make the changes you suggested and resubmit the patch, or are you saying that whoever, commits the fix will make them?
</confused>
Jonathan: You need to make the changes and provide a new patch. I'm also not sure if this patch needs standard review as well.
I've just run attachment 460730 [details] [diff] [review] and attachment 460354 [details] [diff] [review] through the Try server.

I'm seeing failures across all platforms.

The make check test fails with:

TEST-PASS | HTML to text serialization test
TEST-PASS | HTML to ASCII text serialization with format=flowed; delsp=yes
TEST-PASS | HTML to CJK text serialization with format=flowed; delsp=yes
NEXT ERROR TEST-UNEXPECTED-FAIL | Wrong prettyprinted html to text serialization
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /builds/slave/tryserver-macosx-debug/build/content/base/test/TestPlainTextSerializer.cpp, line 188

Does the test in attachment 460354 [details] [diff] [review] not work with the latest version of the patch?

Also, mochitest 1 fails:

27851 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " bla\n\n    foo\n    bar\n\n", expected "bla\n\n    foo\n    bar\n\n"
27857 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " mozilla\n\n    foo\n    bar\n\n", expected "mozilla\n\n    foo\n    bar\n\n"
27863 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " mzla\n\n    foo\n        bazzinga!\n    bar\n\n", expected "mzla\n\n    foo\n        bazzinga!\n    bar\n\n"
27869 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " Tt t t ", expected "Tt t t "
27876 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " T ", expected "T "

Example log: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282825931.1282828189.2099.gz

The test is here: http://mxr.mozilla.org/comm-central/source/mozilla/content/base/test/test_copypaste.html?force=1
I've also created some Thunderbird try builds using trunk Thunderbird, they can be found here if anyone wants to test:

http://ftp.mozilla.org/pub/mozilla.org/thunderbird/tryserver-builds/bugzilla@standard8.plus.com-d960a9ecbad2/
Attachment #460730 - Attachment is obsolete: true
Attachment #469455 - Flags: superreview?
Attachment #469455 - Flags: review?
Attachment #469455 - Flags: superreview?(jst)
Attachment #469455 - Flags: superreview?
Attachment #469455 - Flags: review?(jst)
Attachment #469455 - Flags: review?
Comment on attachment 469455 [details] [diff] [review]
revised potential fix based on jst review

Does this patch pass the failing tests mentioned above?
(In reply to comment #104)
> Does this patch pass the failing tests mentioned above?

No, because I believe the tests, not my changes, are at fault (I believe this was noted above), and (as you can see from the comments above) other people than I have been dealing with the test issues related to this change.
Dam. Shredder is not compatible with Lightning 1.0b2 :(
(In reply to comment #105)
> 
> No, because I believe the tests, not my changes, are at fault (I believe this
> was noted above), and (as you can see from the comments above) other people
> than I have been dealing with the test issues related to this change.

I fixed a mailnews test, but the core code has a different test suite, and those tests need to be fixed as well. I'm less (read completely un) familiar with the core code test suite, but I can try to come up with a patch for them this afternoon.
oh, hmm, yeah, the TestPlainTextSerializer.cpp test worked with the original patch, at least on Windows. I'll have to try it with the latest version of the patch.
It fails with the latest patch on Windows, so I'll have to see what's getting output by the serializer
the test fails because the original bug is back - there's a leading space in the plain text output...
(In reply to comment #110)
> the test fails because the original bug is back - there's a leading space in
> the plain text output...

Um. There is a mystery here. I just installed Thunderbird with exactly the patch I submitted earlier today, and it is not exhibiting the bug. If I send a message to myself with nothing but the word "testing" on the first line, then a line break, then "one two three" on the second line, the first line is clearly not indented, either when the message is displayed in Thunderbird or when I do View > Message Source.
Jonathan, if you go to objdir/mozilla/dist/bin, and run TestPlainTextSerializer.exe, does it work for you? I ran it and looked in the debugger, and saw a leading space in the output.
Hi,

First, I confirm that the bug on the textplain serializer was there before the new HTML serializer. David's test fails on the gecko 1.9.1 branch.

Second, I confirm that the Jonathan's patch doesn't solve the problem :-( There is still a space at the first line.

And I think that if some tests about copy/paste fail, it is probably because Jonathan removed stuff about the selection:

{{{
-    else if(!mInWhitespace ||
-            (!mStartedOutput
-             && mFlags | nsIDocumentEncoder::OutputSelectionOnly)) {
}}}

@Jonathan: Is there a reason for this remove, or is it just a mistake ?

An other thing: you cannot ask a review and a superreview to the same person. It doesn't make sens ;-)

@david : your test should take care about line breaks. On some plateforms it is only \n.

I'm trying to provide a new combined patch.
(In reply to comment #113)
> Hi,
> 
> First, I confirm that the bug on the textplain serializer was there before the
> new HTML serializer. David's test fails on the gecko 1.9.1 branch.
That's because my test is testing the new behavior, without pretty printing. My test failing is not evidence of a bug in 1.9.1
> An other thing: you cannot ask a review and a superreview to the same person.
> It doesn't make sens ;-)

Yes, we can do that for smallish patches. In this case, the patch doesn't actually require sr, as I understand the current rules.
@Laurent (comment #113): The special-casing for OutputSelectionOnly is not necessary with the current logic of my patch.

There was in fact a bug in the most recent version of my patch. I will attach a fixed version.

Even with that fix, the test will fail on Linux because it has hard-coded \r\n line breaks in it and on Linux the pretty-printed text uses \n instead of \r\n for line breaks. So either the test needs to be updated to know the platform it's running on and use the correct line breaks (\r for Mac, \n for Linux, \r\n for Windows), or (easier) it should compare the output against a constant string containing each of the three line break types and succeed if any of the matches.
Attached patch yet another try at the patch (obsolete) — Splinter Review
Attachment #469455 - Attachment is obsolete: true
Attachment #470227 - Flags: review?(jst)
Attachment #469455 - Flags: superreview?(jst)
Attachment #469455 - Flags: review?(jst)
(In reply to comment #115)
> Even with that fix, the test will fail on Linux because it has hard-coded \r\n
> line breaks in it and on Linux the pretty-printed text uses \n instead of \r\n
Mac is the same as Linux ('\n'), and there's an NS_LINEBREAK macro that I could use to make the test cross-platform.
(In reply to comment #117)
> Mac is the same as Linux ('\n'),

Not convinced, for two reasons:

1) In my experience, Mac uses \r, not \n (although most programs on the Mac understand \n just fine).

2) This code in nsPlainTextSerializer::Init in nsPlainTextSerializer.cpp would seem to agree with me:

  else if (mFlags & nsIDocumentEncoder::OutputCRLineBreak) {
    // Mac                                                                      
    mLineBreak.Assign(PRUnichar('\r'));
  }

There are similar blocks of code for \r\n for Windows and \n for Linux.
Attached patch test diffs (obsolete) — Splinter Review
Attachment #460354 - Attachment is obsolete: true
Attachment #470233 - Flags: review?(jst)
(In reply to comment #118)

> 1) In my experience, Mac uses \r, not \n (although most programs on the Mac
> understand \n just fine).
Mac used to, but since the move to OS/X, Mozilla uses \n on the Mac.
> 
> 2) This code in nsPlainTextSerializer::Init in nsPlainTextSerializer.cpp would
> seem to agree with me:
> 
>   else if (mFlags & nsIDocumentEncoder::OutputCRLineBreak) {

You won't see anyone setting ENCODE_FLAGS_CR_LINEBREAKS, I don't believe.

In any case, I've used NS_LINEBREAK in the test. I'll try the test on the Mac tomorrow.
> The special-casing for OutputSelectionOnly is not
necessary with the current logic of my patch.

No, it is necessary, as per comment before the original test. The remove breaks all tests on copy paste.

An other problem with your patch, is you forgot to add mInWhitespace = PR_TRUE at line 644, before the end of the process on eHTMLTag_body.  the " mInWhitespace = PR_TRUE;" at the end of DoOpenContainer is never reach when the serializer is called for <body>. This is why the David's test fails.

@David : in fact, for such tests with the serializer, just force the line break, by passing the corresponding flag, so it is not dependant to a plateform ;-)

> That's because my test is testing the new behavior, without pretty printing. My
test failing is not evidence of a bug in 1.9.1

Yes it is. I (and the plaintext serializer too) don't care if the DOM was constructed directly from the content of a html file, or generated/modified by the editor, by the mail editor or by any other scripts. The fact is, if we give a DOM like you wrote in your test to the plaintext serializer (and your test is good), the plaintext serializer adds a space where it should not. It is a bug, and it exists in old branches.

The new behavior of the HTML serializer just reveals this bug.

By studying the Jonathan's patch, I found that the test

    else if(!mInWhitespace ||
            (!mStartedOutput
             && mFlags | nsIDocumentEncoder::OutputSelectionOnly)) {

is always true when mStartedOutput is false, because mFlags | nsIDocumentEncoder::OutputSelectionOnly *always* returns true. The operator should be &, not |. If I only change this operator, and without the Jonathan's patch, the bug is fixed, or, at least, the David's test succeed.

So the Jonathan's work seems do not fix this bug, but probably fixes an other bug, bug 386613 ;-)
Attached patch Inferior fix (obsolete) — Splinter Review
Here is my proposal, with the David's test and two other tests. In mozilla central and comm-1.9.2 (and mozilla 1.9.1 :-) ), the test succeed.

I still have to verify that it really fix also the mail bug in comm-1.9.2. I have to compile a thunderbird.

Note that we have this bug in the plaintext serializer at least since gecko 1.9.0 ! (according to hg annotate http://hg.mozilla.org/mozilla-central/annotate/414ff9016349/content/base/src/nsPlainTextSerializer.cpp#l1090 )
Attachment #470227 - Attachment is obsolete: true
Attachment #470233 - Attachment is obsolete: true
Attachment #470227 - Flags: review?(jst)
Attachment #470233 - Flags: review?(jst)
(In reply to comment #121)
> No, it is necessary, as per comment before the original test. The remove breaks
> all tests on copy paste.

Does it? With the fix I posted last night, "make check" succeeds in my Thunderbird build tree in object/mozilla/content. And from reading the code, I would expect it to.
 
> An other problem with your patch, is you forgot to add mInWhitespace = PR_TRUE
> at line 644, before the end of the process on eHTMLTag_body.  the "
> mInWhitespace = PR_TRUE;" at the end of DoOpenContainer is never reach when the
> serializer is called for <body>. This is why the David's test fails.

Yes, this is what I fixed in the updated patch I submitted last night.

> By studying the Jonathan's patch, I found that the test
> 
>     else if(!mInWhitespace ||
>             (!mStartedOutput
>              && mFlags | nsIDocumentEncoder::OutputSelectionOnly)) {
> 
> is always true when mStartedOutput is false, because mFlags |
> nsIDocumentEncoder::OutputSelectionOnly *always* returns true. The operator
> should be &, not |. If I only change this operator, and without the Jonathan's
> patch, the bug is fixed, or, at least, the David's test succeed.

But I do not believe your fix is adequate, because it won't cause whitespace to be ignored at the beginning of all container tags, which is the correct behavior and what Ben asked me to do in comment #50. Your fix also preserves the incorrect behavior of treating newline and whitespace differently when they should be treated exactly the same except when preformatting is in effect.
By the way, Laurent, since I can't convince myself that my patch has the problem you claim it does with selection cut-and-paste, and since as noted above I believe your patch is less correct than mine, I don't think it was appropriate for you to mark my patch obsolete, but I can't figure out how to undo that.

I also think it might be interpreted as somewhat, well, impolite, to come along after someone else has been trying to develop a correct patch for over a month after you said you didn't have the time to do it, submit your own, different patch out of the blue, and mark theirs obsolete.

To the others participating in fixing this bug, I want to be clear: I DO NOT THINK LAURENT'S FIX IS CORRECT FOR THE REASONS OUTLINED IN MY LAST COMMENT, and I believe that attachment #470227 [details] [diff] [review], which Laurent marked obsolete, is in fact the correct fix.
Comment on attachment 470227 [details] [diff] [review]
yet another try at the patch

Ok, no problem. Sorry for the obsolete status.

So we have two solutions. Probably your's is better. However, your solution is still not good. Tests on copy paste still fail with your patch. Leading whitespaces are removed from the selection when a copy-paste occurs, whereas it shouldn't (or some unwanted spaces are added).

25973 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " bla

    foo
    bar

", expected "bla

    foo
    bar

"
25974 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | text/unicode value in the clipboard - got "bla

    foo
    bar

", expected " bla

    foo
    bar

"
25976 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | value of the textarea after the paste - got "bla

    foo
    bar

", expected " bla

    foo
    bar

"
25979 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " mozilla

    foo
    bar

", expected "mozilla

    foo
    bar

"
25980 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | text/unicode value in the clipboard - got "mozilla

    foo
    bar

", expected " mozilla

    foo
    bar

"
25982 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | value of the textarea after the paste - got "mozilla

    foo
    bar

", expected " mozilla

    foo
    bar

"
25985 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " mzla

    foo
        bazzinga!
    bar

", expected "mzla

    foo
        bazzinga!
    bar

"
25986 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | text/unicode value in the clipboard - got "mzla

    foo
        bazzinga!
    bar

", expected " mzla

    foo
        bazzinga!
    bar

"
25988 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | value of the textarea after the paste - got "mzla

    foo
        bazzinga!
    bar

", expected " mzla

    foo
        bazzinga!
    bar

"
25991 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " Tt t t ", expected "Tt t t "
25992 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | text/unicode value in the clipboard - got "Tt t t ", expected " Tt t t "
25995 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | value of the textarea after the paste - got "Tt t t ", expected " Tt t t "
25998 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | Selection.toString - got " T ", expected "T "
25999 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | text/unicode value in the clipboard - got "T ", expected " T "
26002 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_copypaste.html | value of the textarea after the paste - got "T ", expected " T "
Attachment #470227 - Attachment is obsolete: false
Attachment #470227 - Flags: review-
Note I ran that tests with mozilla-central sources (gecko 2.0), not gecko 1.9.2, since a patch should work on the trunk before to be accepted on branches. And unfortunately, all tests about copy paste are not in the 1.9.2  branch, this is probably why all tests in your tree succeeded.

You really should work with hg and comm-central/mozilla-central sources ;-)
Status: ASSIGNED → NEW
Assignee: laurent → nobody
I can't speak to all of the tests, but it appears to me that some of them are simply wrong, given the behavior I was told to implement in comment #50.

For example, according to this code in the "alist" test:

testClipboardValue("text/html", "<div id=\"alist\">\n    bla\n    <ul>\n      <li>foo</li>\n      \n      <li>bar</li>\n    </ul>\n  </div>");

The HTML that gets copied into the clipboard is enclosed in a <div> tag. <div> is a block-level tag, therefore whitespace immediately following it is supposed to be stripped (according to comment #50, as well as (as far as I could tell) according to the relevant specs. But the test is expecting the whitespace to *not* be stripped:

testClipboardValue("text/unicode", " bla\n\n    foo\n    bar\n\n");

So if comment #50 is correct about the way things are supposed to behave, then it's the test that's wrong, not my code.

I'm not going to bother spending time analyzing the issues with the other tests until this confusion is resolved. I don't think I'm the right person to resolve it.
> Leading whitespaces are removed from the selection when a copy-paste occurs,
> whereas it shouldn't

Can you or somebody elaborate why? Most of the time that whitespace would be unwanted, no? (It's particularly annoying when e.g. pasting into IRC.)
Assignee: nobody → jik
(In reply to comment #128)
> > Leading whitespaces are removed from the selection when a copy-paste occurs,
> > whereas it shouldn't
> 
> Can you or somebody elaborate why? Most of the time that whitespace would be
> unwanted, no? (It's particularly annoying when e.g. pasting into IRC.)

I believe the idea is that what is copied or cut should be pasted verbatim. If the user highlights the whitespace when copying / cutting, then that whitespace should be included when pasting. The application accepting the paste may choose to strip it, but the decision is up to that application, I think.
That's one theory. My theory is that the user most likely did not intend to select that leading and trailing whitespace, and thus doesn't belong there. It's also not significant. So, unless somebody has a good argument why it's important and that's overriding all the users who accidently select some space before the word, I think removing it (which, coincidentally, your patch does) is the right action.
(In reply to comment #130)
> That's one theory. My theory is that the user most likely did not intend to
> select that leading and trailing whitespace, and thus doesn't belong there.

Well, this is clearly not the theory of whoever originally wrote the code, because they put an explicit comment in indicating that they were preserving whitespace at the start of the selection on purpose. Or trying to, at least.

> It's also not significant. So, unless somebody has a good argument why it's
> important and that's overriding all the users who accidently select some space
> before the word, I think removing it (which, coincidentally, your patch does)
> is the right action.

But my patch *doesn't*. If you run Thunderbird with my patch applied, and copy a space and the word following it, and then click the Write button with the Shift key depressed so the editor pops up in plaintext mode, and then paste with ctrl-v, both the space and the word following it will be pasted. The space is also preserved if, e.g., you paste into an Emacs buffer.

Whether this is the correct behavior is certainly up for discussion, but my patch doesn't actually change it, at least not as far as I can tell. It's certainly possible that I'm wrong, but as I said above, I think it's the tests that are wrong, not the code, since they wrap the HTML in <div>, which is a block element, and thus the whitespace *should* be stripped.
(In reply to comment #130)
> That's one theory. My theory is that the user most likely did not intend to
> select that leading and trailing whitespace, and thus doesn't belong there.
> It's also not significant. So, unless somebody has a good argument why it's
> important and that's overriding all the users who accidently select some space
> before the word, I think removing it (which, coincidentally, your patch does)
> is the right action.

For example I often select a string with a leading space when I move pieces of text inside the same sentence. 

IMO making assumptions on "what users want" is absolutely wrong: if I select a trailing whitespace the software is expected to maintain that (and this bug is exactly based on the wrong behavior of adding extra content to the original text), you can't consider users unable to do a simple task like a proper selection.
If this code path is used for copy&paste within a browser textfield or within the Thunderbird composers (HTML and plaintext), then I agree, it's important to preserve the leading and trailing space. I'm not sure it is, though: IIRC, we also put the HTML version of the text in the clipboard, if you copy from HTML. This class downconverts HTML to plaintext, so it /should/ not be used when pasting from HTML editor to HTML editor nor from plaintext editor to plaintext editor.
Is it possible to run the mochitests in a Thunderbird build tree on windows? Running mochitests in a thunderbird tree doesn't do much. Or do I need to do a Firefox build?
I built a firefox tree and can run mochitests now. jst, before I go tweak test_copypaste.html too much, your opinion on this issue of whitespace in selection would be helpful!
Attached patch fixes mochitestsSplinter Review
This gets the content mochitests to pass, by switching what we expect w.r.t. leading spaces for some of the tests. But, I really don't know if this is the right thing to do for these tests.
I don't really want to complicate the resolution of this bug, but if I use "Shift Write Button" to compose a message in plain-text, the extraneous leading whitespace does not appear.

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100825 Lightning/1.0b2pre Thunderbird/3.1.3

I'd like to express my thanks to all of you who are digging into this issue.
(In reply to comment #138)
> I don't really want to complicate the resolution of this bug, but if I use
> "Shift Write Button" to compose a message in plain-text, the extraneous leading
> whitespace does not appear.

In my experience, sometimes the leading whitespace is pasted and sometimes it isn't. I think it depends on exactly what HTML the ctrl-c copies, and of course you can't see that.
(In reply to comment #138)
> I don't really want to complicate the resolution of this bug, but if I use
> "Shift Write Button" to compose a message in plain-text, the extraneous leading
> whitespace does not appear.

The people here know that, but it's not the point.

Default and comfortable option in Thunderbird is sending HTML-Mails. The bug regards to those mails which don't have any HTML-Code, so Thunderbird send them as plain-text-mails.
Jonathan: I was viewing the source for the resulting messages, and was comparing the body portion to determine whether the extra whitespace was present.

allblue: I just wasn't sure if any of the plumbing for the messages being sent explicitly as plain-text overlapped with the messages that were composed as HTML, but sent as plain-text. For example, I didn't know if nsPlainTextSerializer was involved with the message sent explicitly as plain-text. That's also why I wasn't sure if I should chime in or not.
(In reply to comment #141)
> Jonathan: I was viewing the source for the resulting messages, and was
> comparing the body portion to determine whether the extra whitespace was
> present.

I think I misunderstood what you were saying.

I thought you were saying that if you compose a message in plaintext format and then paste copied text with leading whitespace into it, the leading whitespace is not included in the paste. In response to that, I was pointing out that whether the leading whitespace is pasted in that situation depends on the exact format of the HTML being copied, but you can't see what because it's "hidden" in the clipboard and not easily accessible.

Looking at what you wrote again, I know think that what you were saying is that when you compose a message in plaintext format and then send it, there is no leading whitespace appended to the beginning of the message.

If that is what you are saying, then yes, you are correct. If you compose the message in plaintext format, then it never has to go through the HTML to plaintext serializer, so the bug in the serializer does not manifest. But as allblue@gmx.de pointed out in comment 140, it's really beside the point.
Not resolved, alas, in version 3.13 update (just now)

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100825 Thunderbird/3.1.3
Sorry, but I really can't understand why this bug is taking so long to be fixed. I think it is a high priority issue very noticeable for users.
yep, not solved in 3.1.3.

Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.9) Gecko/20100825 Thunderbird/3.1.3
We know it's not solved in the latest release. We didn't say it was. Please stop commenting. This will be fixed when the status of the bug changes to FIXED see https://bugzilla.mozilla.org/page.cgi?id=fields.html.
Whiteboard: [tb31needs][tb32wants][gs][workaround comment 93] → [tb31needs][tb32wants][gs][workaround comment 93][read comment 246]
Pinging jst and Jonathan for comments on my patch for fixing the mochi-tests. I basically had to switch the tests arounds, in the sense that the test cases that used to expect whitespace no longer got it, and the tests that didn't used to expect whitespace now got it, which was a bit odd.
> We know it's not solved in the latest release. We didn't say it was.
> Please stop commenting. This will be fixed when the status of the bug
> changes to FIXED see https://bugzilla.mozilla.org/page.cgi?id=fields.html.

It won't be fixed until it's fixed :-)

OK, how about this:

I will offer a bounty of GBP 5.00 (~ USD 7.70), payable by PayPal, to whoever can convince me that they were the key person responsible for getting a working version available for me to download, before 2010-09-23T17:30:00+0100 (that's 14 days from now).

This should not be a special build but a general release in the normal stable release stream available for Windows 32-bit, under either English-US or English-GB.

And no "funny business" like saying "Thunderbird 3.0.6; payout please". This about getting the bug fixed properly, not playing tricks.

I know this is not much money, but if other people feel the same way and also offer a bounty then it could make an interesting chunk of cash to get someone motivated.

How do you know I'm going to pay out? I guess you'll just have to trust me. But I *promise* that this is a bona fide offer, cross my heart, hope to die, etc.
If anyone else wishes to pledge a bounty, I could maintain a register of pledges.

See http://duckbill-stuff2.blogspot.com/2010/09/thunderbird-31x-adding-spaces-bounty.html
Here's 5 euros from me. Same scheme.
A bounty is NOT going to get this bug fixed any faster.

Restricting comments only to the etiquette will:

https://bugzilla.mozilla.org/page.cgi?id=etiquette.html

The main point being: Unless you are actually helping to fix the bug please do not comment, useless comments make it harder for developers to communicate and slow the whole process down.

We know all the issues in this bug, it needs a bit of time to ensure we have the right fix and we'll get it done as soon as we can.
Back to Topic: if the bug is deeper and not so easy to fix, maybe you can add a quick fix for it in the next TB Version. I am no c++ developper, but in java we have .trim() method. Maybe a quickfix is to .trim() (use the c++ equivalent) the mailcontent before going on with the mailing process.

With this quick fix you have time to found the source of this error and thousends of admin over the world get no more e-amisl from there customers with "this error reports".
I don't see how offering a bounty breaches the etiquette. I'm not trying to oblige anyone to do anything --- that is the job of despots and dictators.

I am merely offering a reward. No-one *has* to do anything.

I also disagree that a bounty will have no effect.

Let's say I offered USD 10 million to have this fixed by the end of tomorrow. You can bet that it would get done by that time.

It is like the old joke "Do you think I am a prostitute? / We've established that, we're just haggling over price.". This seems to me to be eminent domain for offering a bounty --- it's not as if the task is sordid or immoral :-)

I have no intention of managing the bounty on this bug, which is why I posted a link off-bug.
Whiteboard: [tb31needs][tb32wants][gs][workaround comment 93][read comment 246] → [tb31needs][tb32wants][gs][workaround comment 93][read comment 151]
Suggestion: Since it appears that the current HTML pretty-printing implementation is either in violation of recommendations and/or triggers one or more bugs and/or otherwise undesirable consequences in other parts of the system, how about rolling back the HTML pretty-printing changes as a fast, pragmatic, tactical solution until there is an agreeable strategic solution to the aforementioned undesirable consequences?

In the interim, those concerned about pretty-printing the HTML can reformat the ugly HTML after the fact, working on a private copy, and not affecting hundreds of thousands of users.

Here's another quasi-work-around, with a trade-off: Since the bug only affects the first line, leaving the first line blank and starting your text on the second will tend to hide the indentation issue (the trade-off being the obvious extra vertical space, which you may or may not find more palatable).
>how about rolling back the HTML pretty-printing changes as a fast,
pragmatic, tactical solution

It is not a fast and pragmatic solution because it is difficult to do it, since many changes have been done since these changes. And the patch for the HTML pretty-printing was also about the pretty printing in XHTML and in XML. pretty printing in XHTML and in XML didn't exist before that patch, so if we roll back the change, you remove some features needed and already used by other XulRunner applications.

A solution is to remove the flag "pretty printing" when the HTML serializer is called in thunderbird (no need to have HTML pretty printing for mail). But i don't know where or when the serializer is called.

Or an other solution is my little patch, which fix the plaintext serializer, even if it doesn't fix all case. But it fixes at least our bug.
Comment on attachment 470264 [details] [diff] [review]
Inferior fix

I ask a review for my patch. It fixes the bug. Contrary to other propositions, there are less chance to have regression since changes are minimal, and since all current tests passes with it without changes, and it doesn't change the behavior of copy/paste.

I think this is the most pragmatical solution to apply in the 1.9.2 branch. We could later work on a better solution like the Jonathan's one, and only for mozilla-central. IMHO, the Jonathan's patch isn't ready yet since it breaks the behavior of copy/paste, and we need more tests since changes are more important.
Attachment #470264 - Flags: review?(jst)
Comment on attachment 470567 [details] [diff] [review]
fixes mochitests

bienvenu, you should mark the bug as review?, to get jst's attention.
Attachment #470567 - Flags: superreview?(jst)
Comment on attachment 470264 [details] [diff] [review]
Inferior fix

Laurent Jouanneau, you admit yourself that the other patch fixes more cases. In the interest of a good solution, and also to not frustrate a new contributor, let's first look at the patch provided. Please see comment 123 and 124. I totally understand his frustration.

What Jonathan and bienvenu need are directions on which behavior is wanted. I think we should *first* get this information before throwing in another, entirely different and inferior patch.
Attachment #470264 - Attachment description: Potential fix → Inferior fix
Attachment #470264 - Flags: review?(jst)
As I have already said, as far as I can tell, my patch does not break copy/paste. As I have already said, I believe that it is the tests, and not my patch, that were incorrect. As I have already said, I was asked to change my patch to make it remove whitespace at the beginning of all block elements, which is what the HTML spec appears to require, and that is what I did, and that is whi they mochi tests started failing. As I have already said, someone authoritative needs to offer an opinion about whether the behavior I was asked to implement and indeed implemented in my patch is correct. (Ben already pointed out much of this in comment 160, but I am reiterating to make sure is it is perfectly clear).

Here's something to try. Save this HTML, newlines and all, in a file and then display it in Firefox:

---cut here---
<html>
<body>
foo
<div>



Is there whitespace before this line?
</div>
</body>
</html>
---cut here---

You will see that there is no extra whitespace before the question, which means that the browser is stripping whitespace after the <div> tag, which means that as far as I can tell, my change to make the plaintext serializer do the same thing is correct, and the tests that expect whitespace to be preserved after <div> (which David Bienvenu changed) are wrong.

Who needs to decide this? What do we need to do to make that decision happen?
Comment on attachment 470264 [details] [diff] [review]
Inferior fix

My apologies Jonathan. I rebuilt from scratch mozilla-central and comm-1.9.2 binaries with your patch and try many copy/paste in different context. It seems there isn't any regression on the copy/paste behaviors. I don't know why I saw regressions in my previous builds (and I don't understand why original unit tests are bad) :-(.

I remove my patch. Let's continue with your patch.
Attachment #470264 - Attachment is obsolete: true
Can we turn off html pretty printing in mailnews compose? If so, is there any reason we wouldn't want to do that?
Someone just asked on getsatisfaction.com how (in essence) to revert from Thunderbird 3.1.x to to 3.0.x.

I advised them (in essence) to use the 3.0.x installer via http://www.mozillamessaging.com/en-US/thunderbird/all-older.html

I noticed though on that page it says "Download Thunderbird 3

Thunderbird 3.0.x will be maintained with security and stability updates for a short period of time. All users are strongly encouraged to upgrade to Thunderbird 3.1"

Can it be made for this bug to be a blocker on the dropping of support for Thunderbird 3.0.x ?
(In reply to comment #147)
> Pinging jst and Jonathan for comments on my patch for fixing the mochi-tests. I
> basically had to switch the tests arounds, in the sense that the test cases
> that used to expect whitespace no longer got it, and the tests that didn't used
> to expect whitespace now got it, which was a bit odd.

Sorry for the delay in responding. I think your changes are correct.
Comment on attachment 470227 [details] [diff] [review]
yet another try at the patch

> (comment #162) I remove my patch. Let's continue with your patch.

Laurent, what is the review status of Jonathan's patch? If it has been
reinstated after you retracted your own patch, you should either reconsider
your r-, or remove it and restore the original r? request to jst.
Comment on attachment 470227 [details] [diff] [review]
yet another try at the patch

No, I don't reconsider my review on the attachement 470227 because the patch doesn't follow guide lines for patches, and because it doesn't fix unit tests. The patch is in fact obsolete since David have been updated it with a good formating and unit tests fix. (it doesn't remove Jonathan's credits of course :))
Attachment #470227 - Attachment is obsolete: true
Comment on attachment 470567 [details] [diff] [review]
fixes mochitests

Let's say it's ok, although I would prefer more tests (with different flags) to verify there isn't any regressions
Attachment #470567 - Flags: review+
Thanks for the clarification, I was confused by the forth and back. So, unless the sr+ on attachment 460730 [details] [diff] [review] (comment #98) can be carried forward to
attachment 470567 [details] [diff] [review], we are waiting for jst's superreview again.
Emailed jst directly and asked for his help getting this resolved.
Attachment #470567 - Flags: superreview?(jst) → superreview+
Woohoo! Reviewed and superviewed. How do we get the change pushed into the right branch for baking, which I think is the next step? Somebody was kind enough to assign this ticket to me, but I don't know how to the commit and I'm fairly certain that I don't have the necessary access rights. Who here can tell us what the next step is? Does somebody with the necessary rights want to take ownership of the ticket and shepherd it through the rest of the process?
Attachment #470567 - Flags: approval2.0?
Attachment #470567 - Flags: approval1.9.2.11?
Thanks for your persistence, Jonathan; much appreciated!  To drivers, who need to handle the approved flag: this fixes a very visible bug in Thunderbird.  I don't know what the cost/benefit is for Firefox, but as jst was the superreviewer here, I suspect he's the best person to make that call.
Keywords: checkin-needed
Comment on attachment 470567 [details] [diff] [review]
fixes mochitests

a=jst for trunk. Let's get this in ASAP to get as much testing as we can...
Attachment #470567 - Flags: approval2.0? → approval2.0+
thx very much, Laurent. And of course, thx very much to Jonathan!
(In reply to comment #78)
> Created attachment 461705 [details] [diff] [review]
> fix mailnews unit test

Just got this error locally: TEST-UNEXPECTED-FAIL | mozilla/_tests/xpcshell/mailnews/base/test/unit/test_getMsgTextFromStream.js | The tags should be stripped in this case! No, seriously. == The tags should be  stripped  in this case! No, seriously.

So this fix is still needed.
I don't know if I have enough access rights on the comm-central repository, to land this patch in mailnews. Anyway, I don't know rules to check in this tree. Should we have a specific approval flag or keyword or... ?
Comment on attachment 461705 [details] [diff] [review]
fix mailnews unit test - cecked in.

Is this the missing piece on the MailNews end which yet has to be reviewed?
yeah - I can land it. I'll just make sure it's ok with the tree closed and all.
Attachment #461705 - Attachment description: fix mailnews unit test → fix mailnews unit test - cecked in.
Attachment #461705 - Flags: review+
That helped, tests pass now on Windows and Mac OSX.
Attachment #470567 - Flags: approval1.9.2.11? → approval1.9.2.11+
The 1.9.2 version of this patch suffered a bit of bitrot (due to other changes not being on the branch) in the test cases only. Therefore I've fixed them up and will check this in in a bit.
1.9.2 patch checked in:

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/5ed7a764288b
http://hg.mozilla.org/releases/comm-1.9.2/rev/136f1217a4aa
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [tb31needs][tb32wants][gs][workaround comment 93][read comment 151] → [tb31needed][tb32wanted][gs][workaround comment 93][read comment 151]
Target Milestone: --- → mozilla2.0b7
Verified fixed using:
comm-1.9.2 cset: 136f1217a4aacomm-central cset: 320b3705916f

Thanks to all for getting this resolved.
Status: RESOLVED → VERIFIED
Hmm.  A new bug?  Line 2 of comment 186 was two separate lines, when I submitted.
Hi all - really glad to have found this thread. I'm a new thunderbird user and have been loving it, but have no idea how to actually use this patch to fix the extra space error.  I know this is probably incredibly obvious, and apologize, but how do I actually implement this patch?

Thanks!
(In reply to comment #189)
> Hi all - really glad to have found this thread. I'm a new thunderbird user and
> have been loving it, but have no idea how to actually use this patch to fix the
> extra space error.  I know this is probably incredibly obvious, and apologize,
> but how do I actually implement this patch?

If you don't know how, the best way is to wait a little under two weeks and you'll get a patched version via the automatic update system.
Fantastic, thanks! And thanks to all that worked on this bug to get it fixed.
Could bug 509008 be a dupe of this bug? Bug 509008 was actually active in the oldest binaries of Firefox I could test in August 2009. (Probably Firefox 1.x or 2.x.)
I see Thunderbird 3.1.5 was released on Tuesday (19 Oct 2010).

Does 3.1.5 contain a fix to this problem?
(In reply to comment #194)
> Does 3.1.5 contain a fix to this problem?

Yes
Depends on: 730295
You need to log in before you can comment on or make changes to this bug.