Encoding problem when encoding long unicode string. Can be reproduced with Western European characters like á or CJK characters.

RESOLVED FIXED in Thunderbird 45.0

Status

MailNews Core
MIME
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: Jorg K (GMT+2), Assigned: Jorg K (GMT+2))

Tracking

Trunk
Thunderbird 45.0
Dependency tree / graph

Firefox Tracking Flags

(firefox45 affected)

Details

Attachments

(2 attachments, 12 obsolete attachments)

16.94 KB, patch
Jorg K (GMT+2)
: review+
Details | Diff | Splinter Review
5.18 KB, patch
Magnus Melin
: review+
Details | Diff | Splinter Review
(Assignee)

Description

2 years ago
Steps to reproduce:

Create a HTML message with 600 characters á. The á is encoded as c3a1 (2 bytes) in UTF-8.

Set delivery options to "HTML" only and encoding to UTF-8.

Send the message, you can just send it to the outbox with <Crtl><Shift><Enter>.

The resulting message is encoded as 8bit, but the line is over-long, 2 * 600 > 998.
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit
It shows breakage:
áá� �áá

This is due to poor logic here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#612
It should be encoded as base64. 

If a plaintext part is sent with the message, it's a little better, because here
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#652
we pick a more appropriate encoding which replaces the badly chosen body encoding here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#727

If we force the encoding to base64 here
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#613
we get a better result.

Note:
Of course 600 characters á in a row are unrealistic.

This problem can also be easily reproduced with CJK characters. Especially in Japenese no spaces are used, so if you string about 400 CJK characters together, which are typically encoded as 3 bytes in UTF-8, you easily run into this problem.
Currently CJK are illegally wrapped. Once the bad wrapping is fixed in bug 1225864, you will be able to reproduce it with CJK characters. In the meantime, you need to visit
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1507
and change
uint32_t  flags = nsIDocumentEncoder::OutputFormatted  | nsIDocumentEncoder::OutputNoFormattingInPre;
to
uint32_t  flags = nsIDocumentEncoder::OutputRaw
in order to reproduce the problem with CJK characters. Or just take my word for it and stick with the á ;-)
(Assignee)

Updated

2 years ago
Blocks: 653342
(Assignee)

Comment 1

2 years ago
Looking into this further, it seems correct that the logic at
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#613
should be improved.

I've added some code to determine the longest line and was surprised that I only ever received 997 bytes in 'm_attachment1_body', even for my 2*600 byte long test string.

Turns out that there are choppers on the way which cut the original HTML (M-C returns the full 600 characters unchopped!) into bits and they don't even shirk from cutting through UTF-8 characters. Great stuff. The sharp but blind knifes are here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1637
It even says in the comment: risk of breaking UTF-8 pairs in half

And here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1665

It says:
  // XXX TODO
  // march backwards and determine the "best" place for the linebreak
  // for example, we don't want <a hrLINEBREAKref=""> or <bLINEBREAKr>
  // or "MississLINEBREAKippi"
These original authors were so smart, yet so lazy!! Now with UTF-8 and CJK this comes back to haunt us.

Also, if it's a solid long string, we just CANNOT cut through it at all and must just adjust the encoding accordingly.
(Assignee)

Comment 2

2 years ago
Created attachment 8689189 [details] [diff] [review]
Patch for discussion (v1).

OK, here is my first stab at this.

First and foremost, I killed this horrible beast of EnsureLineBreaks().
Then, whenever we have long line, we prefer base64 encoding. That's basically all.

Sorry about the mess I seemingly caused in nsMsgAttachmentHandler.cpp.
I fixed a lot of comments and formatting, the only code change is this:

+  // According to RFC 821 we must always have lines shorter than 998 bytes.
+  if (m_max_column > 900)
+    m_encoding = ENCODING_BASE64;

Very quick feedback/review would be appreciated since this is part of getting the CJK issue fixed. And the cut-off date is the 14th of Dec.

Who can review this? Kent, who will review anything? ;-)
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Attachment #8689189 - Flags: feedback?(rkent)
Attachment #8689189 - Flags: feedback?(mkmelin+mozilla)
Attachment #8689189 - Flags: feedback?(Pidgeot18)
(Assignee)

Comment 3

2 years ago
And when you have a moment, do look at bug 67334 and its half-hearted fix.

Now, the code I killed was introduced to fix bug 83381 and bug 84261. There is a test message (attachment 36831 [details] or attachment 38649 [details]).

I grabbed this message, hit reply, types some text, hit save, changed the recipient, hit send, all works.

This message has a max-column of 173, so I it's not a problem at all, in fact, it's not even encoded base64 due to the rather short lines.

But hey, you know why this message caused an issue back then? The message has a rather large <pre> block. And back in 2001, Gecko stored newlines in contenteditable <pre> blocks as <br>. So serialising the large block resulted in a very long line. In 2010 Gecko changed the storage of <pre> blocks and now stores newlines (\n) (bug 551704, see attachment 488569 [details] [diff] [review]). So the motivation for the existence of the EnsureLineBreaks() has disappeared. It's ready to die.

Comment 4

2 years ago
Comment on attachment 8689189 [details] [diff] [review]
Patch for discussion (v1).

Review of attachment 8689189 [details] [diff] [review]:
-----------------------------------------------------------------

All looks good to me. I do wonder where the 900 magic number comes from (but that's in the existing code too).
Attachment #8689189 - Flags: feedback?(mkmelin+mozilla) → feedback+
(Assignee)

Comment 5

2 years ago
Magnus, thanks for the quick feedback (it won't help the frustration though since I'm waiting for 7 reviews/feedbacks).

The 900, yes, excellent observation. If comes from
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp#318

I believe some people who weren't sure thought: Well, it's meant to be 998, so we'd better take 10% off and put 900.

If we select base64, we know we can encode anything, since the encoding itself cuts lines to 76 bytes.

"Quoted printable" according to (https://en.wikipedia.org/wiki/Quoted-printable) also limits the line length to 76 bytes. (So funny that the whole world still thinks FORTRAN!)

The threshold for switching to another encoding really doesn't matter, but for my taste we could up that number to, say 990, just to be on the safe side, haha.
(Assignee)

Comment 6

2 years ago
More importantly, how do I get an r+ onto this? Just remove the printf and that's it?

I talked to Joshua on IRC yesterday and he repeated what we all know, that he has little time. From experience that might mean that he won't look at it until it has landed and then complain. So what is the way forward? The earlier we land this, the more it will ride the trains.
(Assignee)

Comment 7

2 years ago
I've sent this off to a try run:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=06c47fc7e75e

I don't see (additional) test failures. Even the test-charset-edit.js::test_no_mojibake thing is a known problem, see bug 1202401 (you've gotta love defaulting to ASCII, haha).
(Assignee)

Updated

2 years ago
Duplicate of this bug: 355209
(Assignee)

Comment 9

2 years ago
Created attachment 8689653 [details] [diff] [review]
Patch for discussion (v2).

Carrying forward Magnus' f+.
Fixed fatal error where data has no newlines or where last line doesn't end with newline. Oops ;-)
Attachment #8689189 - Attachment is obsolete: true
Attachment #8689189 - Flags: feedback?(rkent)
Attachment #8689189 - Flags: feedback?(Pidgeot18)
Attachment #8689653 - Flags: feedback?(rkent)
Attachment #8689653 - Flags: feedback?(Pidgeot18)
(Assignee)

Comment 10

2 years ago
There is something not right here. I sent a long message with Japanese characters and encoding ISO-2022-JP. The message got sent as Content-Transfer-Encoding: 7bit despite having set base64 based on the very long line of 3747 bytes. Hmm. So something in there clobbered the setting. Needs more looking at.
(Assignee)

Comment 11

2 years ago
Something else wrong here. If we sent plaintext only, the plaintext encoding is determined in PickEncoding(). Since no HTML is sent, it makes no set the encoding from the HTML body. Boy, this is twisted. So the next patch will address these two issues.
(Assignee)

Comment 12

2 years ago
Created attachment 8689731 [details] [diff] [review]
Patch for discussion (v3).

Carrying forward Magnus' f+.

Fixed problems from comment #10 and comment #11.
More explanation in the following comment.
Attachment #8689653 - Attachment is obsolete: true
Attachment #8689653 - Flags: feedback?(rkent)
Attachment #8689653 - Flags: feedback?(Pidgeot18)
Attachment #8689731 - Flags: feedback?(rkent)
Attachment #8689731 - Flags: feedback?(Pidgeot18)
(Assignee)

Comment 13

2 years ago
If you're not interested in how the problems from comment #10 and comment #11 got fixed, skip to "READ HERE".

Otherwise please look at the interdiff between v1 and v3:
https://bugzilla.mozilla.org/attachment.cgi?oldid=8689189&action=interdiff&newid=8689731&headers=1

I had to cater for long lines which have no newline at all in two places. These happen using ISO-2022-JP and a hack to work around bug 1225864 to produce these long lines (hack not shown in the patch).

The problem in comment #11 is a logic error in v1. If we're sending plaintext only, we must always override the encoding we *would have* chosen for the HTML body, since it's irrelevant. My additional logic was wrong and I reinstated the original state.

The problem from comment #10 happened like this:
Due to the missing check for the long line without newline, the longest column was 0, so the attachment processing, which is used for the plain text part, used 7bit. The main body had the right length (since the "long line with no newline check" was already in v2 for the body) and therefore the main body wanted base64. The plaintext only didn't override this, so we encoded base64, while the attachment, the plain body, had 7bit which got written out to the header.

Don't worry if you don't follow the details, it's all good now. I just wrote it down to explain to myself why the code produced what it produced. Strange things that go unexplained will bite you later.

=============== READ HERE: ==================
Magnus, apart from the one print statement, this should be good for review. I tested various variations, including the ones we will be interested in when bug 1225864 lands, that is long lines which cannot be broken.

So how do we proceed to get r+?
Flags: needinfo?(mkmelin+mozilla)
(In reply to Jorg K (GMT+1) [currently frustrated by waiting for reviews/feedback] from comment #10)
> There is something not right here. I sent a long message with Japanese
> characters and encoding ISO-2022-JP. The message got sent as
> Content-Transfer-Encoding: 7bit despite having set base64 based on the very
> long line of 3747 bytes. Hmm. So something in there clobbered the setting.
> Needs more looking at.

Maybe it is intentional because RFC 1468 suggests to use CTE:7bit for iso-2022-jp messages.
(Assignee)

Comment 15

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #14)
> Maybe it is intentional because RFC 1468 suggests to use CTE:7bit for
> iso-2022-jp messages.

It was a logic error which I fixed, details in comment #13. See bug 653342 comment #173 for an ISO-2022-JP message with CTE base64.

Once again, if I want to encode long lines, I must use base64 or I get into the problem of splitting and the "delsp=yes" which I'd rather avoid.

If you use 600 á's in a row the data you need to process is no different to CJK data (with the only difference that the M-C serialiser has the urgent need to chop right through it whereas European strings are left alone).

I can see (https://www.ietf.org/rfc/rfc1468.txt) that CTE 7bit is suggested:
===
The ISO-2022-JP encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It should be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in current JUNET software.
===

As it stands, ISO-2022-JP can be used for HTML encoding although it's only designed for plain text. In HTML I don't have "format=flowed; delsp=yes;" so I must encode base64 if there are long lines.

To fully adhere to the *suggestion* of the RFC 1468 (which is from 1993!), I'd have to do the following:
1) implement the "delsp=yes;" option and cut long CJK strings for plain text only.
2) not ever use ISO-2022-JP for HTML parts.

Well, I wonder whether this is necessary. If so, this would be done in another bug, in fact, the original "delsp=yes" bug. Here we're fixing bad code that chops multi-byte characters in half ;-)

While we're talking: Is there anything I should be aware of regarding Shift JIS? Does that perhaps object to base64 encoding and insists on 8bit?
(Assignee)

Comment 16

2 years ago
Sorry, the original "delsp=yes" bug is bug 26734.
Comment on attachment 8689731 [details] [diff] [review]
Patch for discussion (v3).

Review of attachment 8689731 [details] [diff] [review]:
-----------------------------------------------------------------

Tests for this code are an absolute necessity. The encoding code is an absolute mess, and my mental recollection is that there is way too much code duplication of basic stuff here or other shenanigans like "let's decide the encoding to use and how to set the cte in two different places!" for me to place much trust in code without tests.

Ultimately, this is also a sign of really poor design in the compose backend. The mime assembly phase should in theory be seeing its input as a bunch of octets to exactly reproduce, but the code tries its darnedest to do transformations (e.g., magic EOL conversion) that the world decided was a bad idea over a decade ago. And unfortunately, the very messiness of the code makes it hard to actually replace it with correct code without having to rewrite thousands of lines of code at once.

::: mailnews/compose/src/nsMsgAttachmentHandler.cpp
@@ +369,4 @@
>    if (needsB64)
>      m_encoding = ENCODING_BASE64;
>  
> +  // According to RFC 821 we must always have lines shorter than 998 bytes.

This is possibly going to do bad things if the message/ or multipart/ type parts happen to violate this condition. I can't think of any cases off the top of my head where multipart/ would be involved (I'll admit, though, that is the kind of code that you try to purge from long-term memory), but message/rfc822 is probably going to hit this path if you try to include a .eml file from the disk as an attachment.

::: mailnews/compose/src/nsMsgSend.cpp
@@ +2926,5 @@
>    body.Trim(" ", false, true);
>  
>    if (body.Length() > 0)
>    {
> +     m_attachment1_body = PL_strdup(body.get());

ToNewCString(body).

@@ +2928,5 @@
>    if (body.Length() > 0)
>    {
> +     m_attachment1_body = PL_strdup(body.get());
> +     if (!m_attachment1_body) {
> +      return NS_ERROR_OUT_OF_MEMORY;

insufficient indent.
Attachment #8689731 - Flags: feedback?(Pidgeot18) → feedback+

Comment 18

2 years ago
(In reply to Jorg K (GMT+1) [currently frustrated by waiting for reviews/feedback] from comment #13)
> So how do we proceed to get r+?

With some tests it should be pretty good to go.
Flags: needinfo?(mkmelin+mozilla)
(In reply to Jorg K (GMT+1) [currently frustrated by waiting for reviews/feedback] from comment #15)
> (In reply to Masatoshi Kimura [:emk] from comment #14)
> Once again, if I want to encode long lines, I must use base64 or I get into
> the problem of splitting and the "delsp=yes" which I'd rather avoid.
>
> If you use 600 á's in a row the data you need to process is no different to
> CJK data (with the only difference that the M-C serialiser has the urgent
> need to chop right through it whereas European strings are left alone).

It's OK to use base64 for messages containing an overly long line. RFC 1468
discourages to use long lines, so those messages do not adhere the RFC anyway.

Quoting from the RFC:
   The human user (not implementor) should try to keep lines within 80
   display columns, or, preferably, within 75 (or so) columns, to allow
   insertion of ">" at the beginning of each line in excerpts.

Currently Thunderbird automatically wrap the line to avoid such long lines.
Is this feature preserved?

> To fully adhere to the *suggestion* of the RFC 1468 (which is from 1993!),
> I'd have to do the following:
> 1) implement the "delsp=yes;" option and cut long CJK strings for plain text
> only.
> 2) not ever use ISO-2022-JP for HTML parts.

Sounds good.

> Well, I wonder whether this is necessary. If so, this would be done in
> another bug, in fact, the original "delsp=yes" bug. Here we're fixing bad
> code that chops multi-byte characters in half ;-)

Currently Thunderbird does not use "format=flowed" for Japanese plain-text
mail messages, because of bug 26734 :)
If plain-text iso-2022-jp mails are automatically wrapped and sent with
CTE:7bit, I'm fine.

> While we're talking: Is there anything I should be aware of regarding Shift
> JIS? Does that perhaps object to base64 encoding and insists on 8bit?

No, Japanese mail messages should be encoded with iso-2022-jp or utf-8.
We should not use any other encodings for Japanese mail messages.
(I noticed Thunderbird 38 uses the same encoding menu for both composing
 and viewing. This is very bad. Thunderbird should not provide so many
 options for composing.)
(Assignee)

Comment 20

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #19)
Let me first thank you for your continued interest in this matter, because it's great to have feedback from someone who knows about Japanese encoding.

I'd like to clarify what I mean by "long lines". A long line is a string of characters which does not contain white-space (blanks, tabs, newlines). That long line can consist of Western European characters, like á, or Korean characters, or a long string of Japanese characters forming one paragraph. Japanese writing seems to cause long lines since paragraphs can be written without any spaces. That typically does not happen in European or Korean writing, where spaces are used to separate words. I'm not familiar with Chinese.

Currently there are two problems with processing long lines:
1) The M-C serialiser does not wrap/chop long lines consisting of European characters, but it
   does chop through CJK strings causing additional spaces at random: Bug 1225864.
2) When a long line is encoded as UTF-8 or any other multi-byte encoding, the sending logic
   cuts those lines at 997 bytes and hence potentially destroys the UTF-8 encoding and
   also adds an additional space. This is going to be addressed in this bug.

Lines which contain white-space will continue to work as before. This bug is about guaranteeing the unaltered transmission of long lines. See duplicate bug 355209 for a different description of the same problem.

There is nothing at all specific about Japanese characters in this bug. It is only the "habit" of Japanese writers not to separate words with blanks that cause long lines. I can create a long line with (meaningless) 600 á's resulting in a 1200 byte Unicode string.

The solution will simply be to use CTE base64 regardless of the character encoding, so for UTF-8 but also ISO-2022-JP.

Now to the issues regarding Japanese:

> It's OK to use base64 for messages containing an overly long line. RFC 1468
> discourages to use long lines, so those messages do not adhere the RFC
> anyway.
> 
> Quoting from the RFC:
>    The human user (not implementor) should try to keep lines within 80
>    display columns, or, preferably, within 75 (or so) columns, to allow
>    insertion of ">" at the beginning of each line in excerpts.

Frankly, this standard from 1993 seems outdated. Surely the software must guarantee transmission of any entered test regardless of what the user strings together.

> Currently Thunderbird automatically wrap the line to avoid such long lines.
> Is this feature preserved?

Please refer to the initial explanation. Lines will continued to be wrapped if they contain white-space. But no enforced chopping will take place that destroys multi-byte encoding and/or inserts unwanted space into CJK strings.

> > To fully adhere to the *suggestion* of the RFC 1468 (which is from 1993!),
> > I'd have to do the following:
> > 1) implement the "delsp=yes;" option and cut long CJK strings for plain text
> >    only.
> > 2) not ever use ISO-2022-JP for HTML parts.
> 
> Sounds good.

We may do this one day in a follow-up to bug 26734, please see bug 26734 comment #135 and following. Please continue any related discussion there.

> Currently Thunderbird does not use "format=flowed" for Japanese plain-text
> mail messages, because of bug 26734 :)

Correct, I can see this here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#1747
However, your reasoning is not correct. We can use "format=flowed" on all plaintext messages as long as we don't wrap through strings at the wrong place inserting undesired spaces. So once bug 1225864 is fixed, we can allow "format=flowed" for *all* character encodings, see bug 653342 comment #174.

The current state is terrible. I started with an e-mail that contained one line, "(space)" has one space to the left and right:
これは長い日本語のテキストですので、行が折れると思います。 (space) これ は長い日本語のテキストですので、行が折れると思います。

Sending UTF-8 I get:
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
これは長い日本語のテキストですので、行が折れると思います。 (space) [*]これ[*]は長い日本語のテキストですので、行が折れると思います。
(I marked the additional space with [*]).

Sending ISO-2022-JP I get:
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit
これは長い日本語のテキストですので、行が折れると思います。 (space) これ
は長い日本語のテキストですので、行が折れると思います。
with no additional space but an additional newline, potentially right through a word.

This is all caused by bug 1225864. The serialiser chops right through the CJK string.

> If plain-text iso-2022-jp mails are automatically wrapped and sent with
> CTE:7bit, I'm fine.

They will be wrapped as well as possible if they contain white-space. Long lines, see above, will be sent with CTE base64 for now, until bug 26734 is implemented. I don't want to discuss this here any further.

> Japanese mail messages should be encoded with iso-2022-jp or utf-8.
> We should not use any other encodings for Japanese mail messages.

Thank you for this confirmation.
OS: Unspecified → All
Hardware: Unspecified → All
(Assignee)

Comment 21

2 years ago
Correction:
Original message:
これは長い日本語のテキストですので、行が折れると思います。 (space) これは長い日本語のテキストですので、行が折れると思います。

Current result UTF-8:
Either same as input or with additional space as marked with [*] depending on other factors:
これは長い日本語のテキストですので、行が折れると思います。 (space) これ[*]は長い日本語のテキストですので、行が折れると思います。

Current result ISO-2022-JP: as above (with additional line break). All due to bug 1225864.

Comment 22

2 years ago
Comment on attachment 8689731 [details] [diff] [review]
Patch for discussion (v3).

It seems to me that you have received competent feedback on this already, so it does not make sense for me to dive in when there are already people who have responded.

I agree with others that this really needs some sort of test. My first approach to giving feedback would be to look at the test and understand it, but with no test of course I can't do that.

Are there any existing tests that test this code? If not, I might be able to help you with that (which might require making changes to the IDL to hook into this. I know from my own experience this code is really hard to manipulate from JS without modification).
Attachment #8689731 - Flags: feedback?(rkent)
(Assignee)

Comment 23

2 years ago
Kent, thanks for the feedback, this way or another. Yes, I got support from Joshua and Magnus.

As for the tests, I was hoping that attachment 766581 [details] [diff] [review] (which is a test, I didn't give it the confusing name) can be moulded into what I want.

The test will be simple:
Send a message that contains 600 á's encoded as UTF-8, that's 1200 bytes, and make sure it doesn't get broken but instead encoded as base64. Do this for HTML and plaintext. Plaintext exercises the attachment processing, HTML exercises the new code I added.
(Assignee)

Comment 24

2 years ago
Created attachment 8690517 [details] [diff] [review]
Proposed solution (v4)

Here is the proposed solution. Fixes nits from comment #17.
Also one change: If we're already using "quoted printable" there is no need to switch to base64.

Test will follow in a separate patch.
Attachment #8689731 - Attachment is obsolete: true
Attachment #8690517 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 25

2 years ago
Created attachment 8690520 [details] [diff] [review]
WIP: Tests

OK, here comes the test. As per comment #23 I adapted Hiro's test from attachment attachment 766581 [details] [diff] [review]. It was basically all there already, thanks Hiro, you work is not going to waste!

On IRC Kent mentioned that we're phasing out asynchronous tests and switching to promises. Being unfamiliar with both concepts, that doesn't mean much to me.

However, I removed
  async_run_tests(tests);
from Hiro's original and put
  tests.forEach(add_task);
  run_next_test();
instead.

HTLM and plaintext each run these five checks:
Line 29:   do_check_neq(transferEncoding, null);
Line 30:   do_check_eq(transferEncoding.length, 2);
Line 78:   do_check_eq(transferEncoding, "base64");
Line 82:   do_check_eq(expectedBody.length, body.length);
Line 83:   do_check_eq(expectedBody, body);
and it all passes.

The output is:
--- Start Output ----------------------------------
 0:02.29 TEST_END: Thread-1 Harness PASS. Subtests passed 10/10. Unexpected 0
 0:02.29 LOG: MainThread INFO INFO | Result summary:
 0:02.29 LOG: MainThread INFO INFO | Passed: 1
 0:02.29 LOG: MainThread INFO INFO | Failed: 0
 0:02.30 LOG: MainThread INFO INFO | Todo: 0
 0:02.30 LOG: MainThread INFO INFO | Retried: 0
 0:02.30 SUITE_END: MainThread
Summary
=======

Ran 11 tests (1 parents, 10 subtests)
Expected results: 11
Unexpected results: 0

OK
--- End Output ------------------------------------

I guess that's OK, one parent (?) and 10 subtests.

I didn't know what to do with:
function OnStopCopy(aStatus) {
  dump("OnStopCopy says 'Hi!' - previously contained 'async_driver();'\n");
}

async_driver() had to go, but somehow OnStopCopy() is needed.

I'd appreciate help and guidance to finish this off as soon as possible since I want to move on to bug 1225904 which the key to solving the CJK problem. The branch date (14th Dec.) is looming.

Joshua, can you please indicate whether you want to be involved in the review.
Flags: needinfo?(Pidgeot18)
Attachment #8690520 - Flags: feedback?(rkent)
Attachment #8690520 - Flags: feedback?(mkmelin+mozilla)
(Assignee)

Comment 26

2 years ago
Created attachment 8690688 [details] [diff] [review]
Proposed solution (v4b)

Mac and Linux compilers are very picky. I hope they are happier now.
Attachment #8690517 - Attachment is obsolete: true
Attachment #8690517 - Flags: review?(mkmelin+mozilla)
Attachment #8690688 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 27

2 years ago
For a try build refer to bug 653342 comment #177.

Comment 28

2 years ago
Comment on attachment 8690520 [details] [diff] [review]
WIP: Tests

Review of attachment 8690520 [details] [diff] [review]:
-----------------------------------------------------------------

I only comment here on the issues associated with using promises here.

You really did not implement promises. I gave some notes, but it would be helpful if you tried to understand how async tests work with promises.

::: mailnews/compose/test/unit/test_longLines.js
@@ +1,2 @@
> +load("../../../resources/logHelper.js");
> +// load("../../../resources/asyncTestUtils.js");

Remove this completely, and import PromiseTestUtils instead.

@@ +19,5 @@
> +
> +  return line;
> +}
> +
> +function OnStopCopy(aStatus) {

Normally in PromiseTestUtils, we would create a standard listener that includes this call by default. You could create it locally instead if you want. But that method will replace OnStopCopy() here

@@ +120,5 @@
> +                               null,
> +                               null,
> +                               null,
> +                               null,
> +                               copyListener,

This copyListener is defined in head_compose.js and is an odd item. It is defined in the idl as an nsIMsgSendListener, but then gets QI'd internally to an nsIMsgCopyServiceListener item, and notifications sent out on that. The only method that is actually used is OnStopCopy, which generates the async callback.

PromiseTestUtils does not currently support such as listener. But it is really straightforward to create one using the examples of other listeners.

When you do, prior to calling createAndSendMessage you will need:

var copyListener = new PromiseTestUtils.PromiseSendListener(wrapIfNeeded);
msgSend.createAndSend...

you should probably have save_message return a promise, which would be return copyListener.promise;

Then methods that use save_message would have:

yield save_message(...);

@@ +130,5 @@
> +function createLocalFolders() {
> +  localAccountUtils.rootFolder.createLocalSubfolder("Drafts");
> +}
> +
> +function test_html_message_save_as_draft() {

because this will be a generator, function* test_html_message_save_as_draft() And not yield false, but yield save_message(...)

ditto for the other function calling save_message
Attachment #8690520 - Flags: feedback?(rkent) → feedback-
(Assignee)

Comment 29

2 years ago
(In reply to Kent James (:rkent) from comment #28)
> You really did not implement promises. I gave some notes, but it would be
> helpful if you tried to understand how async tests work with promises.

How would I do that? Is there documentation, examples, can I attend a course? I think the answer is "no" to all of these.

I think I've done a good job hunting down the problem that since at least 2008 (creation date of bug 448139, duplicate of bug 653342) no one could fix. Let me not mention bug 26734 from 2000 which is also part of this ghastly mess. It took me a week, and I still need to tidy up or fight in bug 1225864 got get M-C sorted out for our "Asian crisis" fix. So how about you let me watch the master at work while *you* contribute the test to this?

I know absolutely nothing about promises or async tests. You wanted to champion/lobby/promote this, so here's your chance. It's teamwork after all, right?
Flags: needinfo?(rkent)
(Assignee)

Comment 30

2 years ago
I found a good starting point for documentation:
https://www.promisejs.org/generators/ (and references further on the page).

I still think we get the best usage of everyone's time if Kent (or anyone who's done it before) helps us here. The test works and the "conversion" is supposedly easy for those who know.
<https://dxr.mozilla.org/comm-central/source/mailnews/compose/test/unit/test_messageHeaders.js> is the canonical test I refer to for aide about using promise-based tests in compose code.

For example, I already have a method richCreateMessage that any test in compose/test/unit/ can use to save a message to the draft folder. See, e.g., <https://dxr.mozilla.org/comm-central/source/mailnews/compose/test/unit/test_messageHeaders.js#310> for how to use this. Also note how checkDraftHeaders and checkMessageHeaders can be used to confirm the header values without handwriting specific MIME parser checks.
Flags: needinfo?(Pidgeot18)
(Assignee)

Comment 32

2 years ago
Thanks Joshua, you've pointed me to this example before on IRC, but I didn't look at it properly; my mistake.

With the "building blocks" provided there it shouldn't be too hard to refactor my test. I'll give it a go.
Flags: needinfo?(rkent)
Comment hidden (obsolete)
Comment hidden (obsolete)
(Assignee)

Updated

2 years ago
Attachment #8691620 - Attachment is obsolete: true

Comment 35

2 years ago
Looking at progressListener in head_compose.js, I would complain that there is no rejection of the promise if the aStatus field indicates an error. That can lead to passing as success what was actually a failure.

But I did not look at the full range of issues, it is just normal design to reject on error.
(In reply to Kent James (:rkent) from comment #35)
> Looking at progressListener in head_compose.js, I would complain that there
> is no rejection of the promise if the aStatus field indicates an error. That
> can lead to passing as success what was actually a failure.

IIRC, we never call that callback with anything other than NS_OK.
(Assignee)

Comment 37

2 years ago
Created attachment 8691900 [details] [diff] [review]
WIP: test (v3)

OK, I started with a copy of test_messageHeaders.js and removed 90% of it.

I basically want to send two messages, one HTML and one plain.

Looks like one is sent HTML, the other one is sent HTML+plain and my test fails. Here's some debug output:

First HTML, works fine, body length 1200 + the length of the tags:
PROCESS_OUTPUT: Thread-1 (pid:6524) "********* Processing HTML part: 1226"
TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 34]
"text/html; charset=UTF-8" == "text/html; charset=UTF-8"
TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 34]
"base64" == "base64"

Plain doesn't work, the body is used for the HTML part, the plain part is empty:
PROCESS_OUTPUT: Thread-1 (pid:6524) "********* Processing HTML part: 1200"
PROCESS_OUTPUT: Thread-1 (pid:6524) "********* Processing PLAIN part: 0"
TEST_STATUS: Thread-1 testBodyWithLongLine PASS [testBodyWithLongLine : 34]
"text/plain; charset=UTF-8; format=flowed" == "text/plain; charset=UTF-8; format=flowed"
TEST_STATUS: Thread-1 testBodyWithLongLine FAIL [testBodyWithLongLine :  34]
"7bit" == "base64"

Looks like I'm using the toolbox incorrectly. Joshua, can you please tell me what I'm doing wrong.

And the next question. I need to check the body. Is there something already I can use?

In Hiro's original we find:
function split_message_into_headers_and_body()
function check_message()
and he actually encoded the original message with base64 himself in
base64_encode_with_wrap()
See attachment 8690520 [details] [diff] [review] for details.

Is there something better? Instead of encoding the original body and comparing it to the message body it seems more logical to decode the message body and compare that.
(Assignee)

Updated

2 years ago
Flags: needinfo?(Pidgeot18)
Comment hidden (obsolete)
(Assignee)

Comment 39

2 years ago
Created attachment 8692129 [details] [diff] [review]
WIP: working test (v3a) checking headers.

OK, I worked it out. Joshua's toolbox *can* produce plaintext messages, if I add one line to nsMsgSend.cpp:
mOriginalHTMLBody = ToNewCString(attachment1 [details] [diff] [review]_body);

Now, what is missing is checking the body. For this I repeat from comment #34:

Is there something already I can use?

In Hiro's original we find:
function split_message_into_headers_and_body()
function check_message()
and he actually encoded the original message with base64 himself in
base64_encode_with_wrap()
See attachment 8690520 [details] [diff] [review] for details.

Is there something better? Instead of encoding the original body and comparing it to the message body it seems more logical to decode the message body and compare that.

Joshua, please give me a hint, or I keep battling on, most probably grabbing ideas from Hiro's original.
Attachment #8691900 - Attachment is obsolete: true
Flags: needinfo?(rkent)
Flags: needinfo?(mkmelin+mozilla)
(Assignee)

Updated

2 years ago
Attachment #8691900 - Attachment description: WIP: test → WIP: test (v3)
(Assignee)

Updated

2 years ago
Attachment #8692129 - Attachment description: WIP: working test (!!) checking headers. → WIP: working test (v3a) checking headers.
(Assignee)

Comment 40

2 years ago
Created attachment 8692488 [details] [diff] [review]
Test (v4)

Here is the test. Print statements in code will be removed in the final version.
This is the final piece to solve the CJK issues.
Attachment #8692129 - Attachment is obsolete: true
Flags: needinfo?(Pidgeot18)
Attachment #8692488 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 41

2 years ago
Oops, looks there is a charset problem in a comment:
// Create a line with 600 letters ...
I'll fix that together with other issues from the review.

Comment 42

2 years ago
Comment on attachment 8692488 [details] [diff] [review]
Test (v4)

Review of attachment 8692488 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/test/unit/test_longLines.js
@@ +7,5 @@
> +Components.utils.import("resource:///modules/mailServices.js");
> +Components.utils.import("resource:///modules/mimeParser.jsm");
> +
> +var CompFields = CC("@mozilla.org/messengercompose/composefields;1",
> +                    Ci.nsIMsgCompFields);

I see you copied this, but I still don't understand how CC works. (Should be Cc, no?)

@@ +63,5 @@
> +  let line = "";
> +
> +  // Create a line with 600 letters รก, encoded as c3a1 in Unicode.
> +  for (let i = 0; i < 600; i++)
> +    line = line + "á";

for loops should always have braces.

but better written just let line = "á".repeat(600);

... so please just inline the whole function.

Comment 43

2 years ago
Comment on attachment 8690688 [details] [diff] [review]
Proposed solution (v4b)

Review of attachment 8690688 [details] [diff] [review]:
-----------------------------------------------------------------

r=mkmelin
Please make the commit message more descriptive though.
Attachment #8690688 - Flags: review?(mkmelin+mozilla) → review+
(Assignee)

Comment 44

2 years ago
Created attachment 8693085 [details] [diff] [review]
Proposed solution (v4b) with better checkin message.

Carrying over Magnus' r+.

Thanks!
Better like this? Otherwise please tell me what you'd like, so I don't have to guess.
Test is coming in a minute.
Attachment #8690688 - Attachment is obsolete: true
Attachment #8693085 - Flags: review+
(Assignee)

Comment 45

2 years ago
Created attachment 8693088 [details] [diff] [review]
Proposed solution (v4b) with better checkin message.

Damn, attached the wrong patch.
Attachment #8693085 - Attachment is obsolete: true
(Assignee)

Comment 46

2 years ago
Comment on attachment 8693088 [details] [diff] [review]
Proposed solution (v4b) with better checkin message.

Carrying over Magnus' r+, this time to the correct patch.
Attachment #8693088 - Flags: review+
(Assignee)

Comment 47

2 years ago
Created attachment 8693093 [details] [diff] [review]
Test (v4b).

I fixed the "á".repeat(600).

About the CC vs Cc: No idea. Cc ain't working at all:

 0:04.28 LOG: Thread-1 ERROR TypeError: Cc is not a function at c:/mozilla-sourc
e/comm-central/obj-i686-pc-mingw32/_tests/xpcshell/mailnews/compose/test/unit/te
st_longLines.js:10
Attachment #8692488 - Attachment is obsolete: true
Attachment #8692488 - Flags: review?(mkmelin+mozilla)
Attachment #8693093 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 48

2 years ago
Created attachment 8693094 [details] [diff] [review]
Test (v4c).

Changed a comment.
Attachment #8693093 - Attachment is obsolete: true
Attachment #8693093 - Flags: review?(mkmelin+mozilla)
Attachment #8693094 - Flags: review?(mkmelin+mozilla)

Comment 49

2 years ago
Comment on attachment 8693094 [details] [diff] [review]
Test (v4c).

Review of attachment 8693094 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/test/unit/test_longLines.js
@@ +6,5 @@
> +Components.utils.import("resource://gre/modules/Services.jsm");
> +Components.utils.import("resource:///modules/mailServices.js");
> +Components.utils.import("resource:///modules/mimeParser.jsm");
> +
> +var CompFields = CC("@mozilla.org/messengercompose/composefields;1",

ah, Components.Constructor!
Attachment #8693094 - Flags: review?(mkmelin+mozilla) → review+
(Assignee)

Comment 50

2 years ago
Please land this first, then bug 653342.
Please apply the "proposed solution", then the test.
Keywords: checkin-needed

Comment 51

2 years ago
https://hg.mozilla.org/comm-central/rev/7e4f38edda2ff97ad32c99d68b8c387708940911
Bug 1225904 - Use CTE base64 when lines get long in message bodies. Removed EnsureLineBreaks(). r=mkmelin

https://hg.mozilla.org/comm-central/rev/e541ebcb5292c8167996854d2600ee6e7fd05564
Bug 1225904 - Use CTE base64 when lines get long in message bodies. Removed EnsureLineBreaks(). Test. r=mkmelin

Updated

2 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 45.0
Depends on: 1255071
(Assignee)

Updated

a year ago
Duplicate of this bug: 1262742
You need to log in before you can comment on or make changes to this bug.