Don't wrap/chop through long CJK strings when serialising

RESOLVED FIXED in Firefox 45

Status

()

Core
Serializers
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: Jorg K (GMT+2), Assigned: Jorg K (GMT+2))

Tracking

Trunk
mozilla45
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox45 fixed)

Details

Attachments

(5 attachments, 2 obsolete attachments)

(Assignee)

Description

2 years ago
Created attachment 8689008 [details]
Test showing wrapping of CJK characters when rendered onto the screen.

Don't wrap/chop through long CJK strings when serialising:

nsEditor::OutputToString() and nsContentUtils::ConvertToPlainText() wrap/chop long (UTF-8) strings containing CJK characters (and no white-space) when called with nsIDocumentEncoder::OutputFormatted.

They must not do so. No string must be broken into multiple output lines.

For details see bug 653342 comment #161, bug 653342 comment #162 and bug 653342 comment #170.

Long strings containing Western European characters like á (and no white-space) are not split, so it's not clear why CJK strings are split.

Obviously screen rendering of CJK strings is a different story (see attachment), but serialising causes problems since white-space is added where there was none.
(Assignee)

Updated

2 years ago
Blocks: 653342
(Assignee)

Comment 1

2 years ago
Oops, that's nsPlaintextEditor::OutputToString():
http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/nsPlaintextEditor.cpp#1333

The other call is from nsContentUtils::ConvertToPlainText():
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsContentUtils.cpp#4529

In the end both call nsDocumentEncoder::EncodeToString():
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocumentEncoder.cpp#1085
Comment hidden (obsolete)
Comment hidden (obsolete)
(Assignee)

Comment 4

2 years ago
Another observation: We might have a little fight about this issue. The author of this test
https://dxr.mozilla.org/mozilla-central/source/dom/base/test/TestPlainTextSerializer.cpp#78
thought that cutting through a CJK string is a good thing. Take a look, expecting to get cut after 36 characters.

This was implemented in bug 553540 to support the delsp=yes stuff from bug 26734, which never landed since the authors abandoned it.

Let's read up on RFC 3676 which *obsoletes* RFC 2646:
https://www.ietf.org/rfc/rfc3676.txt
4.1.  Interpreting Format=Flowed
 If the line is flowed and DelSp is "yes", the trailing space
 immediately prior to the line's CRLF is logically deleted.

Finally the whole thing is dawning on me: The authors of bug 26734 and bug 553540 tried to suppress the spaces caused by the serialiser chopping through CJK strings by specifying "format=flowed;delsp=yes". This as a *desperate* measure to remove the erroneous spaces which would only ever have worked in plaintext e-mail. But we also have problem of the extra spaces in HTML e-mail. That's why we need to fix it here. No intelligence in the words will be able to remove spaces that got inserted "randomly".
(Assignee)

Updated

2 years ago
Duplicate of this bug: 650206
Comment hidden (off-topic)
Comment hidden (off-topic)
Comment hidden (off-topic)
Comment hidden (off-topic)
Comment hidden (off-topic)
(Assignee)

Comment 11

2 years ago
Created attachment 8690632 [details]
Result of serialisation with line breaking (current state)

Please look for spaces, you'll see three spurious spaces. Take a look at the HTML to see that they are caused by line breaking.
(Assignee)

Comment 12

2 years ago
Created attachment 8690633 [details]
Result of serialisation with line breaking (proposed new)

This was serialised without line breaking. No extra spaces.
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
(Assignee)

Comment 13

2 years ago
Created attachment 8690636 [details] [diff] [review]
Proposed solution (v1)

(Hi Ehsan, how was your holiday? Are you taking reviews at the moment?)
I made this bug as streamlined as possible by obsoleting the conversation I had with a Japanese user.

The problem is that when we serialise DOM to text (e-mail body) we get extra spaces. Many have tried to deal with them, and all have failed, because they need to be removed at their source.

I know that serialisers are un-owned and un-maintained and a little broken here and there. This small tweak won't do any harm and we really need it badly.

I haven't done a try run. I can't imagine that this will break anything, since the new flag is obviously not used anywhere.

How can we proceed? I'd like to get this landed quickly, since I need to land the depending Thunderbird bugs. Due to this bug, TB has basically been broken for all CJK users for years.

I can offer to do a try run and I can offer to add a test for the new feature. There is already
  dom/base/test/unit/test_xmlserializer.js
where some test cases could easily be added.
Attachment #8690636 - Flags: review?(ehsan)
(Assignee)

Comment 14

2 years ago
For a try build refer to bug 653342 comment #177.
Comment on attachment 8690636 [details] [diff] [review]
Proposed solution (v1)

It's not immediately obvious to me where the bug comes from, and whether this is the right fix.  I think Jonathan and/or Masayuki should be able to evaluate what the issue here is and whether this is the right approach.

Note that whatever the underlying bug is, it may manifest itself in other places such as line-breaking preformatted text, or textarea's wrapping support, etc, so the bug may need to be fixed at a different level...
Attachment #8690636 - Flags: review?(masayuki)
Attachment #8690636 - Flags: review?(jfkthame)
Attachment #8690636 - Flags: review?(ehsan)
(Assignee)

Comment 16

2 years ago
Thanks for the quick reply. All I can say it that for obvious reasons we need the DOM converted to attachment 8690633 [details] and not attachment 8690632 [details].

While I was debugging it yesterday I ended up in
http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/nsJISx4051LineBreaker.cpp#801
JIS might stand for Japanese Industry Standard, I don't know.
However, in my test case, as you can see, I was using Korean characters, the character representing the syllable "an", which is the first syllable of "Hello"/"Good Day" in Korean. I'm not even aware that Korean can be broken anywhere you want, since the delimit their words by white-space, at least in contemporary writing, unlike the Chinese or Japanese. Be that as it may, for the purpose of converting DOM to e-mail body, we need to treat the German word
Hottentottenstottertrottelmutterattentäterlattengitterwetterkotterbeutelrattenfangprämie
and this Japanese line これは長い日本語のテキストですので、行が折れると思います。これは長い日本語のテキストですので、行が折れると思います。
the same and *must* not inject random spaces. It's quite unimportant how this is achieved ... as long as it happens before the next branch date ;-)
Comment hidden (off-topic)
(In reply to Jorg K (GMT+1) from comment #16)
> JIS might stand for Japanese Industry Standard, I don't know.

Right. E.g., some specs of ISO is redefined by JIS for internal usage in Japan.

> これは長い日本語のテキストですので、行が折れると思います。これは長い日本語のテキストですので、行が折れると思います。

Perhaps, you may know about Japanese email manners (still) in these days but I should explain about that:

1. Japanese language isn't a language whose words are separated by whitespaces or something. Only when new paragraphs cause line breaks at the end of previous paragraphs. Therefore, a lot of long lines always exist in Japanese emails.
2. Plaintext mails are used basically because a lot of Japanese people don't like HTML mails.
3. ISO-2022-JP is still used mainly only for emails. However, UTF-8 is already de facto standard also in Japan at creating new web contents.
4. Japanese people believe that long lines are automatically broken by MUA because most of them believe breaking each lines every 72 or similar characters is one of email manners.

We know Japanese email manners do not make sense especially in technical reasons. For example, it depends on devices of readers how much characters per line is good to read. But changing manners is much difficult...

So, anyway, I don't see complain about ISO-2022-JP emails. However, I also use UTF-8 for Japanese emails personally. In this case, the unexpected spaces are indeed a big problem. I guess I must not follow your comments in other bugs enough. Do you try to change ISO-2022-JP's email format too?


And ccing some other guys who are familiar with Japanese email manners.
Comment on attachment 8690636 [details] [diff] [review]
Proposed solution (v1)

>diff --git a/dom/base/nsIDocumentEncoder.idl b/dom/base/nsIDocumentEncoder.idl
>--- a/dom/base/nsIDocumentEncoder.idl
>+++ b/dom/base/nsIDocumentEncoder.idl
>@@ -236,16 +236,23 @@ interface nsIDocumentEncoder : nsISuppor
> 
>   /**
>    * Include ruby annotations and ruby parentheses in the output.
>    * PlainText output only.
>    */
>   const unsigned long OutputRubyAnnotation = (1 << 26);
> 
>   /**
>+   * Disallow breaking of long character strings. This is important
>+   * for serialising e-mail which contains CJK strings. These must

s/serialising/serializing ?

>+   * not be broken just as "normal" longs strings aren't briken.

s/briken/broken ??

>   // Only create a linebreaker if we will handle wrapping.
>-  if (MayWrap()) {
>+  if (MayWrap() && !(mFlags & nsIDocumentEncoder::OutputDisallowLineBreaking)) {

You should create a helper inline method which checks mFlags like MayWrap(). I saw a lot of simple mistakes checking flags directly, so, I really don't like this.

>diff --git a/dom/base/nsXMLContentSerializer.cpp b/dom/base/nsXMLContentSerializer.cpp
>--- a/dom/base/nsXMLContentSerializer.cpp
>+++ b/dom/base/nsXMLContentSerializer.cpp
>@@ -107,16 +107,18 @@ nsXMLContentSerializer::Init(uint32_t aF
>   }
> 
>   mDoRaw = !!(mFlags & nsIDocumentEncoder::OutputRaw);
> 
>   mDoFormat = (mFlags & nsIDocumentEncoder::OutputFormatted && !mDoRaw);
> 
>   mDoWrap = (mFlags & nsIDocumentEncoder::OutputWrap && !mDoRaw);
> 
>+  mAllowLineBreaking = !(mFlags & nsIDocumentEncoder::OutputDisallowLineBreaking);

I wonder, do you really need new bool member here? It looks like that inline method which checks mFlags is enough.

>diff --git a/dom/base/nsXMLContentSerializer.h b/dom/base/nsXMLContentSerializer.h
>--- a/dom/base/nsXMLContentSerializer.h
>+++ b/dom/base/nsXMLContentSerializer.h
>@@ -350,16 +350,19 @@ class nsXMLContentSerializer : public ns
> 
>   // true = no formatting,(OutputRaw flag)
>   // no newline convertion and no rewrap long lines even if OutputWrap is set.
>   bool mDoRaw;
> 
>   // true = wrapping should be done (OutputWrap flag)
>   bool mDoWrap;
> 
>+  // true = we can break lines (OutputDisallowLineBreaking flag)
>+  bool mAllowLineBreaking;
>+
>   // number of maximum column in a line, in the wrap mode
>   uint32_t   mMaxColumn;
> 
>   // current indent value
>   nsString   mIndent;

off topic, though... these order doesn't think the memory alignment... (not your fault and not need to fix it)

Otherwise, looks good to me.
Attachment #8690636 - Flags: review?(masayuki) → review-
DelSp=Yes is correct way for plain text into mail body, if CJT and you don't want break between character.  Document encoder already has this option.
If about HTML formatter, thunderbird's compose editor will use format option instead of raw option.  If using raw option, it doesn't use line break.

Of course, if this is issue for HTML formatter instead of plain text formatter, this option useful for tb's compose editor.
(Assignee)

Comment 22

2 years ago
Masayuki and Makoto, thank you for your comments. Let me summarise:
Japanese users use long lines, they don't use (ASCII) white-space. They prefer plain text and use ISO-2022-JP a lot. Delsp=yes is the preferred method for ISO-2022-JP encoded mail so that CTE 7bit can be used.

There is one thing I don't understand:
> 4. Japanese people believe that long lines are automatically broken by MUA [mail client]
> because most of them believe breaking each lines every 72 or similar characters
> is one of email manners.

This I don't understand. There are very intricate "breaking" rules, see:
https://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages
Another Japanese user told me in comment #7 that for example the word "Japan" shouldn't be broken.
So why would it be good to break every 72 characters, perhaps right in the middle of words.
Korean users most certainly would not appreciate that at all.

Now let me summarise my approach to get rid of the extra spaces:
1) In this bug here we fix the serialiser so that extra spaces are not created any more
   if an additional flag is set.
2) In bug 1225904 we fix the Thunderbird sending mechanism so that it can transmit
   lines of any length using base64 encoding when lines are long and cannot be broken.
   This will lead to ISO-2022-JP being transmitted as base64 in some cases.
   I know that RFC 1468 (from 1993!) says that CTE 7bit *should* be used.
   There is a sample message in attachment 8689687 [details] that has ISO-2022-JP and CTE base64
   and works perfectly well.
3) In bug 653342 we tie it altogether. We allow format=flowed for all character encodings
   and set the flag that won't break the lines any more from this bug here.

Please try the binary from bug 653342 comment #177.

Then one day we fix bug 26734 to implement delsp=yes, but only for ISO-2022-JP encoded messages. Sadly the patches proposed for this bug would have implemented delsp=yes for *all* flowed e-mail and that is not correct.

Personally I think whether ISO-2022-JP messages are using CTE 7bit or base64 makes no difference at all to the end user.

Masayuki, I will address the review comments in the next patch coming up soon. Thanks for the review!
(Assignee)

Comment 23

2 years ago
(In reply to Makoto Kato [:m_kato] from comment #21)
> If about HTML formatter, thunderbird's compose editor will use format option
> instead of raw option.  If using raw option, it doesn't use line break.
Yes, there are no spaces created with the "raw" option. But using "raw", the HTML is not "pretty-printed" any more. For plain text, the "raw" option loses a lot of information. See bug 653342 comment #170. I considered using "raw", but having another flag is the better solution.
(Assignee)

Comment 24

2 years ago
Created attachment 8692439 [details] [diff] [review]
Proposed solution (v1b)

Addressing review issues:
"to serialise" is British/Australian English (I am Australian).
"to serialize" is American English. I used that now.

I have a new inline method MayBreakLines() in class nsPlainTextSerializer.

This cannot be used in nsXMLContentSerializer so I maintained the bool flag mAllowLineBreaking. Or do you prefer another inline method?
Attachment #8690636 - Attachment is obsolete: true
Attachment #8690636 - Flags: review?(jfkthame)
Attachment #8692439 - Flags: review?(masayuki)

Comment 25

2 years ago
(In reply to Jorg K (GMT+1) from comment #22)
 ...

> There is one thing I don't understand:
> > 4. Japanese people believe that long lines are automatically broken by MUA [mail client]
> > because most of them believe breaking each lines every 72 or similar characters
> > is one of email manners.
> 
> This I don't understand. There are very intricate "breaking" rules, see:
> https://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages
> Another Japanese user told me in comment #7 that for example the word
> "Japan" shouldn't be broken.

Jorg, " Another Japanese user told me in comment #7 that for example the word
> "Japan" shouldn't be broken." is from me. 

I have to stress that this particular requirement is at a more higher-level of formatting rules that must be implemented at the formating programs that
operate even at the higher level than TeX, TROFF, and other formating programs and meant for producing magazine layout, newspaper layout, etc.
I once had to code such a special routine so that the names of baseball players won't be broken in the middle when their names appear below scoreboard in TV sport news segment. 

But expecting such higher-level line breaking rules (not even mentioned in the
URL you quoted) often found only in the realm of publisher's in-house rules enforced only by meticulous editors is too much for TB. The particular characteristics I mentioned for line-breaking rules may not be even understood outside of (paper) publishing industry.

So let's ignore that particular mention of preferring not to break proper nouns at this moment. I don't believe more then 50% of TB users in Japan care about this subtle issue.

As long as TB passes the user input text data WITHOUT inserting whitespace, 
the text data can be fed into formating program to produce good layout.
Extra whitespace may not be removed by formating program since it cannot tell
whether the user intended an extra whitespace to be inserted. So that *IS* a problem.

> So why would it be good to break every 72 characters, perhaps right in the
> middle of words.
> Korean users most certainly would not appreciate that at all.

I think whether the user agrees with the forced breaking at 72 (the constant could be different and often settable by the user) is today moot.

In the old days, when an earlier RFC had to warn the readers, that there is
a chance that a long line may be chopped off by the transmission path somewhere due
to various limitations of mail gateways, and such,
this line breaking was accepted with a sigh or something.
I didn't like it very much and so set the right margin at a slightly larger place (but not exceeding large, say, 78) and many times I created the original
mail body using EMACS with my OWN short right margin around 74 or so, and 
basically overruled the line breaking position of my own to begin with.
I simply used EMACS's fill-paragpraph mode to fill the paragraph with
line breaking and only when I notice undesirable breaking like the product name
being broken over two lines, etc. I fixed them manually.
Since I set my mailer's right margin at larger value than my Emacs's setting
I know that the particular line breaking layout I had won't be disturbed
by the mailer. 

But again, this is MY OWN PARTICULAR experience, and I have a doubt that you can NOT satisfy everybody.

Just NOT inserting whitespace would be a better approach today.
Young users won't tolerate strange whitespace or forced line breaks.
(Assignee)

Comment 26

2 years ago
(In reply to ISHIKAWA, Chiaki from comment #25)
> As long as TB passes the user input text data WITHOUT inserting whitespace, 
> the text data can be fed into formating program to produce good layout.
> Extra whitespace may not be removed by formating program since it cannot tell
> whether the user intended an extra whitespace to be inserted. So that *IS* a
> problem.
I agree 100%. Thank you for your support.

> Just NOT inserting whitespace would be a better approach today.
> Young users won't tolerate strange whitespace or forced line breaks.
I agree 100%. In the future the e-mail will be received exactly as entered.

The problem is already solved, you can try a Windows build at
https://bugzilla.mozilla.org/show_bug.cgi?id=653342#c177

The three bugs involved, this one and bug 1225904 and bug 653342 all have patches proposed for review which will land on Thunderbird 45. Kent James is committed to fixing this problem once and for all.

The review of this bug here was positive (comment #19: Otherwise, looks good to me). I've addressed the issues so I'm positive that this will land soon.
If the old compatibility does not matter, why don't you just drop non-UTF-8 support entirely? (I already proposed this.)
(Assignee)

Comment 28

2 years ago
This is a different issue. Additional spaces are added regardless of the encoding.
Comment on attachment 8692439 [details] [diff] [review]
Proposed solution (v1b)

(In reply to Jorg K (GMT+1) from comment #24)
> Addressing review issues:
> "to serialise" is British/Australian English (I am Australian).
> "to serialize" is American English. I used that now.

Oh, I didn't know the difference, thank you for the information!

> This cannot be used in nsXMLContentSerializer so I maintained the bool flag
> mAllowLineBreaking. Or do you prefer another inline method?

I like another inline method better, but up to you because at least for now, the new bool doesn't increase the size of the instance because it's allocated in existing padding area.


And I realized that you don't change uuid of nsIDocumentEncoder. Please change it and request super review (to smaug?) because of changing nsIDocumentEncoder. With those changes, r=msasayuki.
Attachment #8692439 - Flags: review?(masayuki) → review+
(In reply to Jorg K (GMT+1) from comment #22)
> There is one thing I don't understand:
> > 4. Japanese people believe that long lines are automatically broken by MUA [mail client]
> > because most of them believe breaking each lines every 72 or similar characters
> > is one of email manners.
> 
> This I don't understand. There are very intricate "breaking" rules, see:
> https://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages
> Another Japanese user told me in comment #7 that for example the word
> "Japan" shouldn't be broken.
> So why would it be good to break every 72 characters, perhaps right in the
> middle of words.

The magic number may depend on people, though (as Ishikawa-san said). At breaking each line, of course, Kinsoku-Syori should be applied (nsJISx4051LineBreaker.cpp implements the rule with browser specific additional rules). So, for example, words of English shouldn't be broken middle of them. A word reaches and over a line limitation, the word should be put the head of next line.

> Korean users most certainly would not appreciate that at all.

As my understanding, Korean text should be treated as Western languages since each word is separated by ASCII whitespace. But I don't familiar with Korean text layout rules.

> Now let me summarise my approach to get rid of the extra spaces:
> 1) In this bug here we fix the serialiser so that extra spaces are not
> created any more
>    if an additional flag is set.

Sure.

> 2) In bug 1225904 we fix the Thunderbird sending mechanism so that it can
> transmit
>    lines of any length using base64 encoding when lines are long and cannot
> be broken.
>    This will lead to ISO-2022-JP being transmitted as base64 in some cases.
>    I know that RFC 1468 (from 1993!) says that CTE 7bit *should* be used.
>    There is a sample message in attachment 8689687 [details] that has
> ISO-2022-JP and CTE base64
>    and works perfectly well.

I'm not sure if this is right thing. But it should be discussed in the bug ;-)

> 3) In bug 653342 we tie it altogether. We allow format=flowed for all
> character encodings
>    and set the flag that won't break the lines any more from this bug here.
> 
> Please try the binary from bug 653342 comment #177.

Thank you, but the most important thing is interoperability between any MUAs. At least in Japan, I guess, about 10 or more MUAs are still being used (including web mails like Gmail). We cannot test it only by our hands. So, I strongly recommend that you should add a preference which can disable the new behavior for ability to disable the new behavior even after the release. If we've reported some trouble with some other MUAs, we should release new build which disables the new behavior with the pref.

> Then one day we fix bug 26734 to implement delsp=yes, but only for
> ISO-2022-JP encoded messages. Sadly the patches proposed for this bug would
> have implemented delsp=yes for *all* flowed e-mail and that is not correct.

I think that UTF-8 emails should be Japanese text layout aware.

> Personally I think whether ISO-2022-JP messages are using CTE 7bit or base64
> makes no difference at all to the end user.

If no MUAs didn't support encoded body for ISO-2022-JP, you're right. But I'm afraid that there are some MUAs not supporting our new behavior...

Comment 31

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #27)
> If the old compatibility does not matter, why don't you just drop non-UTF-8
> support entirely? (I already proposed this.)

(In reply to Jorg K (GMT+1) from comment #28)
> This is a different issue. Additional spaces are added regardless of the
> encoding.

I know this is a different issue, but I could not help chiming in.

Established businesses don't want to break status quo. They are VERY conservative.
And when it comes to conservative business practices, the banking industry comes to my mind.

Sure enough, when I checked e-mails I receive from banks and other financial institutions, they are encoded in iso-2022-jp:
Content-type: header contained the following.
text/plain; charset=iso-2022-jp
text/plain; charset=ISO-2022-JP
text/html; charset="ISO-2022-JP"

I did not realize the name are case-insensitive and we can quote the name in double quotes. Wait. Who is sending me HTML mail? It is very unlike the banking industry!

I am afraid that we have to live with non-UTF-8 mail and the "mess" (for lack of better words) for a long time...
The fact there are widely used different character code systems in a country is the source of practical problems. But there are already large user base (how large is relative) and, in some sectors, they can't be ignored.

Of course, dropping non-utf-8 support for *WRITING*, I presume, 
means TB has to forget about penetrating stable
business sector, which probably is not TB's strong field anyway (?)
[We have to receive and display non-utf-8 e-mails for obvious reasons. Imagine the 
15 years worth of e-mail archives which contain messages in many encodings.]

Sorry for different topic, but I could not resist. But given the relative lack of voices from the Japanese communities, I think sometimes an opinion from there, even skewed to one's particular experience, may be in order. 

TIA

Comment 32

2 years ago
Yes we're only talking about dropping support for *sending* non-utf-8 (gmail has also done that recently and seem to get away with it). Reading/responding to mails of other encoding would still work.
(Assignee)

Comment 33

2 years ago
Comment on attachment 8692439 [details] [diff] [review]
Proposed solution (v1b)

Thanks for the quick review.
> And I realized that you don't change uuid of nsIDocumentEncoder. Please
> change it and request super review (to smaug?) because of changing
> nsIDocumentEncoder. With those changes, r=msasayuki.
I was told in another bug that I shouldn't do the "./mach update-uuids" if something new gets added but nothing changes, which is the case here.

Olli and Ehsan: Who can "superreview" this?
Attachment #8692439 - Flags: superreview?(bugs)
(Assignee)

Comment 34

2 years ago
Comment on attachment 8692439 [details] [diff] [review]
Proposed solution (v1b)

Oops, Bugzilla didn't accept two people in the superreview field.
Attachment #8692439 - Flags: review?(ehsan)
This is off-topic, though, some Japanese people still need to send emails ISO-2022-JP if the receiver uses feature phone.

According to this article (written in Japanese),
https://www.marketingbank.jp/special/cat01/51.php
user share of feature phone is about 50% at March, 2015.

Although, we can guess that the number of users sending emails from Thunderbird to feature phone are not majority, but we cannot ignore them at least in Japan since the number of Thunderbird users in Japan is No.2 in the world. So, even if they were minority than other users, we shouldn't ignore them since the actual number of such users may not so few.
(In reply to Jorg K (GMT+1) from comment #33)
> Comment on attachment 8692439 [details] [diff] [review]
> Proposed solution (v1b)
> 
> Thanks for the quick review.
> > And I realized that you don't change uuid of nsIDocumentEncoder. Please
> > change it and request super review (to smaug?) because of changing
> > nsIDocumentEncoder. With those changes, r=msasayuki.
> I was told in another bug that I shouldn't do the "./mach update-uuids" if
> something new gets added but nothing changes, which is the case here.

You're adding a constant value. It seems that it's necessary to change the UUID but not sure.

> Olli and Ehsan: Who can "superreview" this?

Ehsan isn't a super reviewer at least according to this document...
https://www.mozilla.org/en-US/about/governance/policies/reviewers/
But he should be the best person to check the interface.
(Assignee)

Comment 37

2 years ago
This bug has become a discussion forum about general Japanese encoding issues when it's about adding an encoder flag. I hope the superreviewers won't get confused. ;-)

(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #30)
> So, I
> strongly recommend that you should add a preference which can disable the
> new behavior for ability to disable the new behavior even after the release.
> If we've reported some trouble with some other MUAs, we should release new
> build which disables the new behavior with the pref.
Yes, that's easily done in Thunderbird.
Here we want to land the fix that enables us to have the choice ... and fix it at least for Korean users.

(In reply to ISHIKAWA, Chiaki from comment #31)
> And when it comes to conservative business practices, the banking industry
> comes to my mind.
Yes, the banks are excellent at breaking the rules, see attachment 8687928 [details] from bug 1224426. There they stuff ISO-2022-JP into a subject header which *must* be RCF 2047 encoded.
(Assignee)

Comment 38

2 years ago
(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #35)
> This is off-topic, though, some Japanese people still need to send emails
> ISO-2022-JP if the receiver uses feature phone.
Thanks for the hint. I personally didn't plan to remove ISO-2022-JP. The question is: Can this "feature phone" deal with ISO-2022-JP and CTE base64? Anyway, I will comment on bug 653342 and propose the option you asked for.
(In reply to Jorg K (GMT+1) from comment #38)
> (In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #35)
> > This is off-topic, though, some Japanese people still need to send emails
> > ISO-2022-JP if the receiver uses feature phone.
> Thanks for the hint. I personally didn't plan to remove ISO-2022-JP. The
> question is: Can this "feature phone" deal with ISO-2022-JP and CTE base64?

I'm not sure. Makoto-san, do you know this or can you test this?
Flags: needinfo?(m_kato)
I'm arguing about the encoding here because RFC 1468 is a big reason why we are using ISO-2022-JP for outgoing Japanese mail messages. If we use ISO-2022-JP, we should follow the entire RFC 1468 instead of picking some convenient parts of the RFC.
(Assignee)

Comment 41

2 years ago
Created attachment 8692965 [details] [diff] [review]
Test (v1)

Did I hear anyone say "test" ;-) - Here it is.
I don't know who wants to review this.
Attachment #8692965 - Flags: review?(ehsan)
Attachment #8692965 - Flags: review?(bugs)
(Assignee)

Comment 42

2 years ago
(In reply to Masatoshi Kimura [:emk] from comment #40)
> I'm arguing about the encoding here because RFC 1468 is a big reason why we
> are using ISO-2022-JP for outgoing Japanese mail messages. If we use
> ISO-2022-JP, we should follow the entire RFC 1468 instead of picking some
> convenient parts of the RFC.
This discussion is in the wrong bug. This bug here is about not breaking long lines without any control. The discussion about how to best send ISO-2022-JP is happening in Thunderbird bug 653342.

Can we please stop any discussion about application/encoding issues so that the core reviewers don't get confused. The change proposed here, that is, having control about not braking the lines is *essential*. It is also needed for HTML serialisation.

Let's continue the discussion in the other bug 653342. In bug 653342 comment #185 I proposed a preference that will the allow the user to maintain the existing status quo: Cut the plaintext up and encode ISO-2022-JP with 7bit. That will honour the entire RFC 1468.
Comment on attachment 8692439 [details] [diff] [review]
Proposed solution (v1b)

Review of attachment 8692439 [details] [diff] [review]:
-----------------------------------------------------------------

superreviews aren't really a thing any more.  Masayuki's review is enough here.  But you should rev the uuid still.

(Also please don't ask review from more than one person when you don't actually want all of them to look at the patch, that's kind of spammy!)
Attachment #8692439 - Flags: superreview?(bugs)
Attachment #8692439 - Flags: review?(ehsan)
Comment on attachment 8692965 [details] [diff] [review]
Test (v1)

Review of attachment 8692965 [details] [diff] [review]:
-----------------------------------------------------------------

I think Masayuki should review this.
Attachment #8692965 - Flags: review?(ehsan)
(Assignee)

Updated

2 years ago
Attachment #8692965 - Flags: review?(bugs) → review?(masayuki)
(Assignee)

Comment 45

2 years ago
Created attachment 8693121 [details] [diff] [review]
Proposed solution (v1c).

Carrying forward Masayuki Nakano'r r+

Updated UUID.
Attachment #8692439 - Attachment is obsolete: true
Attachment #8693121 - Flags: review+
(Assignee)

Comment 46

2 years ago
Cancelling the NI for Makoto. In bug 653342 we decided to maintain the status quo for ISO-2022-JP. Further discussion in that bug, please.
Flags: needinfo?(m_kato)
Attachment #8692965 - Flags: review?(masayuki) → review+
(Assignee)

Comment 47

2 years ago
Dear Sheriff,

there are two patches here which can be applied in any order.

Note that the change is minor and the possibility for breakage is close to zero.

So please, to avoid unnecessary testing, combine this with some other suitable patches when landing.

Thanking you in advance,
Jörg.
Keywords: checkin-needed
(Assignee)

Comment 48

2 years ago
Dear Sheriff, please add "r=masayuki" to the Test patch. Thanks!

Comment 49

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e3af3ef46f0
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca8916bb80cf
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/0e3af3ef46f0
https://hg.mozilla.org/mozilla-central/rev/ca8916bb80cf
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox45: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Depends on: 1274837
You need to log in before you can comment on or make changes to this bug.