Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward inline, without converting data to the charset)

RESOLVED FIXED in Thunderbird 47.0

Status

MailNews Core
Composition
RESOLVED FIXED
6 years ago
5 months ago

People

(Reporter: raal, Assigned: Jorg K (GMT+2))

Tracking

Trunk
Thunderbird 47.0
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments, 5 obsolete attachments)

(Reporter)

Description

6 years ago
Created attachment 586351 [details]
sample e-mail

User Agent: Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Build ID: 20111221202246

Steps to reproduce:

My settings:
Tested on Windows XP, TB 9;Ubuntu , TB 11.0a2
Preferences - Display - Formatting - Advanced 
Characters Encodings: incoming & outgoing mail - Central European (ISO-8859-2)

View  - Character encoding - Autodetect - Off


Steps to reproduce:
1- Open sample e-mail
2 - Forward 


Actual results:

Actual Results:  
Message is displayed in wrong encoding, see attachment.


Expected results:

Expected Results:  
Correct encoding.
(Reporter)

Comment 1

6 years ago
Created attachment 586352 [details]
printscreen, forwarded message with bad characters
(Reporter)

Updated

6 years ago
OS: Linux → All
Attachment #586351 - Attachment mime type: application/octet-stream → message/rfc822

Comment 2

6 years ago
Seems like TB choices wrong encoding when forwarding. I seen worst case scenarios with Cyrillic. Can you attach forwarded message too?
(Reporter)

Comment 3

6 years ago
Created attachment 586374 [details]
forwarded message

Comment 4

4 years ago
This is caused by attachments with different encoding. The testcase message has a body encoded in ISO-8859-2 and attachments encoded in UTF-8. I've confirmed with other cases with these combinations:

 1. * Message body: ISO-2022-JP
    * Attachment: plaintext file in Shift_JIS
 2. * Message body: ISO-2022-JP
    * Attachment: plaintext file in EUC-JP
 3. * Message body: ISO-2022-JP
    * Attachment: plaintext file in UTF-8 

The combination ISO-2022-JP message body and ISO-2022-JP attachment doesn't cause this problem.

And I've found out why this problem happens.

When Thunderbird generates a message body for forwarding, nsMsgCompose module guesses the character encoding of the original message from the cached information in the topmost window.

nsMsgCompose::CreateMessage()
http://hg.mozilla.org/comm-central/file/781c3bd778c5/mailnews/compose/src/nsMsgCompose.cpp#l1847
=> GetTopmostMsgWindowCharacterSet()
http://hg.mozilla.org/comm-central/file/781c3bd778c5/mailnews/compose/src/nsMsgCompose.cpp#l134

GetTopmostMsgWindowCharacterSet() reads the cached character encoding information via nsIMsgWindow#mailCharacterSet, however, it returns the character encoding of the last attachment, instead of original message body's one.

As the result, nsMsgCompose::CreateMessage() applies wrong character encoding to the forwarded message body and broken body appears for me.
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 5

4 years ago
Just a workaround, this addon solves this problem in most cases. However, of course it should be fixed by Thunderbird itself.
https://github.com/clear-code/correct-encoding
Duplicate of this bug: 716983
Component: Message Compose Window → Composition
Product: Thunderbird → MailNews Core
Hardware: x86 → All
Version: 11 → Trunk
Created attachment 8345019 [details] [diff] [review]
bug715823.patch

Fix for forwarding incline case.

This patch does solve the wrong encoding in case of incline forward. But I am not 100% sure this patch does not cause any regression. So please test the binary with this patch.

The binary can be downloaded at:
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/hiikezoe@mozilla-japan.org-85309fa29a41/
Created attachment 8345021 [details] [diff] [review]
An xpcshell test
For the record, try server results with the patches are:
https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=85309fa29a41

Updated

4 years ago
QA Contact: hiikezoe
bug 716983 looks closed as dup of this bug. Is bug 701818 dup too?
The patch I attached in comment 7 solves bug 701818 too. So we can close the bug as a duplicate one.
Duplicate of this bug: 701818
Assignee: nobody → hiikezoe
Status: NEW → ASSIGNED

Comment 13

4 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #7)
> This patch does solve the wrong encoding in case of incline forward. But I
> am not 100% sure this patch does not cause any regression. So please test
> the binary with this patch.

I've tested the build and confirmed this bug was away.

However, by the patch, Thunderbird seems to ignore user-chosen character encoding for forwarded messages. If the original message has wrong encoding information in its Content-Type header, or if auto-detection is failed, you will have to choose different encoding from the "View" menu. Then the forwarded message should inherit the user-chosen encoding. After this patch, this behavior is also removed...

Comment 14

4 years ago
Created attachment 8348527 [details]
testcase with ISO-2022-JP body and Shift_JIS attachment

Steps to reproduce the regression:

1. Exit Thunderbird.
2. Put the file "test-folder" under your local folder account, for example:
   "%AppData%\Thunderbird\Profiles\xxx.default\Mail\Local Folders"
3. Start Thunderbird. Then a new "test-folder" folder appears under
   your local folder account.
4. Disable auto-detection.
   View => Character Encoding => Auto-Detect => (Off)
5. Choose the "test-folder" folder.
6. The folder contains just a message, so choose the message.
7. The message is loaded with wrong encoding. (ex. "Western (Windows-1252)")
8. Choose the character encoding "ISO-2022-JP" via:
   View => Character Encoding => More Encodings =>  East Asian
     => Japanese (ISO-2022-JP)
   Then the message body is shown correctly.
9. Forward the message as "Inline".

Actual result on the test build:
The forwarded body is broken like the step 7, it is shown with wrong encoding.

Expected result (and the actual result on the currently released Thunderbird):
The forwarded body is shown correctly, as an ISO-2022-JP encoded text.
FYI.
Overriding wrong charset in mail is done by;
- Per folder "Default Character Encoding" with "[x] Aplly default to all messages..."
  (developer is trying to change name to "Override Character Encoding")
- Per mail/Ad-hoc View/Character Encoding
Can we get reviews going ?
(In reply to YUKI "Piro" Hiroshi from comment #14)
> Created attachment 8348527 [details]
> testcase with ISO-2022-JP body and Shift_JIS attachment
> 
> Steps to reproduce the regression:
> 
> 1. Exit Thunderbird.
> 2. Put the file "test-folder" under your local folder account, for example:
>    "%AppData%\Thunderbird\Profiles\xxx.default\Mail\Local Folders"
> 3. Start Thunderbird. Then a new "test-folder" folder appears under
>    your local folder account.
> 4. Disable auto-detection.
>    View => Character Encoding => Auto-Detect => (Off)
> 5. Choose the "test-folder" folder.
> 6. The folder contains just a message, so choose the message.
> 7. The message is loaded with wrong encoding. (ex. "Western (Windows-1252)")
> 8. Choose the character encoding "ISO-2022-JP" via:
>    View => Character Encoding => More Encodings =>  East Asian
>      => Japanese (ISO-2022-JP)
>    Then the message body is shown correctly.
> 9. Forward the message as "Inline".
> 
> Actual result on the test build:
> The forwarded body is broken like the step 7, it is shown with wrong
> encoding.
> 
> Expected result (and the actual result on the currently released
> Thunderbird):
> The forwarded body is shown correctly, as an ISO-2022-JP encoded text.

Thanks for the testing and investigations.
Then, we need a new attribute in nsIMsgWindow.idl to preserve the charset of original message body or to represent a chosen charset by user.

When I started to fix this issue, I tried to add such new attribute but I cloud not have a good name for it because there is already "mailCharacterSet", and "mailCharacterSet" should be preserved for addon compatibility.  So I decided to take another solution without the new attribute in the patch.

Anyway, as far as I understand, the new attribute is necessary to fix this issue now. "charset" or "characterSet" are very confusable to represent the original charset. I'd propose "forceMailCharacterSet" to represent the chosen charset by user. Piro, what do you think?
Flags: needinfo?(yuki)
(In reply to Ludovic Hirlimann [:Usul] (away dadying) from comment #16)
> Can we get reviews going ?

Not yet. The patch should be revised.

Comment 19

4 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #17)
> Anyway, as far as I understand, the new attribute is necessary to fix this
> issue now. "charset" or "characterSet" are very confusable to represent the
> original charset. I'd propose "forceMailCharacterSet" to represent the
> chosen charset by user. Piro, what do you think?

Basically I think that "mailCharacterSet" should return the character encoding of the body, not the last part. The attribute was introduced by the bug 28869 (http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsIMsgWindow.idl&branch=1.27&root=/cvsroot&subdir=mozilla/mailnews/base/public&command=DIFF_FRAMESET&rev1=1.12&rev2=1.13) and I couldn't find the reason why it should return the character encoding of the last part, not the body. The main reason is written as:

> When libmime decide a main body charset to use, the charset name to be feed 
> backed to the user by putting a mark on the charset menu item.

The menu cannot indicate all of used charsets for mixed charset messages - the UI seems to be designed just for messages with single charset. Because we can indicate only one charset for a message, we have to decide that what is the point of the "Character Encoding" menu. And, I think the menu should reflect the state of the body, not attachments. The current behavior of the mailCharacterSet attribute seems wrong for me.

However, we should not modify existing behavior. Yes, it is true. So, my another plan is: a new attribute "mailBodyCharacterSet" of the interface nsIMsgWindow. It sounds better than "forceMailCharacterSet", for me.
Flags: needinfo?(yuki)
(In reply to YUKI "Piro" Hiroshi from comment #19)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #17)
> > Anyway, as far as I understand, the new attribute is necessary to fix this
> > issue now. "charset" or "characterSet" are very confusable to represent the
> > original charset. I'd propose "forceMailCharacterSet" to represent the
> > chosen charset by user. Piro, what do you think?
> 
> Basically I think that "mailCharacterSet" should return the character
> encoding of the body, not the last part.

Yes, indeed.

> (http://bonsai.mozilla.org/cvsview2.
> cgi?diff_mode=context&whitespace_mode=show&file=nsIMsgWindow.idl&branch=1.
> 27&root=/cvsroot&subdir=mozilla/mailnews/base/
> public&command=DIFF_FRAMESET&rev1=1.12&rev2=1.13) and I couldn't find the
> reason why it should return the character encoding of the last part, not the
> body. 

IIRC, it is used to view source multipart message. And as you know, I am afraid of breaking addons.

> > When libmime decide a main body charset to use, the charset name to be feed 
> > backed to the user by putting a mark on the charset menu item.
> 
> The menu cannot indicate all of used charsets for mixed charset messages -
> the UI seems to be designed just for messages with single charset. Because
> we can indicate only one charset for a message, we have to decide that what
> is the point of the "Character Encoding" menu. And, I think the menu should
> reflect the state of the body, not attachments. The current behavior of the
> mailCharacterSet attribute seems wrong for me.
> 
> However, we should not modify existing behavior. Yes, it is true. So, my
> another plan is: a new attribute "mailBodyCharacterSet" of the interface
> nsIMsgWindow. It sounds better than "forceMailCharacterSet", for me.

Thanks for the better name! The name is really really better one!
I had tried a build with revised patch sets on try server before:

https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=9fe6a3a486ea

but unfortunately there is a bug that try server build for Windows fails with mozilla-central patches.
See bug 953226.
Depends on: 953226
Summary: Forward message, wrong encoding → Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward/edit as new, without converting data to the charset)
As Callek says, I don't think there's anything for releng to do here.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → INCOMPLETE
Oops, meant to close the releng dep.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Duplicate of this bug: 1075436

Comment 25

3 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #21)
> I had tried a build with revised patch sets on try server before:
> 
> https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=9fe6a3a486ea
> 
> but unfortunately there is a bug that try server build for Windows fails
> with mozilla-central patches.
> See bug 953226.

Thank you for working on this.

Has the latest patch been tested in working binary?
Is the patch attached in this entry the latest?
(Oh, I see, I think I need to use the following patch?
https://hg.mozilla.org/try-comm-central/rev/ba72c8bbdb86
)

BTW, in the duped bug 1075436, I uploaded the screen shots
which may be useful to explain the situation in
case 1 in comment 4 to non-Japanese users.

> 1. * Message body: ISO-2022-JP
>    * Attachment: plaintext file in Shift_JIS

https://bugzilla.mozilla.org/attachment.cgi?id=8498194
SCREEN: The problematic message received. Visible OK, but shown encoding (Shift_JIS) does not match the main body's (ISO-2022-JP). 

https://bugzilla.mozilla.org/attachment.cgi?id=8498198
SCREEN: When I tried to forward the message, the mail body text is garbled.

and the message

https://bugzilla.mozilla.org/attachment.cgi?id=8499388
message that triggers garbled mail body text when forwarded. ISO-2022-JP encoding, the attachment is Shift_JIS plain file. 

TIA
(In reply to ISHIKAWA, Chiaki from comment #25)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #21)
> > I had tried a build with revised patch sets on try server before:
> > 
> > https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=9fe6a3a486ea
> > 
> > but unfortunately there is a bug that try server build for Windows fails
> > with mozilla-central patches.
> > See bug 953226.
> 
> Thank you for working on this.
> 
> Has the latest patch been tested in working binary?
> Is the patch attached in this entry the latest?
> (Oh, I see, I think I need to use the following patch?
> https://hg.mozilla.org/try-comm-central/rev/ba72c8bbdb86
> )

And this:
https://hg.mozilla.org/try-comm-central/rev/294dd9f2cbf8

I am not sure those patches can be applied to the current trunk cleanly, but apparently the xpcshell test should be rewritten with Promise.
Created attachment 8500254 [details] [diff] [review]
Adapt to the latest  trunk
Attachment #8345019 - Attachment is obsolete: true
Created attachment 8500255 [details] [diff] [review]
The xpcshell test rewritten with Promise
Attachment #8345021 - Attachment is obsolete: true
Please test the binary with the above patches.

http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/hiikezoe@mozilla-japan.org-f7fb7b284413/

Try run:
https://treeherder.mozilla.org/ui/#/jobs?repo=try-comm-central&revision=f7fb7b284413

Comment 30

3 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #29)
> Please test the binary with the above patches.
> 
> http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/
> hiikezoe@mozilla-japan.org-f7fb7b284413/
> 
> Try run:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try-comm-
> central&revision=f7fb7b284413

Hi, on my local build and local test
the following test succeeded:

TEST-PASS | /REF-OBJ-DIR/objdir-tb3/_tests/xpcshell/mailnews/compose/test/unit/test_detectAttachmentCharset.js | test passed (time: 1002.004ms)

Also, sending an e-mail with main body UTF-8 and attaching a Japanese plain text file
encoded in Shift-JIS resulted in an e-mail
 - that did not garble Japanese characters in mail body text inside the
   compose window when it is to be forwarded, but
 - if the attachment (Japanese plain text file in Shift-JIS encoding) is shown in line,
   then the display of the attachment part IS GARBLED UNFORTUNATELY.

I am now trying to figure out how to set outcoing character set to be ISO-2022-JP to
recreate the original questions in duped bug of Bug 1075436
Is the interface to do this gone?

TIA

Comment 31

3 years ago
(In reply to ISHIKAWA, Chiaki from comment #30)
> I am now trying to figure out how to set outcoing character set to be
> ISO-2022-JP 

Compose window | Options | Character encoding
(In reply to ISHIKAWA, Chiaki from comment #30)
>  - if the attachment (Japanese plain text file in Shift-JIS encoding) is
> shown in line,
>    then the display of the attachment part IS GARBLED UNFORTUNATELY.

In the composition window?
I am guessing it's the message pane. In case of the message pane attachment 8500254 [details] [diff] [review] tries to use the charset of message body instead of attachment charset. (if I am not wrong)
Given that the charset of message body is ISO-2022-JP and attachment file is Shift-JIS, which charset should be used by default?

Comment 33

3 years ago
Created attachment 8502270 [details]
Mail sent (and received) by C-C TB: mail body (ISO-2022-JP), and attachment is Shift_JISIS

Hi,
thank you for the experimental patches.

To be clear, what data is being produced by the patched
C-C TB, I am uploading an e-mail
with ISO-2022-JP main body, and a plain text Japanese attachment in Shift_JIS encoding.

Note that 
 - for some reason, the subject line is encoded in UTF-8 (!)
   This is different from the current released binary, I think.
   See that attachment that shows ISO-2022-JP encoding in subject line:
   https://bugzilla.mozilla.org/attachment.cgi?id=8499388
   [ Created from the currently released windows version of TB ]
   from
   Bug 1075436 - Trying to forward a Japanese e-mail (ISO-2022-JP encoding for mail main body) with certain attachment files cause a garbled main text in mail composite window 

So here is what happens when I send a Japanese mail (in ISO-2022-JP charset encoding) with Japanese subject
line and attaching a plain Japanese text file (encoded in Shift_JIS).

The received e-mail has the MIME information as below:
-------+------------------+-------------------+---------------  
       | Main Text        |  Attachment       |   Subject
       | charset          |  charset          |  charset 
       | Specified as     |  Specified        |  encoding
-------+------------------+-------------------+---------------
Current| ISO-2022-JP      |  SJIS             |  ISO-2022-JP
Release|
-------+------------------+-------------------+---------------
Patched| ISO-2022-JP      |  None             |  UTF-8
C-C    |                  |                   |
-------+------------------+-------------------+---------------

That said,
yes, the garbling of attachment shown as "inline" occurs in the
message window where the received message is displayed.
(In composition window, the attachment seems to be always shown as a filename
in the attachment file window, and so the content is not shown.)

This garbling occurs when the main mail message is encoded in UTF-8 and in 
ISO-2022-JP (both cases).

I think the garbling occurs because the lack of
charset specification on the side of the sender (patched TB).

I don't know if there is a clear solution here.

I don't understand the nature of the patch, but
it tried to fix the problem of using incorrect decoding of
ISO-2022-JP mail text using the incorrect Shift_JIS decoder setup (that seems to be left by the charset specification of an attachment file that 
came with Shift_JIS charset specificatin.

Is it possible to modify TB to
use the incoming charset specification (done by the sender) to
render the main text,
AND also uses the charset specification for attachment
[which was lacking in the patched C-C TB, but was Shift_JIS in the current release),
AND still let the main text to be correctly rendered
by making sure that the decoder is properly initialized
for each part of multi-part mime data?
If it is possible, by keeping the sender-side Shift_JIS encoding info
for the attachment, the receiver side can probably show the Shift_JIS attachment
intact.

Although, I am talking of Shift_JIS vs ISO-2022-JP, etc.,
I think the same issue of garbled inlined attachment display would occur
as in the original reporter's case.

(Due to the previous issues experienced over the years
with inline attachment display, I have disabled the inline display feature, but
I noticed that some colleagues of mine uses Mac and its mailer
handles inline attachment display rather well, and so some people with
good experience of seeing inline attachment "just work" may complain
if the currently proposed  patch has this issue of garbled display of attachment.

Well, we may need more feedback from the user, but I think programming-wise,
making sure that the decoder is initialized properly 
for each part of multi-part is very important
since the sender is NOT TB alone, and there may be other mailers
which send attachment with charset encoding.
(I don't know the behavior of every mailer in the wild, though.
At least
Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
does, and there are people out there who don't upgrade their programs for extended amount of time.)

If TB can't handle the charset information right and leave a decoder for
the main mail text in an incorrect state,
we will still have this garbled mail message in composition window
when the user tries to forward such e-mail. Correct?

TIA

PS: That the subject is encoded in UTF-8 even though I would like
the e-mail to go out in ISO-2022-JP is a little surprise.
I am not sure you may know or not, the current trunk Thunderbird does not detect the charset of attachment files if intl.charset.detector is not set. See bug 844115 and bug 1074585.

Anyway if you just sent a message with attachments, not forwarding, we should open a new bug for it.

And I'd suggest you should test against the current trunk without patches for comparison.

Comment 35

3 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #34)
> I am not sure you may know or not, the current trunk Thunderbird does not
> detect the charset of attachment files if intl.charset.detector is not set.
> See bug 844115 and bug 1074585.
> 
> Anyway if you just sent a message with attachments, not forwarding, we
> should open a new bug for it.
> 
> And I'd suggest you should test against the current trunk without patches
> for comparison.

1. No, I did not know the bug 844115 and bug 1074585 ("did not understand" it was more lke it).

>Anyway if you just sent a message with attachments, not forwarding, we should open a new bug for it.
I am not sure what "it" refers to.
The garbling of inline attachment (in Shift_JIS) display after I send it myself with the mail body
in ISO-2022-JP and receive it (both in TB)?

>And I'd suggest you should test against the current trunk without patches for comparison.
Will do.

TIA

Comment 36

3 years ago
I insert "no. 2", and "no. 3" below to make it clear which part of the
discussion is touched.
(In reply to ISHIKAWA, Chiaki from comment #35)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #34)

No.1 :
> > I am not sure you may know or not, the current trunk Thunderbird does not
> > detect the charset of attachment files if intl.charset.detector is not set.
> > See bug 844115 and bug 1074585.
> > 

No. 2:
> > Anyway if you just sent a message with attachments, not forwarding, we
> > should open a new bug for it.
> > 

No. 3:
> > And I'd suggest you should test against the current trunk without patches
> > for comparison.
> 
> 1. No, I did not know the bug 844115 and bug 1074585 ("did not understand"
> it was more lke it).
> 

No. 2:
> >Anyway if you just sent a message with attachments, not forwarding, we should open a new bug for it.
> I am not sure what "it" refers to.
> The garbling of inline attachment (in Shift_JIS) display after I send it
> myself with the mail body
> in ISO-2022-JP and receive it (both in TB)?
> 

No. 3:
> >And I'd suggest you should test against the current trunk without patches for comparison.
> Will do.
> 
> TIA

Re the third point, I did, and I got a very confusing result.
Unpatched C-C, that is without the patch proposed here, produced a binary of TB, as of the version fetched in the last couple of days,today,
produced an exactly the same type of e-mail message:
To recap,

The received e-mail has the MIME information as below:
--------+------------------+-------------------+---------------  
        | Main Text        |  Attachment       |   Subject
        | charset          |  charset          |  charset 
        | Specified as     |  Specified        |  encoding
--------+------------------+-------------------+---------------
Current | ISO-2022-JP      |  SJIS             |  ISO-2022-JP
Release |
--------+------------------+-------------------+---------------
Patched | ISO-2022-JP      |  None             |  UTF-8
C-C     |                  |                   |
patched with the patches proposed here.
--------+------------------+-------------------+---------------
Unpachted| ISO-2002-JP      |  None             | UTF-8
C-C     |                  |                   |
--------+------------------+-------------------+---------------

I had to re-check if this was correct and unless I made a screw-up somewhere,
the above is what I obtained.

So the message sent out thusly from TB, when received, and when
the attachment is tried to viewed inline, the attachment part is garbled (I think
lacking the charset information, TB tries to decode the attachment as ISO-2022-JP:
maybe TB can be instructed to try detection, but that seems to be gone, right?.
Oh, OK, if we set some preferendces, it will be correctly visible. I should re-try.)
What is your preference value of intl.charset.detector?
You need to set it to 'ja_parallel_state_machine'.
Even if you set the value surely, attachment charset was not detected, please open a new bug, we have to fix it.

Comment 38

3 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #37)
> What is your preference value of intl.charset.detector?

It was default: empty.

> You need to set it to 'ja_parallel_state_machine'.

I did.

> Even if you set the value surely, attachment charset was not detected,
> please open a new bug, we have to fix it.

Now, with the setting of the above variable, to ja_parallel_state_machine,
TB can now correctly render the content of 
of Shift_JIS plain Japanese text when I toggle the
view attachment inline to ON.
Thank you for the info.

However, I wonder why this is not set to ja_parallel_state_machine
by default. 

In the next round of release,
Mozilla community may have unnecessary load of answering/dealing with
Japanese users who face garbled text display of mismatched 
decoding when some plain text attachment came (and its encoding is not 
the same as the main mail text body.)

TIA
(In reply to ISHIKAWA, Chiaki from comment #38)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #37)
> > What is your preference value of intl.charset.detector?
> 
> It was default: empty.
> 
> > You need to set it to 'ja_parallel_state_machine'.
> 
> I did.
> 
> > Even if you set the value surely, attachment charset was not detected,
> > please open a new bug, we have to fix it.
> 
> Now, with the setting of the above variable, to ja_parallel_state_machine,
> TB can now correctly render the content of 
> of Shift_JIS plain Japanese text when I toggle the
> view attachment inline to ON.
> Thank you for the info.
> 
> However, I wonder why this is not set to ja_parallel_state_machine
> by default. 

Because the value is not only for Japanese.

The value is set in localization.
http://hg.mozilla.org/l10n-central/ja/file/6ce31c3a1c68/toolkit/chrome/global/intl.properties#l49

Comment 40

3 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #39)
> (In reply to ISHIKAWA, Chiaki from comment #38)
 [...]
> > 
> > However, I wonder why this is not set to ja_parallel_state_machine
> > by default. 
> 
> Because the value is not only for Japanese.
> 
> The value is set in localization.
> http://hg.mozilla.org/l10n-central/ja/file/6ce31c3a1c68/toolkit/chrome/
> global/intl.properties#l49

Thank you for the pointer.


    43 # LOCALIZATION NOTE (intl.charset.detector):
    44 # This preference controls the initial setting for the character encoding
    45 # detector. Valid values are ja_parallel_state_machine for Japanese, ruprob
    46 # for Russian and ukprob for Ukrainian and the empty string to turn detection
    47 # off. The value must be empty for locales other than Japanese, Russian and
    48 # Ukrainian.
    49 intl.charset.detector			= ja_parallel_state_machine

So it is Japanese, Russian and Ukrainian.

Mozilla may have to think about the
support cost (questions caused by possible garbled inline attachments)
and want to either
- No. 1: write something about setting intl.charset.detector to one of values
  to avoid the garbled inline display of attachment text files
  when one's e-mail has large number of Japanese, Russian and Ukrainian content.

  or (and?)
- (hmm), or maybe rewrite the code to set this variable automatically to
  one of the proper values if the default display character set for incoming e-mail is
  set to one of the charaset used for Japanese, Russian and Ukrainian content.
  But this may be too fragile setup.

Anyway, once the next version is out, I think there will be many questions from people who suddenly see garbled inline attachment display every so often, so
release note should contain at least the caution in No. 1 item above.

TIA

  

I wonder
Please open a new bug about removal of universal charset detector. I know it will be a problem for some of users.
but it's totally unrelated this issue.
Thank you,

Comment 42

3 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #41)
> Please open a new bug about removal of universal charset detector. I know it
> will be a problem for some of users.
> but it's totally unrelated this issue.
> Thank you,

I will. Thank you for the suggestion.

Typical Japanese, Russian and Ukrainian users will never understand that
the "removal of universal charaset detector" in C-C TB means that
they may get garbled inline display of attachment plain text files in the future.

TIA

Comment 43

3 years ago
Filed
Bug 1081693 - (Call for Better Documentation) Please mention that Japanese, Russian and Ukrainian users of TB should set intl.charset.detector by hand in the next release document.

Comment 44

2 years ago
Comment on attachment 8500254 [details] [diff] [review]
Adapt to the latest  trunk

What do you think of this guys?
Attachment #8500254 - Flags: review?(mkmelin+mozilla)
Attachment #8500254 - Flags: review?(Pidgeot18)

Comment 45

2 years ago
I think this should go in to help those who
need to handle occasional multi-language attachments (after adaption to the latest source files.)
This solved my issues with proper setting of int.charset.detector in bug 1081693.

TOA

Comment 46

2 years ago
> TOA
Itchy finger did this.

TIA

Comment 47

2 years ago
Comment on attachment 8500254 [details] [diff] [review]
Adapt to the latest  trunk

Review of attachment 8500254 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/public/nsIMsgWindow.idl
@@ +68,1 @@
>    attribute ACString mailCharacterSet;

nit: star on each documentation line.

documentation is good, but if that's what it does, does anybody really want such a beast? (and if they do, rename it to something reasonable)
Comment on attachment 8500254 [details] [diff] [review]
Adapt to the latest  trunk

Review of attachment 8500254 [details] [diff] [review]:
-----------------------------------------------------------------

It seems clear, judging from a DXR search, that nearly every use of mailCharacterSet was replaced with the newly-added mailBodyCharacterSet. A few were missing, and in reading the history of this bug, it doesn't seem justified why these should be. The bug appears to be that the mail charset is replaced with the attachment charset if there is a textual attachment--which suggests that the bug is in the setting of this field from attachments anyways. Replacing the attribute but keeping the old one feels like a mistake, or at least something that requires particular justification for such an action, justification that is as of yet forthcoming.

So either fix mailCharacterSet to only consider body parts instead of attachment parts in its computation, or justify why its current behavior is needed by at least some clients.
Attachment #8500254 - Flags: review?(mkmelin+mozilla)
Attachment #8500254 - Flags: review?(Pidgeot18)
Attachment #8500254 - Flags: review-

Comment 49

2 years ago
(In reply to Joshua Cranmer [:jcranmer] from comment #48)
> Comment on attachment 8500254 [details] [diff] [review]
> Adapt to the latest  trunk
> 
> Review of attachment 8500254 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It seems clear, judging from a DXR search, that nearly every use of
> mailCharacterSet was replaced with the newly-added mailBodyCharacterSet. A
> few were missing, and in reading the history of this bug, it doesn't seem
> justified why these should be. The bug appears to be that the mail charset
> is replaced with the attachment charset if there is a textual
> attachment--which suggests that the bug is in the setting of this field from
> attachments anyways. Replacing the attribute but keeping the old one feels
> like a mistake, or at least something that requires particular justification
> for such an action, justification that is as of yet forthcoming.
> 
> So either fix mailCharacterSet to only consider body parts instead of
> attachment parts in its computation, or justify why its current behavior is
> needed by at least some clients.

Sorry, I am not a native English user, and maybe I missed something.

But is is the case that 
the above comment suggests that aceman's patch may have left a few 
use of mailCharacterSet incorrectly instead of using mailBodyCharacterSet.
If so, I hope the patch can be made more complete to fix some remaining problematic areas to handle the issue.

I have hit upon the mangled character set issue when I tried to forward some e-mails recently, and realized that this bug is still there :-)
I hope this bug will be eradicated soon.

Comment 50

2 years ago
Created attachment 8625732 [details] [diff] [review]
Possible fix

When opening a multipart message with different encoding, nsMsgWindow::SetMailCharacterSet is called for each part contening a charset.
Function MimeMultipart_notify_emitter (file: mailnews/mime/src/mimemult.cpp) call SetMailCharacterSetToMsgWindow
for each body part that contain a charset:

  if (ct && (obj->options->notify_nested_bodies ||
             MimeObjectIsMessageBody(obj))) {
    char *cset = MimeHeaders_get_parameter(ct, "charset", NULL, NULL);

MimeObjectIsMessageBody should return true only for body parts. This is not the case.

In function MimeObjectIsMessageBodyNoClimb the is_body variable is never set to true. 
Initial value is set to false. So the value never change.

The patch 'patch_715823.txt' set the initial value to true.

With applying the patch, SetMailCharacterSet is only called for body parts.
Transfering the sample email seem to be good (see sample_email_patch_715823.png).

Comment 51

2 years ago
Created attachment 8625733 [details]
sample_email_patch_715823.png

Comment 52

2 years ago
https://developer.mozilla.org/en/docs/Getting_your_patch_in_the_tree#Getting_the_patch_reviewed

Have you tested that the xpcshell test works? https://bugzilla.mozilla.org/attachment.cgi?id=8500255 (And if so, please include it in your patch.)

Comment 53

2 years ago
(In reply to Magnus Melin from comment #52)
[...]
> Have you tested that the xpcshell test works?
> https://bugzilla.mozilla.org/attachment.cgi?id=8500255 (And if so, please
> include it in your patch.)

Test fail:
0:02.21 TEST_STATUS: Thread-1 check_charset FAIL [check_charset : 95] "ISO-2022-JP" == "Shift_JIS"

But with setting:
charsetOverride: false,
=> test ok
0:01.99 TEST_STATUS: Thread-1 check_charset PASS [check_charset : 96] "ISO-2022-JP" == "ISO-2022-JP"

The value of charsetOverride is it exact for the test?

Comment 54

2 years ago
Does bug 597369 sound similar? I found that disabling preview pane F8 helps as workaround (you have to restart TB before see actual result)

Comment 55

2 years ago
(In reply to Nikolay Shopik from comment #54)
> Does bug 597369 sound similar? I found that disabling preview pane F8 helps
> as workaround (you have to restart TB before see actual result)

That bug seems to be fixed (if the latest posting is correct) by a change in Java code.

Here, it seems the problem lies in the c++ code somewhere according to the discussion posts above.

So there may be a superficial similarity, but the root cause, i.e., why the incorrect code set is chosen seems very different.

Updated

2 years ago
Duplicate of this bug: 1188792

Comment 57

2 years ago
(In reply to Nikolay Shopik from comment #54)
> Does bug 597369 sound similar? I found that disabling preview pane F8 helps
> as workaround (you have to restart TB before see actual result)

Thanks for the workaround!
(Assignee)

Comment 58

a year ago
Magnus, can you explain what the issue is here.

I imported the message from attachment 8502270 [details] into today's Daily (which I'm using in production).

In the preview, the body looks good (UTF-8) and the text file viewed inline looks garbled. That's understandable. When I open the attachment, it's displayed fine in Notepad++ on Windows.

Now I press forward. The message content looks fine, still in UTF-8, the attachment is present and can be opened.

I also tried attachment 8348527 [details]. This test case is faulty, it's missing charset=ISO-2022-JP in the content type. After correcting that, it works fine.

The summary says:
Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward/edit as new, without converting data to the charset)

I have tested two cases and can't reproduce the problem. Where is it?
Flags: needinfo?(mkmelin+mozilla)
(Assignee)

Comment 59

a year ago
Created attachment 8737925 [details]
ISO-2022-JP body and Shift_JIS attachment (corrected attachment 8348527 [details])

I've corrected the test case from attachment 8348527 [details].
(Assignee)

Updated

a year ago
Attachment #8625732 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8500254 - Attachment is obsolete: true
(Assignee)

Comment 60

a year ago
I can reproduce the problem with attachment 8348527 [details] in TB 38.
I cannot reproduce the problem in a TB 46 (Earlybird back then) compiled 2016-03-05.

This was most likely fixed by either bug 1239658 or bug 597369. I've backed out the fix to bug 1251783 locally, that didn't make a difference.

I am closing this bug with target milestone TB 47 since this is where the other bugs landed. All have been uplifted to TB 46 and TB 45. I am also assigning myself, since I fixed the aforementioned bugs.

Whoever wants this bug reopened must present a reproducible case for TB 45 or later.
Assignee: hiikezoe → mozilla
Status: REOPENED → RESOLVED
Last Resolved: 3 years agoa year ago
Flags: needinfo?(mkmelin+mozilla)
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 47.0
(Assignee)

Updated

a year ago
Attachment #8500255 - Attachment is obsolete: true
(Assignee)

Updated

6 months ago
Attachment #8502270 - Attachment mime type: message/rfc822 → text/plain
(Assignee)

Comment 61

6 months ago
I looked at attachment 8737925 [details] again. The message body is in ISO-2022-JP, the attachment says windows-1252.

Forwarding maintains ISO-2022-JP, reply and "edit as new" switches to windows-1252. However, the message content isn't garbled.

To be continued in bug 1026989.
Blocks: 597821
(Assignee)

Comment 62

6 months ago
I've looked at this yet again. The message in attachment 8737925 [details] forwards garbled with TB 38 and it forwards OK in TB 45. So something did get fixed and I closed the bug. That there are still aspects that don't work will be fixed in bug 1026989.
(In reply to Jorg K (GMT+1) from comment #62)
> it forwards OK in TB 45. 

But "pick up charset from last attachment" phenomenon is still observed when "Forward As/Attachment" in Thunderd 45.5.1 on Windows.

I believe that main issue in all reports of bug 597821, bug 604628, bug 618465, bug 633217, bug 701818, bug 716983, were "pick up charset from last attachment" phenomenon instead of garbled display upon Forward(Inline), except this bug 715823.
If main issue of this bug was "pick up charset from last attachment", why patch proposed in this bug, which was recently fortunately dig out by bug 1026989, were hidden in this bug for so long time? For aging?

Anyway, Jorg K, please provide solution for "pick up charset from last attachment" issue in bug 1026989 instead of old this bug.
I believe aging of the patch was sufficiently done by this bug :-)
(Assignee)

Comment 64

6 months ago
(In reply to WADA from comment #63)
> Anyway, Jorg K, please provide solution for "pick up charset from last
> attachment" issue in bug 1026989 instead of old this bug.
I will. In fact, that bug already has a working fix. It's just missing a test which I will add today.

I've just revisited this bug to understand why I closed it:
I closed it because the inline forward case got fixed between TB 38 and TB 45 and because I made a mistake and got confused by the poor test cases. In attachment 8502270 [details] and attachment 8348527 [details] the MIME part representing the attachment does NOT even have a charset specified. So naturally no charset can be picked up from those attachments. Only attachment 8737925 [details] has a valid test case.
Depends on: 1026989
Because problem in "Forward Inline" case was actually disappeared while this bug was being processed, including both "wrong charset from attachment" and "garbled display", problem reported by comment #0 was surely resolved by unknown patch in unknown bug as written in comment #62.
Removing "edit as new" from bug summary to avoid further confusions, because it was not involved in original report of this bug. It was from comment added to this bug. Sorry for confusing bug summary change.
Jorg K, it's never your mistake. It's confusing bug summary change.
Summary: Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward/edit as new, without converting data to the charset) → Forward message, wrong encoding (if different charset is used in part under multipart/mixed, charset of last part is used in forward inline, without converting data to the charset)
Depends on: 1323377
No longer depends on: 1323377
You need to log in before you can comment on or make changes to this bug.