Open Bug 824894 Opened 13 years ago Updated 3 years ago

Garbled characters in inline-forwarded messages (HTML2text converter fails to process BOM of UTF-8(0xEFBBBF) in UTF-8 mail text well upon Forward, when BOM of UTF-8 is inserted in text/html part at some places where mail sender likes.)

Categories

(Thunderbird :: Message Compose Window, defect)

17 Branch
x86
Windows XP
defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: mwu4, Unassigned)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0 Build ID: 20121128204232 Steps to reproduce: We want to forward a mail inline to others. Actual results: The forwarded message is garbled. Expected results: The message should be remained the same before forwarding.
Comment on attachment 695919 [details] 20120112-forward_inline_garbledd-25192000.eml a sample mail file to reproduce the phenomenon
Attachment #695919 - Attachment mime type: application/octet-stream → text/plain; charset="utf-8"
(In reply to mwu4 from comment #0) > Steps to reproduce: > We want to forward a mail inline to others. > Actual results: > The forwarded message is garbled. When garbled? (a) No problem in Sent mail copy, garbled at recipient side (b) Garbled in Sent mail, but no problem in composition for fowarding (c) Garbled at since initial of composition for forward It's (c), and garbage like folloiwing? (After heaer reference, top part of message text reference) (Foward as text mail) > eMarketer  > > ***èª æ¯éè«æ¨å ±å*** (Foward as HTML mail) >  > > èª æ¯éè«æ¨å ±å If so, I could see phenomenon in Tb 17.0.1 on Japanese MS Win-XP(system charset=Shift_JIS). Following binary data is contained in the attached HTML mail data. Line-No=37 == Null line, separator of mail headers and mail payload. Line-No=38,Length=21 EFBBBF EFBBBF EFBBBF 3C68746D6C3E 3C686561643E ? ? ? ? ? ? ? ? ? < h t m l > < h e a d > Line-No=41,Length=39 <body bgcolor="#FFFFFF" text="#000000"> Line-No=42,Length=4 EFBBBF 20 ? ? ? <space> Within 5 mail data line, 4 BOM for UTF-8 is seen. Can BOM be placed anywhere of mail data stream? See following documents easily found by Google search for "unicode bom utf-8". > http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark > http://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom Because BOM for UTF-8(0xEFBBBF) is shown like "" if iso-8859-1, and charset shown as "composing charset" looks correct UTF-8, Tb perhaps internaly fails to properly covert such data to correct one. Confiring per duplication test result.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to WADA from comment #2) > (In reply to mwu4 from comment #0) > > Steps to reproduce: > > We want to forward a mail inline to others. > > Actual results: > > The forwarded message is garbled. > > When garbled? > (a) No problem in Sent mail copy, garbled at recipient side > (b) Garbled in Sent mail, but no problem in composition for fowarding > (c) Garbled at since initial of composition for forward > Yes, it is garbled at since initial of composition for forward inline and the garbled message cannot be returned to the normal Traditional Chinese characters by simply changing the character encoding. Because it is a mail sent from other organization, we cannot control how the mail is composed of. Is there anything that we can do to solve this problem, or we have to forward this mail as attachment ? There is no problem if we reply this mail, however reply will remove the attachment of a mail to be replied. Thank you for your reply. Best Regards.
(In reply to mwu4 from comment #3) > Because it is a mail sent from other organization, we cannot control how the mail is composed of. > Is there anything that we can do to solve this problem, (snip) Please ask mail sender, or developer of mailer/mail system which your mail sender uses, to not put "BOF for UTF-8" which is never officially/consisently defined by spec of Uiicode, in any place of mail data stream which he sends.
Summary: Garbled characters in inline-forwarded messages → Garbled characters in inline-forwarded messages (Tb fails to process BOM for UTF-8(0xEFBBBF) in UTF-8 encoded mail data well upon Forward, when BOM for UTF-8(0xEFBBBF) is inserted in messae body of text/html mail at a place where mail sender likes.)
(In reply to WADA from comment #4) > (In reply to mwu4 from comment #3) > > Because it is a mail sent from other organization, we cannot control how the mail is composed of. > > Is there anything that we can do to solve this problem, (snip) > > Please ask mail sender, or developer of mailer/mail system which your mail > sender uses, to not put "BOF for UTF-8" which is never > officially/consisently defined by spec of Uiicode, in any place of mail data > stream which he sends. Because this is a government organization and the kind of mail is received occasionally, we don't think there will be any further response. Thank you WADA for your sincere assistance and this bug can be closed with appropriate status. Thank you once again and regards.
(In reply to mwu4 from comment #5) > Because this is a government organization and the kind of mail is received > occasionally, we don't think there will be any further response. If so, as for the mail, "forward as attachment"(Message/Forward As/Attachment of menu) is simplest/easiest and effective workaround. But sender of not-well-formed mail tends to send worse mail. Please be careful on problem like bug 463129/bug 523796/bug 611666 which produces phenomenon of bug 326303 when you use "forward as attachment".
(In reply to WADA from comment #6) > If so, as for the mail, "forward as attachment"(Message/Forward > As/Attachment of menu) is simplest/easiest and effective workaround. > But sender of not-well-formed mail tends to send worse mail. > Please be careful on problem like bug 463129/bug 523796/bug 611666 which > produces phenomenon of bug 326303 when you use "forward as attachment". Thank you WADA for your intimate reminding. Best regards.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
WORKSFORME at bugzilla.mozilla.org is *defined* like "Tb's bug(flaw in code) actually had existed but already was resolved by unknown patch". Following are unclear. - "BOM of UTF-8" in HTML or plain text part is permitted in mail data, even though charset is defined by higher level Content-Type: header? - Even if permitted, is "BOM of UTF-8 at anywhere in HTML/plain text part of mail" permitted? (BOM of UTF-16 is defined as binary as first byte sequence of a file or a data stream.) Because U+FEFF is actually/currently defined as a character in Unicode(UTF-16, UTF-8), I think that HTML2text converter of Tb is better to always convert "BOM of UTF-8 == UTF-8 representation of U+FEFF(also BOM in UTF-16)" to Null character(i.e. ignore it) at any place of UTF-8 text part in mail data. So, re-opening. Note: - "BOM of UTF-8" is simply UTF-8 notation of U+FEFF which is "BOM in UTF-16". - U+FEFF is defined as a code pont in Unicode, but U+FFFE doesn't exist in code point of Unicode(this binary is perhaps reserved for BOM when UTF-16LE). - "UTF-8 byte order corresponds to BOM byte order when UTF-16LE(0xFFFE") doesn't exist in UTF-8, because U+FFFE is not defined in UTF-16. - Somehow "0xFEFF, byte order as BOM when UTF-16BE" is also defined as U+FEFF, Unicode Character 'ZERO WIDTH NO-BREAK SPACE', and is also defined as "HTML entitity of &#xFEFF" in HTML definition because it's defined in Unicode. This may be done for ease of "ignoring BOM in environment where UTF16-LE won't be produced". This may be done for wrongly placed "BOM" at mid of text. > http://blogs.msdn.com/b/michkap/archive/2005/01/20/357028.aspx > Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!) > http://www.fileformat.info/info/unicode/char/feff/index.htm > http://en.wikipedia.org/wiki/Unicode > http://en.wikipedia.org/wiki/Byte_Order_Mark
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Even if old name of U+FEFF is "byte order mark", and even if use of U+FEFF as 'ZERO WIDTH NO-BREAK SPACE' is already depreciated and role of 'ZERO WIDTH NO-BREAK SPACE' in U+FEFF is already shifted to 'WORD JOINER'(U+2060), as far as U+FEFF is defined in Unicode and "UTF-8 representation of U+FEFF" is defined in UTF-8, I believe that Tb should treat the U+FEFF character at anywhere in UTF-8 message body text as "Unicode character U+FEFF". And, I think Tb is better torelant with "BOM of UTF-8" as many places as possible, because MS's notepad.exe always writes the "BOM of UTF-8" when text is saved as file in UTF-8.
Summary: Garbled characters in inline-forwarded messages (Tb fails to process BOM for UTF-8(0xEFBBBF) in UTF-8 encoded mail data well upon Forward, when BOM for UTF-8(0xEFBBBF) is inserted in messae body of text/html mail at a place where mail sender likes.) → Garbled characters in inline-forwarded messages (HTML2text converter fails to process BOM of UTF-8(0xEFBBBF) in UTF-8 mail text well upon Forward, when BOM of UTF-8 is inserted in text/html part at some places where mail sender likes.)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: