102.0 irrecoverably destroys messages subject if Windows-1251 (or not Unicode) encoded
Categories
(MailNews Core :: Database, defect, P1)
Tracking
(Not tracked)
People
(Reporter: andreshko, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: dataloss, testcase)
Attachments
(4 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0
Steps to reproduce:
Upgraded from 91.11 to 102.0
Actual results:
All messages with Windows-1251 encoded subjects (and maybe all messages with non-Unicode encoded subjects) were irrecoverably converted to messages with subjects of Unicode replacement characters (U+FFFD) in place of the Cyrillic characters. Readable subjects are no longer available. Subjects of such messages are irrecoverably lost, as downgrading to Thunderbird 91.11 keeps the broken subjects severely malformed by Thunderbird 102.0.
Original message subject preserved until and including Thunderbird 91.11:
групажен контейнерен сервиз от Китай
Same message subject after upgrading to Thunderbird 102:
Same message subject in a reply to message:
ÇÒÕÐÁÖÅÎ ËÏÎÔÅÊÎÅÒÅÎ ÓÅÒ×ÉÚ ÏÔ ëÉÔÁÊ
Expected results:
Messages subjects and content should not be destroyed by an e-mail client.
Comment 1•3 years ago
|
||
Thanks Andrey!
From reporter's presentation, this looks like a dealbreaking bug and possibly dataloss.
Tentatively marking P1/S1.
Comment 2•3 years ago
|
||
Reporter, can you provide a test message.eml after removing your private data from the source?
Reporter | ||
Comment 3•3 years ago
|
||
(In reply to Thomas D. (:thomas8) from comment #2)
Reporter, can you provide a test message.eml after removing your private data from the source?
Sure.
https://1drv.ms/u/s!AuHoSmLluBwy9GECH2LLqVJLDNrs?e=9vPnrj
Comment 4•3 years ago
|
||
I'm not the right person for this one. I suspect the badly decoded subject is being stored in the .msf file and that's why it still appears broken on downgrade. But I don't know about where or how that happens.
Andrey, the request was for the actual email file, not a picture of it. With the message loaded in Thunderbird, press Ctrl+S to save as a file.
Reporter | ||
Comment 5•3 years ago
|
||
(In reply to Geoff Lankow (:darktrojan) from comment #4)
I'm not the right person for this one. I suspect the badly decoded subject is being stored in the .msf file and that's why it still appears broken on downgrade. But I don't know about where or how that happens.
Andrey, the request was for the actual email file, not a picture of it. With the message loaded in Thunderbird, press Ctrl+S to save as a file.
Yes, it is the actual file, not a picture of it. Only the addresses of the sender and recipient are substituted and the body is truncated. When opened, the message subject and the real subject in the reply block can be seen.
The .eml file got its name after <Ctrl+S> and all Cyrillic characters are substituted by Unicode replacement characters (U+FFFD) exactly as it appears in Thunderbird's messages list.
Anyway, here is the message source (the text file must be saved as PC, not Unix; ANSI encoded, not Unicode, in order to reproduce exactly):
X-Mozilla-Status: 0011
X-Mozilla-Status2: 00400000
X-Mozilla-Keys:
Reply-To: "ME" <recipient@example.com>
From: "ME" <recipient@example.com>
To: "THEM" <sender@example.org>
Subject: Re: ЗТХРБЦЕО ЛПОФЕКОЕТЕО УЕТЧЙЪ ПФ лЙФБК
Date: Tue, 23 May 2006 09:01:24 +0300
Organization: Trans World Europe Ltd.
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_000C_01C67E47.77943FE0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
Disposition-Notification-To: "ME" <recipient@example.com>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
This is a multi-part message in MIME format.
------=_NextPart_000_000C_01C67E47.77943FE0
Content-Type: text/plain;
charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
=FA (truncated)
----- Original Message -----=20
From: THEM=20
To: recipient@example.com=20
Sent: Wednesday, May 17, 2006 2:53 PM
Subject: =C7=D2=D5=D0=C1=D6=C5=CE =CB=CF=CE=D4=C5=CA=CE=C5=D2=C5=CE =
=D3=C5=D2=D7=C9=DA =CF=D4 =EB=C9=D4=C1=CA
(truncated)
Updated•3 years ago
|
Updated•3 years ago
|
The subject is KOI8-R (or KOI8-U) encoded (try it in Notepad++) and the patch in bug 1739609 lets you display the subject correctly. We can't see any dataloss.
Comment 8•3 years ago
•
|
||
So the subject is displayed to me as "koi8-r".
In case of a single-part mail, the charset is taken from the body.
In a multi-part mail, this could get tricky if the parts have different charsets.
Comment 9•3 years ago
|
||
(In reply to Andrey Marinov from comment #5)
Please attach the message to bugzilla as an .eml
The case from comment 8 shows correctly for me.
Then of course, the Subject is supposed to be encoded, not raw random-charset. But I don't recall what that would have changed from 91-102
Comment 10•3 years ago
•
|
||
I don't think the behavior is new. The 10 emails all contain the same 6 unencoded hex values. Shown here with an old TB88.0a1.
The last change was the removal of the default-charset. With that one could still control the display of unencoded characters.
Comment 13•3 years ago
|
||
-> INVALID. The subject needs to be encoded, not raw.
Description
•