Closed Bug 1779070 Opened 3 years ago Closed 3 years ago

102.0 irrecoverably destroys messages subject if Windows-1251 (or not Unicode) encoded

Categories

(MailNews Core :: Database, defect, P1)

Thunderbird 102

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: andreshko, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, testcase)

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0

Steps to reproduce:

Upgraded from 91.11 to 102.0

Actual results:

All messages with Windows-1251 encoded subjects (and maybe all messages with non-Unicode encoded subjects) were irrecoverably converted to messages with subjects of Unicode replacement characters (U+FFFD) in place of the Cyrillic characters. Readable subjects are no longer available. Subjects of such messages are irrecoverably lost, as downgrading to Thunderbird 91.11 keeps the broken subjects severely malformed by Thunderbird 102.0.

Original message subject preserved until and including Thunderbird 91.11:
групажен контейнерен сервиз от Китай

Same message subject after upgrading to Thunderbird 102:

Same message subject in a reply to message:
ÇÒÕÐÁÖÅÎ ËÏÎÔÅÊÎÅÒÅÎ ÓÅÒ×ÉÚ ÏÔ ëÉÔÁÊ

Expected results:

Messages subjects and content should not be destroyed by an e-mail client.

Thanks Andrey!

From reporter's presentation, this looks like a dealbreaking bug and possibly dataloss.
Tentatively marking P1/S1.

Severity: -- → S1
Component: Untriaged → Database
Flags: needinfo?(geoff)
Keywords: dataloss
Priority: -- → P1
Product: Thunderbird → MailNews Core

Reporter, can you provide a test message.eml after removing your private data from the source?

Flags: needinfo?(andreshko)
Keywords: testcase-wanted

(In reply to Thomas D. (:thomas8) from comment #2)

Reporter, can you provide a test message.eml after removing your private data from the source?

Sure.
https://1drv.ms/u/s!AuHoSmLluBwy9GECH2LLqVJLDNrs?e=9vPnrj

Flags: needinfo?(andreshko)

I'm not the right person for this one. I suspect the badly decoded subject is being stored in the .msf file and that's why it still appears broken on downgrade. But I don't know about where or how that happens.

Andrey, the request was for the actual email file, not a picture of it. With the message loaded in Thunderbird, press Ctrl+S to save as a file.

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(geoff)
Flags: needinfo?(andreshko)

(In reply to Geoff Lankow (:darktrojan) from comment #4)

I'm not the right person for this one. I suspect the badly decoded subject is being stored in the .msf file and that's why it still appears broken on downgrade. But I don't know about where or how that happens.

Andrey, the request was for the actual email file, not a picture of it. With the message loaded in Thunderbird, press Ctrl+S to save as a file.

Yes, it is the actual file, not a picture of it. Only the addresses of the sender and recipient are substituted and the body is truncated. When opened, the message subject and the real subject in the reply block can be seen.
The .eml file got its name after <Ctrl+S> and all Cyrillic characters are substituted by Unicode replacement characters (U+FFFD) exactly as it appears in Thunderbird's messages list.
Anyway, here is the message source (the text file must be saved as PC, not Unix; ANSI encoded, not Unicode, in order to reproduce exactly):

X-Mozilla-Status: 0011
X-Mozilla-Status2: 00400000
X-Mozilla-Keys:
Reply-To: "ME" <recipient@example.com>
From: "ME" <recipient@example.com>
To: "THEM" <sender@example.org>
Subject: Re: ЗТХРБЦЕО ЛПОФЕКОЕТЕО УЕТЧЙЪ ПФ лЙФБК
Date: Tue, 23 May 2006 09:01:24 +0300
Organization: Trans World Europe Ltd.
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_000C_01C67E47.77943FE0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
Disposition-Notification-To: "ME" <recipient@example.com>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869

This is a multi-part message in MIME format.

------=_NextPart_000_000C_01C67E47.77943FE0
Content-Type: text/plain;
charset="koi8-r"
Content-Transfer-Encoding: quoted-printable

=FA (truncated)

----- Original Message -----=20
From: THEM=20
To: recipient@example.com=20
Sent: Wednesday, May 17, 2006 2:53 PM
Subject: =C7=D2=D5=D0=C1=D6=C5=CE =CB=CF=CE=D4=C5=CA=CE=C5=D2=C5=CE =
=D3=C5=D2=D7=C9=DA =CF=D4 =EB=C9=D4=C1=CA

(truncated)

Flags: needinfo?(andreshko)
Blocks: tb102found
Attached image Russian message.png

The subject is KOI8-R (or KOI8-U) encoded (try it in Notepad++) and the patch in bug 1739609 lets you display the subject correctly. We can't see any dataloss.

So the subject is displayed to me as "koi8-r".

In case of a single-part mail, the charset is taken from the body.

In a multi-part mail, this could get tricky if the parts have different charsets.

(In reply to Andrey Marinov from comment #5)
Please attach the message to bugzilla as an .eml

The case from comment 8 shows correctly for me.

Then of course, the Subject is supposed to be encoded, not raw random-charset. But I don't recall what that would have changed from 91-102

I don't think the behavior is new. The 10 emails all contain the same 6 unencoded hex values. Shown here with an old TB88.0a1.

The last change was the removal of the default-charset. With that one could still control the display of unencoded characters.

Agreed.

Flags: needinfo?(mkmelin+mozilla)

-> INVALID. The subject needs to be encoded, not raw.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
Duplicate of this bug: 1835272
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: