Closed Bug 1779070 Opened 3 years ago Closed 3 years ago

102.0 irrecoverably destroys messages subject if Windows-1251 (or not Unicode) encoded

Tracking

(Not tracked)

Status:

RESOLVED INVALID

People

(Reporter: andreshko, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, testcase)

Attachments

(4 files)

Testcase 1: 20060523-Re_�� -2.eml 3 years ago Thomas D. (:thomas8) 1.12 KB, message/rfc822		Details
Russian message.png 3 years ago b1 6.39 KB, image/png		Details
Same email as single part mail 3 years ago Alfred Peters [:infofrommozilla] 607 bytes, text/plain		Details
Different test mails shown with TB88.0a1 3 years ago Alfred Peters [:infofrommozilla] 57.46 KB, image/png		Details

Andrey Marinov

Reporter

Description

•

3 years ago

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0

Steps to reproduce:

Upgraded from 91.11 to 102.0

Actual results:

All messages with Windows-1251 encoded subjects (and maybe all messages with non-Unicode encoded subjects) were irrecoverably converted to messages with subjects of Unicode replacement characters (U+FFFD) in place of the Cyrillic characters. Readable subjects are no longer available. Subjects of such messages are irrecoverably lost, as downgrading to Thunderbird 91.11 keeps the broken subjects severely malformed by Thunderbird 102.0.

Original message subject preserved until and including Thunderbird 91.11:
групажен контейнерен сервиз от Китай

Same message subject after upgrading to Thunderbird 102:

Same message subject in a reply to message:
ÇÒÕÐÁÖÅÎ ËÏÎÔÅÊÎÅÒÅÎ ÓÅÒ×ÉÚ ÏÔ ëÉÔÁÊ

Expected results:

Messages subjects and content should not be destroyed by an e-mail client.

Thomas D. (:thomas8)

Comment 1

•

3 years ago

Thanks Andrey!

From reporter's presentation, this looks like a dealbreaking bug and possibly dataloss.
Tentatively marking P1/S1.

Severity: -- → S1

tracking-firefox103: --- → ?

tracking-firefox103: ? → ---

tracking-firefox104: --- → ?

tracking-firefox104: ? → ---

Component: Untriaged → Database

Flags: needinfo?(geoff)

Keywords: dataloss

Priority: -- → P1

Product: Thunderbird → MailNews Core

Thomas D. (:thomas8)

Comment 2

•

3 years ago

Reporter, can you provide a test message.eml after removing your private data from the source?

Flags: needinfo?(andreshko)

Keywords: testcase-wanted

Andrey Marinov

Reporter

Comment 3

•

3 years ago

(In reply to Thomas D. (:thomas8) from comment #2)

Reporter, can you provide a test message.eml after removing your private data from the source?

Sure.
https://1drv.ms/u/s!AuHoSmLluBwy9GECH2LLqVJLDNrs?e=9vPnrj

Flags: needinfo?(andreshko)

Geoff Lankow (:darktrojan)

Comment 4

•

3 years ago

I'm not the right person for this one. I suspect the badly decoded subject is being stored in the .msf file and that's why it still appears broken on downgrade. But I don't know about where or how that happens.

Andrey, the request was for the actual email file, not a picture of it. With the message loaded in Thunderbird, press Ctrl+S to save as a file.

Flags: needinfo?(mkmelin+mozilla)

Flags: needinfo?(geoff)

Flags: needinfo?(andreshko)

Andrey Marinov

Reporter

Comment 5

•

3 years ago

(In reply to Geoff Lankow (:darktrojan) from comment #4)

I'm not the right person for this one. I suspect the badly decoded subject is being stored in the .msf file and that's why it still appears broken on downgrade. But I don't know about where or how that happens.

Andrey, the request was for the actual email file, not a picture of it. With the message loaded in Thunderbird, press Ctrl+S to save as a file.

Yes, it is the actual file, not a picture of it. Only the addresses of the sender and recipient are substituted and the body is truncated. When opened, the message subject and the real subject in the reply block can be seen.
The .eml file got its name after <Ctrl+S> and all Cyrillic characters are substituted by Unicode replacement characters (U+FFFD) exactly as it appears in Thunderbird's messages list.
Anyway, here is the message source (the text file must be saved as PC, not Unix; ANSI encoded, not Unicode, in order to reproduce exactly):

X-Mozilla-Status: 0011
X-Mozilla-Status2: 00400000
X-Mozilla-Keys:
Reply-To: "ME" <recipient@example.com>
From: "ME" <recipient@example.com>
To: "THEM" <sender@example.org>
Subject: Re: ЗТХРБЦЕО ЛПОФЕКОЕТЕО УЕТЧЙЪ ПФ лЙФБК
Date: Tue, 23 May 2006 09:01:24 +0300
Organization: Trans World Europe Ltd.
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_000C_01C67E47.77943FE0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2869
Disposition-Notification-To: "ME" <recipient@example.com>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869

This is a multi-part message in MIME format.

------=_NextPart_000_000C_01C67E47.77943FE0
Content-Type: text/plain;
charset="koi8-r"
Content-Transfer-Encoding: quoted-printable

=FA (truncated)

----- Original Message -----=20
From: THEM=20
To: recipient@example.com=20
Sent: Wednesday, May 17, 2006 2:53 PM
Subject: =C7=D2=D5=D0=C1=D6=C5=CE =CB=CF=CE=D4=C5=CA=CE=C5=D2=C5=CE =
=D3=C5=D2=D7=C9=DA =CF=D4 =EB=C9=D4=C1=CA

(truncated)

Flags: needinfo?(andreshko)

Thomas D. (:thomas8)

Comment 6

•

3 years ago

Attached file Testcase 1: 20060523-Re_�� -2.eml — Details

This is reporter's original testcase from the link in comment 3.

Thomas D. (:thomas8)

Updated

•

3 years ago

Keywords: testcase-wanted → testcase

Wayne Mery (:wsmwk)

Updated

•

3 years ago

Blocks: tb102found

Comment 7

•

3 years ago

Attached image Russian message.png — Details

The subject is KOI8-R (or KOI8-U) encoded (try it in Notepad++) and the patch in bug 1739609 lets you display the subject correctly. We can't see any dataloss.

Alfred Peters [:infofrommozilla]

Comment 8

•

3 years ago

•

Edited

Attached file Same email as single part mail — Details

So the subject is displayed to me as "koi8-r".

In case of a single-part mail, the charset is taken from the body.

In a multi-part mail, this could get tricky if the parts have different charsets.

Magnus Melin [:mkmelin]

Comment 9

•

3 years ago

(In reply to Andrey Marinov from comment #5)
Please attach the message to bugzilla as an .eml

The case from comment 8 shows correctly for me.

Then of course, the Subject is supposed to be encoded, not raw random-charset. But I don't recall what that would have changed from 91-102

Alfred Peters [:infofrommozilla]

Comment 10

•

3 years ago

•

Edited

Attached image Different test mails shown with TB88.0a1 — Details

I don't think the behavior is new. The 10 emails all contain the same 6 unencoded hex values. Shown here with an old TB88.0a1.

The last change was the removal of the default-charset. With that one could still control the display of unencoded characters.

Magnus Melin [:mkmelin]

Comment 12

•

3 years ago

Agreed.

Flags: needinfo?(mkmelin+mozilla)

Magnus Melin [:mkmelin]

Comment 13

•

3 years ago

-> INVALID. The subject needs to be encoded, not raw.

Status: UNCONFIRMED → RESOLVED

Closed: 3 years ago

Resolution: --- → INVALID

Alfred Peters [:infofrommozilla]

Updated

•

2 years ago

Duplicate of this bug: 1835272

You need to log in before you can comment on or make changes to this bug.

Bugzilla

102.0 irrecoverably destroys messages subject if Windows-1251 (or not Unicode) encoded

Categories

(MailNews Core :: Database, defect, P1)

Tracking

(Not tracked)

People

(Reporter: andreshko, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, testcase)

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 12

Comment 13

Updated

Attachment

General

Description

File Name

Content Type