Closed Bug 1738000 Opened 3 years ago Closed 3 years ago

Encoding detection not run for messages that lack the Content-Type header

Categories

(MailNews Core :: Internationalization, defect)

Thunderbird 91
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1734361

People

(Reporter: hsivonen, Unassigned)

References

Details

(Keywords: regression)

Attachments

(1 file)

According to a user (I haven't confirmed this myself), an email that lacks a Content-Type MIME header and contains windows-1252 plain text that chardetng detects as windows-1252 is decoded as UTF-8 in Thunderbird 91. This is a regression in Thunderbird 91.

In contrast to bug 1713786, which is about manually overriding the encoding of messages that declare the wrong encoding. This is about a failure to run chardetng when the message doesn't declare an encoding at all.

Steps to reproduce

  1. Generate some text in a windows-1252 language, add a From line and enough headers (but not Content-Type) to make it an email in the mbox format, and save it as windows-1252-encoded bytes with a .txt extension.
  2. Load the .txt file from a file: URL in Firefox and verify that it is detected as windows-1252.
  3. Import the file as an mbox file into Thunderbird.
  4. Open the email in Thunderbird.

Actual results

The non-ASCII characters show up as REPLACEMENT CHARACTERs due to decoding as UTF-8.

Expected results

Expected the message to be decoded as windows-1252 thanks to detection using chardetng in the absence of a character encoding declaration.

I wonder if this affects nightly, since https://hg.mozilla.org/comm-central/rev/d0af6cc5fe02dd6e5b27ec24322c02491d8d7990 re-added a case where we didn't autodetect anymore.

Attached file w1252-no-charset.eml

This displays fine in our fork and should display fine in TB Daily with the change that was pointed out. However, if you install https://addons.thunderbird.net/en-GB/thunderbird/addon/charset-menu, you can see that this is detected as windows-1250 which leads to bug 1737245 when forwarded. Henri, why is that not detected as windows-1252?

(In reply to Rachel Martin from comment #3)

Henri, why is that not detected as windows-1252?

The byte pairs for the accented characters happen to be even more plausible for windows-1250 than for windows-1252. This particular input happens to be unlucky. Fortunately, the visible failure mode is remarkably benign: Spanish ñ getting replaced with the Polish letter for that sound: ń.

(In reply to Rachel Martin from comment #2)

More precisely here: https://hg.mozilla.org/comm-central/rev/d0af6cc5fe02dd6e5b27ec24322c02491d8d7990#l3.18

Indeed, it looks a lot like this is a duplicate of bug 1734361, but I'll let someone who has actually verified things to make the determination.

Tested this locally on nightly and from what I can tell nightly is no longer affected. Thus duping.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: