Encoding detection not run for messages that lack the Content-Type header
Categories
(MailNews Core :: Internationalization, defect)
Tracking
(Not tracked)
People
(Reporter: hsivonen, Unassigned)
References
Details
(Keywords: regression)
Attachments
(1 file)
1.65 KB,
text/plain
|
Details |
According to a user (I haven't confirmed this myself), an email that lacks a Content-Type MIME header and contains windows-1252 plain text that chardetng detects as windows-1252 is decoded as UTF-8 in Thunderbird 91. This is a regression in Thunderbird 91.
In contrast to bug 1713786, which is about manually overriding the encoding of messages that declare the wrong encoding. This is about a failure to run chardetng when the message doesn't declare an encoding at all.
Steps to reproduce
- Generate some text in a windows-1252 language, add a
From
line and enough headers (but notContent-Type
) to make it an email in the mbox format, and save it as windows-1252-encoded bytes with a.txt
extension. - Load the
.txt
file from afile:
URL in Firefox and verify that it is detected as windows-1252. - Import the file as an mbox file into Thunderbird.
- Open the email in Thunderbird.
Actual results
The non-ASCII characters show up as REPLACEMENT CHARACTERs due to decoding as UTF-8.
Expected results
Expected the message to be decoded as windows-1252 thanks to detection using chardetng in the absence of a character encoding declaration.
Comment 1•3 years ago
|
||
I wonder if this affects nightly, since https://hg.mozilla.org/comm-central/rev/d0af6cc5fe02dd6e5b27ec24322c02491d8d7990 re-added a case where we didn't autodetect anymore.
Comment 2•3 years ago
|
||
Comment 3•3 years ago
|
||
This displays fine in our fork and should display fine in TB Daily with the change that was pointed out. However, if you install https://addons.thunderbird.net/en-GB/thunderbird/addon/charset-menu, you can see that this is detected as windows-1250 which leads to bug 1737245 when forwarded. Henri, why is that not detected as windows-1252?
Reporter | ||
Comment 4•3 years ago
|
||
(In reply to Rachel Martin from comment #3)
Henri, why is that not detected as windows-1252?
The byte pairs for the accented characters happen to be even more plausible for windows-1250 than for windows-1252. This particular input happens to be unlucky. Fortunately, the visible failure mode is remarkably benign: Spanish ñ getting replaced with the Polish letter for that sound: ń.
Reporter | ||
Comment 5•3 years ago
|
||
(In reply to Rachel Martin from comment #2)
More precisely here: https://hg.mozilla.org/comm-central/rev/d0af6cc5fe02dd6e5b27ec24322c02491d8d7990#l3.18
Indeed, it looks a lot like this is a duplicate of bug 1734361, but I'll let someone who has actually verified things to make the determination.
Comment 6•3 years ago
|
||
Tested this locally on nightly and from what I can tell nightly is no longer affected. Thus duping.
Description
•