Replied-To Message coerced into message composition encoding in composer window

RESOLVED DUPLICATE of bug 254868

Status

MailNews Core
Internationalization
RESOLVED DUPLICATE of bug 254868
14 years ago
8 years ago

People

(Reporter: Eyal Rozenberg, Assigned: smontagu)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

14 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a3) Gecko/20040817
Build Identifier: 

Here's the current logic for determining which encoding to use for reading a message
1. infer the 'reported' encoding from the message headers (this is done with the
rather borked libmime which needs a rewrite - but this is bug 248846)
2. if no encoding is named in the headers than use the default
3. if 'Apply Default to All Messages' is set than ignore 1. and 2. and use the
default

(this is unless I am mistaken and the encoding auto-detection for web pages is
also applied to 


There are numerous problems with this scheme; I can come up with some of them,
others could probably report more:

- After this logic is applied, If you make a manual choice of encoding using the
View->Character Encoding menu, it does not persist. Thus if you move to another
message and back, two bad things happen: first, all of the headers are parsed
again (although this is again a problem with libmime and the fact that no
internal representation of messages seems to be constructed), and second, the
3-step logic above is applied again so you get the same wrong choice of encoding
you had to manually override
- the coercion of the default encoding also carries over to a 'reply-to'
composer window, e.g. if you've received a UTF-8 message with characters in some
Asian script and are replying to it they may be forced into the gibberish seen
by reading them with your default Windows-1256 codepage, for instance, if you've
chose 'Apply Default to All Messages'
- The current coercion scheme is not the most effective 'cheap' coercion
possible: Even when not checking the message body for whether the selected
encoding seems to match the contents or not, it would provide better result if
the coercion option was not "always coerse to default encoding" but rather
"coerse to default encoding whenever the headers say nothing or say the default,
e.g. ISO-8859-1 US-ASCII"; this is due to the fact that it is extremely rare for
a message to arrive with, say, "charset=windows-1255" in the content-type header
which is neither windows-1255 nor plain English in ASCII but rather, say, UTF-8
or Arabic in Windows-1256. I don't think this has ever happened
- The message body needs (subject to a pref) to be considered when deciding the
encoding; If it is not already used, it would be beneficial to apply the
encoding auto-detection to mail messages as well as to documents shown in the
browser. Of course it would be rather useless (at least AFAIAC) since it doesn't
detect Hebrew (86999), which means it will also mis-detect several other
encodings, e.g. Cyrillic Windows-1251, for some Hebrew messages.  A simpler
alternative is some logic for deciding when the coercion was wrong, e.g. if you
coerce text into Windows-1255 but get lots of repeated sequences of punctuation
marks without letters, or may occurnces of characters which are completely
unused in Windows-1255 or very rare (3rd power, inverted exclamation mark,
double dagger etc.) - then the coercion is probably a mistake and should be undone.


Reproducible: Always
Steps to Reproduce:

Comment 1

14 years ago
This sounds awfully familiar. At least part of it is a dupe of Bug 208917 and
I'm sure most other issues are dealt with in other bugs. You might want to go
over bug 254868 (which was recently fixed) and other bugs that are linked from
the tracking Bug 254868.

Prog.
Blocks: 254868
Whiteboard: DUPEME

Comment 2

14 years ago
Correction: The recently fixed bug is Bug 227265.

Sorry for the spam,

Prog.
(Reporter)

Comment 3

14 years ago

*** This bug has been marked as a duplicate of 254868 ***
Status: UNCONFIRMED → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → DUPLICATE

Comment 4

14 years ago
Eyal, since Bug 254868 is for tracking other bugs, please move your analysis and
suggestions (in comment 0) to another bug, such as Bug 208917. There's no reason
to have this content lost in dupelivion.

Prog.
Product: MailNews → Core
Product: Core → MailNews Core
Cleanup *dupeme* whiteboard flag from bugs that are marked as Resolved Duplicate!
Whiteboard: DUPEME
You need to log in before you can comment on or make changes to this bug.