Open Bug 260728 Opened 20 years ago Updated 2 years ago

Encoding coercion to the default encoding should only happen for 'standard'/unspecified encodings

Categories

(MailNews Core :: Internationalization, defect)

defect

Tracking

(Not tracked)

People

(Reporter: eyalroz1, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intl, Whiteboard: [patchlove][has draft patch][needs new assignee?])

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a3) Gecko/20040817
Build Identifier: 

(this is part of a split-up of dupe bug 260706 into non-dupe pieces to be
tracked from bug 254868)

The current default encoding coercion scheme is not the most effective 'cheap'
coercion possible: Even when not checking the message body for whether the
selected encoding seems to match the contents or not, it would provide better
results if the coercion option was not "always coerse to default encoding" but
rather "coerse to default encoding whenever the headers say nothing or say the
default, e.g. ISO-8859-1 or US-ASCII"; this is due to the fact that it is
extremely rare for a message to arrive with, say, "charset=windows-1255" in the
content-type header which is neither windows-1255 nor plain English in ASCII but
rather, say, UTF-8 or Arabic in Windows-1256. In fact, I don't think this has
ever happened to me.

Reproducible: Always
Steps to Reproduce:
Blocks: 254868
Status: UNCONFIRMED → NEW
Ever confirmed: true
What's extremly rare to you may  not be necessarily very rare to other people.
For instance, Japanese and Russian users may have different experiences
(especially with Usenet news postings). 
Keywords: intl
So what Jungshik (I hope that's the first name) is saying is that this should be
controlled by a pref.
Attached patch 'draft' patchSplinter Review
Here's a working, albeit quite ugly, patch.
Assignee: smontagu → eyalroz
Status: NEW → ASSIGNED
Comment on attachment 160489 [details] [diff] [review]
'draft' patch

I don't expect a review+, but I want some input on how to un-uglify the code.
Specifically, there has to be a more elegant way to determine whether a charset
is one of the charsets commonly used by mail clients which don't know any
better (rather than the current use of a new function written by me which does
a few strcasecmp's).
Attachment #160489 - Flags: review?(smontagu)
It's not clear to me what problem is being addressed by this bug, other than a 
stated lack of an "effective, 'cheap' coercion."
The problem is the following: some people send me e-mail with charset
windows-1255 whose headers say they are iso-8859-1 or us-ascii; some people send
me charset windows-1255 messages whose headers say they are windows-1255; and
some people send me messages in utf-8. Now, if I choose coersion to
windows-1255, the utf-8 messages are displayed incorrectly, but if I don't, some
of the windows-1255 messages are displayed incorrectly (because it is assumed
they are iso-8859-1 or us-ascii).
(In reply to comment #6)
> Now, if I choose coersion to windows-1255, the utf-8 messages are displayed
> incorrectly, but if I don't, some of the windows-1255 messages are displayed
> incorrectly (because it is assumed they are iso-8859-1 or us-ascii).

Displayed incorrectly in the compose window, or in the received message?
Displayed incorrectly as received messages, of course. Now that bug 260728 has
been fixed, once I correct the display and press 'reply', there's no problem (I
think).
(In reply to comment #8)
> Now that bug 260728 has been fixed, once I correct the display and
> press 'reply', there's no problem (I think).

Um, *this* is bug 260728.  :)    Which one do you mean has been fixed?
Bug 234958?

Uh, sorry, I meant to say now that bug 260725 has been fixed.
Product: MailNews → Core
Product: Core → MailNews Core
Not clear to me if Eyal is thinking this works.  Thoughts?

reset QA (was empty)
QA Contact: i18n
The patch now suffers from bit rot, I suppose, plus it was never ready for a review+ (e.g. I hard-coded the charsets for which to apply coercion). But I'm not sure I understand your question, Wayne.

In general, this is still something I believe should be done - although my extension (BiDi Mail UI) works around this issue by simply detecting itself what the charset really is by inspecting the content, allowing even for multiple charsets used within the same message:

https://addons.mozilla.org/en/thunderbird/addon/310

It's pretty slow, though, being JS code which needs to inspect the message body and apply a bunch of regexes to it.
Attachment #160489 - Flags: review?(smontagu)
Whiteboard: [patchlove][has draft patch][needs new assignee?]
Assignee: eyalroz → nobody
Status: ASSIGNED → NEW
Is this bug still relevant after the changes made to this area in the last couple of years?
Well, unless one of those changes did what I suggested, then yes. Frankly, though, I have not been following the code for a while already.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: