Closed
Bug 136664
Opened 22 years ago
Closed 10 years ago
charset in header should use lowest common denominator charset
Categories
(MailNews Core :: MIME, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bobj, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: intl)
Attachments
(1 file)
9.07 KB,
patch
|
Details | Diff | Splinter Review |
When sending email in Windows-1252 encoding, the charset in the content-type header should use the "lowest common denominator" charset: - If there are only ASCII text in the mail, use "US-ASCII". Example: "abcde" - If there are non-ASCII characters, but all are within the ISO-8859-1 charset, use "ISO-8859-1". Example: "àbcdê " - If there are non-ISO-8859-1 (e.g., smart quotes, Euro), use "Windows-1252". Example: "‘àbcd€’" Currently, Mozilla will send all three cases (above) as "Windows-1252". Excerpt from http://www.ietf.org/rfc/rfc2046.txt In general, composition software should always use the "lowest common denominator" character set possible. For example, if a body contains only US-ASCII characters, it SHOULD be marked as being in the US- ASCII character set, not ISO-8859-1, which, like all the ISO-8859 family of character sets, is a superset of US-ASCII. More generally, if a widely-used character set is a subset of another character set, and a body contains only characters in the widely-used subset, it should be labelled as being in that subset. This will increase the chances that the recipient will be able to view the resulting entity correctly. We probably should look into this for GB18030, GBK and GB2312 too.
Comment 1•22 years ago
|
||
It would be a simple change for windows-1252 case. http://lxr.mozilla.org/seamonkey/source/mailnews/compose/src/nsMsgCompUtils.cpp#788 I think it could be also applied for other charsets but need to handle differently for 7bit charsets like ISO-2022-JP (bug 86255).
Comment 2•22 years ago
|
||
So.. is this a duplicate of bug 86255 (sure sounds like it). Is it a dependency?
This bug is more general. Bug 86255 is specific to iso-2022-jp, so if anything that one should be made a dup of this bug. But note the comment from that bug http://bugzilla.mozilla.org/show_bug.cgi?id=86255#c6 If we fix this we should beware of pitfalls when one encoding character set is almost a subset of another, but not quite. See bug 4238 for an example -- 0x5C in Japanese charsets represents Unicode U00A5, not ASCII 0x5C.
There are actually 2 codepoints in the 7-bit range which map differently depending if they are Japanese: CODE POINT ASCII JIS X 0201 ========== ========= ========== 0x5C backslash yen sign 0x7E tilde overline CHARACTER UNICODE VALUE ========= ============= backslash 0x005C yen sign 0x00A5 tilde 0x007E overline 0x203E So if we test for ASCII before converting from Unicode, then testing for < 0x007F will work. If we test after converting from Unicode, then we have to special case check for 0x5C and 0x7E for Japanese.
Another case we should consider: GB18030 -> GB2312 Currently GB2312 is more commonly used and likely to be supported by any mail app that supports Simplified Chinese. Probably S. Chinese text in most messages today are covered by GB2312.
Comment 7•22 years ago
|
||
Reassign to nhotta. Probably we need a mechanism to specify editable lists (e.g. pref, property) for the charsets which need the mapping. To implement this, we need to convert more than once depends on the content of the header/body. We may use the fallback charset mechanism to try converting multiple charsets (e.g. us-ascii, gb2312, gb18030). But that would be slow because usually Chinese message cannot be converted to us-ascii. The us-ascii check may be substitute by 7 bit check for the Chinese case but not for Japanese case (ISO-2022-JP is 7 bit encoding). Alternative approach would be do the check while the user is editing. This may work for the body.
Assignee: ducarroz → nhotta
Comment 8•22 years ago
|
||
> The us-ascii check may be substitute by 7 bit check for the Chinese case but
> not for Japanese case (ISO-2022-JP is 7 bit encoding).
We may do the check while the body is Unicode and no special cases are needed.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.1beta
This link: http://www.w3.org/TR/japanese-xml/#ambiguity_of_yen "Ambiguities in conversion from Shift-JIS to Unicode (Non-Normative)" provides good info on the yen sign, etc. ambiguity problem.
Comment 10•22 years ago
|
||
The patch moves the ASCII check right before we convert Unicode to a mail charset. By doing this, we can skip the convert manager overhead if the body is ASCII only. After the conversion compFields remembers the result and it can be used later in the code. Additional changes needed to actually labeling any 7 bit only body as us-ascii regardless of the mail charset. That part is not included in this patch.
Comment 11•22 years ago
|
||
The ASCII check is also needed after the Unicode conversion because the 8 bit string may turn to entities like á or € in case of HTML mail.
Comment 12•22 years ago
|
||
I realized that the current patch is actually for bug 86255 MIME charset header is incorrect when msg contains only ASCII characters. So I put a new patch to that bug. For this bug, we can do mapping when we set a charset. For example, if the user choose GB18030 then we can map to GB2312. Later when we convert, it may fail depends on the text contents. We can supply a fallback list for that case.
Updated•22 years ago
|
Target Milestone: mozilla1.1beta → ---
Comment 13•20 years ago
|
||
This behaviour causes compatibility problems with other MUA. See bug 247958 for details. In summary, OE silently discards information when replying. The patch at bug 247958 prevents this behaviour (as added by bug 86255) by default, adding a pref to allow a user to re-enable it if desired.
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Assignee: nhottanscp → nobody
Status: ASSIGNED → NEW
QA Contact: ji → mime
Assignee | ||
Updated•16 years ago
|
Product: Core → MailNews Core
Comment 14•10 years ago
|
||
The world moved on. When in doubt use UTF-8 and everyone will be happy. -> WONTFIX
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•