Closed Bug 86255 Opened 24 years ago Closed 23 years ago

MIME charset header is incorrect when msg contains only ASCII characters

Categories

(MailNews Core :: Internationalization, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.1beta

People

(Reporter: momoi, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(1 file, 2 obsolete files)

** Observed with 6.1 PR1 & 6/15/2001 Win32 trunk builds ** Our spec calls for marking msgs that contain only ASCII characters as "charset=US-ASCII" as send them out. It used to work, I believe. But currently for some reason this is not working. 1. Compose either HTML or plain text mail msg. 2. Choose something like ISO-2022-JP as the encoding, i.e. encoding which normally contains non-ASCII characters. 3. Send it out and view the MIME charset info. 4. You will find that the charset name used is ISO-2022-JP., etc. Please fix this for 0.9.2.
Keywords: intl
Target Milestone: --- → mozilla0.9.3
future
Target Milestone: mozilla0.9.3 → Future
Status: NEW → ASSIGNED
Nominating for nsbeta1.
Keywords: nsbeta1
nsbeta1- enhancement, no dataloss, no display problem
Keywords: nsbeta1nsbeta1-
*** Bug 137850 has been marked as a duplicate of this bug. ***
As a note, there _is_ a display problem. The ISO-2022-JP fonts have incredibly ugly glyphs for ASCII text.
If we fix this we should beware of pitfalls when one encoding character set is almost a subset of another, but not quite. See bug 4238 for an example -- 0x5C in Japanese charsets represents Unicode U00A5, not ASCII 0x5C.
Can anyone give me assurance that these messages which contain no Japanese characters will not trigger font-downloading dialog?
I think font download is triggered by language info which is generated from Content-Type charset (not actual content of the body) for message viewing.
I have an ascii message which always triggers the font download dialog (then the dialog goes away as reported in another bug). Here's the charset info on the message: Content-Type: text/plain; charset=GB2312
So why are we not fixing this bug for the next release. We are proliferating msgs which contain only ASCII characters but are marked with one of the CJK encodings. It is **irresponsbile** to have a font downloading dialog mechanism and then not create accurate content-type charset info. This could inconvenience many users who receive English msgs from Japan, China or Korea using our mail program. These users may not want to install CJK fonts. Nor should they be bothered with the dialog if the message does not require additional fonts. I am raising this to the nsbeta1 again.
Keywords: nsbeta1-nsbeta1
I think the font download dialog problem is a separate issue. It can be caused by messages sent by Mozilla but we cannot prevent other mailers to send it. 4.x and OE does not re-label ISO-2022-JP message to us-ascii even when the body contains ASCII only.
My point is that we should not be creating headers that contain incorrect information no matter what other mailers are doing. We can at least prevent our own problems and reduce the chances of font downlaod dialog coming unnecessarily. We should also change the font downlaod dialog so that users can disable it if they have no plan to instal fonts.
nsbeta1-, we will fix the font download dialog and let the user install the font. Once that got fix, if the user still refuse to download/install the the font but still receive the charset that have those mail message, then the font download dialog will still show up just like what IE does today.
Keywords: nsbeta1nsbeta1-
Target Milestone: Future → mozilla1.1beta
I would like to renominate this for nsbeta1 to be fixed for the next major release for the following 2 reasons: 1. When a message contains only ASCII characters, it should be displayed with a Western font rather than by a Japanese font. 2. I can't find the reference at the moment but the best practice in this tyepf of case is to specifiy the name of the smallest charset available for the characters used in the mail, which in this case would be US-ASCII. About the point smontagu raised, if the original mail contained the 0x5C character in Japanese, I would not convert it to US-ASCII and send it out instead as ISO-2022-JP. There is no way to tell whether the original 0x5C was meant as the yen sign or teh back slash equivalent in Japanese.
Keywords: nsbeta1-nsbeta1
>1. When a message contains only ASCII characters, it should be > displayed with a Western font rather than by a Japanese font. If I send out the message in one machine by using ISO-2022-JP with ASCII only data and view the sent message, should I see that in the same font when I compose it ? If yes, then we should use the japanese font, right ? >2. in this tyepf of case is to specifiy the name of the smallest >charset available for the characters used in the mail, which in this >case would be US-ASCII. Agree, but is this strong enough for us to fix it now? How strong is the requirement to do this ? technically speaking, label a US-ASCII mail with ISO-2022-JP does not violate any standard. It also does not create any bad side effect. The current behave have no competitive disadvangage over other mailer. (nor advangage) N4.x, Eurdora, and OE does not do that neither. nsbeta1- untill I see a strong reason in term of a) standard compliance, 2) competitive disadvangage/advangage, 3) users experiences, 4) technical issues, OR 5) interoperability with other software.
Keywords: nsbeta1nsbeta1-
>>1. When a message contains only ASCII characters, it should be >> displayed with a Western font rather than by a Japanese font. >If I send out the message in one machine by using ISO-2022-JP with ASCII only >data and view the sent message, should I see that in the same font when I >compose it ? If yes, then we should use the japanese font, right ? I don't think so. The messages should be viewed in the best font available for that language or script. If an ISO-2022-JP msg contained nothing but ASCII characters, then that msg is most likely in English. Such a message is best viewed in a Western font. I think this is one fo strongest argument for changing the current behavior. >>2. in this type of case is to specifiy the name of the smallest >charset >> available for the characters used in the mail, which in this >case would be >> US-ASCII. >Agree, but is this strong enough for us to fix it now? How strong is the >requirement to do this ? My position is consistently on the side of the best practice. I believe that using the smallest charset available is "best practice". There are other mailers which sends US-ASCII charset name. Regarding standards, we don't implement just standards. When we implement certain logic, we should try to find the best way to implement it. There is nothing strange about asking Mozilla to behave as correctly as possible. In fact that is the principle we should be following. I would rather not be sloppy in one area just becasue something is not a standard. I think it is incorrect thinking to apply to this particular case. Let's see if we can get a volunteer to take this bug on so that at least Mozilla trunk will implement the best practice.
>About the point smontagu raised, if the original mail contained the 0x5C <momoi>character in Japanese, I would not convert it to US-ASCII and send it >out instead as ISO-2022-JP. There is no way to tell whether the original >0x5C was meant as the yen sign or teh back slash equivalent in Japanese. I agree on this specific example, but I was trying to make a more general point - if we fix this we will have to do careful research and testing to be sure that we have covered all the special cases that may arise, of which the yen in Japanese charsets is one example. <ftang>technically speaking, label a US-ASCII mail with ISO-2022-JP does not >violate any standard. What about RFC 2046, as quoted in bug 136664?
With this patch, mapping to "us-ascii" is applied to any charsets instead of ISO-8859-1 only.
Attached patch Also remove old 7bit check code. (obsolete) — Splinter Review
Attachment #89427 - Attachment is obsolete: true
Comment on attachment 89819 [details] [diff] [review] Also remove old 7bit check code. R=ducarroz
Attachment #89819 - Flags: review+
+NS_IMETHODIMP nsMsgCompFields::GetBodyIsAsciiOnly(PRBool *_retval) +{ + *_retval = m_bodyIsAsciiOnly; return NS_OK; } you need an NS_ENSURE_ARG_POINTER for _retval, since this can be called from js, right? other than that, looks fine.
Attachment #89819 - Attachment is obsolete: true
Comment on attachment 89832 [details] [diff] [review] Added arg pointer check for GetBodyIsAsciiOnly. sr=bienvenu
Attachment #89832 - Flags: superreview+
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
I disagree with this patch. FYI bug 247958 is now about backing this behaviour out. The mailer should not automatically change the characterset selection of the user. If you send an ascii only email to someone in an ascii-superset characterset that violates no RFC. If the fonts you use to view it are ugly, get better fonts. The mailer cannot know what the user wants to do and so should leave this as is.
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: