247958 - labelling 'ASCII only messages' as in the US-ASCII charset leads to an interoperability problem with MS OE (shouldn't downgrade to ASCII)

Assignee

Description

•

21 years ago

Currently, Mozilla labels 'US-ASCII' only messages with 'Content-Type: text/*; charset=US-ASCII' even if users select non-ASCII 8bit character encodings. Arguably, this is the 'right' (well, it's not quite right vs wrong) thing to do. However, this causes a rather serious 'incompatibility' with MS OE. MS OE replaces all characters not representable in the currently selected character encoding with question marks. When an MS OE user replies to a message sent with Mozilla/TB and labelled as in US-ASCII (e.g. a Chinese user sends an email in English to her friend who uses MS OE and replies in Chinese), MS OE turns any non-ASCII characters in the reply to question marks without any warning whatsoever. As a result, there's an irreversible loss of information. If MS OE is set up to send both text/plain and text/html, its content can be recovered because MS OE uses NCRs to represent characters not representable in the current character encoding in text/html. However, if text/plain is not accompanied by text/html, there's no way to recover the content other than asking the sender to resend it, which may not be a possibility in some situations. Therefore, I think it's better to leave the user-selected MIME charset (character encoding) alone even if the message body contains only ASCII characters by default. In case some users want Mozilla/TB to behave differently, we can add a pref. 'mail.label_ascii_only_mail_as_us_ascii' which is false by default.

Jungshik Shin

Assignee

Comment 1

•

21 years ago

Attached patch patch (obsolete) — Details — Splinter Review

I added a pref. to control the behavior. It's off by default as I wrote in comment #0.

Jungshik Shin

Assignee

Comment 2

•

21 years ago

Comment on attachment 151365 [details] [diff] [review] patch asking for r/sr.

Attachment #151365 - Flags: superreview?(bienvenu)

Attachment #151365 - Flags: review?(sspitzer)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 3

•

21 years ago

+ nsCOMPtr<nsIPref> prefs(do_GetService(kPrefCID, &rv)); can you make this use nsIPrefService/nsIPrefBranch while you're here?

Brodie

Comment 4

•

21 years ago

Do we really need *another* pref? This is the right thing to do, we should always send the email using the user selected charset.

Jungshik Shin

Assignee

Comment 5

•

21 years ago

Attached patch patch (using nsIPrefService/nsIPrefBranch) (obsolete) — Details — Splinter Review

per cbie's suggestion, I now use nsIPrefService/nsIPrefBranch. nsIPref was used elsewhere in the function being patched. In other places in the file, nsIPref is used, but that has to be a separate bug.

Attachment #151365 - Attachment is obsolete: true

Jungshik Shin

Assignee

Comment 6

•

21 years ago

Attached patch patch v3 (using nsIPrefBranch only) — Details — Splinter Review

I realized that this is not perf. critical at all so that reducing the code size is more important (although the difference would be very small).

Attachment #151422 - Attachment is obsolete: true

Jungshik Shin

Assignee

Updated

•

21 years ago

Attachment #151365 - Flags: superreview?(bienvenu)

Attachment #151365 - Flags: review?(sspitzer)

Jungshik Shin

Assignee

Updated

•

21 years ago

Attachment #151423 - Flags: superreview?(bienvenu)

Attachment #151423 - Flags: review?(sspitzer)

Brodie

Comment 7

•

21 years ago

My use-case is similar... I send out the same english email in UTF-8 encoding to my Korean, Japanese and German translators. They all just reply and type the text into the email and send. Since most mailers will by default reply to an email using the characterset of the original email, I get the replies in a readable charset. If I forget to set UTF-8 for the original or it is undone by this bug in our mailer, then I will often get garbage. None of my translators use OE. Thus this shouldn't be labelled as an interop problem with OE. It is actually a 100% legitimate bug in the mailer behaviour. The mailer should *ALWAYS* respect the characterset that the user has selected. There is no time that it should be automatically downgrading it to anything else even if it appears to fit. The patch should be to just remove this behaviour altogether. The comment about OE isn't required and there should be NO pref (this project is plagued by prefs).

Brodie

Comment 8

•

21 years ago

This was 'feature' was originally added for bug 86255.

Jungshik Shin

Assignee

Comment 9

•

21 years ago

(In reply to comment #7) > text into the email and send. Since most mailers will by default reply to an > email using the characterset of the original email, I Pine doesn't. Neither does mutt. Mozilla used to, but still does by default, but that can be changed now (it has even a UI for that) > It is actually a 100% legitimate bug in the mailer behaviour. Not everyone agrees with you as you already found out in bug 86255. For one, I'd NOT change the behavior if there weren't an interoperability problem with stupid mail programs like MS OE (and others your correspondents use) which silently converts characters to question marks __behind the back of its users__. (Mozilla always warns users of the problem in such a case) > The mailer should > *ALWAYS* respect the characterset that the user has selected. There is no time > that it should be automatically downgrading it to anything else even if it > appears to fit. Pine automatically downgrades to US-ASCII, too (it has done that for the last 10 years if not longer). Anyway, with the patch, downgrading is off by default and only the die-hard 'purists' would bother to turn it on. Most users wouldn't realize there is such a pref so that it doesn't matter except for a small increase in the code size.

David :Bienvenu

Updated

•

21 years ago

Attachment #151423 - Flags: superreview?(bienvenu) → superreview+

Jungshik Shin

Assignee

Comment 10

•

21 years ago

Comment on attachment 151423 [details] [diff] [review] patch v3 (using nsIPrefBranch only) asking mscott for r. If you don't find any problem, can you check it into aviary-1.0 branch when giving r?

Attachment #151423 - Flags: review?(sspitzer) → review?(mscott)

Jungshik Shin

Assignee

Comment 11

•

21 years ago

*** Bug 248794 has been marked as a duplicate of this bug. ***

Alan Tam

Comment 12

•

21 years ago

Thanks for the patch! Hope it is in aviary soon as well.

Scott MacGregor

Comment 13

•

21 years ago

Comment on attachment 151423 [details] [diff] [review] patch v3 (using nsIPrefBranch only) jshin, this seems like a pretty obscure pref. Seems like we should either do this by default or not bother. I don't see our target user going out of there way to set this pref by hand in prefs.js. My two cents after glancing at this bug.

Alan Tam

Comment 14

•

21 years ago

I think we have bug 86255 since some Japanese users want otherwise, because of the specific deeds of Japanese charsets and/or fonts. But the rest of the world may not see it fit. So probably there is a need for such an option.

Jungshik Shin

Assignee

Comment 15

•

21 years ago

(In reply to comment #13) > (From update of attachment 151423 [details] [diff] [review]) > jshin, this seems like a pretty obscure pref. Seems like we should either do > this by default or not bother. I don't see our target user going out of there > way to set this pref by hand in prefs.js. It's a tough call. For sure, it's obscure, which is why we'd never bother to make UI (well, I know you think this is so obscure that it doesn't even deserve a pref. entry). One of rationales for downgrading to the smallest MIME charset must have been some mail clients (especially web-mail clients) may alert users that 'this message is in a foreign character encoding... blahblah' even though the entire content is in US-ASCII because they rely on the value of 'charset' parameter in C-T header. Or, some MUAs may try to invoke an external (or iconv()-like functions) program for the encoding conversion only to find that the converter is not available on the system (a number of commercial Unix installations in the US and Europe don't have converters for non-European encodings installed by default) even though there's no need because the entire content is in US-ASCII. For those who need to correspond with them, this pref. can be useful. However, those users are rare and we may just as well remove the pref and do without the downgrading (to improve the interoperability with major MUAs in the market.) So, what to do? If you feel strongly against adding the pref., I'll go without it, which will reduce the code size slightly.

Brodie

Comment 16

•

21 years ago

RFC2046 states that we SHOULD downgrade the charset to the lowest common denominator (see last paragraph of page 10): http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10 bug 136664 is specifically about following this RFC and downgrading; the general case of bug 86255 (which is about Japanese). I've changed my opinion from that stated above since reading the RFC and the bug, and now feel that we should at least give users the option of following the RFC. However, due to the stated MUA compatibility problems we should NOT enable this by default (i.e. by default use the charset that the user selected). However, we should have the pref which should be generalized such that it can be used by bug 136664 if/when it ever gets implemented for the general case. How about we name the preference "mail.auto_use_simplest_charset" (or "mail.auto_use_best_charset"), default to false.

Jungshik Shin

Assignee

Comment 17

•

21 years ago

(In reply to comment #16) > RFC2046 states that we SHOULD downgrade the charset to the lowest common > denominator (see last paragraph of page 10): > http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10 I forgot abuot RFC 2046. Thanks for the reminder. However, we have to note that it's written quite a while ago and the best practice then is not necessarily the best practice now(may or may not be). Nonetheless, I agree with its spirit and am inclined to keep the pref. > bug 136664 is specifically about following this RFC and downgrading; the general > case of bug 86255 (which is about Japanese). Actually, bug 86255 is not just about Japanese. Our current implementation is not generic enough in that other downgrading paths (other than to US-ASCII from virtually all character encodings) such as Windows-1252-> ISO-8859-1 (and its Greek equivalent), Windows-874->ISO-8859-11->TIS-620, GB18030->GBK->GB2312 are not supported. Over the last few years, its usefulness has diminished significantly, though. Perhaps, ill-I18Nized Eudora and web mail users would be beneficiaries of this feature. > used by bug 136664 if/when it ever gets implemented for the general case. How > about we name the preference "mail.auto_use_simplest_charset" (or > "mail.auto_use_best_charset"), default to false. We can change the pref. name when we 'fix' bug 136664. For the now, we can just use what I have.

Boris 'pi' Piwinger

Comment 18

•

21 years ago

Sorry for being late, but I believe the reasoning in comment 0 is wrong. Here is my understanding: OE does not understand MIME and is not capable of producing it in any proper way. Now the problem is that OE if not properly configured does not set any MIME header whatsoever (so in almost all cases it does not). This means that you are not able to send any non ASCII message from OE. This is what will look like question marks on the receiver side if the charset is not guessed "correctly". That means it is not important if we set the proper charset or we don't. I have seen millions of postings from OE which answer to US-ASCII messages with non-ASCII charsets. The problem always was unconfigured OE which did not declare the charset and never that OE did missbehave because of the charset US-ASCII. So please double-check carefully what happens before making Mozilla as bad as OE with regards to MIME. pi

Jungshik Shin

Assignee

Comment 19

•

21 years ago

(In reply to comment #18) > Sorry for being late, but I believe the reasoning in comment 0 is wrong. Here is > my understanding: OE does not understand MIME and is not capable of producing it > in any proper way. Now the problem is that OE if not properly configured does snip... Sorry to snip your 'analysis', but you got it wrong. MS OE is capable of producing and interpreting MIME. The loss of information (with characters not covered by the current MIME charset converted to question marks) occurrs even when MS OE is configured to produce MIME-compliant messages. Besides, it's not just MS OE but other MUAs that have similar problems.

Jungshik Shin

Assignee

Comment 20

•

21 years ago

(In reply to comment #16) > RFC2046 states that we SHOULD downgrade the charset to the lowest common > denominator (see last paragraph of page 10): > http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10 I forgot abuot the provision in RFC 2046. Thanks for the reminder. However, we have to note that it's written quite a while ago and the best practice then is not necessarily the best practice now(may or may not be). Nonetheless, I agree with its spirit and am inclined to keep the pref. > bug 136664 is specifically about following this RFC and downgrading; the general > case of bug 86255 (which is about Japanese). Our current implementation is not generic enough in that other downgrading paths (other than to US-ASCII from virtually all character encodings) such as Windows-1252-> ISO-8859-1 (and its Greek equivalent), Windows-874->ISO-8859-11->TIS-620, GB18030->GBK->GB2312 are not supported. Over the last few years, its usefulness has diminished significantly, though. Perhaps, ill-I18Nized Eudora and web mail users would be beneficiaries of this feature. > used by bug 136664 if/when it ever gets implemented for the general case. How > about we name the preference "mail.auto_use_simplest_charset" (or > "mail.auto_use_best_charset"), default to false. We can change the pref. name when we 'fix' bug 136664. For the now, we can just use what I have, I guess.

Status: NEW → ASSIGNED

Boris 'pi' Piwinger

Comment 21

•

21 years ago

> Sorry to snip your 'analysis', but you got it wrong. MS OE is capable of > producing and interpreting MIME. It does not produce those by default and it fails badly in many situations to handle them (actually this is what this bug is about). > The loss of information (with characters not > covered by the current MIME charset converted to question marks) occurrs even > when MS OE is configured to produce MIME-compliant messages. Can you be more specific? I have seen uncountable broken OE messages, but not this one. Which characters exactly should be shown as question marks? The ones some user of OE types in an answer (as I said OE does not understand MIME). But that is something I have seen to work in all cases (ignoring the fact that OE does not by default declare them correctly). So there must be much more to this problem. Can you provide examples (should be easily found in usenet). Who sees the question marks? Already the typing OE user? > Besides, it's not > just MS OE but other MUAs that have similar problems. Never seen such problems. What exactly goes wrong with which reader? pi

Jungshik Shin

Assignee

Comment 22

•

21 years ago

(In reply to comment #21) > > Sorry to snip your 'analysis', but you got it wrong. MS OE is capable of > > producing and interpreting MIME. > > It does not produce those by default and it fails badly in many situations to This is not the place to debate about MS OE, but it has gone a long way since mid-1990's. I can't help wondering what version of MS OE you use. > handle them (actually this is what this bug is about). I reported this bug and you think I don't know what this bug is about. > > The loss of information (with characters not > > covered by the current MIME charset converted to question marks) occurrs even > > when MS OE is configured to produce MIME-compliant messages. > > Can you be more specific? I have seen uncountable broken OE messages, but not I don't know how to be more specific. The above sentence should be more than enough. If not, see comment #0 and comment #7. You're supposed to read them before commenting here. Have you?

Alan Tam

Comment 23

•

21 years ago

In fact I really see three different behaviors from MS OE when the charset chosen is not a superset of the text typed: 1. Warn you and ask if you want to change to UTF-8, just like what Mozilla does. 2. Convert all the characters brute force to '?'. 3. Send the message verbatim without any MIME information. I suspect some are version problems and some are configuration problems, either OE or Windows. After all, no matter how OE responds, if we send in UTF-8, then the precondition of this problem does not exist, hence no problem. Behavior 3 may still exist, but it is beyond the scope of this bug.

Brodie

Comment 24

•

21 years ago

Tam & Boris: The point is that this fixes a bug seen by customers. No functionality is being lost, it is just fixing a bug. The reason the pref is being added is so that people like yourselves can continue to use the original behaviour if you wish.

Alan Tam

Comment 25

•

21 years ago

I understand Boris's concern because we are changing Mozilla's default behavior. This can solve problems but can also create problems. My explanation was that we do not solve all problems, but we are not creating more. For sure I understand the bug - I am the reporter of bug 248794.

Boris 'pi' Piwinger

Comment 26

•

21 years ago

>> > Sorry to snip your 'analysis', but you got it wrong. MS OE is capable of >> > producing and interpreting MIME. >> >> It does not produce those by default and it fails badly in many situations to > > This is not the place to debate about MS OE, but it has gone a long way since >mid-1990's. I can't help wondering what version of MS OE you use. I have seen many version, including most up-to-date. Not a single one does produce MIME headers by default. Not a single version can handle quoted-printable (it will not set quoting symbols) and all ignore MIME headers completely (if it believes it sees UUencode). So There is very good reason to believe, that this is not something for Mozilla to change. >> handle them (actually this is what this bug is about). > > I reported this bug and you think I don't know what this bug is about. It is about the cause of problems. It is still not fully clear what the conditions are which do cause exactly which problem. Especially there is no bug in Mozilla. So before make Mozilla worse we need a very good to do so. >> Can you be more specific? I have seen uncountable broken OE messages, but not > > I don't know how to be more specific. Just answer my questions: 1) Where do the question marks appear for the first time under which conditions? 2) Give example messages of a) Mozilla message b) OE reply to a) >If not, see comment #0 and comment #7. You're supposed to read them >before commenting here. Have you? I also referred to it, if you read my comments. Once more: I have seen thousands of messages where OE users answer messages in US-ASCII and use characters in other charsets. pi

Jungshik Shin

Assignee

Comment 27

•

21 years ago

re comment #26: If comment #0, comment #7, bug 248794 (that includes what I forgot to mention in comment #0, namely a serious problem in the message header) and comment #23 didn't do the job for you (you keep believing that Tam and I made up a problem that doesn't exist or saying as if you did), I don't think anything more would be just wasting my, your and others' time. Do you have any idea what attachment 151423 [details] [diff] [review] does? If so, please tell me why are you are against the change?

Jungshik Shin

Assignee

Comment 28

•

21 years ago

(In reply to comment #27) > that doesn't exist or saying as if you did), I don't think anything more would > be just wasting my, your and others' time. s/don't// sorry for spamming.

Boris 'pi' Piwinger

Comment 29

•

21 years ago

>If comment #0, comment #7, bug 248794 (that includes what I forgot to mention in >comment #0, namely a serious problem in the message header) I cannot see a real problem there. For many years many people send ASCII and it works well with people using OE and answering in non-ASCII. So if there is a problem it must be way more specific than claimed. >and comment #23 >didn't do the job for you (you keep believing that Tam and I made up a problem >that doesn't exist or saying as if you did), I have seen the problem not to exist in uncountable cases. So the problem cannot be as general as claimed. >I don't think anything more would >be just wasting my, your and others' time. Why don't you just give an example, so we can look at it? >Do you have any idea what attachment 151423 [details] [diff] [review] does? If so, please tell me why are >you are against the change? It is best practice (and has ever been) to declare US-ASCII as the charset if this is what the message is. This worked ever since. Mozilla is not special in behaving this way. So I ask for exakt instructions to reproduce. The patch claims this would be a 'a major 'interoperability problem' with MS OE'. It is more than strange, that we don't have lots of really old bugs about that, there was never any report of that kind of problems in various mail and news reader newsgroups I follow including about OE and Mozilla. pi

Jungshik Shin

Assignee

Comment 30

•

21 years ago

What you have seen is case 3 in comment #23 (MS OE probably does that when it's configured to send out messages not compliant to MIME) and you've been insisting that that's __all__ MS OE every produces in replies to messages labelled with 'US-ASCII'. What I and Tam have been receiving is case 2 in comment #23 (that's how MS OE behaves when it's set to send out messages in more or less compliant to MIME). Now got it? > Why don't you just give an example, so we can look at it? Why do I have to prove it to you? You're the only skeptic here. Why don't you try it yourself? When you try, make sure to configure your MS OE to use encoding, for outgoing emails, other than 'Western(ISO)' and 'Western(Windows)' (for instance, set it to Simplified Chinese or Korean and include SC or Korean in your reply to an message labelled with ASCII) > Mozilla is not special in behaving this way. See comment #9. I'm well aware of that. I use Pine more often than Mozilla and I've had exactly the same problem with Pine, too. > It is best practice (and has ever been) to declare US-ASCII as the charset > if this is what the message is. Is this all you have? This argument is pretty weak. I've already given a few reasons to abide by that 'practice' in this very bug, but when weighed against the problem we have with other MUAs, they're rather weak, too, IMHO, which is why I want to go with the patch. Besides, there's a pref. you can turn on as mentioned a few times before.

Boris 'pi' Piwinger

Comment 31

•

21 years ago

>What you have seen is case 3 in comment #23 (MS OE probably does that when it's >configured to send out messages not compliant to MIME) Yes, this is the default behavior. Even if it is configured to produce correct messages according to MIME standards, it will go to case 1. I could not find a single situation where that case 2 applies. The suggested solution in comment 23 would be nice if everybody would use and understand Unicode which is not the case, OE and Mozilla can, others cannot. >and you've been insisting >that that's __all__ MS OE every produces in replies to messages labelled with >'US-ASCII'. What I and Tam have been receiving is case 2 in comment #23 (that's >how MS OE behaves when it's set to send out messages in more or less compliant >to MIME). Now got it? Not fully. The main question is what are the conditions to cause this behavior and how exactly does it happen. There are different things to consider: The charset used to display the message (AFAICS this cannot be ASCII, only an extension, no matter what the encoding says), this is normally used for sending (declared or not). For composing a message the windows system charset will be used in general. Later for sending it might be converted. The question is what exactly happens if the system charset does not suffice. Now what does the OE user see? Does he see the Chinese (or whatever language he uses) characters during composition? Is this changed (if so how) upon sending? What MIME headers does OE set (if any)? Will OE be able to display that message when receiving? This is where I would like to see an example attached to this bug. >> Why don't you just give an example, so we can look at it? > > Why do I have to prove it to you? Why can't you? > You're the only skeptic here. And I gave you the reason. If this were such a major problem there would have been many reports for a long time. Not only for Mozilla but for many other readers who send (and declare) US-ASCII by default. I am involved in such projects many years and haven't seen those reports. >Why don't you try it yourself? I am waiting for exact steps to reproduce. >> It is best practice (and has ever been) to declare US-ASCII as the charset >> if this is what the message is. > > Is this all you have? This argument is pretty weak. I find it very strong not to change standard behavior for a very broken reader where we even cannot clearly say which conditions do cause the problem. pi

Jungshik Shin

Assignee

Comment 32

•

21 years ago

(In reply to comment #31) > >What you have seen is case 3 in comment #23 (MS OE probably does that when it's > >configured to send out messages not compliant to MIME) > > Yes, this is the default behavior. It's not the default behavior at least in the Korean version of MS OE. By default it sends out MIME-compliant email messages (except for mail headers) in multipart/alternative (text/plain + text/html) > would be nice if everybody would use and understand Unicode which is not the > case, OE and Mozilla can, others cannot. Others? It's 2004 and the world has changed a lot. Many MUAs (including text-terminal-based clients like mutt and I18Nized version of Pine as included in SuSE linux and Solaris' default mail client like dtmail, let alone other GUI-based mail clients on Mac OS X, Linux/Unix and Windows) are now capable of handling multiple character encodings including UTF-8. Eudora-Windows may be one of a few among 'major MUAs' that still don't support MIME and I18N. And, there are stupid web-mail services like hotmail, yahoo mail, etc. Some opne-source web mail programs can handle MIME, multiple character encodings well (now), but commercial services lag behind. > >that that's __all__ MS OE every produces in replies to messages labelled with > >'US-ASCII'. What I and Tam have been receiving is case 2 in comment #23 (that's > >how MS OE behaves when it's set to send out messages in more or less compliant > >to MIME). Now got it? > > Not fully. The main question is what are the conditions to cause this behavior Nothing short of a very detailed step-by-step instruction seems to work for you. Here's one. 1. Send an ASCII-only message to yourself with Mozilla. 2. Configure your MS OE (in options | tools | send | international setting, set the default character encoding to 'Korean' [1]. You also have to select 'MIME' in send format, which is default in my version of MS OE, and check 'plain text'), read the message you sent with Mozilla and reply to it with Korean characters in the message body and the message subject (go to http://www.yahoo.co.kr and copy and paste a few Korean characters [1]). When you press 'send' button, here's a fun part : > and how exactly does it happen. There are different things to consider: Ok. I admit that I overestimated the 'intelligence' of my coorespondents. When an outgoing message has characters outside the character repertoire of the curerntly selected MIME charset, MS OE 6 prompts users to select one of the following three options; 1. send in Unicode, 2. send as is 3. cancel and go back. Apparently, all of my correspondents chose the second option _despite_ the warning message in the dialog box (some characters will be lost irreversibly). I thought MS OE made that choice behind their backs without warning them. So, at least MS OE 6 is smarter than I thought (MS OE 5 was not this smart and almost certainly used question marks for unrepresentable characters as far as I remember). Still, it'd be even better if it didn't offer the second choice at all (Mozilla offers only the first and the third choices in the same situation. It used to offer the second - data loss case - and the third choice, but I changed the behavior in bug 233361 and bug 194862). As you wrote and my correspondents' careless choice showed, most users are ignorant about Unicode or other character encodings so that we need to protect us (Mozilla users) from the 'misbehavior' and the wrong choice of users of MS OE and other mail clients (Brodie's correspondents use), which is what my patch is about. > Now what does the OE user see? Does he see the Chinese (or whatever > language he uses) characters during composition? As I wrote below, MS OE uses Unicode internally and during the composition, any Unicode character is visible as far as support for them is installed on Windows. > Will OE be able to display that message when receiving? Of course not. How can it figure out what characters were intended by the author when all it has are question marks? Neither can Mozilla. It can't do magic (If it's text/plain alone, there's absolutely no way to reverse the information loss. If it's multipart/alternative with text/html and text/plain parts, text/html part is **information-preserving**. see comment #0 as to why.) As for your musing about the way MS OE works (I'm sorry to say this, but your speculation about MS OE and use of the term 'ASCII extension' indicate that you're not so well-versed about I18N issues as you think), MS OE is fully-Unicode-capable and it internally uses Unicode exclusively (Needless to say, I don't have access to MS OE source code, but MS is not such a fool not to use Unicode that they have championed for the last 15 years internally). The character encoding conversion only occurs when it communicates with the external world. The same is true of Mozilla-mail/TB. >> Is this all you have? This argument is pretty weak. > I find it very strong not to change standard behavior for a very broken reader Firstly, MS OE is not so broken as you think. Secondly, a lot of people use it so that to allure some of its users to convert to Mozilla/TB, we should try to keep interoperability with it if it doesn't violate the standard downright. My change doesn't make Mozilla produce a non-standard message by any means. I am the last person to advocate a change that would violate the MIME standard. Thirdly, I've already made a much stronger cases to keep the old behavior than you made and reasoned that even that is not strong enough in 2004 (when we have a lot more widespread MIME and I18N support in MUAs than in mid-1990's), which is why your case is pretty weak. You have to begin with the rationale behind the provision in RFC 2046 (comment #16) > where we even cannot clearly say which conditions do cause the problem Please, s/we/I/. [1]I explicitly asked (and am doing it again in this comment) you to try with Korean or Chinese (in my previous comment) instead of Latin1(Western(ISO) in MS OE) or WIndows-1250(Western in MS OE) because MS OE is likely to use ISO-8859-1 even when replying to a message labelled as US-ASCII, which will make it impossible to reproduce the bug.

Boris 'pi' Piwinger

Comment 33

•

21 years ago

Thanks for this explanation, now things become much more understandable. >> >What you have seen is case 3 in comment #23 (MS OE probably does that when >> >it's configured to send out messages not compliant to MIME) >> >> Yes, this is the default behavior. > > It's not the default behavior at least in the Korean version of MS OE. OK, so then this is different, thanks for the information. >By default it sends out MIME-compliant email messages (except for mail >headers) in multipart/alternative (text/plain + text/html) How about text/plain alone? >> would be nice if everybody would use and understand Unicode which is not the >> case, OE and Mozilla can, others cannot. > > Others? It's 2004 and the world has changed a lot. With that argument OE would behave properly;-) Anyhow, the point being is that if we want to make special rules for special readers we most likely hurt others. >> Not fully. The main question is what are the conditions to cause this behavior > > Nothing short of a very detailed step-by-step instruction seems to work for >you. Here's one. I'll come back to that. >> and how exactly does it happen. There are different things to consider: > > Ok. I admit that I overestimated the 'intelligence' of my coorespondents. When >an outgoing message has characters outside the character repertoire of the >curerntly selected MIME charset, MS OE 6 prompts users to select one of the >following three options; 1. send in Unicode, 2. send as is 3. cancel and go >back. Apparently, all of my correspondents chose the second option _despite_ the >warning message in the dialog box (some characters will be lost irreversibly). OK, this is consistent with my information. So actually it is not our fault, but a user error (not even an OE bug). >Still, it'd be even better if it didn't offer the second choice at all Agreed, but we cannot change it. >As you wrote and my >correspondents' careless choice showed, most users are ignorant about Unicode or >other character encodings so that we need to protect us (Mozilla users) from the >'misbehavior' and the wrong choice of users of MS OE and other mail clients >(Brodie's correspondents use), which is what my patch is about. This seems to be the core question of the bug. I am glad our long discussion finally identified the real problem. You won't be surprised I draw a different conclusion. Let me give you a slightly different example: Say you have the same correspondence as originally described by you, but you -- the Mozilla user -- normally work under some different setting (like Russian or some European language). If the Chinese writer does not declare the charset (that question from above for text/plain), then you are also lost. Of course, this is also a user failure (combined with braindead OE defaults), but we cannot take care of that. So the question is for which problems we should. >> Will OE be able to display that message when receiving? > > Of course not. That's what I understood, I just wanted to be sure. So this makes a strong argument, that it is really OE's fault, and we are out of the business. >How can it figure out what characters were intended by the >author when all it has are question marks? I was not sure, if they are actually question marks or that would only displayed because of some settings. But now we know it is something the user chose to happen. pi

Jungshik Shin

Assignee

Comment 34

•

21 years ago

(In reply to comment #33) > >By default it sends out MIME-compliant email messages (except for mail > >headers) in multipart/alternative (text/plain + text/html) > > How about text/plain alone? Unless you explicitly turn off 'MIME' (which virtually no ordinary user is aware exists, letting alone changing it), it's still MIME-compliant by default like this: Subject: =?EUC-KR?B?..........==?= MIME-Version: 1.0 Content-Type: text/plain; charset="EUC-KR" Content-Transfer-Encoding: base64 Why don't you explore MS OE's configuration panels yourself? If you had, you wouldn't have asked the question like the above. > >> would be nice if everybody would use and understand Unicode which is not the > >> case, OE and Mozilla can, others cannot. > > > > Others? It's 2004 and the world has changed a lot. > > With that argument OE would behave properly;-) Anyhow, the point being is that It(the newest one, MS OE 6) behaves properly giving three options (although one of them had better not be there) as you wrote below. > a user error (not even an OE bug). I'm pretty sure, though, that old MS OEs had this bug of siliently resorting to question marks. > >other character encodings so that we need to protect us (Mozilla users) from the > >'misbehavior' and the wrong choice of users of MS OE and other mail clients > >(Brodie's correspondents use), which is what my patch is about. > > This seems to be the core question of the bug. I am glad our long discussion > finally identified the real problem. It was very clear from the beginning to everyone except for you. > You won't be surprised I draw a different > conclusion. Let me give you a slightly different example: Say you have the same > correspondence as originally described by you, but you -- the Mozilla user -- > normally work under some different setting (like Russian or some European > language). If the Chinese writer does not declare the charset (that question > from above for text/plain), then you are also lost. What do you mean by not declaring charset? Anyway, I don't see what you're up to here and as such why you think your scenario can be used to build a case against my patch. Actually, you could have made a better case (although I can easily refute it, too) with Chinese(GB2312) vs Russian(KOI8-R), but you didn't. > >> Will OE be able to display that message when receiving? > > > > Of course not. > > That's what I understood, I just wanted to be sure. So this makes a strong > argument, that it is really OE's fault, and we are out of the business. How could it be OE's falut that it's not able to conjure up something out of nothing (question marks)? Neither can Mozilla. Neither can I. Nobody can violate 'the second law of thermodynamics'. > >How can it figure out what characters were intended by the > >author when all it has are question marks? > > I was not sure, if they are actually question marks or that would > only displayed because of some settings. I suspected you're not despite the fact I clearly mentioned that there's _irreversible_ loss of information. (when I wrote 'irreversible', I really meant 'irreversible'. I wrote a simple program in early 1990's to recover the original content from MSB-stripped Korean emails in EUC-KR. That's possible because in EUC-KR, bytes with MSB should always come in pairs. There's a slight increase in entropy when MSBs are stripped off, but most of information is there. ). You thought I couldn't tell real question marks in the message from question marks that are rendered in place of some undecodable byte sequences? The fact that you're not sure only shows that you're not familiar with how character encodings work. Go to http://www.yahoo.co.kr and set the character encoding (in View) to ISO-8859-1 manually and see how many question marks come up.

Boris 'pi' Piwinger

Comment 35

•

21 years ago

>Why don't you explore MS OE's configuration panels yourself? If you had, you >wouldn't have asked the question like the above. Simply because defaults are very different for other language version. There no MIME headers are set by default. >> >other character encodings so that we need to protect us (Mozilla users) from the >> >'misbehavior' and the wrong choice of users of MS OE and other mail clients >> >(Brodie's correspondents use), which is what my patch is about. >> >> This seems to be the core question of the bug. I am glad our long discussion >> finally identified the real problem. > > It was very clear from the beginning to everyone except for you. Well, it took all the discussion to find it is a user error. Before you stated, it happens by OE itself. >> You won't be surprised I draw a different >> conclusion. Let me give you a slightly different example: Say you have the same >> correspondence as originally described by you, but you -- the Mozilla user -- >> normally work under some different setting (like Russian or some European >> language). If the Chinese writer does not declare the charset (that question >> from above for text/plain), then you are also lost. > > What do you mean by not declaring charset? Giving the charset in Content-Type. >Anyway, I don't see what you're up >to here and as such why you think your scenario can be used to build a case >against my patch. Once more: There are other situation where OE fails which we cannot influence. So why deal with the particular problem where users ignore warnings? >> >> Will OE be able to display that message when receiving? >> > >> > Of course not. >> >> That's what I understood, I just wanted to be sure. So this makes a strong >> argument, that it is really OE's fault, and we are out of the business. > > How could it be OE's falut that it's not able to conjure up something out of >nothing (question marks)? The original description suggested OE would just produce question marks, which it doesn't as we know now. That would have been OE's fault. >> >How can it figure out what characters were intended by the >> >author when all it has are question marks? >> >> I was not sure, if they are actually question marks or that would >> only displayed because of some settings. > > I suspected you're not despite the fact I clearly mentioned that there's >_irreversible_ loss of information. Yes, you also mentioned that things happen automatically which they don't. I had reasons to doubt the description. >The fact that >you're not sure only shows that you're not familiar with how character encodings >work. Why do you constantly try to insult me? Don't you have real arguments? Fact is: You original assessment was wrong. pi

Boris 'pi' Piwinger

Comment 36

•

21 years ago

>Unless you explicitly turn off 'MIME' (which virtually no ordinary user is >aware exists, letting alone changing it), it's still MIME-compliant by default I just got an additional information by one OE expert I asked about this problem. He suggests thtat this might also be different between mail and news. pi

Cheng Yuk Pong

Comment 37

•

21 years ago

I have make a Flash movie to demosation this bug: http://www.sdiz.net/moz/ warning: this file is over 6 MiB, it need a resoluation of at least 1024x768 to view. FYI, all those demo message are posted to netscape.public.test..

Jungshik Shin

Assignee

Comment 38

•

21 years ago

re comment #35: > >> This seems to be the core question of the bug. I am glad our long discussion > >> finally identified the real problem. > > > > It was very clear from the beginning to everyone except for you. > > Well, it took all the discussion to find it is a user error. Before you stated, > it happens by OE itself. You think that's important. Fine. However, that's not so important as the fact that I (and other non-Western-European users) receive frequently emails in which information is _not_ preserved, which can be easily avoided if we just label ASCII-only messages as in the character encoding selected by a Mozilla user *regardless of the cause of the problem* ( whether it is a user erorr or MUA-misbehavior). Moreover, as I found out yesterday with more experiments with MS OE and the flash movie by Cheng Yuk Pong (thanks !!) demonstrated, this also happens _automatically_ by MS OE when MIME is turned off. Apparently, you haven't observed this (instead you have seen numerous messages without C-T and C-T-E header fields but with characters from the GR part of ISO-8859-1 in the body which were replies to messages labelled as in US-ASCII.) because MS OE uses ISO-8859-1 when replying to messages labelled as in US-ASCII (comment #32, comment #30) This behavior of MS OE would not expose the problem being dealt with in this bug as long as you use characters within ISO-8859-1. However, characters outside ISO-8859-1 are all turned to question marks without any question asked (when MIME is turned off in MS OE). In short, MS OE users can inadvertently turns all characters outside ISO-8859-1 to questions marks in two ways, 1) by selecting 'send as is' (for which end-users are responsible, but most users are clueless about this kind of stuff so that we can't blame them too much) with MIME turned on 2) by turning off MIME (which is also their fault in a sense but MS OE should alert users in this case as well and MS OE should have a more sensible default for news. In case of mail, it's also users' fault because they must have turned off MIME on purpose.). One scenario in which this patch wouldn't help is that I send an ASCII-only message to my Russian friend with charset set to a Korean legacy encoding (EUC-KR) and he replies in Russian (with some Cyrllic letters not covered by EUC-KR. Quite a large subset of Cyrillic letters are covered by legacy CJK character sets) and chooses 'send as is'. Those Cyrillic letters (not representable in EUC-KR) are all lost even with this patch applied. However, I don't use EUC-KR any more and I always use UTF-8 (except when sending to some stupid web mail clients that don't handle UTF-8 well) so that in the above scenario, there's no problem with the patch applied. Without the patch applied, I would lose not just a few Cyrillic letters but all Cyrillic letters (because ISO-8859-1 that would be used by MS OE when replying to a message labelled as in US-ASCII doesn't cover any Cyrillic letter). As I mentioned before, there are potential problems with some web mail services and some old MUAs (comment #15) and rendering - not critical - issues (bug 86255). However, I think they're not as critical as data-loss problem.

Alan Tam

Comment 39

•

21 years ago

Anyway we could get the patch reviewed faster? It sounds strange to me to have sr+ but r?.

Scott MacGregor

Comment 40

•

21 years ago

Comment on attachment 151423 [details] [diff] [review] patch v3 (using nsIPrefBranch only) Thanks for clarifying jshin.

Attachment #151423 - Flags: review?(mscott) → review+

Scott MacGregor

Comment 41

•

21 years ago

patch v3 checked into the aviary 1.0 branch

Whiteboard: fixed-aviary1.0

Jungshik Shin

Assignee

Comment 42

•

21 years ago

thanks for review. fix checked into the trunk

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Alan Tam

Comment 43

•

21 years ago

WFM in Thunderbird aviary 20040707. Thanks a lot for all your contributions!

Jungshik Shin

Assignee

Comment 44

•

21 years ago

Comment on attachment 151423 [details] [diff] [review] patch v3 (using nsIPrefBranch only) asking for a 1.7branch. This was already checked into aviary 1.0 branch and should be safe.

Attachment #151423 - Flags: approval1.7.1?

Jungshik Shin

Assignee

Comment 45

•

21 years ago

re comment #38 : a bit more clarification for the record Although it was mentioned in earlier comments, in comment #38 it was not mentioned. _Regardless of_ whether MIME is on or not, MS Outlook (and I guess MS Outlook Express, too) does not prompt users _even if_ one or more of message header fields (e.g. Subject, From, To) have characters outside the current character repertoire and blindly replace them with question marks as long as the messae body doesn't have any unrepresentable character.

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: MailNews → Core

Asa Dotzler [:asa]

Comment 46

•

21 years ago

Comment on attachment 151423 [details] [diff] [review] patch v3 (using nsIPrefBranch only) a=asa for 1.7.x checkin.

Attachment #151423 - Flags: approval1.7.x? → approval1.7.x+

Arkadiusz 'Black Fox' Artyszuk

Comment 47

•

21 years ago

*** Bug 279530 has been marked as a duplicate of this bug. ***

Mike Cowperthwaite

Comment 48

•

20 years ago

*** Bug 226175 has been marked as a duplicate of this bug. ***

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

20 years ago

Summary: labelling 'ASCII only messages' as in US-ASCII leads to an interoperability problem with MS OE → labelling 'ASCII only messages' as in the US-ASCII charset leads to an interoperability problem with MS OE (shouldn't downgrade to ASCII)

Philip Prindeville

Comment 49

•

20 years ago

(In reply to comment #17) > (In reply to comment #16) > > RFC2046 states that we SHOULD downgrade the charset to the lowest common > > denominator (see last paragraph of page 10): > > http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10 > > I forgot abuot RFC 2046. Thanks for the reminder. However, we have to note > that it's written quite a while ago and the best practice then is not > necessarily the best practice now(may or may not be). Nonetheless, I agree with > its spirit and am inclined to keep the pref. Until a new RFC comes out that obsoletes RFC-2046, we should stick with it. > > bug 136664 is specifically about following this RFC and downgrading; the general > > case of bug 86255 (which is about Japanese). > > Actually, bug 86255 is not just about Japanese. Our current implementation is > not generic enough in that other downgrading paths (other than to US-ASCII from > virtually all character encodings) such as Windows-1252-> ISO-8859-1 (and its > Greek equivalent), Windows-874->ISO-8859-11->TIS-620, GB18030->GBK->GB2312 are > not supported. Over the last few years, its usefulness has diminished > significantly, though. Perhaps, ill-I18Nized Eudora and web mail users would be > beneficiaries of this feature. > > > used by bug 136664 if/when it ever gets implemented for the general case. How > > about we name the preference "mail.auto_use_simplest_charset" (or > > "mail.auto_use_best_charset"), default to false. > > We can change the pref. name when we 'fix' bug 136664. For the now, we can just > use what I have. I think that the default should agree with the recommendation of RFC-2046. I also think that "fixing" any third party software as a work-around to a known bug in MS OE is broken. In any case, downgrading content to the smallest inclusive character set is a valuable tool in Spam-fighting. If someone whose local is ZH or TR or JP, but they send a message in English that can be downgraded into "us-ascii" or "iso-8859-1" (as appropriate) when they post to an english-language mailing list (as a lot of Internet mailing lists are, since it's often the lingua franca of Internet discussions)... then correctly identifying the content as that of english-language content is a win: it minimizes the chances of the email being rejected based on being in the wrong language/locale for a given audience.

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core

patch 21 years ago Jungshik Shin 4.88 KB, patch		Details \| Diff \| Splinter Review
patch (using nsIPrefService/nsIPrefBranch) 21 years ago Jungshik Shin 7.34 KB, patch		Details \| Diff \| Splinter Review
patch v3 (using nsIPrefBranch only) 21 years ago Jungshik Shin 6.94 KB, patch	mscott : review+ Bienvenu : superreview+ asa : approval1.7.5+	Details \| Diff \| Splinter Review