Closed Bug 247958 Opened 20 years ago Closed 20 years ago

labelling 'ASCII only messages' as in the US-ASCII charset leads to an interoperability problem with MS OE (shouldn't downgrade to ASCII)

Categories

(MailNews Core :: Composition, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jshin1987, Assigned: jshin1987)

References

Details

(Keywords: intl, Whiteboard: fixed-aviary1.0)

Attachments

(1 file, 2 obsolete files)

Currently, Mozilla labels 'US-ASCII' only messages with 'Content-Type: text/*;
charset=US-ASCII' even if users select non-ASCII 8bit character encodings.
Arguably, this is the 'right' (well, it's not quite right vs wrong) thing to do.
However, this causes a rather serious 'incompatibility' with MS OE. MS OE
replaces all characters not representable in the currently selected character
encoding with question marks. 

When an MS OE user replies to a message sent with Mozilla/TB and labelled as in
US-ASCII (e.g. a Chinese user sends an email in English to her friend who uses
MS OE and replies in Chinese), MS OE turns any non-ASCII characters in the reply
to question marks without any warning whatsoever. As a result, there's an
irreversible loss of information. If MS OE is set up to send both text/plain and
text/html, its content can be recovered because MS OE uses NCRs to represent
characters not representable in the current character encoding in text/html.
However, if text/plain is not accompanied by text/html, there's no way to
recover the content other than asking the sender to resend it, which may not be
a possibility in some situations. 

Therefore, I think it's better to leave the user-selected MIME charset
(character encoding) alone even if the message body contains only ASCII
characters by default. In case some users want Mozilla/TB to behave differently,
we can add a pref. 'mail.label_ascii_only_mail_as_us_ascii' which is false by
default.
Attached patch patch (obsolete) — — Splinter Review
I added a pref. to control the behavior. It's off by default as I wrote in
comment #0.
Comment on attachment 151365 [details] [diff] [review]
patch

asking for r/sr.
Attachment #151365 - Flags: superreview?(bienvenu)
Attachment #151365 - Flags: review?(sspitzer)
+  nsCOMPtr<nsIPref> prefs(do_GetService(kPrefCID, &rv)); 

can you make this use nsIPrefService/nsIPrefBranch while you're here?
Do we really need *another* pref? This is the right thing to do, we should
always send the email using the user selected charset. 
Attached patch patch (using nsIPrefService/nsIPrefBranch) (obsolete) — — Splinter Review
per cbie's suggestion, I now use nsIPrefService/nsIPrefBranch. nsIPref was used
elsewhere in the function being patched.

 In other places in the file, nsIPref is used, but that has to be a separate
bug.
Attachment #151365 - Attachment is obsolete: true
I realized that this is not perf. critical at all so that reducing the code
size is more important (although the difference would be very small).
Attachment #151422 - Attachment is obsolete: true
Attachment #151365 - Flags: superreview?(bienvenu)
Attachment #151365 - Flags: review?(sspitzer)
Attachment #151423 - Flags: superreview?(bienvenu)
Attachment #151423 - Flags: review?(sspitzer)
My use-case is similar... I send out the same english email in UTF-8 encoding to
my Korean, Japanese and German translators. They all just reply and type the
text into the email and send. Since most mailers will by default reply to an
email using the characterset of the original email, I get the replies in a
readable charset. If I forget to set UTF-8 for the original or it is undone by
this bug in our mailer, then I will often get garbage. None of my translators
use OE. Thus this shouldn't be labelled as an interop problem with OE. 

It is actually a 100% legitimate bug in the mailer behaviour. The mailer should
*ALWAYS* respect the characterset that the user has selected. There is no time
that it should be automatically downgrading it to anything else even if it
appears to fit. The patch should be to just remove this behaviour altogether.
The comment about OE isn't required and there should be NO pref (this project is
plagued by prefs).
This was 'feature' was originally added for bug 86255.
(In reply to comment #7)

> text into the email and send. Since most mailers will by default reply to an
> email using the characterset of the original email, I

  Pine doesn't. Neither does mutt. Mozilla used to, but still does by default,
but that can be changed now (it has even a UI for that)

> It is actually a 100% legitimate bug in the mailer behaviour. 

   Not everyone agrees with you as you already found out in bug 86255. For one,
I'd NOT change the behavior if there weren't an interoperability problem with
stupid mail programs like MS OE (and others your correspondents use) which
silently converts  characters to question marks __behind the back of its
users__. (Mozilla always warns users of the problem in such a case)

> The mailer should
> *ALWAYS* respect the characterset that the user has selected. There is no time
> that it should be automatically downgrading it to anything else even if it
> appears to fit.

  Pine automatically downgrades to US-ASCII, too (it has done that for the last
10 years if not longer). Anyway, with the patch, downgrading is off by default
and only the die-hard 'purists' would bother to turn it on. Most users wouldn't
realize there is such a pref so that it doesn't matter except for a small
increase in the code size.
Attachment #151423 - Flags: superreview?(bienvenu) → superreview+
Comment on attachment 151423 [details] [diff] [review]
patch v3 (using nsIPrefBranch only)

asking mscott for r.
If you don't find any problem, can you check it into aviary-1.0 branch when
giving r?
Attachment #151423 - Flags: review?(sspitzer) → review?(mscott)
*** Bug 248794 has been marked as a duplicate of this bug. ***
Thanks for the patch! Hope it is in aviary soon as well.
Comment on attachment 151423 [details] [diff] [review]
patch v3 (using nsIPrefBranch only)

jshin, this seems like a pretty obscure pref. Seems like we should either do
this by default or not bother. I don't see our target user going out of there
way to set this pref by hand in prefs.js. My two cents after glancing at this
bug.
I think we have bug 86255 since some Japanese users want otherwise, because of
the specific deeds of Japanese charsets and/or fonts. But the rest of the world
may not see it fit. So probably there is a need for such an option.
(In reply to comment #13)
> (From update of attachment 151423 [details] [diff] [review])
> jshin, this seems like a pretty obscure pref. Seems like we should either do
> this by default or not bother. I don't see our target user going out of there
> way to set this pref by hand in prefs.js.

It's a tough call. For sure, it's obscure, which is why we'd never bother to
make UI (well, I know you think this is so obscure that it doesn't even deserve
a pref. entry). One of rationales for downgrading to the smallest MIME charset
must have been some mail clients (especially web-mail clients) may alert users
that 'this message is in a foreign character encoding... blahblah' even though
the entire content is in US-ASCII because they rely on the value of 'charset'
parameter in C-T header. Or, some MUAs may try to invoke an external (or
iconv()-like functions) program for the encoding conversion only to find that
the converter is not available on the system (a number of commercial Unix
installations in the US and Europe don't have converters for non-European
encodings installed by default)  even though there's no need because the entire
content is in US-ASCII.  For those who need to correspond with them, this pref.
can be useful. However, those users are rare and we may just as well remove the
pref and do without the downgrading (to improve the interoperability with major
MUAs in the market.)

So, what to do? If you feel strongly against adding the pref., I'll go without
it, which will reduce the code size slightly.
RFC2046 states that we SHOULD downgrade the charset to the lowest common
denominator (see last paragraph of page 10):
http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10

bug 136664 is specifically about following this RFC and downgrading; the general
case of bug 86255 (which is about Japanese).

I've changed my opinion from that stated above since reading the RFC and the
bug, and now feel that we should at least give users the option of following the
RFC. However, due to the stated MUA compatibility problems we should NOT enable
this by default (i.e. by default use the charset that the user selected).

However, we should have the pref which should be generalized such that it can be
used by bug 136664 if/when it ever gets implemented for the general case. How
about we name the preference "mail.auto_use_simplest_charset" (or
"mail.auto_use_best_charset"), default to false.
(In reply to comment #16)
> RFC2046 states that we SHOULD downgrade the charset to the lowest common
> denominator (see last paragraph of page 10):
> http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10

  I forgot abuot RFC 2046. Thanks for the reminder. However, we have to note
that it's written quite a while ago and the best practice then is not
necessarily the best practice now(may or may not be). Nonetheless, I agree with
its spirit and am inclined to keep the pref. 
 
> bug 136664 is specifically about following this RFC and downgrading; the general
> case of bug 86255 (which is about Japanese).

Actually, bug 86255 is not just about Japanese. Our current implementation is
not generic enough in that other downgrading paths (other than to US-ASCII from
virtually all character encodings) such as Windows-1252-> ISO-8859-1 (and its
Greek equivalent), Windows-874->ISO-8859-11->TIS-620, GB18030->GBK->GB2312 are
not supported. Over the last few years, its usefulness has diminished
significantly, though. Perhaps, ill-I18Nized Eudora and web mail users would be
beneficiaries of this feature. 

> used by bug 136664 if/when it ever gets implemented for the general case. How
> about we name the preference "mail.auto_use_simplest_charset" (or
> "mail.auto_use_best_charset"), default to false.

 We can change the pref. name when we 'fix' bug 136664. For the now, we can just
use what I have.

Sorry for being late, but I believe the reasoning in comment 0 is wrong. Here is
my understanding: OE does not understand MIME and is not capable of producing it
in any proper way. Now the problem is that OE if not properly configured does
not set any MIME header whatsoever (so in almost all cases it does not). This
means that you are not able to send any non ASCII message from OE. This is what
will look like question marks on the receiver side if the charset is not guessed
"correctly". That means it is not important if we set the proper charset or we
don't. I have seen millions of postings from OE which answer to US-ASCII
messages with non-ASCII charsets. The problem always was unconfigured OE which
did not declare the charset and never that OE did missbehave because of the
charset US-ASCII.

So please double-check carefully what happens before making Mozilla as bad as OE
with regards to MIME.

pi
(In reply to comment #18)
> Sorry for being late, but I believe the reasoning in comment 0 is wrong. Here is
> my understanding: OE does not understand MIME and is not capable of producing it
> in any proper way. Now the problem is that OE if not properly configured does
 snip...

Sorry to snip your 'analysis', but you got it wrong. MS OE is  capable of
producing and interpreting MIME. The loss of information (with characters not
covered by the current MIME charset converted to question marks) occurrs even
when MS OE is configured to produce MIME-compliant messages. Besides, it's not
just MS OE but other MUAs that have similar problems.
(In reply to comment #16)
> RFC2046 states that we SHOULD downgrade the charset to the lowest common
> denominator (see last paragraph of page 10):
> http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10

  I forgot abuot the provision in RFC 2046. Thanks for the reminder. However, we
have to note that it's written quite a while ago and the best practice then is
not necessarily the best practice now(may or may not be). Nonetheless, I agree
with its spirit and am inclined to keep the pref. 
 
> bug 136664 is specifically about following this RFC and downgrading; the general
> case of bug 86255 (which is about Japanese).

 Our current implementation is not generic enough in that other downgrading
paths (other than to US-ASCII from virtually all character encodings) such as
Windows-1252-> ISO-8859-1 (and its Greek equivalent),
Windows-874->ISO-8859-11->TIS-620, GB18030->GBK->GB2312 are not supported. Over
the last few years, its usefulness has diminished significantly, though.
Perhaps, ill-I18Nized Eudora and web mail users would be beneficiaries of this
feature. 

> used by bug 136664 if/when it ever gets implemented for the general case. How
> about we name the preference "mail.auto_use_simplest_charset" (or
> "mail.auto_use_best_charset"), default to false.

 We can change the pref. name when we 'fix' bug 136664. For the now, we can just
use what I have, I guess.

Status: NEW → ASSIGNED
> Sorry to snip your 'analysis', but you got it wrong. MS OE is  capable of
> producing and interpreting MIME.

It does not produce those by default and it fails badly in many situations to
handle them (actually this is what this bug is about).

> The loss of information (with characters not
> covered by the current MIME charset converted to question marks) occurrs even
> when MS OE is configured to produce MIME-compliant messages.

Can you be more specific? I have seen uncountable broken OE messages, but not
this one. Which characters exactly should be shown as question marks? The ones
some user of OE types in an answer (as I said OE does not understand MIME). But
that is something I have seen to work in all cases (ignoring the fact that OE
does not by default declare them correctly). So there must be much more to this
problem. Can you provide examples (should be easily found in usenet). Who sees
the question marks? Already the typing OE user?

> Besides, it's not
> just MS OE but other MUAs that have similar problems.

Never seen such problems. What exactly goes wrong with which reader?

pi
(In reply to comment #21)
> > Sorry to snip your 'analysis', but you got it wrong. MS OE is  capable of
> > producing and interpreting MIME.
> 
> It does not produce those by default and it fails badly in many situations to
  
  This is not the place to debate about MS OE, but it has gone a long way since
mid-1990's. I can't help wondering what version of MS OE you use. 

> handle them (actually this is what this bug is about).

 I reported this bug and you think I don't know what this bug is about.

> > The loss of information (with characters not
> > covered by the current MIME charset converted to question marks) occurrs even
> > when MS OE is configured to produce MIME-compliant messages.
> 
> Can you be more specific? I have seen uncountable broken OE messages, but not

 I don't know how to be more specific. The above sentence should be more than
enough. If not, see comment #0 and comment #7. You're supposed to read them
before commenting here. Have you? 

In fact I really see three different behaviors from MS OE when the charset
chosen is not a superset of the text typed:
1. Warn you and ask if you want to change to UTF-8, just like what Mozilla does.
2. Convert all the characters brute force to '?'.
3. Send the message verbatim without any MIME information.

I suspect some are version problems and some are configuration problems, either
OE or Windows.

After all, no matter how OE responds, if we send in UTF-8, then the precondition
of this problem does not exist, hence no problem. Behavior 3 may still exist,
but it is beyond the scope of this bug.
Tam & Boris: 
The point is that this fixes a bug seen by customers. No functionality is being
lost, it is just fixing a bug. The reason the pref is being added is so that
people like yourselves can continue to use the original behaviour if you wish.
I understand Boris's concern because we are changing Mozilla's default behavior.
This can solve problems but can also create problems. My explanation was that we
do not solve all problems, but we are not creating more.

For sure I understand the bug - I am the reporter of bug 248794.
>> > Sorry to snip your 'analysis', but you got it wrong. MS OE is  capable of
>> > producing and interpreting MIME.
>> 
>> It does not produce those by default and it fails badly in many situations to
>  
>  This is not the place to debate about MS OE, but it has gone a long way since
>mid-1990's. I can't help wondering what version of MS OE you use. 

I have seen many version, including most up-to-date. Not a single one does
produce MIME headers by default. Not a single version can handle
quoted-printable (it will not set quoting symbols) and all ignore MIME headers
completely (if it believes it sees UUencode). So There is very good reason to
believe, that this is not something for Mozilla to change.

>> handle them (actually this is what this bug is about).
>
> I reported this bug and you think I don't know what this bug is about.

It is about the cause of problems. It is still not fully clear what the
conditions are which do cause exactly which problem. Especially there is no bug
in Mozilla. So before make Mozilla worse we need a very good to do so.

>> Can you be more specific? I have seen uncountable broken OE messages, but not
>
> I don't know how to be more specific. 

Just answer my questions:

1) Where do the question marks appear for the first time under which conditions?

2) Give example messages of
a) Mozilla message
b) OE reply to a)

>If not, see comment #0 and comment #7. You're supposed to read them
>before commenting here. Have you? 

I also referred to it, if you read my comments. Once more: I have seen thousands
of messages where OE users answer messages in US-ASCII and use characters in
other charsets.

pi
re comment #26: 
If comment #0, comment #7, bug 248794 (that includes what I forgot to mention in
comment #0, namely a serious problem in the message header) and comment #23
didn't do the job for you (you keep believing that Tam and I made up a problem
that doesn't exist or saying as if you did), I don't think anything more would
be just wasting my, your and others' time. 

Do you have any idea what attachment 151423 [details] [diff] [review] does?  If so, please tell me why are
you are against the change? 
(In reply to comment #27)

> that doesn't exist or saying as if you did), I don't think anything more would
> be just wasting my, your and others' time. 

  s/don't//  

sorry for spamming.

>If comment #0, comment #7, bug 248794 (that includes what I forgot to mention in
>comment #0, namely a serious problem in the message header) 

I cannot see a real problem there. For many years many people send ASCII and it
works well with people using OE and answering in non-ASCII. So if there is a
problem it must be way more specific than claimed.

>and comment #23
>didn't do the job for you (you keep believing that Tam and I made up a problem
>that doesn't exist or saying as if you did),

I have seen the problem not to exist in uncountable cases. So the problem cannot
be as general as claimed.

>I don't think anything more would
>be just wasting my, your and others' time. 

Why don't you just give an example, so we can look at it?

>Do you have any idea what attachment 151423 [details] [diff] [review] does?  If so, please tell me why are
>you are against the change? 

It is best practice (and has ever been) to declare US-ASCII as the charset if
this is what the message is. This worked ever since. Mozilla is not special in
behaving this way. So I ask for exakt instructions to reproduce. The patch
claims this would be a 'a major 'interoperability problem' with MS OE'. It is
more than strange, that we don't have lots of really old bugs about that, there
was never any report of that kind of problems in various mail and news reader
newsgroups I follow including about OE and Mozilla.

pi
What you have seen is case 3 in comment #23 (MS OE probably does that when it's
configured to send out messages not compliant to MIME) and you've been insisting
that that's  __all__ MS OE every produces in replies to messages labelled with
'US-ASCII'. What I and Tam have been receiving is case 2 in comment #23 (that's
how MS OE behaves when it's set to send out messages in more or less compliant
to MIME).  Now got it?

> Why don't you just give an example, so we can look at it?

 Why do I have to prove it to you? You're the only skeptic here. Why don't you
try it yourself? When you try, make sure to configure your MS OE to use
encoding, for outgoing emails, other than 'Western(ISO)' and 'Western(Windows)'
(for instance, set it to Simplified Chinese or Korean and include SC or Korean
in your reply to an message labelled with ASCII) 

> Mozilla is not special in behaving this way. 

   See comment #9. I'm well aware of that. I use Pine more often than Mozilla
and I've had exactly the same problem with Pine, too. 

>  It is best practice (and has ever been) to declare US-ASCII as the charset 
> if this is what the message is.

  Is this all you have? This argument is pretty weak. I've already given a few
reasons to abide by that 'practice' in this very bug, but when weighed against
the problem we have with other MUAs, they're rather weak, too, IMHO, which is
why I want to go with the patch. Besides, there's a pref. you can turn on as
mentioned a few times before. 

 
>What you have seen is case 3 in comment #23 (MS OE probably does that when it's
>configured to send out messages not compliant to MIME) 

Yes, this is the default behavior. Even if it is configured to produce correct
messages according to MIME standards, it will go to case 1. I could not find a
single situation where that case 2 applies. The suggested solution in comment 23
would be nice if everybody would use and understand Unicode which is not the
case, OE and Mozilla can, others cannot.

>and you've been insisting
>that that's  __all__ MS OE every produces in replies to messages labelled with
>'US-ASCII'. What I and Tam have been receiving is case 2 in comment #23 (that's
>how MS OE behaves when it's set to send out messages in more or less compliant
>to MIME).  Now got it?

Not fully. The main question is what are the conditions to cause this behavior
and how exactly does it happen. There are different things to consider:

The charset used to display the message (AFAICS this cannot be ASCII, only an
extension, no matter what the encoding says), this is normally used for sending
(declared or not).

For composing a message the windows system charset will be used in general.
Later for sending it might be converted. The question is what exactly happens if
the system charset does not suffice.

Now what does the OE user see? Does he see the Chinese (or whatever language he
uses) characters during composition? Is this changed (if so how) upon sending?
What MIME headers does OE set (if any)? Will OE be able to display that message
when receiving? This is where I would like to see an example attached to this bug.

>> Why don't you just give an example, so we can look at it?
>
> Why do I have to prove it to you?

Why can't you?

> You're the only skeptic here. 

And I gave you the reason. If this were such a major problem there would have
been many reports for a long time. Not only for Mozilla but for many other
readers who send (and declare) US-ASCII by default. I am involved in such
projects many years and haven't seen those reports.

>Why don't you try it yourself? 

I am waiting for exact steps to reproduce.

>>  It is best practice (and has ever been) to declare US-ASCII as the charset 
>> if this is what the message is.
>
>  Is this all you have? This argument is pretty weak. 

I find it very strong not to change standard behavior for a very broken reader
where we even cannot clearly say which conditions do cause the problem.

pi
(In reply to comment #31)
> >What you have seen is case 3 in comment #23 (MS OE probably does that when it's
> >configured to send out messages not compliant to MIME) 
> 
> Yes, this is the default behavior. 

  It's not the default behavior at least in the Korean version of MS OE. By
default it sends out MIME-compliant email messages (except for mail headers) in
multipart/alternative (text/plain + text/html) 

> would be nice if everybody would use and understand Unicode which is not the
> case, OE and Mozilla can, others cannot.

  Others? It's 2004 and the world has changed a lot. Many MUAs (including
text-terminal-based clients like mutt and I18Nized version of Pine as included
in SuSE linux and Solaris' default mail client like dtmail, let alone other
GUI-based mail clients on Mac OS X, Linux/Unix and Windows) are now capable of
handling multiple character encodings including UTF-8.  Eudora-Windows may be
one of a few among 'major MUAs' that still don't support MIME and I18N.  And,
there are stupid web-mail services like hotmail, yahoo mail, etc. Some
opne-source web mail programs can handle MIME, multiple character encodings well
(now), but commercial services lag behind. 

> >that that's  __all__ MS OE every produces in replies to messages labelled with
> >'US-ASCII'. What I and Tam have been receiving is case 2 in comment #23 (that's
> >how MS OE behaves when it's set to send out messages in more or less compliant
> >to MIME).  Now got it?
> 
> Not fully. The main question is what are the conditions to cause this behavior

   Nothing short of a very detailed step-by-step instruction seems to work for
you. Here's one. 1. Send an ASCII-only message to yourself with Mozilla. 2.
Configure your MS OE (in options | tools | send | international setting, set the
default character encoding to 'Korean' [1]. You also have to select 'MIME' in
send format, which is default in my version of MS OE, and check 'plain text'),
read the message you sent with Mozilla  and reply to it with Korean characters
in the message body and the message subject (go to http://www.yahoo.co.kr and
copy and paste a few Korean characters [1]).  When you press 'send' button,
here's a fun part : 
  
> and how exactly does it happen. There are different things to consider:

 Ok. I admit that I overestimated the 'intelligence' of my coorespondents. When
an outgoing message has characters outside the character repertoire of the
curerntly selected MIME charset, MS OE 6 prompts users to select one of the
following three options; 1. send in Unicode, 2. send as is 3. cancel and go
back. Apparently, all of my correspondents chose the second option _despite_ the
warning message in the dialog box (some characters will be lost irreversibly). 
I thought MS OE made that choice behind their backs without warning them. So, at
least MS OE 6 is smarter than I thought (MS OE 5 was not this smart and almost
certainly used question marks for unrepresentable characters as far as I
remember). Still, it'd be even better if it didn't offer the second choice at
all (Mozilla offers only the first and the third choices in the same situation.
It used to offer the second - data loss case - and the third choice, but I
changed the behavior in bug 233361 and bug 194862).  As you wrote and my
correspondents' careless choice showed, most users are ignorant about Unicode or
other character encodings so that we need to protect us (Mozilla users) from the
'misbehavior' and the wrong choice of users of MS OE and other mail clients
(Brodie's correspondents use), which is what my patch is about.
 
> Now what does the OE user see? Does he see the Chinese (or whatever 
> language he uses) characters during composition?

  As I wrote below, MS OE uses Unicode internally and during the composition,
any Unicode character is visible as far as support for them is installed on
Windows. 

> Will OE be able to display that message when receiving? 

  Of course not. How can it figure out what characters were intended by the
author when all it has are question marks? Neither can Mozilla. It can't do
magic (If it's text/plain alone, there's absolutely no way to reverse the
information loss. If it's multipart/alternative with text/html and text/plain
parts, text/html part is **information-preserving**. see comment #0 as to why.)
As for your musing about the way MS OE works (I'm sorry to say this, but your
speculation about MS OE and use of the term 'ASCII extension'  indicate that
you're not so well-versed about I18N issues as you think), MS OE is
fully-Unicode-capable and it internally uses Unicode exclusively (Needless to
say, I don't have access to MS OE source code, but MS is not such a fool not to
use Unicode that they have championed for the last 15 years internally). The
character encoding conversion only occurs when it communicates with the external
world. The same is true of Mozilla-mail/TB.



>>  Is this all you have? This argument is pretty weak. 

> I find it very strong not to change standard behavior for a very broken reader

  Firstly, MS OE is not so broken as you think. Secondly, a lot of people use it
so that to allure some of its users to convert to Mozilla/TB, we should try to
keep interoperability with it if it doesn't violate the standard downright. My
change doesn't make Mozilla produce a non-standard message by any means. I am
the last person to advocate a change that would violate the MIME standard. 
Thirdly, I've already made a much stronger cases to keep the old behavior than
you made and reasoned that even that is not strong enough in 2004 (when we have
a lot more widespread MIME and I18N support in MUAs than in mid-1990's), which
is why your case is pretty weak. You have to begin with the rationale behind the
provision in RFC 2046 (comment #16)

> where we even cannot clearly say which conditions do cause the problem

  Please, s/we/I/. 

[1]I explicitly asked (and am doing it again in this comment) you to try with
Korean or Chinese (in my previous comment) instead of Latin1(Western(ISO) in MS
OE) or WIndows-1250(Western in MS OE) because MS OE is likely to use ISO-8859-1
even when replying to a message labelled as US-ASCII, which will make it
impossible to reproduce the bug.
Thanks for this explanation, now things become much more understandable.

>> >What you have seen is case 3 in comment #23 (MS OE probably does that when
>> >it's configured to send out messages not compliant to MIME) 
>> 
>> Yes, this is the default behavior. 
>
>  It's not the default behavior at least in the Korean version of MS OE.

OK, so then this is different, thanks for the information.

>By default it sends out MIME-compliant email messages (except for mail 
>headers) in multipart/alternative (text/plain + text/html) 

How about text/plain alone?

>> would be nice if everybody would use and understand Unicode which is not the
>> case, OE and Mozilla can, others cannot.
>
>  Others? It's 2004 and the world has changed a lot.

With that argument OE would behave properly;-) Anyhow, the point being is that
if we want to make special rules for special readers we most likely hurt others.

>> Not fully. The main question is what are the conditions to cause this behavior
>
>   Nothing short of a very detailed step-by-step instruction seems to work for
>you. Here's one. 

I'll come back to that.

>> and how exactly does it happen. There are different things to consider:
>
> Ok. I admit that I overestimated the 'intelligence' of my coorespondents. When
>an outgoing message has characters outside the character repertoire of the
>curerntly selected MIME charset, MS OE 6 prompts users to select one of the
>following three options; 1. send in Unicode, 2. send as is 3. cancel and go
>back. Apparently, all of my correspondents chose the second option _despite_ the
>warning message in the dialog box (some characters will be lost irreversibly). 

OK, this is consistent with my information. So actually it is not our fault, but
a user error (not even an OE bug).

>Still, it'd be even better if it didn't offer the second choice at all 

Agreed, but we cannot change it.

>As you wrote and my
>correspondents' careless choice showed, most users are ignorant about Unicode or
>other character encodings so that we need to protect us (Mozilla users) from the
>'misbehavior' and the wrong choice of users of MS OE and other mail clients
>(Brodie's correspondents use), which is what my patch is about.

This seems to be the core question of the bug. I am glad our long discussion
finally identified the real problem. You won't be surprised I draw a different
conclusion. Let me give you a slightly different example: Say you have the same
correspondence as originally described by you, but you -- the Mozilla user --
normally work under some different setting (like Russian or some European
language). If the Chinese writer does not declare the charset (that question
from above for text/plain), then you are also lost. Of course, this is also a
user failure (combined with braindead OE defaults), but we cannot take care of
that. So the question is for which problems we should.

>> Will OE be able to display that message when receiving? 
>
>  Of course not. 

That's what I understood, I just wanted to be sure. So this makes a strong
argument, that it is really OE's fault, and we are out of the business.

>How can it figure out what characters were intended by the
>author when all it has are question marks? 

I was not sure, if they are actually question marks or that would only displayed
because of some settings. But now we know it is something the user chose to happen.

pi
(In reply to comment #33)

> >By default it sends out MIME-compliant email messages (except for mail 
> >headers) in multipart/alternative (text/plain + text/html) 
> 
> How about text/plain alone?

  Unless you explicitly turn off 'MIME' (which virtually no ordinary user is
aware exists, letting alone changing it), it's still MIME-compliant by default
like this:

Subject: =?EUC-KR?B?..........==?=
MIME-Version: 1.0
Content-Type: text/plain; charset="EUC-KR"
Content-Transfer-Encoding: base64
 
Why don't you explore MS OE's configuration panels yourself? If you had, you
wouldn't have asked the question like the above. 


> >> would be nice if everybody would use and understand Unicode which is not the
> >> case, OE and Mozilla can, others cannot.
> >
> >  Others? It's 2004 and the world has changed a lot.
> 
> With that argument OE would behave properly;-) Anyhow, the point being is that

  It(the newest one, MS OE 6) behaves properly giving three options (although
one of them had better not be there) as you wrote below.

> a user error (not even an OE bug).

  I'm pretty sure, though, that old MS OEs had this bug of siliently resorting
to question marks.


> >other character encodings so that we need to protect us (Mozilla users) from the
> >'misbehavior' and the wrong choice of users of MS OE and other mail clients
> >(Brodie's correspondents use), which is what my patch is about.
> 
> This seems to be the core question of the bug. I am glad our long discussion
> finally identified the real problem. 

 It was very clear from the beginning to everyone except for you. 

> You won't be surprised I draw a different
> conclusion. Let me give you a slightly different example: Say you have the same
> correspondence as originally described by you, but you -- the Mozilla user --
> normally work under some different setting (like Russian or some European
> language). If the Chinese writer does not declare the charset (that question
> from above for text/plain), then you are also lost. 

  What do you mean by not declaring charset? Anyway, I don't see what you're up
to here and as such why you think your scenario can be used to build a case
against my patch. Actually, you could have made a better case (although I can
easily refute it, too) with Chinese(GB2312) vs Russian(KOI8-R), but you didn't.
    
> >> Will OE be able to display that message when receiving? 
> >
> >  Of course not. 
> 
> That's what I understood, I just wanted to be sure. So this makes a strong
> argument, that it is really OE's fault, and we are out of the business.

  How could it be OE's falut that it's not able to conjure up something out of
nothing (question marks)? Neither can Mozilla. Neither can I. Nobody can violate
'the second law of thermodynamics'. 

> >How can it figure out what characters were intended by the
> >author when all it has are question marks? 
> 
> I was not sure, if they are actually question marks or that would 
> only displayed because of some settings. 

  I suspected you're not despite the fact I clearly mentioned that there's
_irreversible_ loss of information. (when I wrote 'irreversible', I really meant
'irreversible'. I wrote a simple program in early 1990's to recover the original
content from MSB-stripped Korean emails in EUC-KR. That's possible because in
EUC-KR, bytes with MSB should always come in pairs. There's a slight increase in
entropy when MSBs are stripped off, but most of information is there. ). You
thought I couldn't tell real question marks in the message  from question marks
that are rendered in place of some undecodable byte sequences?  The fact that
you're not sure only shows that you're not familiar with how character encodings
work. Go to http://www.yahoo.co.kr and set the character encoding (in View) to
ISO-8859-1 manually and see how many question marks come up. 

>Why don't you explore MS OE's configuration panels yourself? If you had, you
>wouldn't have asked the question like the above. 

Simply because defaults are very different for other language version. There no
MIME headers are set by default.

>> >other character encodings so that we need to protect us (Mozilla users) from the
>> >'misbehavior' and the wrong choice of users of MS OE and other mail clients
>> >(Brodie's correspondents use), which is what my patch is about.
>> 
>> This seems to be the core question of the bug. I am glad our long discussion
>> finally identified the real problem. 
>
> It was very clear from the beginning to everyone except for you. 

Well, it took all the discussion to find it is a user error. Before you stated,
it happens by OE itself.

>> You won't be surprised I draw a different
>> conclusion. Let me give you a slightly different example: Say you have the same
>> correspondence as originally described by you, but you -- the Mozilla user --
>> normally work under some different setting (like Russian or some European
>> language). If the Chinese writer does not declare the charset (that question
>> from above for text/plain), then you are also lost. 
>
>  What do you mean by not declaring charset? 

Giving the charset in Content-Type.

>Anyway, I don't see what you're up
>to here and as such why you think your scenario can be used to build a case
>against my patch. 

Once more: There are other situation where OE fails which we cannot influence.
So why deal with the particular problem where users ignore warnings?

>> >> Will OE be able to display that message when receiving? 
>> >
>> >  Of course not. 
>> 
>> That's what I understood, I just wanted to be sure. So this makes a strong
>> argument, that it is really OE's fault, and we are out of the business.
>
>  How could it be OE's falut that it's not able to conjure up something out of
>nothing (question marks)?

The original description suggested OE would just produce question marks, which
it doesn't as we know now. That would have been OE's fault.

>> >How can it figure out what characters were intended by the
>> >author when all it has are question marks? 
>> 
>> I was not sure, if they are actually question marks or that would 
>> only displayed because of some settings. 
>
>  I suspected you're not despite the fact I clearly mentioned that there's
>_irreversible_ loss of information.

Yes, you also mentioned that things happen automatically which they don't. I had
reasons to doubt the description.

>The fact that
>you're not sure only shows that you're not familiar with how character encodings
>work. 

Why do you constantly try to insult me? Don't you have real arguments?

Fact is: You original assessment was wrong.

pi
>Unless you explicitly turn off 'MIME' (which virtually no ordinary user is
>aware exists, letting alone changing it), it's still MIME-compliant by default

I just got an additional information by one OE expert I asked about this
problem. He suggests thtat this might also be different between mail and news.

pi
I have make a Flash movie to demosation this bug:
   http://www.sdiz.net/moz/

warning: this file is over 6 MiB, it need a resoluation of at least 1024x768 to
view.

FYI, all those demo message are posted to netscape.public.test..
re comment #35:

> >> This seems to be the core question of the bug. I am glad our long discussion
> >> finally identified the real problem. 
> >
> > It was very clear from the beginning to everyone except for you. 
> 
> Well, it took all the discussion to find it is a user error. Before you stated,
> it happens by OE itself.

  You think that's important. Fine. However, that's not so important as the fact
that  I (and other non-Western-European users) receive frequently  emails in
which information is _not_ preserved, which can be easily avoided if we just
label ASCII-only messages as in the character encoding selected by a
Mozilla user *regardless of the cause of the problem* ( whether it is a user
erorr or MUA-misbehavior). 

Moreover, as I found out yesterday with more experiments with MS OE and the
flash movie by  Cheng Yuk Pong (thanks !!) demonstrated, this also happens
_automatically_ by MS OE when MIME is turned off. Apparently, you haven't
observed this (instead you have seen numerous messages without C-T and C-T-E
header fields but with characters from the GR part of ISO-8859-1 in the body
which were replies to messages labelled as in US-ASCII.) because MS OE uses
ISO-8859-1 when replying to messages labelled as in US-ASCII (comment #32,
comment #30)  This behavior of MS OE would not expose the problem being dealt
with in this bug as long as you use characters within ISO-8859-1. However,
characters outside ISO-8859-1 are all turned to question marks without any
question asked (when MIME is turned off in MS OE). 

  In short, MS OE users can inadvertently turns all characters outside
ISO-8859-1 to questions marks in two ways, 1) by selecting 'send as is' (for
which end-users are responsible, but most users are clueless about this kind of
stuff so that we can't blame them too much) with MIME turned on 2) by turning
off MIME (which is also their fault in a sense but MS OE should alert users in
this case as well and MS OE should have a more sensible default for news. In
case of mail, it's also users' fault because they must have turned off MIME on
purpose.).

 One scenario in which this patch wouldn't help is that I send an ASCII-only
 message to my Russian friend with charset set to a Korean legacy encoding
 (EUC-KR) and he replies in Russian (with some Cyrllic letters not
 covered by EUC-KR. Quite a large subset of Cyrillic letters are covered
 by legacy CJK character sets) and chooses 'send as is'. Those Cyrillic
 letters (not representable in EUC-KR) are all lost even with this patch
 applied.  However, I don't use EUC-KR any more and I always use UTF-8
 (except when sending to some stupid web mail clients that don't handle
  UTF-8 well) so that in the above scenario, there's no problem with
 the patch applied. Without the patch applied, I would lose not just
 a few Cyrillic letters but  all Cyrillic letters (because ISO-8859-1
 that would be used by MS OE when replying to a message labelled as in
 US-ASCII doesn't cover any Cyrillic letter).

As I mentioned before, there are  potential problems with some web mail services
and some old MUAs (comment #15) and rendering - not critical - issues (bug
86255). However, I think they're not as critical as data-loss problem. 

Anyway we could get the patch reviewed faster? It sounds strange to me to have
sr+ but r?.
Comment on attachment 151423 [details] [diff] [review]
patch v3 (using nsIPrefBranch only)

Thanks for clarifying jshin.
Attachment #151423 - Flags: review?(mscott) → review+
patch v3 checked into the aviary 1.0 branch
Whiteboard: fixed-aviary1.0
thanks for review. fix checked into the trunk
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
WFM in Thunderbird aviary 20040707. Thanks a lot for all your contributions!
Comment on attachment 151423 [details] [diff] [review]
patch v3 (using nsIPrefBranch only)

asking for a 1.7branch.

This was already checked into aviary 1.0 branch and should be safe.
Attachment #151423 - Flags: approval1.7.1?
re comment #38 : a bit more clarification for the record

Although it was mentioned in earlier comments, in comment #38 it was not
mentioned. _Regardless of_  whether MIME is on or not, MS Outlook (and I guess
MS Outlook Express, too) does not prompt users _even if_  one or more of message
header fields (e.g. Subject, From, To) have characters outside the current
character repertoire and blindly replace them with question marks as long as the
messae body doesn't have any unrepresentable character.
Product: MailNews → Core
Comment on attachment 151423 [details] [diff] [review]
patch v3 (using nsIPrefBranch only)

a=asa for 1.7.x checkin.
Attachment #151423 - Flags: approval1.7.x? → approval1.7.x+
*** Bug 279530 has been marked as a duplicate of this bug. ***
*** Bug 226175 has been marked as a duplicate of this bug. ***
Summary: labelling 'ASCII only messages' as in US-ASCII leads to an interoperability problem with MS OE → labelling 'ASCII only messages' as in the US-ASCII charset leads to an interoperability problem with MS OE (shouldn't downgrade to ASCII)
(In reply to comment #17)
> (In reply to comment #16)
> > RFC2046 states that we SHOULD downgrade the charset to the lowest common
> > denominator (see last paragraph of page 10):
> > http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2046.html#page-10
> 
>   I forgot abuot RFC 2046. Thanks for the reminder. However, we have to note
> that it's written quite a while ago and the best practice then is not
> necessarily the best practice now(may or may not be). Nonetheless, I agree with
> its spirit and am inclined to keep the pref. 

Until a new RFC comes out that obsoletes RFC-2046, we should stick with it.


> > bug 136664 is specifically about following this RFC and downgrading; the general
> > case of bug 86255 (which is about Japanese).
> 
> Actually, bug 86255 is not just about Japanese. Our current implementation is
> not generic enough in that other downgrading paths (other than to US-ASCII from
> virtually all character encodings) such as Windows-1252-> ISO-8859-1 (and its
> Greek equivalent), Windows-874->ISO-8859-11->TIS-620, GB18030->GBK->GB2312 are
> not supported. Over the last few years, its usefulness has diminished
> significantly, though. Perhaps, ill-I18Nized Eudora and web mail users would be
> beneficiaries of this feature. 
> 
> > used by bug 136664 if/when it ever gets implemented for the general case. How
> > about we name the preference "mail.auto_use_simplest_charset" (or
> > "mail.auto_use_best_charset"), default to false.
> 
>  We can change the pref. name when we 'fix' bug 136664. For the now, we can just
> use what I have.

I think that the default should agree with the recommendation of RFC-2046.

I also think that "fixing" any third party software as a work-around to a known bug in MS OE is broken.

In any case, downgrading content to the smallest inclusive character set is a valuable tool in Spam-fighting.

If someone whose local is ZH or TR or JP, but they send a message in English that can be downgraded into "us-ascii" or "iso-8859-1" (as appropriate) when they post to an english-language mailing list (as a lot of Internet mailing lists are, since it's often the lingua franca of Internet discussions)... then correctly identifying the content as that of english-language content is a win: it minimizes the chances of the email being rejected based on being in the wrong language/locale for a given audience.


Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: