Closed Bug 109342 Opened 23 years ago Closed 22 years ago

Euro symbol turned into "EUR" in sent mail (plain text)

Categories

(MailNews Core :: Composition, defect, P4)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: adamlock, Assigned: nhottanscp)

References

Details

(Whiteboard: [adt2 rtm])

Attachments

(2 files, 8 obsolete files)

Write an email containing an Euro symbol (?) and it's translated into "EUR" for
the recipient.

When you hit the send button Moz complains "The message you composed contains
characters not found in the selected character coding so your message become
unreadable after you send it."

The appears to be do with the default character set, ISO-8859-1. If as is likely
we should be using ISO-8859-15 then the mailnews.js default prefs needs to be
updated to specify that as the default. Most people, especially in Europe
haven't the first clue about character encoding so it is important that this is
the default unless there's a good reason to the contrary.
Ack, the question mark in the brackets is actually a Euro symbol. Perhaps I
should raise a bug on the browser part as well.
> Most people, especially in Europe haven't the first clue about 
> character encoding so it is important that this is the default ..

Adam, please provide the source for this opinion. 
Let me also provide some of the reasons why we did not choose
ISO-8859-15 as the default.

1. ISO-8859-15 migh cause a backward compatibility problem. For example, 
   Comm 4.x supports 8859-15 only on Unix platform
2. ISO-8859-15 may not be supported by Windows platform clients such as
   Outlook Express and others. They have Windows-1252 and that will do
   the job.
3. We have not seen any wide spread use of ISO-8859-15 and hesitate to
   set that as the default mail encoding.
4. We have manually the selected default mail encoding for each language or
   language group we support. We have an NS-internal document which spelled 
   out the specs. (Hopefully I will clean it up soon so that it can
   be published externally.)

Ideally we should have a language preference choice for users when they
create a new profile. We should then set all encoding related defaults
based on that. There is no automatic way of setting these values -- we
need to have a manual table.
I realize that as Europe adopts the Euro currency officially next
year we need a way to deal with this. For mail, I am inclined to
move to Unicode (UTF-8) for this. This is something that is promoted as 
the standard charset for a variety of web standards and there is also
recommendation from IMC (Internet Mail Consortium). See this
page:

http://www.imc.org/mail-i18n.html
By the way, this should be a problem only in plain text mail. HTML mail
will use NCRs for this character.
Holger Metzger suggested in theMozilla MailNews ML that
ISO-8859-15 would be most backward compatible. 

I am not sure if this is true. In my opinion, ISO-8859-15
has not spread widely. I don't think there is a single
encoding, Windows-1252, UTF-8, or ISO-8859-15, which will not
cause backward compatibility problem to one client or another.
It seems to me that we should probably look forward to the
future now on this issue.
I would rather we move to UTF-8 which is what most suggest 
is the future standard. 

One suggestion would be to explicitly suggest UTF-8 when
a mail msg cannot deal with the EURO and other characters
in a chosen mail encoding. This way, users will keep on using
ISO-8859-1 for most cases and we will have a partial 
transition to UTF-8 msgs when special characters in it.

I say this because I don't see firm evidence that ISO-8859-15 is
going to be adopted as the Euro standard. Rather than moving
to an interim standard, we can stay put and use UTF-8 sparingly
for special cases.
I think to use UTF-8 per default when the characters used exceed the US-ASCII
limit  is dangerous. There are still many mail/newsreaders out there which do
not understand UTF-8 (Xnews for example on the Windows platform). IIRC
iso-8859-15 is not a problem for Outlook Express - it uses windows-1252 in any
case, because in most cases OE doesn't check the headers anyways. Mozilla should
be conservative in what to send out, play nicely with other mail/newsreaders. :-)
Best thing would be to ask the user...

"Your characters used extend the US-ASCII limit, which character set do you want
to use to send the message?"

And then a dropdown message maybe with recommendations on top.
> I think to use UTF-8 per default when the characters used 
> exceed the US-ASCII limit  is dangerous.

I'm not suggesting this at all. I am suggesting that we use
UTF-8 only when the characters exceed ISO-8859-1 limit.
UTF-8 is becoming more and more prevalent and so 
we should recommedn its support. If your newsreader or
other mail program cannot deal with it, I think it it time
to move to a new program. We should not hold progress
for the lowest common denominator.

BTW, there is nothing wrong in using EUR, the official 
abbreviation for the Euro symbol.

Please see this explanation from the EC:

http://europa.eu.int/euro/html/rubrique-cadre5.html?pag=rubrique-defaut5.html|lang=5|rubrique=221|chap=15

The limitation appears only when plain text mail is used.
Should we really change the mail standard to ISO-8859-15 when
all we need to do is use "EUR" in that case? if you want to
use the real character, you can use Win-1252, UTF-8 or
ISO-8859-15, whichever you think the recipient will 
appreciate. In te abosence of the real standard which has
been registered as such, I hesitate to use ISO-8859-15. If there
is a RFC or some other proposal which has 8859-15 as the mail
standard for Western scripts, please let us know.

HTML mail uses NCRs and so will pose no problem at present.

So my real preference might be to do nothing in this case and let the
user decide to use UTF-8, ISO-8859-15 or ISO-8859-1, etc.


> I'm not suggesting this at all. I am suggesting that we use
> UTF-8 only when the characters exceed ISO-8859-1 limit.

ah, ok, now that's better. I think it should be more like this:
US-ASCII --> iso-8859-1 --> iso-8859-15 --> UTF-8

> If your newsreader or
> other mail program cannot deal with it, I think it it time
> to move to a new program. We should not hold progress
> for the lowest common denominator.

Some people don't have the choice to simply upgrade their program.

> BTW, there is nothing wrong in using EUR, the official 
> abbreviation for the Euro symbol.

Of course not. You can also substitute umlauts to be on the safe side, or
convert all extended characters back to US-ASCII, now that would be really safe. :-)

Holger
If UTF-8 supports the special characters of most countries and if UTF-8 has been
designated by the powers that be to become the desired standard, then we should
support this forward looking approach. I suggest to bring up a window when there
are special characters that asks/informs (incl. "[ ]ask next time") to use UTF-8.

The only two reasons against this that were presented were:

1. Some mail readers don't handle UTF-8.
Why not? Is it difficult to implement? If not, then tough sh**t. The programmers
should update their software pronto or loose those clients who want to send/read
special characters.

2. Some people don't have a choice to upgrade their software?
Huh! Who? Employees of companies with paranoid & deaf IT personnel (they exist)?
Then the employees need to tell the IT what they need. The IT is there to serve
the needs of the employees, not the other way around. 

I don't think this will be a problem if the solution (an UTF-8 capable mail
proggy) is sufficiently publicised.

BTW. How badly mangled would the text be if it is sent UTF-8 and received by a
iso-8859-1 only reader? If the answer is "not much", then let's move *forward*.

PS. Netscape can always bring out NC4.79 with UTF-8 support ;)
I defer to other people's expert opinion on the best way to support the Euro
symbol, but I believe it boils down to:

1. Picking a sensible default character set from the locale when a profile is
first created.
2. Checking for unencodable symbols during composition and giving the user a
message in plain English explaining how they may proceed.
I think the best way to deal with this is to use the best "minimal" encoding
possible, like Forte Agent does it: Ascending Charsets.
1.) US-ASCII. The minimal charset. That's the basic charset.
2.) iso-8859-1 --> already necessary for a line like "Just my 2 ¢" needs it.
3.) iso-8859-15 (or windows-1252) --> extended iso-8859-1 with euro support.
Compatible to muas/nuas that don't understand utf-8, but have no problem with
basic iso-8859-1. They exist, and they will exist for quite some time. Be
conservative in what you send. You don't know what the recipient uses. We
shouldn't use a complex charset just because it's available and some readers
know how to deal with it. Most Windows readers don't have a clue about it
anyways, they use windows-1252 for displaying a message, and utf-8 looks
horrible in raw text. :-)
4.) utf-8, as a last resort.
Adam, thanks for summarizing the issues.

> 1. Picking a sensible default character set from the locale 
> when a profile is first created.

This is a much broader issue than this bug and it involves 
not just this particular case but many others and so
we should deal with this in an overall context of 
setting the defaults based on the user's lang choice.
I will try to create a new bug for it with some specs.

In the mean time, it might be useful to publish a page
which lists default charsets for a variety of languages.
(I can work on this list and publish it under International
Users page <-- "Help | Help & Support Center" menu.

> 2. Checking for unencodable symbols during composition 
> and giving the user a message in plain English explaining 
> how they may proceed.

We can try to make the current warning better. Let's see how
much improvement we can make here. It might get too wordy for
a dialog. We might want to create a help item that explains
how the user should proceed. To account for many different
ways in which this warning comes up will not be that easy.
I don't think we can just special-case the Euro example.
Accepting but not sure I am the right person! Naoki, is it for you?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
I take this but the reported behavior is an expected one for plain text mail
because no way to encode EURO in ISO-8859-1. Sending as other charset (e.g.
windows-1252, ISO-8859-15) may not be understood by other mail clients.
This is same for smart quotes, bullet, ellipsis, trademark, etc...
Assignee: ducarroz → nhotta
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.9 → ---
I would be nice if you could send as HTML.
Jean-Francois, is that possible to convert a plain text message and send as HTML?
Summary: Euro symbol turned into "EUR" in sent mail. → Euro symbol turned into "EUR" in sent mail (plain text)
Sur it could be possible but what for? the goal of using plain text is to make
sure every mail reader will be able to display it correctly!
Right, but offering it as an option would not be bad, for the case like this.
The user might understands plain<->html difference better than different charset
names.
Marina,
If this is not in your area please reassign to the right person.  This is not my 
bug. 
QA Contact: sheelar → marina
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
I really don't think, we should suggest HTML just because of the Euro symbol.
HTML has many severe backward-compatibility problems, many more than iso-8859-15.

I also don't think that we should just ask the user. Charsets are something that
very few - if any - European users understand, since we have a very limited
character set and expect it to just work.

I think we should use ascii or iso-8859-1 when possible. If there is a Euro
symbol, use iso-8859-15, utf-8 or convert to EUR, I don't know.
But I'd do it transparently in any case. Imagine you'd get a(n additional)
dialog box whenever you use "$".
Now I want to make a simple statement. I cannot see something as a problem that
functions fine in most of the cases - if that is so. Mozilla does not have the
standing to allow for over-controlled behaviour. So possibly one suggestion
would be to leave encoding/interpretation as is in NN4 until someone finds the
gutts to switch to a newer encoding standard. Why not? It works!

Second, one thing must be clear: translating to "EUR" may not be a final
solution but only intermediary. Other it would mean an offence against the
Euroean currency - immagine "$" would be translated into "USD"! It also hampers
the grafical design pattern of a text, which could be undesirable in cases.

Third, ok, if you insist on purity: avoid warnings that the user probably
doesn't understand. That's how it is now. It's awful! If the user is in a hurry
this could mean a heart-attack! I'ld recommend a two level transformation
mechanism. First, give an option somewhere in preferences "automatically switch
to appropriate encoding standard when sending messages" and tick it TRUE by
default. Then do what is necessary in the mails without bothering the user.
Second, if that option is unticked, bring up a warning before sending - just
similar as it is now - *but with a qualified option*, that means say: "We cannot
send as is but you have the option a) send with encoding iso-8859-15 (or
whatever) b) translate symbol x into literal "EUR" (or whatever it is) c) go
back to edit. Additionally say which is recommended. 

That means someone would have to work out typical conversion cases and make a
table of suggestions of it.

- Wolf 
Sample email that I sent to myself from the Mac OS X Mail application. Note it
uses quoted-printable to embed Euro symbols as =80.
Nice test! Thing is, people can only see the message as plain text (encoding) in
the appendix, not as it appears in the mailer. But I suppose it shows well as
the charset is WINDOWS-1252. Interesting! Is that an indication against the
theory of  heavy cross-platform mismatch (of encoding 0x80)? I have no idea what
the relations are. Does Mac now use WINDOWS-1252 normally? An analysis or
well-based estimation on the quantity relations of a possible mismatch of 0x80
might help us find a decission on an interim solution for the Euro transmission.

- Wolf
Comment on attachment 65892 [details]
Sample email containing euro symbols

Fixing mimetype. Try to open it again.
Attachment #65892 - Attachment mime type: text/plain → message/rfc822
Err, sorry to butt in, but as far as I understand it the default charset for
European locales is already ISO-8859-1.

As the difference between ISO-8859-1 and ISO-8859-15 is only the Euro symbol,
and most mail readers understand ISO-8859-1, why not just switch transparently
to ISO-8859-15 when a message containing the euro symbol is sent? It wouldn't
even be necessary to warn the user...

If the reader understands ISO-8859-15 the symbol will display correctly, else it
will just display one character of garbage. An European seeing the garbage
character would probably interpret it correctly as meaning Euro based on
context, which is probably better to converting it to EUR.

In contrast, in UTF-8 the euro symbol becomes three special characters, which is
probably harder for a human to interpret as an euro symbol.
> If the reader understands ISO-8859-15 the symbol will display correctly, else it
> will just display one character of garbage.

Not necessarily. Maybe it notices that it doesn't support ISO-8859-15 and freaks
out. The fact that ISO-8859-1 and ISO-8859-15 are almost identical is a
"coincidence" - ISO-8859-14 is completely difference, I guess.

> In contrast, in UTF-8 the euro symbol becomes three special characters

Worse yet, all umlauts become 2 or 3 special chars.
My preference would be to make it more obvious to the user to choose the right
charset for their locale in the first place, (e.g. a page in the New Mail/News
Account wizard) and secondly settle on a way to deal with old email readers.

My gut reaction is find out which mail readers have problems and if there's only
a few then say to hell with them. Most email readers nowadays *should* be able
to cope with charsets, and if not then they'll see an odd character where
there's supposed to be a euro. Time to upgrade.
Another strategy would be wrapping the message in different encodings in a 
MIME multipart/alternative message.

This solution also has its drawbacks, but I think it deserves mentioning.
I used windows-1252 because it is a super set of ISO-8859-1, also it takes care
more than Euro symbol. I think mail clients which understand MIME charset can
also take care windows-1252.
IMHO, using ISO-8859-1 by default, and ISO-8859-15 if and only if a Euro symbol
is encountered is the correct solution.

Since all ISO-8859 series are a superset of US-ASCII, all conformant
mailers that do not recognize ISO-8859-15, must at least be able to
show the characters in the US-ASCII set. (RFC2046/2049)

Futhermore, more and more mailers support or will support ISO-8859-15.

This can and should be handled transparantly.
> IMHO, using ISO-8859-1 by default, and ISO-8859-15 if and only
> if a Euro symbol is encountered is the correct solution.
> [...]
> This can and should be handled transparantly.

I quite agree. Why should we use to Windows-1252 instead of ISO-8859-15?
I used windows-1252 because it will conver more characters.
I filed bug 124198 for enhancement of charset fallback.
I guess we are not going to settle this issue on which 
encoding to use as the 1st fallback, 8859-15, windows-1252, or utf-8. 
If you're going to implement an automatic fallbcak, I would like to 
suggest the following:

1. Automatically fallback to ISO-8859-15 if the Euro and other characters
   are covered by it.
2. If there are charaters outside of 8859-15, then fall back to 
   Windows-1252 next.

So this is a 2-step process. Even if we implement something like this,
I think we should review this in the near future and see if the
right thing may not be moving to UTF-8. I heard a rumor that 
Outlook Express may move to UTF-8 as the non-ASCII standard. I have
not confirmed it but it is something to think about for the future.
I can implement more than one fallback for ISO-8859-1 if that is necessary (but
I prefer to do it as part of bug 124198).
Why ISO-8859-15 is preferred over windows-1252? Is that supported by more mail
clients? OE support ISO-8859-15?
> Why ISO-8859-15 is preferred over windows-1252?

Because the former is a standard, while the latter is controlled by Microsoft.
I am concerned about backward compatibility like others.
Let me provide some test results on Windows:

Receiving Euro in plain text mail body:

Communicator 4.79: 8859-15 (NO on Win/Mac. OK on Unix), Win-1252 (OK), UTF-8 (OK)
                   8859-1 (OK)
Eudora 5.1: 8859-15 (NO), Win-1252 (OK), UTF-8 (NO)
             8859-1 (OK)
Outlook Express: 8859-15 (OK - Latin 9), Win-1252 (OK), UTF-8 (OK)
                 8859-1 (NO)
Mozilla/NS 6: 8859-15 (OK), Win-1252 (OK), UTF-8 (OK)
            8859-1 (OK)

** You can tell from this that Mozilla is the most tolerant viewer of
   the Euro character.

The header display support is somewhat less than this since
Comm 4.79, Eudora 5.1 and Outlook Exress 5.5 all depend on the
system default charset for display. Thus Mozilla is the only
Mail that displays the Euro in headres as is under any of the 
4 encodings on any language version of an OS, e.g Japanese
Windows.

You can draw your own conclusion but Windows-1252 is the only
encoding that works with all of the above for plain text body 
dislay. Like others I don't think it is good to spread Win-1252
since it contains Win-only characters in 0x80 - 0x90F range
but for the Euro, it is not a bad choice. 

If there is an RFC that said 8859-15 is the new mail standard for 
Western languages, then I am all for it. But there is no declared
standard except by practice, ISO-8859-1. 

I would like to ask others if Win-1252 as the fallback would
break Mail programs on Mac or Unix. If so, we may have a 
case for an option to choose which encoding the user prefers as
the 1st fallback.
Test results using MacOS X 10.1.

Outlook Express 5.02: 8859-15 (OK), Win-1252 (OK), UTF-8 (OK)
Mac OSX mail 1.1: 8859-15 (OK), Win-1252 (OK), UTF-8 (OK)

Both clients show body and header correctly. They do not have ISO-8859-15, so
not able to send in ISO-8859-15. For reply, OE forces me to use UTF-8, Mac mail
silently changes a charset to windows-1252.
I do not have Eudora.
Keywords: nsbeta1
Sylpheed (GTK+)  utf8: no (complete line with Euro char not readable) / yes
(line with umlauts), iso-8859-15: yes (as currency char), win-1252: same as utf8.

Before using Windows-1252, I'd rather use "EUR".
We might entertain forking the code for Unix vs Win/Mac given
the facts so far:

Unix: fallback on ISO-8859-15
Win/Mac: fallback on Windows-1252
The again, maybe not. We have no idea who is receiving 
what Mozilla creates -- even if the creator is on Unix, 
the recipient may not be.
Attached patch More clean up for the old code. (obsolete) — Splinter Review
Attachment #69360 - Attachment is obsolete: true
I am going to ask reviews and try to check in for 0.9.9.
Target Milestone: mozilla1.0 → mozilla0.9.9
Note that CP1252 is a superset of iso8859-1 while iso8859-15 aka Latin9) is NOT.
 There are a few characters in iso8859-15 that replace the iso-8859-1 characters
(besides the Euro symbol for the international currency symbol).

See http://czyborra.com/charsets/iso8859.html:
   The new Latin9 nicknamed Latin0 aims to update Latin1 by replacing the less
   needed symbols ¦¨´¸¼½¾ with forgotten French and Finnish letters and placing
   the U+20AC Euro sign in the cell =A4 of the former international currency
   sign ¤.

The tests cited in previous comments only tested for the Euro symbol.  Sending
as iso8859-15 will break these other characters.  I don't know how many
people this would affect.  Do we care?
If the data contains both '¤' and the Euro symbol then it is not going to be
sent as ISO-8859-15, so we don't have the problem. I assume that the use would
be alerted (i.e. same as the current behavior).
> The tests cited in previous comments only tested for the 
> Euro symbol.  Sending as iso8859-15 will break these other 
> characters.  I don't know how many people this would affect.  
> Do we care?

As reported above for Windows, the Euro with 8859-15 breaks
Win & Mac Communicator 4 mail, which does not have
support for it.

The latest patch sends the Euro in ISO-8859-15 (implementing the comment #32). 
If any other behavior desired, then please propose in the bug, thanks.
The review is on hold now. I will wait one more day for any input.
I am very sorry but I am going to withdraw the proposal
in comment #32. I don't think it is a good idea to break 
Communicator 4.x users. (I have no idea why we didn't
implement 8859-15 for Win/Mac Communicator 4 but I don't 
want to hear complaints from those people who will potentially
upgrade to Mozilla or Netscape 6 that we again ignored
them.)

So, here is my new proposal:

1. Automatically fallback to Windows-1252 if the Euro is present
in plain text mail body, or headers. (Make this the default.)

2. Offer a prefs.js option to choose what is proposed in

http://bugzilla.mozilla.org/showattachment.cgi?attach_id=69368

as the default behavior instead.

Ben and others, who prefer ISO-8859-15, I'm sorry but
I don't want to screw loyal users of Communicator 4.
They are the users who potentially might move to the new
Gecko-based clients. 

Hopefully, those who feel strongly will choose to turn on the
option to send ISO-8859-15 instead.

As I said before, I think we shold re-evaluate again in
the future and possibly move to 8859-15, UTF-8 or Win-1252
by studying the prevailing condition at that time.
> Ben and others, who prefer ISO-8859-15, I'm sorry but
> I don't want to screw loyal users of Communicator 4.

That's completely reasonable, given how much market share 4.x still has. But,
I'd rather convert to EUR than switching to a Microsoft charset.
Windows-1252 is a non-standard, MS charset, but it's the one that seems to work
most consistently between platforms.

It seems the most logical charset to fall back to if only the Euro is needed,
because it's a superset of ISO-8859-1. So if you fall back from ISO-8859-1 to
Windows-1252 you only gain the Euro and change nothing, whereas if you fall back
to ISO-8859-15 you also change a few characters.

I thought ISO-8859-15 was the right choice before, but now I'm not so sure.
> Windows-1252 is a non-standard, MS charset, but it's the 
> one that seems to work most consistently between platforms.

From the Internet charset name registry point of view,
this is not correct. Windows-1252 along with other Windows-12xx
names are registered for use in the Internet. It has the same
status as ISO-8859-15.

See IANA Charset Resgistry:

http://www.iana.org/assignments/character-sets

Windows-1252 was registered following the procedures 
specified in RFC 2978:

ftp://ftp.isi.edu/in-notes/rfc2978.txt

The question here really is not about the standard but
what would be the best practice for our users given that
every one of the choices we have has some drawback.

nsbeta1+ per triage meeting
Keywords: nsbeta1nsbeta1+
Gentlemen, 

Actually codepage 8859-15 makes many things look weird, see this bug for ref:
http://bugzilla.mozilla.org/show_bug.cgi?id=122455

Depending on the web page you're browsing, you might see every instance of
English ownership ("'s" suffix) incorrectly, ditto for many other "grammar"
characters. Also, for whatever reason, I didn't see yen char and others the -15
codepage inserts, just question marks.

I suppose doing transliteral text + fancy mime/html version might not be the
worst idea ever.
OS: Windows 2000 → All
Hardware: PC → All
Target Milestone: mozilla0.9.9 → mozilla1.0
The proposal in comment #48,

2. Offer a prefs.js option to choose what is proposed in
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=69368
as the default behavior instead.

Is the pref supposed to select whether sending as ISO-8859-15 or windows-1252?
I assume no UI is needed. But is that necessary? It would be easier for the user
to set ISO-8859-15 as a default charset than tweaking the pref value.
> Is the pref supposed to select whether sending as 
> ISO-8859-15 or windows-1252? I assume no UI is needed. 
> But is that necessary? It would be easier for the user 
> to set ISO-8859-15 as a default charset than tweaking 
> the pref value.

This is true. I think what we need is a dialog somewhere
saying that the we offer Windows-1252 as the default in this case.
But that the user may try other choices -- with risks understood --
UTF-8 or ISO-8859-15. If we can say this somewhere, we would not
need any pref option. Should we show something like this when
the user get into this logic for the first time? Any other
idea?

momoi, you mean a popup dialog as we have today (just a custom one with better
options)?
I thought the whole point of this bug was to not bother the user with that
issue. It's not important enough. No harm is done, if we convert to "EUR".
Ben I think several hundred million potential Mozilla users would disagree with
you  :) 

People using the Euro symbol (i.e. Europeans) will frequently hit this problem
so it is important enough to justify its own dialog to describe what the issue
is and what solutions are available to solve it. 
Adam, I *am* German, but I don't understand you.

> People using the Euro symbol (i.e. Europeans) will frequently hit this problem
> so it is important enough to justify its own dialog

The frequency with which they "hit this problem" is exactly one of the major
reasons *not* to show a dialog.

Also, please elaborate why sending "EUR" (E-U-R) and not € (the Euro char) is a
problem.
The dialog can have a "never show me this again" checkbox, but we do need a
dialog and it needs to give the user some meaningful choices.
> The dialog can have a "never show me this again" checkbox

I'm not an UI expert, but I guess that most users are scared to check that or
don't even see it.
I know that UI experts hate popup dialogs.

> but we do need a dialog

You gave no reason why. Again: Why not just "EUR"?

> it needs to give the user some meaningful choices.

There are no meaningful choices for users, because they do not understand the
issues. If *we* don't even know what to do, how can the user ever?

It's not a question of user preference; and in almost all cases, the user also
does not have more information than we do, but less. In other words, if anybody
can make a meaning decision, then it's us. (The occasional user who does know
the issues can still select a charset via the menu or prefs.)

What do you want to tell them? "If the recipient uses Netscape 4.x, use that; if
the recipient uses mutt, use that; and if the recipient uses OEMac, use that"?

As mpt recently said, 'if you cannot decide, it's completely unfair to offload
that decision on the user.'
What I am trying to do here is to let the user to send a message without the
alert if possible. I think the user does not care about the charset when sending
symbols like Euro or smart quotes. I like to have no additional UI for this.
My 2 cents' worth:

> Also, please elaborate why sending "EUR" (E-U-R) and
> not € (the Euro char) is a problem.

I think this is a problem because it would make Mozilla look shabby compared to
other mail programs.

I think a dialog could do a lot to clean up any confusion that may arise in the
users's mind. The dialog could be something like:

"You have sent a message containing the Euro symbol. Some older email/news
readers may not be able to display the symbol correctly and will display a
currency symbol ( ) instead. What would you like to do:

(RADIO BUTTON) Send the Euro character as is (€).
(RADIO BUTTON) Convert to the official abbreviation (EUR).

(CHECKBOX, default checked) Remember this decision for future messages I send.

In the future, you may change this setting in the 'Advanced' section of the
Mozilla preferences."

This does add an extra pref, but it would make Europeans feel that Mozilla
actively supports Europe and the Euro.
Further to that, a dialog could refer the user to a page of help that discusses
the issue in more depth and solutions to the problem.
> "You have sent a message containing the Euro symbol. Some older email/news
> readers may not be able to display the symbol correctly and will display a
> currency symbol ( ) instead. What would you like to do:
> 
I think the behavior (display a currency symbol) is true if ISO-8859-15 is used.
I thought it was agreed to use windows-1252.
It would be interesting to know just /how/ exactly the popular readers fail, not
just that they do. I can imagine the following failure modes when encountering a
message containing the euro code in an unknown charset:

(a) Refuse to display the message outright.
(b) Display the message as if it were sent in a known charset, possibly warning
the user that some characters may come out differently than intended by the sender.

(b) is clearly superior to (a) in many cases (e.g. popular western charsets that
overlap at least in the ASCII range), and not worse than (a) even if the
charsets are completely disjunct (the user will see garbage instead of nothing
at all). Therefore my suspicion is that many authors chose (b).

If that is correct ISO-8859-15 may still be the better choice. With it agents
failing in mode (b) will probably display the generic currency symbol instead of
the Euro. Confronted with an windows-1252 Euro code they could display almost
anything (including, by chance, the right glyph), as the code is undefined in
the older network standards.

In another vein, would it be possible to override the setting on a
per-address-book-entry basis? So if I am sure that person X will not be able to
view messages sent out with the default Euro charset I can choose something
different for her.

If it wants to get really smart-alec Mozilla could even sniff the User-Agent in
messages from a mail partner, and base the charset decision on that.
>I think the behavior (display a currency symbol) is true if ISO-8859-15 is used.
>I thought it was agreed to use windows-1252.

Ok, the currency symbol only appears if the mail is sent in ISO-8859-15.

But something wrong must happen if you send the Euro in a Windows-1252 encoded
message and the reader does not support it. Does anybody know what happens? We
might tell the user. :)
I agree that ideally, when sending the Euro currency symbol in a plain
text email, all recipients should be able to view this as the glyph
for the Euro currency symbol.  But with the current state of mail readers
(aka MUAs), this is not possible as has been cited in previous comments.

In the future, it looks like there will be a migration of all MUAs to
support UTF-8.  That evolution appears to be occuring now.

The solution which works NOW for all MUAs is the string "EUR" which is the
official abbreviation for the Euro currency symbol (although not the most
elegant representation).  See the EU Euro website:
http://europa.eu.int/euro/html/rubrique-cadre5.html?pag=rubrique-defaut5.html|lang=5|rubrique=221|chap=15
   The graphic symbol for the euro looks like an E with two clearly marked, 
   horizontal parallel lines across it. 
   ...
   The official abbreviation for the euro is 'EUR'.  It has been registered
   with the International Standards Organisation (ISO),    and will be used
   for all business, financial and commercial purposes, just as the terms
  'FRF' (French franc), 'DEM' (Deutschmark), 'GBP' (pound    sterling) and
  'BEF' (Belgian franc) are used today.

Adding dialogs and preferences to provide special case handling of
the Euro in iso-8859-1 email will add confusion and usability problems
and will still not work in all cases as cited in previous comments.

I think this issue will resolve itself.  For several years the Netscape
mail client sent un-encoded Latin1 headers because we discovered in early
Beta tests that many MUAs did not support MIME encoded headers.
We added a pref to enable MIME-compliant headers but the default was off.
After a few years Netscape switched the default as most MUAs had finally
added the MIME support.  I believe this will be the case for UTF-8 support
as well.

And remember that there are no problems with rich-text (HTML) mail because
we can use the HTML entity for the Euro, "&eur;".  This only affects
plain text.

Internet mail strives for interoperability.  I'm not convince the proposed
solutions are really improvements since they make things less interoperable.

My 2 cents.
bobj: would you like the dollar symbol to be converted to USD if some mail
readers didn't support the $ character?

I say the best thing is to send the glyph (in Windows-1252 or ISO-8859-15, it
doesn't matter). If the the receiving MUA chokes it will display a garbage
character, but any human reading the message will probably figure out, based on
context, that it's meant to be an euro symbol.

If a MUA doesn't understand the Euro symbol, then the Euro symbol won't display
in that MUA. But that does not mean Mozilla should not send it.
Priority: -- → P4
>If the the receiving MUA chokes it will display a garbage
>character, but any human reading the message will probably figure 
>out, based on context, that it's meant to be an euro symbol.
Users could ALSO figure out "EUR" mean the euro symbol based on the context. 

>If a MUA doesn't understand the Euro symbol, then the Euro symbol >won't
display in that MUA.
If the sender choose to send out as HTML, or both HTML AND PlainText and the MUA
doesn't understand the HTML, then the Euro symbol won't display in that MUA. 
109342 Euro symbol turned into "EUR" in sent mail (plain text)
Impact Summery
Impact Platform: ALL
Impact language users: 560 M 100%
Probability of hitting the problem: High
Severity if hit the problem in the worst case: the Euro sign will be convert to
three characters "EUR"
Way of recover after hit the problem: User send out as HTML mail instead.Or send
out as UTF-8 mail instead.
Risk of the fix: Medium
Potential benefit of fix this problem: Unknown
ADT3
Whiteboard: adt3
Gentlemen. 

I was using lookout for a while since mozilla mail couldn't handle IMAP mail
attachments (fixed now) .. What Outlook does, it does suggest "Do you want to
use utf-8 in your message because it contains character we can't encode using
the current code page?"

It does *not* give you an option to "Yeah, and do so from now on" or "No, and do
not bother me about it anymore" .. So. If mozilla mail asks the user *and* gives
the user option to do it either way in future.. It's doing things better than
the industry standard app does. Since that query is in there, you can bet you're
going to see a *lot* of those utf-8 messages floating around. So everyone will
have to cope, sooner or later. 

Maybe we'll also have a lot of people using outlook that get tired of being
nagged about it and start to look for alternatives. You never know. If they at
least learn to open the "options" menu, everyone wins.
My recommendation is to settle on a solution for the
next commercial release and Mozilla 1.0. European users
deserve a better solution than EUR for plain text mail that
they use often.

If you want a compromise, I have a proposal ready to go.
If you want a no pref option solution, go with Windows-1252
or UTF-8. Either should work for all major email programs. 
(Win-1252 work better for Eudora.) 

Let's make a decision.
> bobj: would you like the dollar symbol to be converted to USD if some mail
> readers didn't support the $ character?

The question is not whether I want to display USD or $.  The question is
whether the receiver of my email to will see $ or garbage. When a plain text
mail message is composed in Mozilla, the Euro currency symbol will display
correctly, it cannot determine if the receiver of the message will render it
correctly or as garbage.  

So the options are
 (A) Send Euro currency symbol (without UI). Optimal for some receivers
 (B) Send "EUR" (without UI). Sub-optimal, but works for all receivers
 (C) UI Dialog to allow sender to decide between (A) or (B)

Option (C) brings with it lots of usability issues
 - Many users probably won't understand the choice or how to choose
 - Some users will want the choice to be sticky and other will not.
 - Should stickiness be per receiver or per sender?  If I send to a
   particular user, I may or may not know if that user can render Euro.
 - What about mail to multiple addresses or posting to newsgroups? 
This is what I propose (reasoning follows):

1. User sends a plain text message containing the euro character €.
2. Mozilla prompts the user as proposed in comment 62 above.
3. If the user chooses to send the euro character as is, Mozilla sends the
message in Windows-1252.
3a. If the receiver's MUA supports the euro symbol, he will see it.
3b.If the receiver does not support the euro symbol, he will see exactly one
character of garbage. If the receiver is European, very probably he will deduce
that the garbage character is an euro symbol based on context.

The reasoning is as follows:

1. Why Windows-1252 instead of ISO-8859-15 or UTF-8:
 - Windows-1252 is a superset of ISO-8859-15, whereas ISO-8859-15 is not.
 - UTF-8 is poorly supported by mail readers (worse than Windows-1252)
 - If the receiver's MUA does not support the charset, with Windows-1252
   or ISO-8859-15 the message is readable and there is only one character
   of garbage, whereas with UTF-8 the whole message becomes unreadable.

2. Why the prompt:
 - It calls attention to Mozilla's support of the Euro, making European
   users feel that Mozilla supports them and the Euro (good user
   experience).
 - It explains the situation clearly to users.
 - It explains to them how to change the setting if they need to.

3. Why we should not automatically fall back to E-U-R:
 - It makes Mozilla look shabby compared to other mail programs. To a
   non-savvy user, it might seem that Mozilla does not support the
   Euro at all!
 - For a European, it is very easy to figure out that the garbage
   character is in fact meant to be a Euro symbol based on context,
   even if his MUA does not support the Euro.
 - Sending as is takes advantage of more recent MUAs, including
   Mozilla, that support the Euro. Why should we fall back to the
   lowest common denominator if the only negative consequence on
   old MUAs is a garbage character, which is easily interpreted as €
   based on context?
> If the receiver's MUA does not support the charset, with Windows-1252
> or ISO-8859-15 the message is readable and there is only one character
> of garbage

Wrong. As you can see in comment 37, there are mailer who fail worse with
unknown chansets. In that case, the mailer omitted the whole line (I consider
this a severe bug). Other mailers might not display the msg at all.
>1. Why Windows-1252 instead of ISO-8859-15 or UTF-8:
1.a Why NOT windows-1252 ?
- windows-1252 are not international standard nor national standard, both
ISO-8859-15 and UTF-8 ARE.
( should we send out as x-mac-roman if the sender are on Mac ? )

>- Windows-1252 is a superset of ISO-8859-15, whereas ISO-8859-15 is not.
Yea, but UTF-8 is a superset of windows-1252, while Widnows-1252 is NOT



I suggest we do the following
1. nsbeta1- this bug 
2. for any build localized for European country, change the default mail charset
to windows-1252, UTF-8 or ISO-8859-15, and the business owner of the european
localization of the shipping product should make the decision of what to be used
in their countries. 
Ben: 
I may be wrong, of course, but the fact that sylpheed fails to display a whole
line when it fails to draw a character sounds like a bug in the mailer to me.
Also, as it cannot display the euro *whatever* character set is used, perhaps it
does not support the euro at all and might benefit from a fix. Have you
considered this possibility?

Frank:

>- Windows-1252 is a superset of ISO-8859-15, whereas ISO-8859-15 is not.
>Yea, but UTF-8 is a superset of windows-1252, while Widnows-1252 is NOT

Sorry, I meant to say "Windows-1252 is a superset of *ISO-8859-1*, whereas
ISO-8859-15 is not". The more similar the charsets are, the less likely they are
to cause problems. In my opinion, UTF-8 is likely to break on many more mailers
than the other two, and if it doesn't work the results are probably worse than
both the alternatives, because it's the most different from the standard
ISO-8859-1 and straight ASCII.

> windows-1252 are not international standard nor national standard,
> both ISO-8859-15 and UTF-8 ARE.

I am not an expert on this, I was basing myself on Katsuhiko's views (comment
#51 and onwards). I was in favour of ISO-8859-15 before, but the fact that it's
not a pure superset of ISO-8859-1 but breaks some characters put me off. Also,
according to the results we have here Windows-1252 is the most compatible with
different MUAs (including Netscape 4.x, and apart from sylpheed which doesn't
work with any charset).

I don't really think it's very important exactly which character set is used, as
I am going to be using Mozilla mail.

But I feel that it's not right to fall back to E-U-R just because some mailers
don't support the euro symbol.

Why should we europeans not be able to send mail containing our currency symbol?
After all, $ is in standard ascii... :-)
> Why should we europeans not be able to send mail containing our currency symbol?

Because it's new and new standards take a loong time to propagate, esp. in
email? It took ages (10 years?) for MIME for propagate, and that is clearly very
useful for everyone.
>> Why should we europeans not be able to send mail containing our currency symbol?
1. you could if you change the mail charset to "ISO-8859-15" by hand
2. you could if you set the default mail charset in your preference to
"ISO-8859-15" by hand
3. you could if you send out HTML mail
4. you could if the localization language pack (for german, franch, or other
europen languages) use "ISO-8859-15" as the default mail charset. And it is up
to the localization to decide that. 


>>Because it's new and new standards take a loong time to propagate, esp. in
>email? It took ages (10 years?) for MIME for propagate, and that is clearly very
>useful for everyone.

And by not fixing this bug it make people one more reason to adopt html mail
and/or ISO-8859-15, right ? 

nsbeta1-
Keywords: nsbeta1+nsbeta1-
Ben: according to the sylpheed web page, it supports the euro symbol (through
ISO-8859-15) in version 0.7.3 or later.
>> Why should we europeans not be able to send mail containing
>> our currency symbol?
> 1. you could if you change [...]

This is exactly what could give Mozilla a bad image in Europe. The average user
will not know how to change this on, so all messages he sends with Mozilla will
have € converted to EUR. The average user probably won't understand why, and
will probably be surprised that his email messages are being silently changed by
mozilla (I would).

However, the average user *will* understand that all the emails he receives from
people using Outlook, Eudora, Mac OS X mail, and other mailers (except Mozilla)
*do* have the euro symbol. So what will he think? Probably "hmm, the euro symbol
doesn't work right in Mozilla: Mozilla is behind the times".

There is the opportunity to do it right for 1.0. Why pass it up?

> And by not fixing this bug it make people one more reason to adopt
> html mail and/or ISO-8859-15, right ? 

This is a step in the opposite direction. HTML probably doesn't work on more
mail readers than ISO-8859-15 or Windows-1252. Ok, we can send as both html and
plain text, but the euro symbol won't work anyway, because they ignore the HTML
and just display the text...

This is not a case of ISO-8859-15 vs Windows-1252, but a case of € vs EUR. Why
should we convert to EUR if only about 2% of recipients (and no Mozilla users)
will see a garbage char?

Re-reading the comments posted here, it seems that I am not alone in thinking
along these lines. What is the opinion of the other participants in the
discussion? What is the opinion of the bug owner? Wouldn't it be better if we
discussed the matter instead of just dropping it like this? Especially as there
is a patch ready...
Although this is not a discussion forum, let me answer. I'm from Germany and
thus a European, too.

> Re-reading the comments posted here, it seems that I am not alone in thinking
> along these lines.

No, you aren't.

> What is the opinion of the other participants in the discussion?

I believe the best way is the following: When someone sends a mail as ISO 8859-1
or anything else (other than UTF-8, ISO 8859-15 or Windows-1252!), he is warned
in a modal dialog that the specified charset doesn't support the EUR symbol. He
is given the option of a) sending the mail as <popup menu letting the user
choose any of the three>, b) sending the mail with our infamous "EUR"
replacement or c) returning to the compose window and editing the mail.
My opinion is I want to see and send the ¤. I defer the implementation to others
but I feel converting it EUR because a small and ever diminishing percentage of
users have antiquated mail/news clients is pretty lame really.
The only reason this bug is not getting resolved is because
we can't agree on a solution. Cutting out all the repeats of discussions that
went on earlier,I think we should go with
one fallback default wihout asking the user. Windows-1252
despite some drawbacks will have fewest problems with
other programs and platforms. Windows-1252 is registered in
IANA list and so has an official Internet status.
We can debate this later anc make other changes later.
For Mozilla 1.0, can we agree on something now?  
I can subscribe to that. That's certainly better than the current behaviour, IMO.
Sounds good to me. It's important to get this right by 1.0.

If we see that it causes lots of problems, we can always change the behaviour
later...
You have my vote.
I am going to renominate this bug for nsbeta1 for the following
reasons.

1. For a large percentage of users in Europe and also in other
countries, plan text mail is the preferred format for messages.
We tried to change this by setting the default to HTML mail, but
this has not completely succeeded. For the majority 
of users in many countries, plain text mail format is 
the preferred way. 

We need to deal with the Euro currency issue in the best
compromise we can find for Mozilla 1.0/NS6.

2. A way to send Euro currency character in plain text
mail should be available to users of all language versions of
Mozilla/NS6. If we adopt Frank Tang's approach to leave this
matter to localizers for Latin 1 countries, this leave out
users whose default encodings are Latin2, Baltic, Greek, 
Cyrillic but preferring to write in ISO-8859-1 for business
and other types of international communication.

This also leave out users of other encodings in East Asia whose
encodings are not usually ISO-8859-1 but for business and other
communications needs may be using ISO-8859-1 mail.

An easy way to send the Euro currency character in plain 
text mail should exist for any users trying to use ISO-8859-1,
which is the most widely used mail encoding. The Euro currency 
is an international currency and its use is wide spread now
in business situations, and not having a good fallback for this
in ISO-8859-1 is a big hurdle for Mozilla/Netscape 6 users. 

3. I suggest for now (Mozilla 1.0 and the next Netscape 6
client) we go with what nhotta proposed in:

http://bugzilla.mozilla.org/showattachment.cgi?attach_id=68195

This adds fallback to Windows-1252 when ISO-8859-1 contains
characters that cannot be dealt but can be dealt with 
the use of Windows-1252. This is not the perfect solution
but is probably better than sending "EUR".
For those user who want the current behavior, it would be
nice if we can leave that as a prefs.js option as discussed
somewhere above.

4. We can reassess this approach post Mozilla 1.0. I have a 
more detailed proposal with various options built in for 
handling the Euro currency symbol. ( I will not attach it
to this bug for fear that it would derail the current discussion.)

In summary, the above plan is workable and the patch already
exists. We should review the situation post Mozilla 1.0 and 
come up with a better plan. I suspect that within the next
year or so, this will become a lot clearer.
I also do not think it would be a good idea to fragment
our way to deal with the Euro currency too much. Localizers
into different languages should not be carrying the burden of
choosing which encoding is to be the default for Latin1 mail.
We really should have a consistent approach for all European
localizations. All that the localizer-based solution is doing
is passing the buck to them because we cannot decide here. 

For the goodness of Mozilla and Netscape 6 **products** in
Europe, we should agree on something and go forward even if
it is not perefect. Let's not pass the responsiblity/burden
to each localizer. We want more consistent behavior
from our products.
Let's also not forget why this bug was filed in the first place.
That is because people want a common solution that applies
to all Mozilla-based products if at all possible. That
need is now even more keenly felt since the Euro currency
became official in Europe this year. 

We need to be Euro ready. I request that we reconsider this
bug for nsbeta1 and Mozilla 1.0.
Keywords: nsbeta1-nsbeta1
This is a high-impact bug for many average
users. Not being able to send the Euro currency symbol
in some encoding when the users choose the 8859-1 mail encoding
makes our Mail much less competitive with other mailers for
European users also. I suggest adt1 or adt2 for classfication.
>I also do not think it would be a good idea to fragment
>our way to deal with the Euro currency too much. 
It is not our choice to decide "fragmentation" or not, because the decision is
made in 1985 the time  ISO-8859-1 got published and in 199x when ISO-8859-15 got
published. The Internet Mail used for European will be fragment simply because
ISO-8859-1 cannot encode Euro sign and ISO-8859-15 are NOT backward compatabile
with ISO-8859-1. No matter what we decide to do, there will be other mailer
"fragment" the internet email usage outside our decision/control.

>Localizers
>into different languages should not be carrying the burden of
>choosing which encoding is to be the default for Latin1 mail.

Application developers implement different mailer should not be carrying 
the burden of choosing which encoding is to be the default for Latin1 mail neither.
The ISO standard body already choose for us- ISO-8859-15, right? 


>We really should have a consistent approach for all European
>localizations. All that the localizer-based solution is doing
>is passing the buck to them because we cannot decide here. 

We really should have a consistent approach with all other mailers for
all European.

If we want to implement such "try ISO-8859-1, if failed try the other" approach,
then we should consider the other as
1. ISO-8859-1 then windows-1252, or
2. ISO-8859-1 then UTF-8, or
3. ISO-8859-1 then ISO-8859-15, or
4. ISO-8859-1 then ISO-8859-15, then windows-1252, or
5. ISO-8859-1 then ISO-8859-15, then UTF-8, or
6. ISO-8859-1 then windows-1252, then UTF-8

I personally think we should NOT send out windows-1252 so 1 , 4, and 6 are bad
choice for me. 
I prefer we do 5 because it will promote ISO-8859-15 (ISO-8859-15 IS a published
ISO standard. It is the other mailer's fault not to implement ISO-8859-15 if
they want to support European users. ) and in the case that we hit those
characters ISO-8859-15 does not encode, for example, those characters they
remove from ISO-8859-1 , then we fallback to UTF-8.

OK, here is the impact summery for this bug

Impact Summery
Impact Platform: ALL
Impact language users: all users lived in EU (European languages- 192.3M (33.9%
of total internet), and 31M (5.53% of total internet) UK users (they don't use
Euro currency right now but they probably will send mail about it)  ) plus
people want to have business/personal communication about european fincial
information. so there are 223.3M 39.43% internet users will have chance to hit
this probably daily, and the rest will hit this problem every so often.

Probability of hitting the problem: HIGH. For communication about financial
information
Severity if hit the problem in the worst case: Euro sign won't send as the SIGN
itself in plain text mail or subject. Users will be warn by a conversion error
dialog box. 
Way of recover after hit the problem: uesr can change their encoding and resend
it again. the problem is user may not know what to change to. 
Risk of the fix: TBD
Potential benefit of fix this problem: TBD


Keywords: nsbeta1nsbeta1+
Whiteboard: adt3 → [adt2]
> If we want to implement such "try ISO-8859-1, if failed try the other" 
> approach, then we should consider the other as
> 1. ISO-8859-1 then windows-1252, or
> 2. ISO-8859-1 then UTF-8, or
> 3. ISO-8859-1 then ISO-8859-15, or
> 4. ISO-8859-1 then ISO-8859-15, then windows-1252, or
> 5. ISO-8859-1 then ISO-8859-15, then UTF-8, or
> 6. ISO-8859-1 then windows-1252, then UTF-8

Although ISO-8859-15 is an ISO standard, I don't believe it is being widely
accepted or used because of its incompatibility with ISO-8859-1.  If there
is a switch from ISO-8859-1, I think many people feel it should be to UTF-8.

So, I would oppose any option (3, 4 and 5) that includes ISO-8859-15.

I don't like supporting non-standard encodings, but the reality is that
windows-1252 support is more widespread than UTF-8.  (Being registered
in IANA does not make it a standard encoding.)  For the short-term, I agree
that the current proposal to send as windows-1252 (option 1) is a better 
solution for most users.  (For people that want to send UTF-8, they can
do so manually, but it won't happen automatically.)

In the future (hopefully soon), we could switch to UTF-8 (option 2).

Option 6 (ISO-8859-1 then windows-1252, then UTF-8) is not needed for this
bug about Euro currency symbol.  But we could silently send as UTF-8 if
the contents cannot be successfully converted to windows-1252.  We could
have a pref to do this automatically or pop-up the warning dialog as we
do now.  How does Outlook behave?  But this should be a different bug...

I wish people read discussions that go on in the report
before commiting their views. Both ISO-8859-15 and UTF-8 
have more compatibility problems with major e-mail programs
reported above in comment #35.

The reality is that 8859-15 is not likely to be the
standard mail encoding no matter what ISO says. By practice,
it will be either UTF-8 or Win-1252 for msgs which include
Euro character. And ISO-8859-1 or Windows-1252 when it 
does not include Euro and other Windows only characters. 

Whether or not 8859-1/windows-1252/utf-8 is a standard
is not very helpful in dealing with real issues.

By the way,Outlook Express does not face this problem
primarily because its default European encoding is Windows-1252
and if users choose 8859-1, it will send out either UTF-8 or 
Windows-1252.
 
A question about the patch: What happens, if the user explicitly chose
ISO-8859-1 (in the Composer, in contrast to the default), but uses an Euro char?
I hope, it won't be sent in the fallback encoding.
Once the fallback is enabled, it's always applied for that charset (i.e. no way
to send "EUR" when the composing mail's charset is ISO-8859-1).
Attachment #78801 - Attachment is obsolete: true
Frank, after you review, please ask Jean-Francois Ducarroz to review the changes
for mail.
Summary of the changes for mail code: 
* Added fallback charset arguments to some functions and if that is non null
relabel charset to the fallback one.
* In the conversion code, retry the conversion with pref specified charset in
case the initial conversion not succeeded because of the character unmapped.

Comment on attachment 78964 [details] [diff] [review]
Modified nsISaveAsCharset and related code after ftang's review.

r=ftang for nsISaveAsCharset.idl
about nsSaveAsCharset.h:
the life time of the return value of const char * GetNextCharset(); will last
till the destrution of the object or the next time Init got called. I think
this is ok since this is a "procted function".


about nsSaveAsCharset.cpp:

nsSaveAsCharset::GetCharset(char * *aCharset)


please read http://www.mozilla.org/scriptable/faq.html	point 9. Should we use
nsMemory::Clone instead ?
Attachment #78964 - Flags: needs-work+
Attachment #79738 - Attachment is obsolete: true
Comment on attachment 79742 [details] [diff] [review]
Correcting a typo in the change for all.js.

r=ftang
Attachment #79742 - Flags: review+
now we have patch,please assess the risk again
>Risk of the fix: Medium
Risk of the fix:
Risk of the fix: Medium - the change is for error handling and does not affect
usual message send but the change to the intl component to is generic, so medium
risk
Comment on attachment 79742 [details] [diff] [review]
Correcting a typo in the change for all.js.

r=ducarroz for the mailnews part.
Whiteboard: [adt2] → [adt2] need 'sr'
some nits:

NULL should be nsnull

nsCRT::strdup(charset) should be just strdup

You might want to look at your uses of NS_ERROR_FAILURE - I've always found that
to be a very uninformative error code and if you get one, you have to go looking
though all the code that returns NS_ERROR_FAILURE. If there's a more
accurate/informative error code, you should use that.

The other thing to look at is all the uses of NS_ENSURE_SUCCESS(rv, rv) - are
you sure in all those cases that you want to completely bail instead of
continuing on? 

I can't answer either of those questions, so if you say there's no more
meaningful error code and the ensure_success calls are all correct, then I'll
take your word for it and just have two nits above. let me know, and I'll stamp
an sr.
1)

+    if (attr_EntityBeforeCharsetConv == MASK_ENTITY(mAttribute)) {
+      if (NULL == mEntityConverter) return NS_ERROR_FAILURE;

NULL, don't you want nsnull?

or better yet:

if (!mEntityConverter)
  return NS_ERROR_FAILURE;

2)

+      PRUnichar *entity = NULL;

nsnull, right?

+      // do the entity conversion first
+      rv = mEntityConverter->ConvertToEntities(inString, mEntityVersion, &entity);
+      if(NS_SUCCEEDED(rv)) {
+        if (NULL == entity) return NS_ERROR_OUT_OF_MEMORY;

Seems weird to me that ConvertToEntities can succeed, but not return an entity.
Why is it written that way?

3)

+  while (*p1) {
+    for (; *p1 && (*p1 != ',') && (*p1 != ' '); p1++) ;
+
+    charset.Assign(p2, p1 - p2);
+    mCharsetList.AppendCString(charset);
+
+    for (; *p1 && ((*p1 == ',') || (*p1 == ' ')); p1++) ;
+    p2 = p1;
+  }

see nsCSTringArray::ParseString()

// Parses a given string using the delimiter passed in and appends item
// parsed to the array.
void
nsCStringArray::ParseString(const char* string, const char* delimiter)

4)

Index: mozilla/modules/libpref/src/init/all.js
+pref("intl.fallbackCharsetList.ISO-8859-1", "windows-1252");

for give my ignorance, but does that do the right thing on non windows
platforms?  or is that a windows only font?
* NS_ENSURE_TRUE - those are used when the funcitions are called incorrectly,
e.g. "need to call init first", "charset list is empty", I can add assertions too.

* NS_ENSURE_SUCCESS - The functions are for charset conversion, so whatever the
conversion failure need to bail out. Others like fail to get pref service also
bails out. The one for nsITextTransform can probably be ignored, so I will
change it.

* nsCSTringArray::ParseString() - let me try that

* +pref("intl.fallbackCharsetList.ISO-8859-1", "windows-1252");
this is a charset list, according to the investigation comment #35 and
discussions, it is decided to use "windows-1252", it is supported in non Windows
platform and if not it is most likely treated as "ISO-8859-1" (sub set of
"windows-1252")

>Seems weird to me that ConvertToEntities can succeed, but not return an entity.
>Why is it written that way?
I found the check is not necessary, an error will be returned in case the output
is empty. I wlll remove the check.
nsCRT::strdup does not wrap strdup but have its own implementation.
http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsCRT.cpp#249
Is strdup available on Mac? It does not compile on my Mac build.
Attachment #79742 - Attachment is obsolete: true
Comment on attachment 80978 [details] [diff] [review]
Includes super reviewers' suggestions

sr=sspitzer
Attachment #80978 - Flags: superreview+
yes, sorry about the misinformation about strdup - I think we're supposed to use
PL_strdup now.

how about NS_ERROR_INVALID_ARG instead of NS_ERROR_FAILURE here?
+  if (!charsetList[0])
+    return NS_ERROR_FAILURE;
just a suggestion...the patch looks ok to me, I'll let Seth make sure his
comments were addressed.
adjusting status whiteboard.
Whiteboard: [adt2] need 'sr' → [adt2]
Comment on attachment 80978 [details] [diff] [review]
Includes super reviewers' suggestions

R=ducarroz for the mailnews change
>how about NS_ERROR_INVALID_ARG instead of NS_ERROR_FAILURE here?
yes, I will change it before check in
Comment on attachment 80978 [details] [diff] [review]
Includes super reviewers' suggestions

in nsSaveAsCharset::GetCharset, please add
NS_ENSURE_TRUE(mCharsetListIndex >=0, NS_ERROR_FAILURE);
after 
NS_ENSURE_ARG(aCharset);

r=ftang
Attachment #80978 - Flags: review+
Attachment #80978 - Attachment is obsolete: true
Comment on attachment 81059 [details] [diff] [review]
change to address comment #114 and #118

copy r/sr
Attachment #81059 - Flags: superreview+
Attachment #81059 - Flags: review+
checked in to the trunk

Please test carefully, test following cases.
test 4) is needed in order to verify this bug, other cases needed to check
regressions

1) format
compose as plain and send as plain
compose as html and send as html
compose as html and send as plain
compose as html and send as both plain and html

2) charset
send as ISO-8859-1
send as ISO-8859-15
send as ISO-2022-JP
send as UTF-8

3) characters (both subject and body)
send ASCII only
send European characters (e.g. a-accute)
send Japanese
send Chinese
send symbols, trademark, smartquotes

4) test for Euro
put Euro in subject send as ISO-8859-1 -> check if header charset is windows-1252
put Euro in body send as plain text ISO-8859-1 -> check if body charset is
windows-125
put Euro in subject send as ISO-2022-JP -> check if it is transliterated as "EUR"
put Euro in body send as plain text ISO-2022-JP -> check if it is transliterated
as "EUR"
put Euro in subject and send as UTF-8 -> check if header charset is UTF-8
put Euro in subject and send as ISO-8859-15 -> check if header charset is
ISO-8859-15

5) special cases
put Japanese text in header send as ISO-8859-1 -> make sure you get the charset
alert
send Japanese text as plain ISO-8859-1 -> make sure you get the charset alert
The change is included in today's trunk.
Marina, please verify so this can go in to the branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
tested all scenarios Naoki suggested: the behavior is correct, as expected,
verified on the trunk. Will test on the branch when the build will come. 
I think this is too late to take for nsbeta1, but we definitely should consider
this for rtm. 
It is too risky for nsbeta1 now. still bring up to adt1.0.0 so adt will see it,
suggeat adt1.0.0- it but take it for rtm. 
Keywords: adt1.0.0
Whiteboard: [adt2] → [adt2][adt rtm]
> It is too risky for nsbeta1 now.
Can you quantify the risks?

I thought we hoped to get this into the trunk, so we can get more user
feedback on this change in behavior?
Bob, this is already checked in to the trunk
Sorry, I meant to write BRANCH not TRUNK in my previous comment:

   > It is too risky for nsbeta1 now.
   Can you quantify the risks?

   I thought we hoped to get this into the BRANCH, so we can get more user
   feedback on this change in behavior?

Then we can get user feedback, in case we find people complaining that
we are sending email encoded as cp1252...
Risks:
* The change added two things to nsISaveAsCharset.
1) Added a flag to indicate to fallback to other charsets in case of the
conversion error.
2) Changed to pass a list of charsets (can be one charset in a list).
Those changed the implementation which may affect non fallback cases (e.g. a
simple conversion from Unicode to ISO-8859-1 without Euro) too.

* The diff is relatively large (638 lines).
*** Bug 141419 has been marked as a duplicate of this bug. ***
good work on getting this one fixed, but we think it is too risky to take on the
branch right now. adt1.0.0-/adt2RTM. 
Keywords: adt1.0.0adt1.0.0-
Whiteboard: [adt2][adt rtm] → [adt2 rtm]
Blocks: 141008
marina@netscape.com:
please mark this bug as verified if the trunk is verified without problem. We
need that to ask adt to consider take it for rtm.
I am still seeing euro symbol getting replaced by EUR when it's part of the
message subjet! But maybe we should open a new bug for that case!
#132
Special characters such as the Euro sign in a *subject* (or, in any of the
header lines) is never a good idea because there's still a lot of mail servers
having problems with them. So maybe we should leave it as it is.
mark it as VERIFIED based on the following:

>------- Additional Comment #123 From marina@netscape.com 2002-04-26 15:58 -------

>tested all scenarios Naoki suggested: the behavior is correct, as >expected,
>verified on the trunk. Will test on the branch when the build will come. 
Status: RESOLVED → VERIFIED
Removing minus from adt1.0.0-, and renomianting for the 1.0 branch.
Blocks: 143047
Keywords: adt1.0.0-adt1.0.0, approval
Whiteboard: [adt2 rtm] → [adt2 rtm] [Needs a=]
adding adt1.0.0+.  Please get drivers approval and then check into the 1.0 branch.
Keywords: adt1.0.0adt1.0.0+
changing to adt1.0.1+ for checkin to the 1.0 branch.  Please get drivers
approval before checking in.
Keywords: adt1.0.0+adt1.0.1+
Keywords: mozilla1.0.1
Comment on attachment 81059 [details] [diff] [review]
change to address comment #114 and #118

please check into the 1.0.1 branch ASAP. once landed remove the 
mozilla1.0.1+ keyword and add the fixed1.0.1 keyword
Attachment #81059 - Flags: approval+
Target Milestone: mozilla1.0 → mozilla1.0.1
checked in to 1.0.1
Keywords: fixed1.0.1
Blocks: 146292
No longer blocks: 141008
Keywords: mozilla1.0.1+
I have read MOST of the comments but I still can't figure if this bug has to do
with HTML euro symbol sent as '€' instead of '&eur;'.
Using HTML shouldn't using NBRs be the best and more compatible way?
Lopa wrote:

> Using HTML

In the summary, you can read "(plain text)". This bug applies only, if the mail
is being sent as plain text.
Lapo, did you put Euro in the subject too? In that case, the message is sent as
windows-1252 because entities cannot be used in message headers.
marina: pls verify this as fixed, then  replace "fixed1.0.1" with
"verified1.0.1". thanks!
Whiteboard: [adt2 rtm] [Needs a=] → [adt2 rtm]
verified as fixed with the branch build
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: