Closed Bug 109342 Opened 23 years ago Closed 23 years ago

Euro symbol turned into "EUR" in sent mail (plain text)

Categories

(MailNews Core :: Composition, defect, P4)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: adamlock, Assigned: nhottanscp)

References

Details

(Whiteboard: [adt2 rtm])

Attachments

(2 files, 8 obsolete files)

Write an email containing an Euro symbol (?) and it's translated into "EUR" for the recipient. When you hit the send button Moz complains "The message you composed contains characters not found in the selected character coding so your message become unreadable after you send it." The appears to be do with the default character set, ISO-8859-1. If as is likely we should be using ISO-8859-15 then the mailnews.js default prefs needs to be updated to specify that as the default. Most people, especially in Europe haven't the first clue about character encoding so it is important that this is the default unless there's a good reason to the contrary.
Ack, the question mark in the brackets is actually a Euro symbol. Perhaps I should raise a bug on the browser part as well.
> Most people, especially in Europe haven't the first clue about > character encoding so it is important that this is the default .. Adam, please provide the source for this opinion. Let me also provide some of the reasons why we did not choose ISO-8859-15 as the default. 1. ISO-8859-15 migh cause a backward compatibility problem. For example, Comm 4.x supports 8859-15 only on Unix platform 2. ISO-8859-15 may not be supported by Windows platform clients such as Outlook Express and others. They have Windows-1252 and that will do the job. 3. We have not seen any wide spread use of ISO-8859-15 and hesitate to set that as the default mail encoding. 4. We have manually the selected default mail encoding for each language or language group we support. We have an NS-internal document which spelled out the specs. (Hopefully I will clean it up soon so that it can be published externally.) Ideally we should have a language preference choice for users when they create a new profile. We should then set all encoding related defaults based on that. There is no automatic way of setting these values -- we need to have a manual table.
I realize that as Europe adopts the Euro currency officially next year we need a way to deal with this. For mail, I am inclined to move to Unicode (UTF-8) for this. This is something that is promoted as the standard charset for a variety of web standards and there is also recommendation from IMC (Internet Mail Consortium). See this page: http://www.imc.org/mail-i18n.html
By the way, this should be a problem only in plain text mail. HTML mail will use NCRs for this character.
Holger Metzger suggested in theMozilla MailNews ML that ISO-8859-15 would be most backward compatible. I am not sure if this is true. In my opinion, ISO-8859-15 has not spread widely. I don't think there is a single encoding, Windows-1252, UTF-8, or ISO-8859-15, which will not cause backward compatibility problem to one client or another. It seems to me that we should probably look forward to the future now on this issue. I would rather we move to UTF-8 which is what most suggest is the future standard. One suggestion would be to explicitly suggest UTF-8 when a mail msg cannot deal with the EURO and other characters in a chosen mail encoding. This way, users will keep on using ISO-8859-1 for most cases and we will have a partial transition to UTF-8 msgs when special characters in it. I say this because I don't see firm evidence that ISO-8859-15 is going to be adopted as the Euro standard. Rather than moving to an interim standard, we can stay put and use UTF-8 sparingly for special cases.
I think to use UTF-8 per default when the characters used exceed the US-ASCII limit is dangerous. There are still many mail/newsreaders out there which do not understand UTF-8 (Xnews for example on the Windows platform). IIRC iso-8859-15 is not a problem for Outlook Express - it uses windows-1252 in any case, because in most cases OE doesn't check the headers anyways. Mozilla should be conservative in what to send out, play nicely with other mail/newsreaders. :-) Best thing would be to ask the user... "Your characters used extend the US-ASCII limit, which character set do you want to use to send the message?" And then a dropdown message maybe with recommendations on top.
> I think to use UTF-8 per default when the characters used > exceed the US-ASCII limit is dangerous. I'm not suggesting this at all. I am suggesting that we use UTF-8 only when the characters exceed ISO-8859-1 limit. UTF-8 is becoming more and more prevalent and so we should recommedn its support. If your newsreader or other mail program cannot deal with it, I think it it time to move to a new program. We should not hold progress for the lowest common denominator. BTW, there is nothing wrong in using EUR, the official abbreviation for the Euro symbol. Please see this explanation from the EC: http://europa.eu.int/euro/html/rubrique-cadre5.html?pag=rubrique-defaut5.html|lang=5|rubrique=221|chap=15 The limitation appears only when plain text mail is used. Should we really change the mail standard to ISO-8859-15 when all we need to do is use "EUR" in that case? if you want to use the real character, you can use Win-1252, UTF-8 or ISO-8859-15, whichever you think the recipient will appreciate. In te abosence of the real standard which has been registered as such, I hesitate to use ISO-8859-15. If there is a RFC or some other proposal which has 8859-15 as the mail standard for Western scripts, please let us know. HTML mail uses NCRs and so will pose no problem at present. So my real preference might be to do nothing in this case and let the user decide to use UTF-8, ISO-8859-15 or ISO-8859-1, etc.
> I'm not suggesting this at all. I am suggesting that we use > UTF-8 only when the characters exceed ISO-8859-1 limit. ah, ok, now that's better. I think it should be more like this: US-ASCII --> iso-8859-1 --> iso-8859-15 --> UTF-8 > If your newsreader or > other mail program cannot deal with it, I think it it time > to move to a new program. We should not hold progress > for the lowest common denominator. Some people don't have the choice to simply upgrade their program. > BTW, there is nothing wrong in using EUR, the official > abbreviation for the Euro symbol. Of course not. You can also substitute umlauts to be on the safe side, or convert all extended characters back to US-ASCII, now that would be really safe. :-) Holger
If UTF-8 supports the special characters of most countries and if UTF-8 has been designated by the powers that be to become the desired standard, then we should support this forward looking approach. I suggest to bring up a window when there are special characters that asks/informs (incl. "[ ]ask next time") to use UTF-8. The only two reasons against this that were presented were: 1. Some mail readers don't handle UTF-8. Why not? Is it difficult to implement? If not, then tough sh**t. The programmers should update their software pronto or loose those clients who want to send/read special characters. 2. Some people don't have a choice to upgrade their software? Huh! Who? Employees of companies with paranoid & deaf IT personnel (they exist)? Then the employees need to tell the IT what they need. The IT is there to serve the needs of the employees, not the other way around. I don't think this will be a problem if the solution (an UTF-8 capable mail proggy) is sufficiently publicised. BTW. How badly mangled would the text be if it is sent UTF-8 and received by a iso-8859-1 only reader? If the answer is "not much", then let's move *forward*. PS. Netscape can always bring out NC4.79 with UTF-8 support ;)
I defer to other people's expert opinion on the best way to support the Euro symbol, but I believe it boils down to: 1. Picking a sensible default character set from the locale when a profile is first created. 2. Checking for unencodable symbols during composition and giving the user a message in plain English explaining how they may proceed.
I think the best way to deal with this is to use the best "minimal" encoding possible, like Forte Agent does it: Ascending Charsets. 1.) US-ASCII. The minimal charset. That's the basic charset. 2.) iso-8859-1 --> already necessary for a line like "Just my 2 ¢" needs it. 3.) iso-8859-15 (or windows-1252) --> extended iso-8859-1 with euro support. Compatible to muas/nuas that don't understand utf-8, but have no problem with basic iso-8859-1. They exist, and they will exist for quite some time. Be conservative in what you send. You don't know what the recipient uses. We shouldn't use a complex charset just because it's available and some readers know how to deal with it. Most Windows readers don't have a clue about it anyways, they use windows-1252 for displaying a message, and utf-8 looks horrible in raw text. :-) 4.) utf-8, as a last resort.
Adam, thanks for summarizing the issues. > 1. Picking a sensible default character set from the locale > when a profile is first created. This is a much broader issue than this bug and it involves not just this particular case but many others and so we should deal with this in an overall context of setting the defaults based on the user's lang choice. I will try to create a new bug for it with some specs. In the mean time, it might be useful to publish a page which lists default charsets for a variety of languages. (I can work on this list and publish it under International Users page <-- "Help | Help & Support Center" menu. > 2. Checking for unencodable symbols during composition > and giving the user a message in plain English explaining > how they may proceed. We can try to make the current warning better. Let's see how much improvement we can make here. It might get too wordy for a dialog. We might want to create a help item that explains how the user should proceed. To account for many different ways in which this warning comes up will not be that easy. I don't think we can just special-case the Euro example.
Accepting but not sure I am the right person! Naoki, is it for you?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
I take this but the reported behavior is an expected one for plain text mail because no way to encode EURO in ISO-8859-1. Sending as other charset (e.g. windows-1252, ISO-8859-15) may not be understood by other mail clients. This is same for smart quotes, bullet, ellipsis, trademark, etc...
Assignee: ducarroz → nhotta
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.9 → ---
I would be nice if you could send as HTML. Jean-Francois, is that possible to convert a plain text message and send as HTML?
Summary: Euro symbol turned into "EUR" in sent mail. → Euro symbol turned into "EUR" in sent mail (plain text)
Sur it could be possible but what for? the goal of using plain text is to make sure every mail reader will be able to display it correctly!
Right, but offering it as an option would not be bad, for the case like this. The user might understands plain<->html difference better than different charset names.
Marina, If this is not in your area please reassign to the right person. This is not my bug.
QA Contact: sheelar → marina
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
I really don't think, we should suggest HTML just because of the Euro symbol. HTML has many severe backward-compatibility problems, many more than iso-8859-15. I also don't think that we should just ask the user. Charsets are something that very few - if any - European users understand, since we have a very limited character set and expect it to just work. I think we should use ascii or iso-8859-1 when possible. If there is a Euro symbol, use iso-8859-15, utf-8 or convert to EUR, I don't know. But I'd do it transparently in any case. Imagine you'd get a(n additional) dialog box whenever you use "$".
Now I want to make a simple statement. I cannot see something as a problem that functions fine in most of the cases - if that is so. Mozilla does not have the standing to allow for over-controlled behaviour. So possibly one suggestion would be to leave encoding/interpretation as is in NN4 until someone finds the gutts to switch to a newer encoding standard. Why not? It works! Second, one thing must be clear: translating to "EUR" may not be a final solution but only intermediary. Other it would mean an offence against the Euroean currency - immagine "$" would be translated into "USD"! It also hampers the grafical design pattern of a text, which could be undesirable in cases. Third, ok, if you insist on purity: avoid warnings that the user probably doesn't understand. That's how it is now. It's awful! If the user is in a hurry this could mean a heart-attack! I'ld recommend a two level transformation mechanism. First, give an option somewhere in preferences "automatically switch to appropriate encoding standard when sending messages" and tick it TRUE by default. Then do what is necessary in the mails without bothering the user. Second, if that option is unticked, bring up a warning before sending - just similar as it is now - *but with a qualified option*, that means say: "We cannot send as is but you have the option a) send with encoding iso-8859-15 (or whatever) b) translate symbol x into literal "EUR" (or whatever it is) c) go back to edit. Additionally say which is recommended. That means someone would have to work out typical conversion cases and make a table of suggestions of it. - Wolf
Sample email that I sent to myself from the Mac OS X Mail application. Note it uses quoted-printable to embed Euro symbols as =80.
Nice test! Thing is, people can only see the message as plain text (encoding) in the appendix, not as it appears in the mailer. But I suppose it shows well as the charset is WINDOWS-1252. Interesting! Is that an indication against the theory of heavy cross-platform mismatch (of encoding 0x80)? I have no idea what the relations are. Does Mac now use WINDOWS-1252 normally? An analysis or well-based estimation on the quantity relations of a possible mismatch of 0x80 might help us find a decission on an interim solution for the Euro transmission. - Wolf
Comment on attachment 65892 [details] Sample email containing euro symbols Fixing mimetype. Try to open it again.
Attachment #65892 - Attachment mime type: text/plain → message/rfc822
Err, sorry to butt in, but as far as I understand it the default charset for European locales is already ISO-8859-1. As the difference between ISO-8859-1 and ISO-8859-15 is only the Euro symbol, and most mail readers understand ISO-8859-1, why not just switch transparently to ISO-8859-15 when a message containing the euro symbol is sent? It wouldn't even be necessary to warn the user... If the reader understands ISO-8859-15 the symbol will display correctly, else it will just display one character of garbage. An European seeing the garbage character would probably interpret it correctly as meaning Euro based on context, which is probably better to converting it to EUR. In contrast, in UTF-8 the euro symbol becomes three special characters, which is probably harder for a human to interpret as an euro symbol.
> If the reader understands ISO-8859-15 the symbol will display correctly, else it > will just display one character of garbage. Not necessarily. Maybe it notices that it doesn't support ISO-8859-15 and freaks out. The fact that ISO-8859-1 and ISO-8859-15 are almost identical is a "coincidence" - ISO-8859-14 is completely difference, I guess. > In contrast, in UTF-8 the euro symbol becomes three special characters Worse yet, all umlauts become 2 or 3 special chars.
My preference would be to make it more obvious to the user to choose the right charset for their locale in the first place, (e.g. a page in the New Mail/News Account wizard) and secondly settle on a way to deal with old email readers. My gut reaction is find out which mail readers have problems and if there's only a few then say to hell with them. Most email readers nowadays *should* be able to cope with charsets, and if not then they'll see an odd character where there's supposed to be a euro. Time to upgrade.
Another strategy would be wrapping the message in different encodings in a MIME multipart/alternative message. This solution also has its drawbacks, but I think it deserves mentioning.
I used windows-1252 because it is a super set of ISO-8859-1, also it takes care more than Euro symbol. I think mail clients which understand MIME charset can also take care windows-1252.
IMHO, using ISO-8859-1 by default, and ISO-8859-15 if and only if a Euro symbol is encountered is the correct solution. Since all ISO-8859 series are a superset of US-ASCII, all conformant mailers that do not recognize ISO-8859-15, must at least be able to show the characters in the US-ASCII set. (RFC2046/2049) Futhermore, more and more mailers support or will support ISO-8859-15. This can and should be handled transparantly.
> IMHO, using ISO-8859-1 by default, and ISO-8859-15 if and only > if a Euro symbol is encountered is the correct solution. > [...] > This can and should be handled transparantly. I quite agree. Why should we use to Windows-1252 instead of ISO-8859-15?
I used windows-1252 because it will conver more characters. I filed bug 124198 for enhancement of charset fallback.
I guess we are not going to settle this issue on which encoding to use as the 1st fallback, 8859-15, windows-1252, or utf-8. If you're going to implement an automatic fallbcak, I would like to suggest the following: 1. Automatically fallback to ISO-8859-15 if the Euro and other characters are covered by it. 2. If there are charaters outside of 8859-15, then fall back to Windows-1252 next. So this is a 2-step process. Even if we implement something like this, I think we should review this in the near future and see if the right thing may not be moving to UTF-8. I heard a rumor that Outlook Express may move to UTF-8 as the non-ASCII standard. I have not confirmed it but it is something to think about for the future.
I can implement more than one fallback for ISO-8859-1 if that is necessary (but I prefer to do it as part of bug 124198). Why ISO-8859-15 is preferred over windows-1252? Is that supported by more mail clients? OE support ISO-8859-15?
> Why ISO-8859-15 is preferred over windows-1252? Because the former is a standard, while the latter is controlled by Microsoft.
I am concerned about backward compatibility like others. Let me provide some test results on Windows: Receiving Euro in plain text mail body: Communicator 4.79: 8859-15 (NO on Win/Mac. OK on Unix), Win-1252 (OK), UTF-8 (OK) 8859-1 (OK) Eudora 5.1: 8859-15 (NO), Win-1252 (OK), UTF-8 (NO) 8859-1 (OK) Outlook Express: 8859-15 (OK - Latin 9), Win-1252 (OK), UTF-8 (OK) 8859-1 (NO) Mozilla/NS 6: 8859-15 (OK), Win-1252 (OK), UTF-8 (OK) 8859-1 (OK) ** You can tell from this that Mozilla is the most tolerant viewer of the Euro character. The header display support is somewhat less than this since Comm 4.79, Eudora 5.1 and Outlook Exress 5.5 all depend on the system default charset for display. Thus Mozilla is the only Mail that displays the Euro in headres as is under any of the 4 encodings on any language version of an OS, e.g Japanese Windows. You can draw your own conclusion but Windows-1252 is the only encoding that works with all of the above for plain text body dislay. Like others I don't think it is good to spread Win-1252 since it contains Win-only characters in 0x80 - 0x90F range but for the Euro, it is not a bad choice. If there is an RFC that said 8859-15 is the new mail standard for Western languages, then I am all for it. But there is no declared standard except by practice, ISO-8859-1. I would like to ask others if Win-1252 as the fallback would break Mail programs on Mac or Unix. If so, we may have a case for an option to choose which encoding the user prefers as the 1st fallback.
Test results using MacOS X 10.1. Outlook Express 5.02: 8859-15 (OK), Win-1252 (OK), UTF-8 (OK) Mac OSX mail 1.1: 8859-15 (OK), Win-1252 (OK), UTF-8 (OK) Both clients show body and header correctly. They do not have ISO-8859-15, so not able to send in ISO-8859-15. For reply, OE forces me to use UTF-8, Mac mail silently changes a charset to windows-1252. I do not have Eudora.
Keywords: nsbeta1
Sylpheed (GTK+) utf8: no (complete line with Euro char not readable) / yes (line with umlauts), iso-8859-15: yes (as currency char), win-1252: same as utf8. Before using Windows-1252, I'd rather use "EUR".
We might entertain forking the code for Unix vs Win/Mac given the facts so far: Unix: fallback on ISO-8859-15 Win/Mac: fallback on Windows-1252
The again, maybe not. We have no idea who is receiving what Mozilla creates -- even if the creator is on Unix, the recipient may not be.
Attached patch More clean up for the old code. (obsolete) — Splinter Review
Attachment #69360 - Attachment is obsolete: true
I am going to ask reviews and try to check in for 0.9.9.
Target Milestone: mozilla1.0 → mozilla0.9.9
Note that CP1252 is a superset of iso8859-1 while iso8859-15 aka Latin9) is NOT. There are a few characters in iso8859-15 that replace the iso-8859-1 characters (besides the Euro symbol for the international currency symbol). See http://czyborra.com/charsets/iso8859.html: The new Latin9 nicknamed Latin0 aims to update Latin1 by replacing the less needed symbols ¦¨´¸¼½¾ with forgotten French and Finnish letters and placing the U+20AC Euro sign in the cell =A4 of the former international currency sign ¤. The tests cited in previous comments only tested for the Euro symbol. Sending as iso8859-15 will break these other characters. I don't know how many people this would affect. Do we care?
If the data contains both '¤' and the Euro symbol then it is not going to be sent as ISO-8859-15, so we don't have the problem. I assume that the use would be alerted (i.e. same as the current behavior).
> The tests cited in previous comments only tested for the > Euro symbol. Sending as iso8859-15 will break these other > characters. I don't know how many people this would affect. > Do we care? As reported above for Windows, the Euro with 8859-15 breaks Win & Mac Communicator 4 mail, which does not have support for it.
The latest patch sends the Euro in ISO-8859-15 (implementing the comment #32). If any other behavior desired, then please propose in the bug, thanks.
The review is on hold now. I will wait one more day for any input.
I am very sorry but I am going to withdraw the proposal in comment #32. I don't think it is a good idea to break Communicator 4.x users. (I have no idea why we didn't implement 8859-15 for Win/Mac Communicator 4 but I don't want to hear complaints from those people who will potentially upgrade to Mozilla or Netscape 6 that we again ignored them.) So, here is my new proposal: 1. Automatically fallback to Windows-1252 if the Euro is present in plain text mail body, or headers. (Make this the default.) 2. Offer a prefs.js option to choose what is proposed in http://bugzilla.mozilla.org/showattachment.cgi?attach_id=69368 as the default behavior instead. Ben and others, who prefer ISO-8859-15, I'm sorry but I don't want to screw loyal users of Communicator 4. They are the users who potentially might move to the new Gecko-based clients. Hopefully, those who feel strongly will choose to turn on the option to send ISO-8859-15 instead. As I said before, I think we shold re-evaluate again in the future and possibly move to 8859-15, UTF-8 or Win-1252 by studying the prevailing condition at that time.
> Ben and others, who prefer ISO-8859-15, I'm sorry but > I don't want to screw loyal users of Communicator 4. That's completely reasonable, given how much market share 4.x still has. But, I'd rather convert to EUR than switching to a Microsoft charset.
Windows-1252 is a non-standard, MS charset, but it's the one that seems to work most consistently between platforms. It seems the most logical charset to fall back to if only the Euro is needed, because it's a superset of ISO-8859-1. So if you fall back from ISO-8859-1 to Windows-1252 you only gain the Euro and change nothing, whereas if you fall back to ISO-8859-15 you also change a few characters. I thought ISO-8859-15 was the right choice before, but now I'm not so sure.
> Windows-1252 is a non-standard, MS charset, but it's the > one that seems to work most consistently between platforms. From the Internet charset name registry point of view, this is not correct. Windows-1252 along with other Windows-12xx names are registered for use in the Internet. It has the same status as ISO-8859-15. See IANA Charset Resgistry: http://www.iana.org/assignments/character-sets Windows-1252 was registered following the procedures specified in RFC 2978: ftp://ftp.isi.edu/in-notes/rfc2978.txt The question here really is not about the standard but what would be the best practice for our users given that every one of the choices we have has some drawback.
nsbeta1+ per triage meeting
Keywords: nsbeta1nsbeta1+
Gentlemen, Actually codepage 8859-15 makes many things look weird, see this bug for ref: http://bugzilla.mozilla.org/show_bug.cgi?id=122455 Depending on the web page you're browsing, you might see every instance of English ownership ("'s" suffix) incorrectly, ditto for many other "grammar" characters. Also, for whatever reason, I didn't see yen char and others the -15 codepage inserts, just question marks. I suppose doing transliteral text + fancy mime/html version might not be the worst idea ever.
OS: Windows 2000 → All
Hardware: PC → All
Target Milestone: mozilla0.9.9 → mozilla1.0
The proposal in comment #48, 2. Offer a prefs.js option to choose what is proposed in http://bugzilla.mozilla.org/showattachment.cgi?attach_id=69368 as the default behavior instead. Is the pref supposed to select whether sending as ISO-8859-15 or windows-1252? I assume no UI is needed. But is that necessary? It would be easier for the user to set ISO-8859-15 as a default charset than tweaking the pref value.
> Is the pref supposed to select whether sending as > ISO-8859-15 or windows-1252? I assume no UI is needed. > But is that necessary? It would be easier for the user > to set ISO-8859-15 as a default charset than tweaking > the pref value. This is true. I think what we need is a dialog somewhere saying that the we offer Windows-1252 as the default in this case. But that the user may try other choices -- with risks understood -- UTF-8 or ISO-8859-15. If we can say this somewhere, we would not need any pref option. Should we show something like this when the user get into this logic for the first time? Any other idea?
momoi, you mean a popup dialog as we have today (just a custom one with better options)? I thought the whole point of this bug was to not bother the user with that issue. It's not important enough. No harm is done, if we convert to "EUR".
Ben I think several hundred million potential Mozilla users would disagree with you :) People using the Euro symbol (i.e. Europeans) will frequently hit this problem so it is important enough to justify its own dialog to describe what the issue is and what solutions are available to solve it.
Adam, I *am* German, but I don't understand you. > People using the Euro symbol (i.e. Europeans) will frequently hit this problem > so it is important enough to justify its own dialog The frequency with which they "hit this problem" is exactly one of the major reasons *not* to show a dialog. Also, please elaborate why sending "EUR" (E-U-R) and not € (the Euro char) is a problem.
The dialog can have a "never show me this again" checkbox, but we do need a dialog and it needs to give the user some meaningful choices.
> The dialog can have a "never show me this again" checkbox I'm not an UI expert, but I guess that most users are scared to check that or don't even see it. I know that UI experts hate popup dialogs. > but we do need a dialog You gave no reason why. Again: Why not just "EUR"? > it needs to give the user some meaningful choices. There are no meaningful choices for users, because they do not understand the issues. If *we* don't even know what to do, how can the user ever? It's not a question of user preference; and in almost all cases, the user also does not have more information than we do, but less. In other words, if anybody can make a meaning decision, then it's us. (The occasional user who does know the issues can still select a charset via the menu or prefs.) What do you want to tell them? "If the recipient uses Netscape 4.x, use that; if the recipient uses mutt, use that; and if the recipient uses OEMac, use that"? As mpt recently said, 'if you cannot decide, it's completely unfair to offload that decision on the user.'
What I am trying to do here is to let the user to send a message without the alert if possible. I think the user does not care about the charset when sending symbols like Euro or smart quotes. I like to have no additional UI for this.
My 2 cents' worth: > Also, please elaborate why sending "EUR" (E-U-R) and > not € (the Euro char) is a problem. I think this is a problem because it would make Mozilla look shabby compared to other mail programs. I think a dialog could do a lot to clean up any confusion that may arise in the users's mind. The dialog could be something like: "You have sent a message containing the Euro symbol. Some older email/news readers may not be able to display the symbol correctly and will display a currency symbol ( ) instead. What would you like to do: (RADIO BUTTON) Send the Euro character as is (€). (RADIO BUTTON) Convert to the official abbreviation (EUR). (CHECKBOX, default checked) Remember this decision for future messages I send. In the future, you may change this setting in the 'Advanced' section of the Mozilla preferences." This does add an extra pref, but it would make Europeans feel that Mozilla actively supports Europe and the Euro.
Further to that, a dialog could refer the user to a page of help that discusses the issue in more depth and solutions to the problem.
> "You have sent a message containing the Euro symbol. Some older email/news > readers may not be able to display the symbol correctly and will display a > currency symbol ( ) instead. What would you like to do: > I think the behavior (display a currency symbol) is true if ISO-8859-15 is used. I thought it was agreed to use windows-1252.
It would be interesting to know just /how/ exactly the popular readers fail, not just that they do. I can imagine the following failure modes when encountering a message containing the euro code in an unknown charset: (a) Refuse to display the message outright. (b) Display the message as if it were sent in a known charset, possibly warning the user that some characters may come out differently than intended by the sender. (b) is clearly superior to (a) in many cases (e.g. popular western charsets that overlap at least in the ASCII range), and not worse than (a) even if the charsets are completely disjunct (the user will see garbage instead of nothing at all). Therefore my suspicion is that many authors chose (b). If that is correct ISO-8859-15 may still be the better choice. With it agents failing in mode (b) will probably display the generic currency symbol instead of the Euro. Confronted with an windows-1252 Euro code they could display almost anything (including, by chance, the right glyph), as the code is undefined in the older network standards. In another vein, would it be possible to override the setting on a per-address-book-entry basis? So if I am sure that person X will not be able to view messages sent out with the default Euro charset I can choose something different for her. If it wants to get really smart-alec Mozilla could even sniff the User-Agent in messages from a mail partner, and base the charset decision on that.
>I think the behavior (display a currency symbol) is true if ISO-8859-15 is used. >I thought it was agreed to use windows-1252. Ok, the currency symbol only appears if the mail is sent in ISO-8859-15. But something wrong must happen if you send the Euro in a Windows-1252 encoded message and the reader does not support it. Does anybody know what happens? We might tell the user. :)
I agree that ideally, when sending the Euro currency symbol in a plain text email, all recipients should be able to view this as the glyph for the Euro currency symbol. But with the current state of mail readers (aka MUAs), this is not possible as has been cited in previous comments. In the future, it looks like there will be a migration of all MUAs to support UTF-8. That evolution appears to be occuring now. The solution which works NOW for all MUAs is the string "EUR" which is the official abbreviation for the Euro currency symbol (although not the most elegant representation). See the EU Euro website: http://europa.eu.int/euro/html/rubrique-cadre5.html?pag=rubrique-defaut5.html|lang=5|rubrique=221|chap=15 The graphic symbol for the euro looks like an E with two clearly marked, horizontal parallel lines across it. ... The official abbreviation for the euro is 'EUR'. It has been registered with the International Standards Organisation (ISO), and will be used for all business, financial and commercial purposes, just as the terms 'FRF' (French franc), 'DEM' (Deutschmark), 'GBP' (pound sterling) and 'BEF' (Belgian franc) are used today. Adding dialogs and preferences to provide special case handling of the Euro in iso-8859-1 email will add confusion and usability problems and will still not work in all cases as cited in previous comments. I think this issue will resolve itself. For several years the Netscape mail client sent un-encoded Latin1 headers because we discovered in early Beta tests that many MUAs did not support MIME encoded headers. We added a pref to enable MIME-compliant headers but the default was off. After a few years Netscape switched the default as most MUAs had finally added the MIME support. I believe this will be the case for UTF-8 support as well. And remember that there are no problems with rich-text (HTML) mail because we can use the HTML entity for the Euro, "&eur;". This only affects plain text. Internet mail strives for interoperability. I'm not convince the proposed solutions are really improvements since they make things less interoperable. My 2 cents.
bobj: would you like the dollar symbol to be converted to USD if some mail readers didn't support the $ character? I say the best thing is to send the glyph (in Windows-1252 or ISO-8859-15, it doesn't matter). If the the receiving MUA chokes it will display a garbage character, but any human reading the message will probably figure out, based on context, that it's meant to be an euro symbol. If a MUA doesn't understand the Euro symbol, then the Euro symbol won't display in that MUA. But that does not mean Mozilla should not send it.
Priority: -- → P4
>If the the receiving MUA chokes it will display a garbage >character, but any human reading the message will probably figure >out, based on context, that it's meant to be an euro symbol. Users could ALSO figure out "EUR" mean the euro symbol based on the context. >If a MUA doesn't understand the Euro symbol, then the Euro symbol >won't display in that MUA. If the sender choose to send out as HTML, or both HTML AND PlainText and the MUA doesn't understand the HTML, then the Euro symbol won't display in that MUA.
109342 Euro symbol turned into "EUR" in sent mail (plain text) Impact Summery Impact Platform: ALL Impact language users: 560 M 100% Probability of hitting the problem: High Severity if hit the problem in the worst case: the Euro sign will be convert to three characters "EUR" Way of recover after hit the problem: User send out as HTML mail instead.Or send out as UTF-8 mail instead. Risk of the fix: Medium Potential benefit of fix this problem: Unknown ADT3
Whiteboard: adt3
Gentlemen. I was using lookout for a while since mozilla mail couldn't handle IMAP mail attachments (fixed now) .. What Outlook does, it does suggest "Do you want to use utf-8 in your message because it contains character we can't encode using the current code page?" It does *not* give you an option to "Yeah, and do so from now on" or "No, and do not bother me about it anymore" .. So. If mozilla mail asks the user *and* gives the user option to do it either way in future.. It's doing things better than the industry standard app does. Since that query is in there, you can bet you're going to see a *lot* of those utf-8 messages floating around. So everyone will have to cope, sooner or later. Maybe we'll also have a lot of people using outlook that get tired of being nagged about it and start to look for alternatives. You never know. If they at least learn to open the "options" menu, everyone wins.
My recommendation is to settle on a solution for the next commercial release and Mozilla 1.0. European users deserve a better solution than EUR for plain text mail that they use often. If you want a compromise, I have a proposal ready to go. If you want a no pref option solution, go with Windows-1252 or UTF-8. Either should work for all major email programs. (Win-1252 work better for Eudora.) Let's make a decision.
> bobj: would you like the dollar symbol to be converted to USD if some mail > readers didn't support the $ character? The question is not whether I want to display USD or $. The question is whether the receiver of my email to will see $ or garbage. When a plain text mail message is composed in Mozilla, the Euro currency symbol will display correctly, it cannot determine if the receiver of the message will render it correctly or as garbage. So the options are (A) Send Euro currency symbol (without UI). Optimal for some receivers (B) Send "EUR" (without UI). Sub-optimal, but works for all receivers (C) UI Dialog to allow sender to decide between (A) or (B) Option (C) brings with it lots of usability issues - Many users probably won't understand the choice or how to choose - Some users will want the choice to be sticky and other will not. - Should stickiness be per receiver or per sender? If I send to a particular user, I may or may not know if that user can render Euro. - What about mail to multiple addresses or posting to newsgroups?
This is what I propose (reasoning follows): 1. User sends a plain text message containing the euro character €. 2. Mozilla prompts the user as proposed in comment 62 above. 3. If the user chooses to send the euro character as is, Mozilla sends the message in Windows-1252. 3a. If the receiver's MUA supports the euro symbol, he will see it. 3b.If the receiver does not support the euro symbol, he will see exactly one character of garbage. If the receiver is European, very probably he will deduce that the garbage character is an euro symbol based on context. The reasoning is as follows: 1. Why Windows-1252 instead of ISO-8859-15 or UTF-8: - Windows-1252 is a superset of ISO-8859-15, whereas ISO-8859-15 is not. - UTF-8 is poorly supported by mail readers (worse than Windows-1252) - If the receiver's MUA does not support the charset, with Windows-1252 or ISO-8859-15 the message is readable and there is only one character of garbage, whereas with UTF-8 the whole message becomes unreadable. 2. Why the prompt: - It calls attention to Mozilla's support of the Euro, making European users feel that Mozilla supports them and the Euro (good user experience). - It explains the situation clearly to users. - It explains to them how to change the setting if they need to. 3. Why we should not automatically fall back to E-U-R: - It makes Mozilla look shabby compared to other mail programs. To a non-savvy user, it might seem that Mozilla does not support the Euro at all! - For a European, it is very easy to figure out that the garbage character is in fact meant to be a Euro symbol based on context, even if his MUA does not support the Euro. - Sending as is takes advantage of more recent MUAs, including Mozilla, that support the Euro. Why should we fall back to the lowest common denominator if the only negative consequence on old MUAs is a garbage character, which is easily interpreted as € based on context?
> If the receiver's MUA does not support the charset, with Windows-1252 > or ISO-8859-15 the message is readable and there is only one character > of garbage Wrong. As you can see in comment 37, there are mailer who fail worse with unknown chansets. In that case, the mailer omitted the whole line (I consider this a severe bug). Other mailers might not display the msg at all.
>1. Why Windows-1252 instead of ISO-8859-15 or UTF-8: 1.a Why NOT windows-1252 ? - windows-1252 are not international standard nor national standard, both ISO-8859-15 and UTF-8 ARE. ( should we send out as x-mac-roman if the sender are on Mac ? ) >- Windows-1252 is a superset of ISO-8859-15, whereas ISO-8859-15 is not. Yea, but UTF-8 is a superset of windows-1252, while Widnows-1252 is NOT
I suggest we do the following 1. nsbeta1- this bug 2. for any build localized for European country, change the default mail charset to windows-1252, UTF-8 or ISO-8859-15, and the business owner of the european localization of the shipping product should make the decision of what to be used in their countries.
Ben: I may be wrong, of course, but the fact that sylpheed fails to display a whole line when it fails to draw a character sounds like a bug in the mailer to me. Also, as it cannot display the euro *whatever* character set is used, perhaps it does not support the euro at all and might benefit from a fix. Have you considered this possibility? Frank: >- Windows-1252 is a superset of ISO-8859-15, whereas ISO-8859-15 is not. >Yea, but UTF-8 is a superset of windows-1252, while Widnows-1252 is NOT Sorry, I meant to say "Windows-1252 is a superset of *ISO-8859-1*, whereas ISO-8859-15 is not". The more similar the charsets are, the less likely they are to cause problems. In my opinion, UTF-8 is likely to break on many more mailers than the other two, and if it doesn't work the results are probably worse than both the alternatives, because it's the most different from the standard ISO-8859-1 and straight ASCII. > windows-1252 are not international standard nor national standard, > both ISO-8859-15 and UTF-8 ARE. I am not an expert on this, I was basing myself on Katsuhiko's views (comment #51 and onwards). I was in favour of ISO-8859-15 before, but the fact that it's not a pure superset of ISO-8859-1 but breaks some characters put me off. Also, according to the results we have here Windows-1252 is the most compatible with different MUAs (including Netscape 4.x, and apart from sylpheed which doesn't work with any charset). I don't really think it's very important exactly which character set is used, as I am going to be using Mozilla mail. But I feel that it's not right to fall back to E-U-R just because some mailers don't support the euro symbol. Why should we europeans not be able to send mail containing our currency symbol? After all, $ is in standard ascii... :-)
> Why should we europeans not be able to send mail containing our currency symbol? Because it's new and new standards take a loong time to propagate, esp. in email? It took ages (10 years?) for MIME for propagate, and that is clearly very useful for everyone.
>> Why should we europeans not be able to send mail containing our currency symbol? 1. you could if you change the mail charset to "ISO-8859-15" by hand 2. you could if you set the default mail charset in your preference to "ISO-8859-15" by hand 3. you could if you send out HTML mail 4. you could if the localization language pack (for german, franch, or other europen languages) use "ISO-8859-15" as the default mail charset. And it is up to the localization to decide that. >>Because it's new and new standards take a loong time to propagate, esp. in >email? It took ages (10 years?) for MIME for propagate, and that is clearly very >useful for everyone. And by not fixing this bug it make people one more reason to adopt html mail and/or ISO-8859-15, right ? nsbeta1-
Keywords: nsbeta1+nsbeta1-
Ben: according to the sylpheed web page, it supports the euro symbol (through ISO-8859-15) in version 0.7.3 or later.
>> Why should we europeans not be able to send mail containing >> our currency symbol? > 1. you could if you change [...] This is exactly what could give Mozilla a bad image in Europe. The average user will not know how to change this on, so all messages he sends with Mozilla will have € converted to EUR. The average user probably won't understand why, and will probably be surprised that his email messages are being silently changed by mozilla (I would). However, the average user *will* understand that all the emails he receives from people using Outlook, Eudora, Mac OS X mail, and other mailers (except Mozilla) *do* have the euro symbol. So what will he think? Probably "hmm, the euro symbol doesn't work right in Mozilla: Mozilla is behind the times". There is the opportunity to do it right for 1.0. Why pass it up? > And by not fixing this bug it make people one more reason to adopt > html mail and/or ISO-8859-15, right ? This is a step in the opposite direction. HTML probably doesn't work on more mail readers than ISO-8859-15 or Windows-1252. Ok, we can send as both html and plain text, but the euro symbol won't work anyway, because they ignore the HTML and just display the text... This is not a case of ISO-8859-15 vs Windows-1252, but a case of € vs EUR. Why should we convert to EUR if only about 2% of recipients (and no Mozilla users) will see a garbage char? Re-reading the comments posted here, it seems that I am not alone in thinking along these lines. What is the opinion of the other participants in the discussion? What is the opinion of the bug owner? Wouldn't it be better if we discussed the matter instead of just dropping it like this? Especially as there is a patch ready...
Although this is not a discussion forum, let me answer. I'm from Germany and thus a European, too. > Re-reading the comments posted here, it seems that I am not alone in thinking > along these lines. No, you aren't. > What is the opinion of the other participants in the discussion? I believe the best way is the following: When someone sends a mail as ISO 8859-1 or anything else (other than UTF-8, ISO 8859-15 or Windows-1252!), he is warned in a modal dialog that the specified charset doesn't support the EUR symbol. He is given the option of a) sending the mail as <popup menu letting the user choose any of the three>, b) sending the mail with our infamous "EUR" replacement or c) returning to the compose window and editing the mail.
My opinion is I want to see and send the ¤. I defer the implementation to others but I feel converting it EUR because a small and ever diminishing percentage of users have antiquated mail/news clients is pretty lame really.
The only reason this bug is not getting resolved is because we can't agree on a solution. Cutting out all the repeats of discussions that went on earlier,I think we should go with one fallback default wihout asking the user. Windows-1252 despite some drawbacks will have fewest problems with other programs and platforms. Windows-1252 is registered in IANA list and so has an official Internet status. We can debate this later anc make other changes later. For Mozilla 1.0, can we agree on something now?
I can subscribe to that. That's certainly better than the current behaviour, IMO.
Sounds good to me. It's important to get this right by 1.0. If we see that it causes lots of problems, we can always change the behaviour later...
You have my vote.
I am going to renominate this bug for nsbeta1 for the following reasons. 1. For a large percentage of users in Europe and also in other countries, plan text mail is the preferred format for messages. We tried to change this by setting the default to HTML mail, but this has not completely succeeded. For the majority of users in many countries, plain text mail format is the preferred way. We need to deal with the Euro currency issue in the best compromise we can find for Mozilla 1.0/NS6. 2. A way to send Euro currency character in plain text mail should be available to users of all language versions of Mozilla/NS6. If we adopt Frank Tang's approach to leave this matter to localizers for Latin 1 countries, this leave out users whose default encodings are Latin2, Baltic, Greek, Cyrillic but preferring to write in ISO-8859-1 for business and other types of international communication. This also leave out users of other encodings in East Asia whose encodings are not usually ISO-8859-1 but for business and other communications needs may be using ISO-8859-1 mail. An easy way to send the Euro currency character in plain text mail should exist for any users trying to use ISO-8859-1, which is the most widely used mail encoding. The Euro currency is an international currency and its use is wide spread now in business situations, and not having a good fallback for this in ISO-8859-1 is a big hurdle for Mozilla/Netscape 6 users. 3. I suggest for now (Mozilla 1.0 and the next Netscape 6 client) we go with what nhotta proposed in: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=68195 This adds fallback to Windows-1252 when ISO-8859-1 contains characters that cannot be dealt but can be dealt with the use of Windows-1252. This is not the perfect solution but is probably better than sending "EUR". For those user who want the current behavior, it would be nice if we can leave that as a prefs.js option as discussed somewhere above. 4. We can reassess this approach post Mozilla 1.0. I have a more detailed proposal with various options built in for handling the Euro currency symbol. ( I will not attach it to this bug for fear that it would derail the current discussion.) In summary, the above plan is workable and the patch already exists. We should review the situation post Mozilla 1.0 and come up with a better plan. I suspect that within the next year or so, this will become a lot clearer. I also do not think it would be a good idea to fragment our way to deal with the Euro currency too much. Localizers into different languages should not be carrying the burden of choosing which encoding is to be the default for Latin1 mail. We really should have a consistent approach for all European localizations. All that the localizer-based solution is doing is passing the buck to them because we cannot decide here. For the goodness of Mozilla and Netscape 6 **products** in Europe, we should agree on something and go forward even if it is not perefect. Let's not pass the responsiblity/burden to each localizer. We want more consistent behavior from our products. Let's also not forget why this bug was filed in the first place. That is because people want a common solution that applies to all Mozilla-based products if at all possible. That need is now even more keenly felt since the Euro currency became official in Europe this year. We need to be Euro ready. I request that we reconsider this bug for nsbeta1 and Mozilla 1.0.
Keywords: nsbeta1-nsbeta1
This is a high-impact bug for many average users. Not being able to send the Euro currency symbol in some encoding when the users choose the 8859-1 mail encoding makes our Mail much less competitive with other mailers for European users also. I suggest adt1 or adt2 for classfication.
>I also do not think it would be a good idea to fragment >our way to deal with the Euro currency too much. It is not our choice to decide "fragmentation" or not, because the decision is made in 1985 the time ISO-8859-1 got published and in 199x when ISO-8859-15 got published. The Internet Mail used for European will be fragment simply because ISO-8859-1 cannot encode Euro sign and ISO-8859-15 are NOT backward compatabile with ISO-8859-1. No matter what we decide to do, there will be other mailer "fragment" the internet email usage outside our decision/control. >Localizers >into different languages should not be carrying the burden of >choosing which encoding is to be the default for Latin1 mail. Application developers implement different mailer should not be carrying the burden of choosing which encoding is to be the default for Latin1 mail neither. The ISO standard body already choose for us- ISO-8859-15, right? >We really should have a consistent approach for all European >localizations. All that the localizer-based solution is doing >is passing the buck to them because we cannot decide here. We really should have a consistent approach with all other mailers for all European. If we want to implement such "try ISO-8859-1, if failed try the other" approach, then we should consider the other as 1. ISO-8859-1 then windows-1252, or 2. ISO-8859-1 then UTF-8, or 3. ISO-8859-1 then ISO-8859-15, or 4. ISO-8859-1 then ISO-8859-15, then windows-1252, or 5. ISO-8859-1 then ISO-8859-15, then UTF-8, or 6. ISO-8859-1 then windows-1252, then UTF-8 I personally think we should NOT send out windows-1252 so 1 , 4, and 6 are bad choice for me. I prefer we do 5 because it will promote ISO-8859-15 (ISO-8859-15 IS a published ISO standard. It is the other mailer's fault not to implement ISO-8859-15 if they want to support European users. ) and in the case that we hit those characters ISO-8859-15 does not encode, for example, those characters they remove from ISO-8859-1 , then we fallback to UTF-8. OK, here is the impact summery for this bug Impact Summery Impact Platform: ALL Impact language users: all users lived in EU (European languages- 192.3M (33.9% of total internet), and 31M (5.53% of total internet) UK users (they don't use Euro currency right now but they probably will send mail about it) ) plus people want to have business/personal communication about european fincial information. so there are 223.3M 39.43% internet users will have chance to hit this probably daily, and the rest will hit this problem every so often. Probability of hitting the problem: HIGH. For communication about financial information Severity if hit the problem in the worst case: Euro sign won't send as the SIGN itself in plain text mail or subject. Users will be warn by a conversion error dialog box. Way of recover after hit the problem: uesr can change their encoding and resend it again. the problem is user may not know what to change to. Risk of the fix: TBD Potential benefit of fix this problem: TBD
Keywords: nsbeta1nsbeta1+
Whiteboard: adt3 → [adt2]
> If we want to implement such "try ISO-8859-1, if failed try the other" > approach, then we should consider the other as > 1. ISO-8859-1 then windows-1252, or > 2. ISO-8859-1 then UTF-8, or > 3. ISO-8859-1 then ISO-8859-15, or > 4. ISO-8859-1 then ISO-8859-15, then windows-1252, or > 5. ISO-8859-1 then ISO-8859-15, then UTF-8, or > 6. ISO-8859-1 then windows-1252, then UTF-8 Although ISO-8859-15 is an ISO standard, I don't believe it is being widely accepted or used because of its incompatibility with ISO-8859-1. If there is a switch from ISO-8859-1, I think many people feel it should be to UTF-8. So, I would oppose any option (3, 4 and 5) that includes ISO-8859-15. I don't like supporting non-standard encodings, but the reality is that windows-1252 support is more widespread than UTF-8. (Being registered in IANA does not make it a standard encoding.) For the short-term, I agree that the current proposal to send as windows-1252 (option 1) is a better solution for most users. (For people that want to send UTF-8, they can do so manually, but it won't happen automatically.) In the future (hopefully soon), we could switch to UTF-8 (option 2). Option 6 (ISO-8859-1 then windows-1252, then UTF-8) is not needed for this bug about Euro currency symbol. But we could silently send as UTF-8 if the contents cannot be successfully converted to windows-1252. We could have a pref to do this automatically or pop-up the warning dialog as we do now. How does Outlook behave? But this should be a different bug...
I wish people read discussions that go on in the report before commiting their views. Both ISO-8859-15 and UTF-8 have more compatibility problems with major e-mail programs reported above in comment #35. The reality is that 8859-15 is not likely to be the standard mail encoding no matter what ISO says. By practice, it will be either UTF-8 or Win-1252 for msgs which include Euro character. And ISO-8859-1 or Windows-1252 when it does not include Euro and other Windows only characters. Whether or not 8859-1/windows-1252/utf-8 is a standard is not very helpful in dealing with real issues. By the way,Outlook Express does not face this problem primarily because its default European encoding is Windows-1252 and if users choose 8859-1, it will send out either UTF-8 or Windows-1252.
A question about the patch: What happens, if the user explicitly chose ISO-8859-1 (in the Composer, in contrast to the default), but uses an Euro char? I hope, it won't be sent in the fallback encoding.
Once the fallback is enabled, it's always applied for that charset (i.e. no way to send "EUR" when the composing mail's charset is ISO-8859-1).
Attachment #78801 - Attachment is obsolete: true
Frank, after you review, please ask Jean-Francois Ducarroz to review the changes for mail. Summary of the changes for mail code: * Added fallback charset arguments to some functions and if that is non null relabel charset to the fallback one. * In the conversion code, retry the conversion with pref specified charset in case the initial conversion not succeeded because of the character unmapped.
Comment on attachment 78964 [details] [diff] [review] Modified nsISaveAsCharset and related code after ftang's review. r=ftang for nsISaveAsCharset.idl about nsSaveAsCharset.h: the life time of the return value of const char * GetNextCharset(); will last till the destrution of the object or the next time Init got called. I think this is ok since this is a "procted function". about nsSaveAsCharset.cpp: nsSaveAsCharset::GetCharset(char * *aCharset) please read http://www.mozilla.org/scriptable/faq.html point 9. Should we use nsMemory::Clone instead ?
Attachment #78964 - Flags: needs-work+
Attachment #79738 - Attachment is obsolete: true
Comment on attachment 79742 [details] [diff] [review] Correcting a typo in the change for all.js. r=ftang
Attachment #79742 - Flags: review+
now we have patch,please assess the risk again >Risk of the fix: Medium Risk of the fix:
Risk of the fix: Medium - the change is for error handling and does not affect usual message send but the change to the intl component to is generic, so medium risk
Comment on attachment 79742 [details] [diff] [review] Correcting a typo in the change for all.js. r=ducarroz for the mailnews part.
Whiteboard: [adt2] → [adt2] need 'sr'
some nits: NULL should be nsnull nsCRT::strdup(charset) should be just strdup You might want to look at your uses of NS_ERROR_FAILURE - I've always found that to be a very uninformative error code and if you get one, you have to go looking though all the code that returns NS_ERROR_FAILURE. If there's a more accurate/informative error code, you should use that. The other thing to look at is all the uses of NS_ENSURE_SUCCESS(rv, rv) - are you sure in all those cases that you want to completely bail instead of continuing on? I can't answer either of those questions, so if you say there's no more meaningful error code and the ensure_success calls are all correct, then I'll take your word for it and just have two nits above. let me know, and I'll stamp an sr.
1) + if (attr_EntityBeforeCharsetConv == MASK_ENTITY(mAttribute)) { + if (NULL == mEntityConverter) return NS_ERROR_FAILURE; NULL, don't you want nsnull? or better yet: if (!mEntityConverter) return NS_ERROR_FAILURE; 2) + PRUnichar *entity = NULL; nsnull, right? + // do the entity conversion first + rv = mEntityConverter->ConvertToEntities(inString, mEntityVersion, &entity); + if(NS_SUCCEEDED(rv)) { + if (NULL == entity) return NS_ERROR_OUT_OF_MEMORY; Seems weird to me that ConvertToEntities can succeed, but not return an entity. Why is it written that way? 3) + while (*p1) { + for (; *p1 && (*p1 != ',') && (*p1 != ' '); p1++) ; + + charset.Assign(p2, p1 - p2); + mCharsetList.AppendCString(charset); + + for (; *p1 && ((*p1 == ',') || (*p1 == ' ')); p1++) ; + p2 = p1; + } see nsCSTringArray::ParseString() // Parses a given string using the delimiter passed in and appends item // parsed to the array. void nsCStringArray::ParseString(const char* string, const char* delimiter) 4) Index: mozilla/modules/libpref/src/init/all.js +pref("intl.fallbackCharsetList.ISO-8859-1", "windows-1252"); for give my ignorance, but does that do the right thing on non windows platforms? or is that a windows only font?
* NS_ENSURE_TRUE - those are used when the funcitions are called incorrectly, e.g. "need to call init first", "charset list is empty", I can add assertions too. * NS_ENSURE_SUCCESS - The functions are for charset conversion, so whatever the conversion failure need to bail out. Others like fail to get pref service also bails out. The one for nsITextTransform can probably be ignored, so I will change it. * nsCSTringArray::ParseString() - let me try that * +pref("intl.fallbackCharsetList.ISO-8859-1", "windows-1252"); this is a charset list, according to the investigation comment #35 and discussions, it is decided to use "windows-1252", it is supported in non Windows platform and if not it is most likely treated as "ISO-8859-1" (sub set of "windows-1252")
>Seems weird to me that ConvertToEntities can succeed, but not return an entity. >Why is it written that way? I found the check is not necessary, an error will be returned in case the output is empty. I wlll remove the check.
nsCRT::strdup does not wrap strdup but have its own implementation. http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsCRT.cpp#249 Is strdup available on Mac? It does not compile on my Mac build.
Attachment #79742 - Attachment is obsolete: true
Comment on attachment 80978 [details] [diff] [review] Includes super reviewers' suggestions sr=sspitzer
Attachment #80978 - Flags: superreview+
yes, sorry about the misinformation about strdup - I think we're supposed to use PL_strdup now. how about NS_ERROR_INVALID_ARG instead of NS_ERROR_FAILURE here? + if (!charsetList[0]) + return NS_ERROR_FAILURE; just a suggestion...the patch looks ok to me, I'll let Seth make sure his comments were addressed.
adjusting status whiteboard.
Whiteboard: [adt2] need 'sr' → [adt2]
Comment on attachment 80978 [details] [diff] [review] Includes super reviewers' suggestions R=ducarroz for the mailnews change
>how about NS_ERROR_INVALID_ARG instead of NS_ERROR_FAILURE here? yes, I will change it before check in
Comment on attachment 80978 [details] [diff] [review] Includes super reviewers' suggestions in nsSaveAsCharset::GetCharset, please add NS_ENSURE_TRUE(mCharsetListIndex >=0, NS_ERROR_FAILURE); after NS_ENSURE_ARG(aCharset); r=ftang
Attachment #80978 - Flags: review+
Attachment #80978 - Attachment is obsolete: true
Comment on attachment 81059 [details] [diff] [review] change to address comment #114 and #118 copy r/sr
Attachment #81059 - Flags: superreview+
Attachment #81059 - Flags: review+
checked in to the trunk Please test carefully, test following cases. test 4) is needed in order to verify this bug, other cases needed to check regressions 1) format compose as plain and send as plain compose as html and send as html compose as html and send as plain compose as html and send as both plain and html 2) charset send as ISO-8859-1 send as ISO-8859-15 send as ISO-2022-JP send as UTF-8 3) characters (both subject and body) send ASCII only send European characters (e.g. a-accute) send Japanese send Chinese send symbols, trademark, smartquotes 4) test for Euro put Euro in subject send as ISO-8859-1 -> check if header charset is windows-1252 put Euro in body send as plain text ISO-8859-1 -> check if body charset is windows-125 put Euro in subject send as ISO-2022-JP -> check if it is transliterated as "EUR" put Euro in body send as plain text ISO-2022-JP -> check if it is transliterated as "EUR" put Euro in subject and send as UTF-8 -> check if header charset is UTF-8 put Euro in subject and send as ISO-8859-15 -> check if header charset is ISO-8859-15 5) special cases put Japanese text in header send as ISO-8859-1 -> make sure you get the charset alert send Japanese text as plain ISO-8859-1 -> make sure you get the charset alert
The change is included in today's trunk. Marina, please verify so this can go in to the branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
tested all scenarios Naoki suggested: the behavior is correct, as expected, verified on the trunk. Will test on the branch when the build will come.
I think this is too late to take for nsbeta1, but we definitely should consider this for rtm. It is too risky for nsbeta1 now. still bring up to adt1.0.0 so adt will see it, suggeat adt1.0.0- it but take it for rtm.
Keywords: adt1.0.0
Whiteboard: [adt2] → [adt2][adt rtm]
> It is too risky for nsbeta1 now. Can you quantify the risks? I thought we hoped to get this into the trunk, so we can get more user feedback on this change in behavior?
Bob, this is already checked in to the trunk
Sorry, I meant to write BRANCH not TRUNK in my previous comment: > It is too risky for nsbeta1 now. Can you quantify the risks? I thought we hoped to get this into the BRANCH, so we can get more user feedback on this change in behavior? Then we can get user feedback, in case we find people complaining that we are sending email encoded as cp1252...
Risks: * The change added two things to nsISaveAsCharset. 1) Added a flag to indicate to fallback to other charsets in case of the conversion error. 2) Changed to pass a list of charsets (can be one charset in a list). Those changed the implementation which may affect non fallback cases (e.g. a simple conversion from Unicode to ISO-8859-1 without Euro) too. * The diff is relatively large (638 lines).
*** Bug 141419 has been marked as a duplicate of this bug. ***
good work on getting this one fixed, but we think it is too risky to take on the branch right now. adt1.0.0-/adt2RTM.
Keywords: adt1.0.0adt1.0.0-
Whiteboard: [adt2][adt rtm] → [adt2 rtm]
Blocks: 141008
marina@netscape.com: please mark this bug as verified if the trunk is verified without problem. We need that to ask adt to consider take it for rtm.
I am still seeing euro symbol getting replaced by EUR when it's part of the message subjet! But maybe we should open a new bug for that case!
#132 Special characters such as the Euro sign in a *subject* (or, in any of the header lines) is never a good idea because there's still a lot of mail servers having problems with them. So maybe we should leave it as it is.
mark it as VERIFIED based on the following: >------- Additional Comment #123 From marina@netscape.com 2002-04-26 15:58 ------- >tested all scenarios Naoki suggested: the behavior is correct, as >expected, >verified on the trunk. Will test on the branch when the build will come.
Status: RESOLVED → VERIFIED
Removing minus from adt1.0.0-, and renomianting for the 1.0 branch.
Blocks: 143047
Keywords: adt1.0.0-adt1.0.0, approval
Whiteboard: [adt2 rtm] → [adt2 rtm] [Needs a=]
adding adt1.0.0+. Please get drivers approval and then check into the 1.0 branch.
Keywords: adt1.0.0adt1.0.0+
changing to adt1.0.1+ for checkin to the 1.0 branch. Please get drivers approval before checking in.
Keywords: adt1.0.0+adt1.0.1+
Keywords: mozilla1.0.1
Comment on attachment 81059 [details] [diff] [review] change to address comment #114 and #118 please check into the 1.0.1 branch ASAP. once landed remove the mozilla1.0.1+ keyword and add the fixed1.0.1 keyword
Attachment #81059 - Flags: approval+
Target Milestone: mozilla1.0 → mozilla1.0.1
checked in to 1.0.1
Keywords: fixed1.0.1
Blocks: 146292
No longer blocks: 141008
Keywords: mozilla1.0.1+
I have read MOST of the comments but I still can't figure if this bug has to do with HTML euro symbol sent as '€' instead of '&eur;'. Using HTML shouldn't using NBRs be the best and more compatible way?
Lopa wrote: > Using HTML In the summary, you can read "(plain text)". This bug applies only, if the mail is being sent as plain text.
Lapo, did you put Euro in the subject too? In that case, the message is sent as windows-1252 because entities cannot be used in message headers.
marina: pls verify this as fixed, then replace "fixed1.0.1" with "verified1.0.1". thanks!
Whiteboard: [adt2 rtm] [Needs a=] → [adt2 rtm]
verified as fixed with the branch build
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: