Closed Bug 249530 Opened 21 years ago Closed 20 years ago

add an option 'send anyway' when there are characters not covered by the selected encoding

Categories

(MailNews Core :: Internationalization, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dsmutil, Assigned: jshin1987)

References

Details

(Keywords: intl)

Attachments

(1 file, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a1) Gecko/20040520 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a1) Gecko/20040520 I recently forwarded an e-mail and received the following in a dialog box: Confirm [X] The message you composed contains characters not found in the selected Character Encoding. While you can choose a different Character Encoding, it is usually safe to use Unicode for mail. To send or save it as Unicode (UTF-8), click OK. To return to the Composer window where you can choose a different Character Encoding, click Cancel. ' [ OK ] [ Cancel ] If the text has to explain what "OK" and "Cancel" do then the buttons probably aren't correct. It should probably be more along the lines of: Confirm [X] The message you composed contains characters not found in the selected Character Encoding. While you can choose a different Character Encoding, it is usually safe to use Unicode (UTF-8) for mail. You may also cancel and return to the Composer window to choose a different Character Encoding. What would you like to do? ' [ Send as Unicode ] [ Cancel ] Reproducible: Always Steps to Reproduce:
This has already been suggested in bug 194862 comment 18, but I don't think a bug was filed.
Instead of Cancel, maybe one of [Cancel Sending] or [Edit Again] or [Return to editing] would be more adequate ?
That's a good point. Definately more clear than just "Cancel," especially if the text changes ("You may also cancel and return to the Composer window...").
Taking. In bug 194862 comment #24, Birkir wrote he need a thirds option (send as is), but I think that's kinda dangerous. What do others think of a third option? His specific problem seems to be caused by the fact that 'his' (or Windows's) notion of Shift_JIS is different from Mozilla's notion of Shift_JIS. Probably, Windows' Shift_JIS includes some 'extension' (Windows31J ?). Birkir, can you upload a plain text file that you think is in Shift_JIS but Mozilla believes is not in Shift_JIS?
Assignee: smontagu → jshin
Keywords: intl
(In reply to comment #4) Actually I get the error just by inputting simple text in the body, or if my identity has some japanese characters then I don't need to input any text. (Just enter a TO address and click send, and the message pops up). You can probably reproduce the error by pasting in this text: "ビルキルはメール送りた い!". I also tried different character sets like ISO-2022 and Big5, which should also contain the characters in question although perhaps at different code points.
The text didn't come through, so I will try again: ビルキルはメール送りたい。 If this fails I will upload a text file.
I could't reproduce your problem. Mozilla didn't complain and all the characters came through intact. I tried both Shift_JIS and ISO-2022-JP. BTW, next time you want to add a comment with non-ASCII characters to bugzilla, please, set your character encoding to UTF-8 before posting.
Status: NEW → ASSIGNED
(In reply to comment #7) That's very peculiar. For the life of me I can not send anything with these characters. I can send out a mail in ISO-8859 containing non English characters. Perhaps it's a platform or locale issue. I will try this again on a non japanese version of windows and/or by changing the system codepage.
(In reply to comment #8) > For the life of me I can not send anything with these > characters. That's very odd. If it had occurred to others, it'd have been reported numerous times here and Mozilla-JP's bugzilla. > characters. Perhaps it's a platform or locale issue. I will try this Mozilla's character encoding converters are self-contained and are independent of platform and locale. Kat, is there any character in comment #6 that you think could be problematic depending on what 'definition' of Shift_JIS we use?
I made a patch using nsIPromptService->ConfirmEx. The 'prompt dialog' can have up to three buttons of which I'm using two buttons. The first button is 'Send in UTF-8'. I'm wondering what others would like see in the second button. Jean-Marc has a few candidates. His third option is clearest, but it's a bit too long for a button.
Severity: trivial → enhancement
OS: Windows 2000 → All
Hardware: PC → All
(In reply to comment #9) Ok, I did some fiddling and found the cause although I not sure of the actual culprit. It seems that some non japanese character (a control character or something that did not appear in the text box itself) crept into the name or company descriptions for the account identity. I cleared all the boxes and type the whole thing again and then I could send out messages normally. However, I think this illustrates the need for a third option. In certain cases users can be expected to have to send a mail using an encoding which conflict with their names or other identity information. In such cases it might be relatively harmless to send out the mail with some scrambled characters rather than requiring the user to hunt down the offending character and change the identity information (which might be quite difficult). For example, my organization name could be written in japanese with my name in normal latin characters. But if I want to send a mail using ISO-8859-1, I would have to temporarily remove the organization name in order for Thunderbird to allow me to send the mail in question using the character set of my choice. As I experienced, finding the offending character can be difficult, for example in a name written on a japanese keyboard a wide space (present in many asian languagues) might be present in the string without it being readily apparent to the eye. For comparison, Outlook Express offers the following options under the same circumstances: Send as Unicode. Send as is. Cancel. I would suggest this as a viable alternitive. Are there cases where attempting to encode characters not present in a specific set can cause unpredictable behaviour? If so how is this generally handled (in programs such as OE)? As it stands I think this is a case where the program might be behaving in a way frustrating to the user. As for the actual labels, I think Send as Unicode, or Send in UTF-8 or both good, with perhaps the later being more accurate while the former might be more user friendly. Cancel as cancel is good.
(In reply to comment #11) > For example, my organization name could be written in japanese with my name in > normal latin characters. But if I want to send a mail using ISO-8859-1, I would > have to temporarily remove the organization name in order for Thunderbird to > allow me to send the mail in question using the character set of my choice. Either you can send in UTF-8 or make two identities, one for sending mails in one of legacy Japanese encodings and the other for mails in one of legacy encodings for Western Europeans. Perhaps, a third with both Japanese and your name in Latin letters would be necessary for sending in UTF-8. In thunderbird, there's the UI frontend for multiple identities. Mozilla doesn't have a UI, yet although there's the backend for multiple IDs. > Are there cases where attempting to encode characters not present > in a specific set can cause unpredictable behaviour? Characters outside the repertoir are turned to question marks in plain text (in html, they're turned to NCRs so that there's no information loss). See bug 194862 comment #7 and bug 194862 comment #10. We didn't take into account mobile phone's lack of support for UTF-8, but I'm not sure whether that should be important enough to change our decision.
(In reply to comment #12) I don't want to debate this matter in any great length, so I will offer you my perspective for now. You can then make up your mind. > Either you can send in UTF-8 or make two identities, one for sending mails in > one of legacy Japanese encodings and the other for mails in one of legacy > encodings for Western Europeans. I see this is a possible solution for a way to circumvent the problem. But it is not very intuitive and it's more of maintenance than I would care to undertake. For example, if my identity information contains some japansese characters but I am sending a message in ISO-8859-1, because I know that is what the receiver is expecting (or supports) then I don't really mind if the japanese characters get scrambled (translated or replaced by question marks) because I don't expect the receiver to be able to read them to begin with. I will admit that this is probably not a common problem. This stems from the fact that few problems rise for mono- or bilingual users. (One of the reason non unicode codepages are so firmly entrenched in their respective communities, they work for the majority of users). But in my case, when on a regular basis,I use languages that span more than two 'entrenched' codepages an increadible amount of problems creep up. I can hardly begin to descripe it. One example: using windows with a japanese codepage, causes most filenames with icelandic characters not to open in their assigned programs. Of course I am being unreasonable in wanting windows to work the way I want it to work instead of doing the sensible thing and using perhaps only english characters in filenames, but thats that. This is one such problem. When I send a mail, I know pretty well (usually better then the program I am using) what codepage works best for the mail, depending on where I send it. I am firm supporter of Unicode and advocate it's use as much as possible, but it just isn't there yet. There is of course one other way. Am I correct in assuming that the conversion form the internal format (unicode) to the sending code page isn't done until the mail is sent or saved? Perhaps when changing the encoding the characters should be changed as well. That is if the message contained characters outside of the codepage then they would change into questionmarks or whatever, before the users eyes. This would be the ideal solution. Then the user would see exactly what he was about to send. (Many times I have accidentally sent message in a japanese codepage believing I was writing in iso-8859, only to find many of the characters translated or replaced). But the other method is easier to implement I think, and probably good for most purposes. > We didn't take into account mobile phone's lack of support for UTF-8, but I'm > not sure whether that should be important enough to change our decision. To be fair, this isn't really the issue, since I can after all set the codepage manually. It's what happens after that, when the message contains characters that aren't in the codepage that I selected, how do I know which characters are incorrect and how important it is for me to know. Usually I don't care as long as I can send the message. If I could find out, that would be pretty cool too. But it is important to realize that utf-8 isn't so widely supported. Like noted in bug 194862, most mail services (yahoo, hotmail etc) don't use utf-8 to present their non english user interfaces. In Japan, 40+ million mobile users are sending mail in phones that don't support UTF-8 (outside the ascii range anyway, and these phones will probably not really support it in the next few years, because of the size of the font set). But in the end I believe a third option is valid because I belive users should be allowed to do what they want when they know what they are doing. People that don't know would be inclined to cancel or try unicode anyway.
(In reply to comment #13) > I don't want to debate this matter in any great length, You did, however :-) > so I will offer you my > perspective for now. You can then make up your mind. I know you have a strong opinion, but I don't. I need to see what others on Cc think before making a decision. > > Either you can send in UTF-8 or make two identities, one for sending mails in > > one of legacy Japanese encodings and the other for mails in one of legacy > > encodings for Western Europeans. > > I see this is a possible solution for a way to circumvent the problem. But it > is not very intuitive and it's more of maintenance than I would care to undertake. Well, I have multiple ids and use different ones depending on whom I write to. It's not that much hassle. > using windows with a japanese codepage, causes most filenames with icelandic > characters not to open in their assigned programs. Of course I am being > unreasonable in wanting windows to work the way I want it to work instead of > doing the sensible thing and using perhaps only english characters in filenames, You're not unreasonable at all :-) See bug 239279 and bugs it blocks.
I am opposed to "Send as is" because fundamentally, it makes no sense: there is no way to figure out just which bytes need to be sent for the characters that don't fit in the user-specified encoding. Japanese characters have different byte codes depending on whether they are assumed as Shift_JIS, ISO-2022-JP or Unicode. If you don't specify the encoding, but just pick one of those encodings at random, the recipient is left with the task of trying to figure out which encoding was used. (I'm not sure if you can even select Shift_JIS as an outgoing charset -- it's not available in the list in my installation.) Regarding the issue raised at bug 194862, about forwarding mail in alternate character sets: this mail needs to be forwarded as an attachment, not inline. There is no other reasonable solution. Forwarding INLINE causes the text to be read in its specified character set, converted to Unicode internally, and displayed -- "as is" *means* Unicode once the message has been loaded. Regarding Birkir Barkarson's problem with what I'm guessing is his part-Japanese sig: > For example, my organization name could be written in japanese with my name > in normal latin characters. But if I want to send a mail using ISO-8859-1, > I would have to temporarily remove the organization name in order for > Thunderbird to allow me to send the mail in question using the character set > of my choice. First of all: why do you "want to send a mail using ISO-8859-1"? What is gained by using 8859-1 instead of UTF-8 or ISO-2022-JP? Do you really need 8859-1 capability -- that is, ISO Western characters in the range 128-255 -- or would plain old 7bit ASCII suffice? The characters in the 7bit ASCII range are supported by *all* other encodings. Second of all: it is impossible to send Japanese characters in an ISO-8859-1 message. PERIOD. So you either remove the Japanese characters, or you send in a different character set. There is no third way. Possible, alternate solutions: - use a vCard instead of, or in addition to, a sig. In my quick testing, it seems the vCard is always sent as UTF-8, but it's like an attachment: it doesn't affect the encoding of the main message. It's possible that any mail client that can't support UTF-8 also can't support vcards, so the recipient wouldn't even be exposed to the misdisplayed text. - utilize the fallback preference. Set your default outgoing encoding to 8859-1, then set this preference: intl.fallbackCharsetList.ISO-8859-1 to ISO-2022-JP Then just go ahead and compose the mail. If the message text is entirely encodeable as 8859-1, that's how it will go out; if there are Japanese characters in the message, the message will go out as ISO-2022-JP. If there is something else -- say, a Cyrillic text -- then you'll get the prompt again, and have to decide on sending UTF-8 or changing the encoding by hand to something that handles the message. However, if your sig continues to contain Japanese characters, all the mail will go out as ISO-2022-JP. - send HTML mail, with a sig that include an image with the Japanese company name. You can then even include the company logo -- spiffy! > Perhaps when changing the encoding the characters should be changed as well. > That is if the message contained characters outside of the codepage then > they would change into questionmarks or whatever, before the users eyes. Rather than changing them into questionmarks -- which I think is unlikely to happen, and unnecessary -- how about highlighting them in color, or with a squiggly line, or whatever? This would be an interesting enhancement, and make it easier to delete any problematic characters, but its lack does not justify "Send As Is."
(In reply to comment #15) > I am opposed to "Send as is" because fundamentally, it makes no sense: there is > no way to figure out just which bytes need to be sent for the characters that > don't fit in the user-specified encoding. Mike, you seem to have forgotten that Mozilla keeps an outgoing message in Unicode until the last moment so that your point above isn't relevant. > by using 8859-1 instead of UTF-8 or ISO-2022-JP? Do you really need 8859-1 > capability -- that is, ISO Western characters in the range 128-255 -- or would ISO-8859-1 doesn't have any graphic characters in [0x80 - 0x9f] (Windows-1252 does, though) Anyway, he already explained why he needs that. Note that he's not an American but he's from Iceland. > Second of all: it is impossible to send Japanese characters in an ISO-8859-1 > message. PERIOD. So you either remove the Japanese characters, or you send in > a different character set. There is no third way. He's not so ignorant as you might think. Rather, he's rather knowledgable in character encoding issues. He just wants to have a leverage of being able to go ahead __fully knowing__ that he'll lose some characters. So do two Japanese users who added comments to bug 194862. > I'm not sure if you can even select Shift_JIS as an > outgoing charset -- it's not available in the list in my installation. You can add it if you want, for which there's 'Customize' menu in the mail compose window (View | Character Encoding) > something else -- say, a Cyrillic text -- then you'll get the prompt again I'm afraid you picked a bad example. CJK character encodings can cover a significant portion of Cyrillic and Greek letters. Tamil or Devanagari would have worked :-) re comment > Perhaps when changing the encoding the characters should be changed as well. > That is if the message contained characters outside of the codepage then > they would change into questionmarks or whatever, before the users eyes. This would be nice, but it's not so trivial to implement this at the moment.
(In reply to comment #16) > (In reply to comment #15) > > I am opposed to "Send as is" because fundamentally, it makes no sense: > > there is no way to figure out just which bytes need to be sent for the > > characters that don't fit in the user-specified encoding. > > Mike, you seem to have forgotten that Mozilla keeps an outgoing message in > Unicode until the last moment so that your point above isn't relevant. I had remembered it, and I don't see why that's in conflict with what you quoted. The users who want "as is" want the characters sent in the *original* encoding; that information is no longer available. My point still holds: if you're sending a message in ISO-8859-1, *which* byte codes do you use for the characters you're sending? Or do they want each character that doesn't fit to be sent as a question mark, or simply stripped out? > > Second of all: it is impossible to send Japanese characters in an ISO-8859-1 > > message. PERIOD. So you either remove the Japanese characters, or you send > > in a different character set. There is no third way. > > He's not so ignorant as you might think. Rather, he's rather knowledgable in > character encoding issues. He just wants to have a leverage of being able to > go ahead __fully knowing__ that he'll lose some characters. So do two Japanese > users who added comments to bug 194862. I'm not claiming ignorance on his part. But I don't see how that desire can be made workable. If it's OK for the recipient to view trash, in the form of lost or mistranslated or wrong-charset, then why not send as UTF-8? Well, I guess the reason in this case is that useful 128-255 chars will be encoded as two bytes. I think the suggested workarounds -- which are already supported -- are less intrusive than adding yet another decision point in the UI (not to mention the code) which can cause Mozilla to either deliberately send malformed mail or to mistranslate/strip characters, both of which behaviors will just lead to more bug reports. Also, implementing the request in bug 73567 would provide a way to pick a sig that doesn't include the Japanese text, altho I think the multiple-identity feature is almost as simple to use and more functional. btw: Mozilla *does* have a limited frontend for multiple identities; however, I don't think you can make the preferred charset part of the identity.
(In reply to comment #17) > (In reply to comment #16) > > Or do they want each character that doesn't fit to be sent as a question mark, > or simply stripped out? Question marks are what he and others want and what Mozilla sent out in the past if it's not clear to you. What else (other than transliteration) do you think can be done? > The users who want "as is" want the characters sent in the *original* > encoding; You misunderstood what they want and what Mozilla did when 'send as is' was available in the past. 'send as is' does NOT mean sending 'byte stream' as is. There's no such a thing because once in the composition window, everything (including a message you're quoting) is in **Unicode**. > I'm not claiming ignorance on his part. But I don't see how that desire can be > made workable. If it's OK for the recipient to view trash, in the form of lost > or mistranslated or wrong-charset, then why not send as UTF-8? They already gave sufficient reasons for UTF-8 not being a feasible option to them (please read their comments). They also made it clear that losing some characters is not of their concern. In short, they know they'd lose some characters irreversibly, but they still want to have that option. That is, they're saying they know what to do with their emails than Mozilla does and they want to get back that control. > Well, I guess > the reason in this case is that useful 128-255 chars will be encoded as two > bytes. I don't see what you meant by the above. > btw: Mozilla *does* have a limited frontend for multiple identities; however, I > don't think you can make the preferred charset part of the identity. There's no need. Just make multiple IDs, remember which one is for which encoding and pick one suitable for the character encoding you're sending your email in.
*** Bug 261471 has been marked as a duplicate of this bug. ***
I strongly support "Send As Is" alternative. My reasonning from bug 261471 :

===
I'm trying to send a message using KOI8-R charset. When I press send button
MailNews reports:

``The message you composed contains characters not found in the selected
Character Encoding. While you can choose a different Character Encoding, it is
usually safe to use Unicode for mail. To send or save it as Unicode (UTF-8),
click OK. To return to the Composer window where you can choose a different
Character Encoding, click Cancel.''

I do NOT want to use UTF-8, neither to change the Character Encoding. I want to
send the message AS IS in the selected Character Encoding (i.e., KOI8-R), and I
do not care if some hypothetical characters will be lost. Visually my message
looks fine and do not see any special characters that cannot be encoded in KOI8-R. 
If I knew which character are ``not found in the selected Character Encoding'',
I would simply remove them. Unfortunately I do not see any of them.

*** Bug 268549 has been marked as a duplicate of this bug. ***
This issue isn't going a way and quite a number of people seem to desire a 'send as is' option. Is it not time to reconsider this?
One of draft patches for bug 194862 has that option. I'll try to 'resurrect' it to work.
Summary: Character Encoding warning uses standard OK/Cancel dialog box → Character Encoding warning should be more user-friendly (than Ok/Cancel) and offer 'Send As Is'
Product: MailNews → Core
Summary: Character Encoding warning should be more user-friendly (than Ok/Cancel) and offer 'Send As Is' → add an option 'send anyway' when there are characters not covered by the selected encoding
Attached patch patch (obsolete) — Splinter Review
This should work, but for a mysterious reason, I get the old dialog with Ok and Cancel. The dialogText was changed, but I don't get three buttons, 'Send in UTF-8', 'Cancel' and 'Send anyway'. I did even 'distclean' and rebuild. I also unzipped 'messenger.jar' file to make sure that it has the modified 'Mess...Command.js'. David, Neil and Scott, any idea what's going on?
Neil and David, can you take a look at the patch? As I wrote in the previous comment, myteriously I get the old two button dialog even with 'confirmEx' with three choices. Thanks.
Comment on attachment 172850 [details] [diff] [review] patch Sorry I forgot that there are two separate checks for the message body and the message header. The former is handled in cpp while the latter is handled in js.
Attachment #172850 - Attachment is obsolete: true
This one actually works. I had to add an attribute to nsIMsgCompField to avoid prompting users about the character encoding and the coverage twice if both the header (checked in JS) and the body(checked in cpp) of a message have unrepresentable characters in the selected encoding. Once 'send anyway' is selected for the message header, it's not asked any more. If 'send in UTF-8' is selected, tehre's no need to check (as it works currently). I don't like the variable name 'needToCheckCharset', but 'needToAskAboutCharsetConversion' seems too long. Any suggestion would be welcome.
Maybe it would be easier if we could make both checks at the same time?
I'm not sure it'll be easier that way. If I do, the check in JS part should be moved to C++ part (not the other way around). There are two places in C++ files where it's checked, but the only one of them (nsMsgSend.cpp) seems to be used. The other one (nsMsgCompose.cpp) may as well be used for cases I missed. Anyway, I'll see what I can do.
Comment on attachment 172925 [details] [diff] [review] update (with patch for the backend) Neil, can you review the patch as it is? I considered the possibility of moving the check in JS code to C++ file(s), but it's not that easy. Morever, I'm afraid it'd increase the code size (10 ~ 20 lines in JS code are translated into perhaps over 50 lines of C++ code). BTW, I found that the check in nsMsgCompose.cpp is used when the default composition format for a server is plain text.
Attachment #172925 - Flags: superreview?(bienvenu)
Attachment #172925 - Flags: review?(neil.parkwaycc.co.uk)
Comment on attachment 172925 [details] [diff] [review] update (with patch for the backend) > { > var fallbackCharset = new Object; > if (gPromptService && > !gMsgCompose.checkCharsetConversion(getCurrentIdentity(), fallbackCharset)) > { > var dlgTitle = sComposeMsgsBundle.getString("initErrorDlogTitle"); > var dlgText = sComposeMsgsBundle.getString("12553"); // NS_ERROR_MSG_MULTILINGUAL_SEND >- if (!gPromptService.confirm(window, dlgTitle, dlgText)) >- return; >- fallbackCharset.value = "UTF-8"; >+ var result3 = gPromptService.confirmEx(window, dlgTitle, dlgText, >+ (gPromptService.BUTTON_TITLE_IS_STRING * gPromptService.BUTTON_POS_0) + >+ (gPromptService.BUTTON_TITLE_IS_STRING * gPromptService.BUTTON_POS_1) + >+ (gPromptService.BUTTON_TITLE_CANCEL * gPromptService.BUTTON_POS_2), >+ sComposeMsgsBundle.getString('sendInUTF8'), >+ sComposeMsgsBundle.getString('sendAnyway'), >+ null, null, {value:0}); >+ switch(result3) >+ { >+ case 0: >+ fallbackCharset.value = "UTF-8"; >+ break; >+ case 1: // send anyway >+ msgCompFields.needToCheckCharset = false; >+ break; >+ case 2: // cancel >+ return; >+ } > } > if (fallbackCharset && > fallbackCharset.value && fallbackCharset.value != "") > gMsgCompose.SetDocumentCharset(fallbackCharset.value); > } I still think that since you've already implemented the prompt in C++ you could almost do all the work in checkCharsetConversion, rather than reimplementing the prompt here.
Attachment #172925 - Flags: review?(neil.parkwaycc.co.uk) → review+
(In reply to comment #31) Thanks for r. > I still think that since you've already implemented the prompt in C++ you could > almost do all the work in checkCharsetConversion, rather than reimplementing > the prompt here. Oh, I didn't realize that you had meant that. What you suggested indeed can simplify things in one place, but it turned out that its ripple effect elsewhere is rather large (I have to change OE and Eudora import code as well). So, I'd rather go with the current patch. David, can you sr?
Comment on attachment 172925 [details] [diff] [review] update (with patch for the backend) can you move the nsresult decl down where you use it - I don't see it being used anywhere above... + rv = dialog-> + ConfirmEx(title, msg, can you remove the printf here? + mCompFields->GetNeedToCheckCharset(&needToCheckCharset); + printf("need to check charset=%d\n", needToCheckCharset); + if (needToCheckCharset) {
Attachment #172925 - Flags: superreview?(bienvenu) → superreview+
jshin, minor comment about the user text, I would lose this phrase: "by returning to the composer window" if you leave it in, it should at least say "by returning to the mail composition window" instead of composer
checked in with 'by returning to ...' removed. also took care of David's comments thanks all for review
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
*** Bug 294624 has been marked as a duplicate of this bug. ***
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: