Closed Bug 227268 Opened 21 years ago Closed 13 years ago

Subject line in File | Send Link... mail uses strange characters instead of non-ASCII characters (probably UTF-8)

Categories

(Core Graveyard :: File Handling, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla13

People

(Reporter: jesper.hertel.arbejde, Assigned: smontagu)

References

()

Details

(Keywords: intl, relnote)

Attachments

(1 file, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 When I use File | Send Link... on a page which title contains non-ASCII characters, e.g. the Danish characters æøå, the subject line in the resulting e-mail in Outlook 2000 contains two strange characters on each place where one of the Danish characters should have been. When I use File | Send Link... on the page http://www.pcworld.dk/default.asp?Mode=2&ArticleID=4762 , this mailto URL is fired: mailto:?body=http%3A//www.pcworld.dk/default.asp%3FMode%3D2%26ArticleID%3D4762&subject=PC%20World%20-%20Digital%20video%20p%C3%A5%2065%20gram (I collected this mailto by modifying the registry key reg:\HKEY_CLASSES_ROOT\mailto\shell\open\command\(Default) to point to a small Python script that collected the arguments sent.) The title of the page is "PC World - Digital video på 65 gram", but the subject in the resulting e-mail in Outlook 2000 is "PC World - Digital video pÃ¥ 65 gram", which the given mailto URL also reflects. I have found out that in UTF-8, the two characters "Ã¥" is exactly the character "å". Maybe Mozilla should be converting to code page 1252 in the Windows case before constructing the mailto url? Reproducible: Always Steps to Reproduce: 1. Go to the given URL http://www.pcworld.dk/default.asp?Mode=2&ArticleID=4762 . 2. Choose File | Send Link... 3. Look at the subject line in the resulting mail. Actual Results: The subject is "PC World - Digital video pÃ¥ 65 gram". Expected Results: The subject should have been "PC World - Digital video på 65 gram". I use Windows 2000 SP4, Mozilla 1.5, Outlook 2000 SP-3.
I must mention that I have patched my Mozilla 1.5 with (exactly) the patch mentioned in Bug 217328 comment 14 (http://bugzilla.mozilla.org/show_bug.cgi?id=217328#c14). But this was a mailto:body= issue and does not affect this problem. The current subject problem has also been there all the time, including before I made the patch. Otherwise I have changed nothing in my installation.
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.6b) Gecko/20031129 I´m not using Outlook, but Mozilla MailNews as default mail, sending to an account I can access via Webmail only I got in the header: Subject: =?iso-8859-1?Q?Digital_video_p=E5_65_gram?= and this was displayed like seen on the website. Can you retest with Mozilla 1.6b, when it comes?
invalid comment #2, I didn´t test what the reporter was claiming. Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.6b) Gecko/20031202 I used File -> Send Link... to send the page using MozillaMail to my account, Mozilla Mail opened, and was showing Subject and body like below, same, as I received after sending. So this is working internally, but I can´t test if it is also working, if using an external mailclient, like Outlook. This should be tested by someone using Outlook, or another mailclient. Sent/received: Subject: PC World - Digital video på 65 gram <http://www.pcworld.dk/default.asp?Mode=2&ArticleID=4762> Component: XPApps, as in Bug 217328 ?
Component: Browser-General → XP Apps
Using either Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7a) Gecko/20040121 Firebird/0.8.0+ (scragz) or Mozilla 1.6 release, I get the following in QM when I try to "send link" (Moz) or "send page" (FB): http%3A%2F%2Fbugzilla.mozilla.org%2Fshow_bug.cgi%3Fid%3D227268&subject=Bug%20227268%20-%20Subject%20line%20in%20File%20%7C%20Send%20Link...%20mail%20uses%20strange%20characters%20instead%20of%20non-ASCII%20characters%20(probably%20UTF-8) With older versions of Firebird the page title was also appended (after the word "subject"), but at least the URL part didn't convert the colon and slashes to the hex equivalents which make the link completely useless. This is a major regression.
Flags: blocking1.7a?
Flags: blocking1.4.2?
I just installed Mozilla 1.6 ("Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 MultiZilla/1.6.0.0c"), and the problem is still the same.
Not a blocker. Copying someone who might have an idea.
Flags: blocking1.7a?
Flags: blocking1.7a-
Flags: blocking1.4.2?
Flags: blocking1.4.2-
This is actually much easier to see with Firefox since the mail doesn't get in the way. I set up my mailto for Lotus Notes and sure enough my subject is wrong.
Assignee: general → file-handling
Status: UNCONFIRMED → NEW
Component: XP Apps → File Handling
Ever confirmed: true
QA Contact: general → ian
It's a URI. It contains non-ascii chars. There is no source document information. So it gets encoded as UTF-8 (which is the standard for non-ASCII chars in URIs in any case); that's what GetAsciiSpec does on nsIURI objects.
<mkaply> biesi: The native charset would be the right call <biesi> mkaply: yeah, it means unescaping the spec, converting to native charset, escaping again (probably), and calling shellexecute on that, I suppose
Or you could GetSpec instead of GetAsciiSpec...
104 * Some characters may be escaped. 105 */ 106 attribute AUTF8String spec; doesn't gain you anything.
What mozilla currently does is 'correct'. External mail programs should be 'fixed' to understand IRI (Internationalized Resource Identifier : URIs in UTF-8) [1]. That is, it's not Mozilla but MS OE, Lotus Notes, etc that should be fixed. Of course, there's little, if any, we can do about them. How does Thunderbird work when used as an external mail program for firefox? At least, we have to get TB to do the 'right thing'. We might convert to the most widely used _legacy_ character encoding for a given locale, but that doesn't seem to be the 'right' thing. What should we do when there are characters not representable in the encoding? Using IRIs for those cases _only_ seems to be worse than what we do now. What if the default character encoding for outgoing emails in MS OE is different from the legacy encoding of the default system locale? A work-around for MS OE users is to change the default encoding (for outgoing emails) to UTF-8. Nowadays, except for __stupid__ web mail services (such as hotmail and yahoo mail), mail clients can handle messages in UTF-8 'transparently'. There should be a similar option in Lotus Notes. We can add this to the release notes. [1] http://www.w3.org/International/O-URL-and-ident.html
Keywords: intl, relnote
We should probably pay attention to RFC 2368 (mailto URL scheme): http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2368.txt Let me quote from this RFC a bit: "8-bit characters in mailto URLs are forbidden. MIME encoded words (as defined in [RFC2047]) are permitted in header values, but not for any part of a "body" hname." If the data provided by the original bug reporter is correct, Mozilla does not use MIME in the mailto "subject" URL scheme that is sent to the mail program when this "File > Send Link" menu is used. If we are using mailto protocol to populate the subject header with the title of the document, then we can at least indicate what the charset of the MIME encoded string is. Assuming that this is the right way to go, then we have one of two choices: 1. Use the encoding of the document in which the title resides and use that as the MIME charset. In this Danish example, that would be ISO-8859-1. As long the charset is indicated in the MIME'd URL, any mailer capable of interpreting MIME header should be ale to handle it. 2. We can uniformly change all non-ASCII titles into UTF-8 MIME encoded mailto url. We earlier had this same discussion in: http://bugzilla.mozilla.org/show_bug.cgi?id=12851 where I did not get my opinion to prevail. MIME-decode was checked in but not MIME-encode. You might want to review the rationale discussed there and see if that is still valid.
Thanks for the note on RFC 2368. Naoki had the following rationale for not supporting it. > In fact, I am not sure MIME encode in mailto URL is practically useful. > It is not supported by IE and 4.x. Using UTF-8 for URL is simpler than including > MIME encoded words inside URL. To support RFC 2368, we need to move some mailnews code out of mailnews into necko (netwerk/mime : see bug 162765 for a similar change). It's doable, but before actually doing it, we have to make sure that what you wrote below is true. Being able to decode RFC 2047-encoded words is one thing and being able to decode RFC-2047 encoded words in URLs before putting them in 'Subject' header is another. > As long the charset is indicated in the MIME'd > URL, any mailer capable of interpreting MIME header should be > ale to handle it.
Protocol handlers can actually be used to invoke any native applications. These native applications all expect native character sets. I think it is a bit naive to believe that we should change every mail client versus fixing our application. However, I will point out that this doesn't work on IE either :)
> However, I will point out that this doesn't work on IE either It works on my IE 6.0.2800.1106 with Outlook 2000 SP3 (9.0.0.6627) on Windows 2000 5.00.2195: The subject of the mail is right when I use File | Send | Link (or whatever the English translation is - my IE speaks Danish). But -- it doesn't use the mailto: protocol. It doesn't matter what I set the registry key HKEY_CLASSES_ROOT\mailto\shell\open\command\(default) to. IE must speak directly with Outlook in some way.
The concept of 'native' charset is not so clear any more on modern Unicode-based OS' like Windows 2k/XP, Mac OS X, BeOS. Either we have to leave this alone or have to make Mozilla compliant to RFC 2368.
I was testing IE to Notes and it fails there using mailto.
If we do what's suggested in comment #9, characters that have never been a part of any legacy code page can't be used at all. With what we have now, UTF-8-aware mail clients can be configured to work for any characters. Obviously, we cannot fix others' bug (we can report it though), but we shouldn't _break_ (it's not a fix) ours to work around others' bug. Besides, as time goes on, IRI will be supported by more and more programs.
I also have to point out that any characters outside the repertoire of the current default system locale can't be used, either if we use so-called native charset. For instance, Chinese can't be used on Windows with the default locale set to French.
beos native charset is utf8, for windows we can use utf16 apis. don'T know about macos.
Well, I'm aware that BeOS uses UTF-8 througout its APIs and file system. Mac OS X uses UTF-8 (in NFD) on its file system and in some of its APIs (especially, POSIX-related ones). However, most of Cocoa APIs are based on UTF-16 (afaik). So, depending on how you look at this, the native charset on Mac OS X can be either UTf-8 or UTF-16. I'd say it's UTF-8 (assuming what file system uses is what's closest to the 'native charset'). However, that doesn't help if mail clients on Mac OS X don't understand IRI. For them, the 'native' charset could be the most widely used legacy character encoding for a given locale. More or less the same is true of Windows (2k/XP) except that NTFS uses UTF-16 so that our 'operational definition' (for the sake of discussion here [1]) on Win 2k/XP has to use either UTF-8 or what we get from GetACP() (or equivalent), which is I think what mkaply meant by 'native' charset (if I'm not mistaken). GetACP() returns cp1252, cp949, cp932, cp1251, etc. That doesn't work for cases I mentioned earlier. [1] because using UTF-16 for emails is out of question.
what prevents us from passing an utf16 url to ShellExecuteW?
Still happens in Firefox 1.0.7 (Win32) & Outlook 2003.
*** Bug 314477 has been marked as a duplicate of this bug. ***
Still happens in Firefox 1.5 & Outlook 2003.
(In reply to comment #24) > what prevents us from passing an utf16 url to ShellExecuteW? That's probably what we have to do arguing that RFC 2368 is not so relevant for 'inter-process' communication. One 'minor' problem is that it's not available on Win 9x/ME, but we can work around it.
Was there any progress on this bug?
Same as bug 412076. Bug fixed in Firefox 3.0.8. The site works fine on Vista.
Still happens in Firefox 3.0.8, Outlook 2003 SP3 and Windows 2000.
Still happens in Firefox 3.0.11, Outlook 2003 SP3 and Windows XP Professional SP3.
Assignee: file-handling → nobody
QA Contact: ian → file-handling
Not only one URL but every URL with non ASCII 7 chars generate bad Subject line in the email. In french all éèçàîù (.../...) generate messages with characters like that : Subject line : éèàç - Recherche Google From page : http://www.google.com/search?q=%C3%A9%C3%A8%C3%A0%C3%A7&btnG=Rechercher&meta=&aq=f&oq= Still present in Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
I just tried this latest example in Firefox 3.5.5 and it appears to be working now. Has this been fixed or is my mail client compensating in some way? I have outlook 2007.
Tried using FF 3.5.5 under Linux Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.5) Gecko/20091107 Firefox/3.5.5 (Swiftfox) (.NET CLR 3.5.30729) and Thunderbird version 2.0.0.23 (20090817) and yes it seems to work. That would mean the bug stands in the email client I guess.
This might have been fixed, but it might also be a Windows-specific bug; in fact that is quite likely if the problem is in converting UTF-8 as if it was the native character set. I'll check next week when I have access to a Windows system.
Assignee: nobody → smontagu
The bug still occurs on Windows on trunk with Outlook Express set as default email client, but not with Thunderbird. The current patch for bug 411511 doesn't fix it, but I may be able to tweak it so that it does (I'm still not sure of the exact code path in question -- the reference in comment 10 is out-of-date).
Depends on: 411511
Attached patch Possible patch (obsolete) — Splinter Review
This is only a partial solution, at least on my system (Windows XP Professional with Outlook 2003 (11.83183.8221) SP3 and Outlook Express 6.00.2900.5512). With both Outlook and Outlook Express it only works for Subjects that are expressible in the default Windows non-unicode character set, even though we call ShellExecuteW with UTF-16 arguments. The last 3 hunks of the patch are not strictly relevant, but fix a problem that I found in the WINCE code path while experimenting: { sinfo.lpFile = NS_ConvertUTF8toUTF16(urlSpec).get(); } doesn't work because the converted string goes out of scope before it's used. This may be the cause of bug 518164, but I have no way to verify that. Thoughts? Comments?
Attachment #418854 - Flags: superreview?(cbiesinger)
Attachment #418854 - Flags: review?(benjamin)
Blocks: 518164
Attachment #418854 - Flags: review?(benjamin) → review+
http://www.idg.se/2.1085/1.285190/ny-databas-raddningen-for-webben Same result with Outlook 2003 and it works with IE 7 so it really looks like this is the Firefox browser somehow.
Comment on attachment 418854 [details] [diff] [review] Possible patch I don't think this is the right fix. There's no guarantee that after unescaping you'll get UTF-8...
(In reply to comment #40) > (From update of attachment 418854 [details] [diff] [review]) > I don't think this is the right fix. There's no guarantee that after unescaping > you'll get UTF-8... No? Did I misunderstand comment 8?
That's true for the specific case of Send Link, but LoadURI has other callers. For example, if a web page has a mailto: URI, in Firefox that will go through this function. And in those case we do have the web page's charset information, and of course it's also possible that the web page already has specified escaped characters. Maybe using GetSpec instead of GetAsciiSpec (and not unescaping stuff in addition to that) would actually be good enough, contrary to comment 12. It would probably fix this bug at least, though there's still cases it would get wrong.
(In reply to comment #42) > Maybe using GetSpec instead of GetAsciiSpec (and not unescaping stuff in > addition to that) would actually be good enough, contrary to comment 12. It > would probably fix this bug at least, though there's still cases it would get > wrong. No, it turns out that it doesn't fix this bug.
Like the previous patch, this works well with Thunderbird, but only works with Outlook and Outlook Express if the page title is expressible in the native character set.
Attachment #418854 - Attachment is obsolete: true
Attachment #423795 - Flags: superreview?(cbiesinger)
Attachment #423795 - Flags: review?(benjamin)
Attachment #418854 - Flags: superreview?(cbiesinger)
Comment on attachment 423795 [details] [diff] [review] More stable patch Is there a way to write an automated test for this (without registering a system MIME handler, which doesn't sound wise)? If not, litmus?
Attachment #423795 - Flags: review?(benjamin) → review+
Attachment #423795 - Flags: superreview?(cbiesinger) → superreview+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: