Last Comment Bug 227268 - Subject line in File | Send Link... mail uses strange characters instead of non-ASCII characters (probably UTF-8)
: Subject line in File | Send Link... mail uses strange characters instead of n...
Status: RESOLVED FIXED
: intl, relnote
Product: Core Graveyard
Classification: Graveyard
Component: File Handling (show other bugs)
: Trunk
: x86 Windows 2000
: -- normal with 2 votes (vote)
: mozilla13
Assigned To: Simon Montagu :smontagu
:
Mentors:
http://www.pcworld.dk/default.asp?Mod...
: 314477 (view as bug list)
Depends on: 411511
Blocks: 518164
  Show dependency treegraph
 
Reported: 2003-12-02 04:06 PST by Jesper Hertel
Modified: 2016-06-22 12:16 PDT (History)
15 users (show)
mozilla: blocking1.4.2-
mozilla: blocking1.7a-
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Possible patch (1.70 KB, patch)
2009-12-22 07:48 PST, Simon Montagu :smontagu
benjamin: review+
Details | Diff | Review
More stable patch (1.74 KB, patch)
2010-01-27 07:04 PST, Simon Montagu :smontagu
benjamin: review+
cbiesinger: superreview+
Details | Diff | Review

Description Jesper Hertel 2003-12-02 04:06:55 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007

When I use File | Send Link... on a page which title contains non-ASCII
characters, e.g. the Danish characters æøå, the subject line in the resulting
e-mail in Outlook 2000 contains two strange characters on each place where one
of the Danish characters should have been.

When I use File | Send Link... on the page
http://www.pcworld.dk/default.asp?Mode=2&ArticleID=4762 , this mailto URL is fired: 
mailto:?body=http%3A//www.pcworld.dk/default.asp%3FMode%3D2%26ArticleID%3D4762&subject=PC%20World%20-%20Digital%20video%20p%C3%A5%2065%20gram

(I collected this mailto by modifying the registry key
reg:\HKEY_CLASSES_ROOT\mailto\shell\open\command\(Default) to point to a small
Python script that collected the arguments sent.)

The title of the page is "PC World - Digital video på 65 gram", but the subject
in the resulting e-mail in Outlook 2000 is "PC World - Digital video på 65
gram", which the given mailto URL also reflects.

I have found out that in UTF-8, the two characters "Ã¥" is exactly the character
"å".

Maybe Mozilla should be converting to code page 1252 in the Windows case before
constructing the mailto url?

Reproducible: Always

Steps to Reproduce:
1. Go to the given URL http://www.pcworld.dk/default.asp?Mode=2&ArticleID=4762 .
2. Choose File | Send Link...
3. Look at the subject line in the resulting mail.

Actual Results:  
The subject is "PC World - Digital video på 65 gram".

Expected Results:  
The subject should have been "PC World - Digital video på 65 gram".

I use Windows 2000 SP4, Mozilla 1.5, Outlook 2000 SP-3.
Comment 1 Jesper Hertel 2003-12-02 04:17:02 PST
I must mention that I have patched my Mozilla 1.5 with (exactly) the patch
mentioned in Bug 217328 comment 14
(http://bugzilla.mozilla.org/show_bug.cgi?id=217328#c14). But this was a
mailto:body= issue and does not affect this problem. The current subject problem
has also been there all the time, including before I made the patch.

Otherwise I have changed nothing in my installation.
Comment 2 Hermann Schwab 2003-12-02 10:27:06 PST
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.6b) Gecko/20031129

I´m not using Outlook, but Mozilla MailNews as default mail,
sending to an account I can access via Webmail only I got in the header:
Subject: =?iso-8859-1?Q?Digital_video_p=E5_65_gram?=
and this was displayed like seen on the website.

Can you retest with Mozilla 1.6b, when it comes?
Comment 3 Hermann Schwab 2003-12-02 12:18:59 PST
invalid comment #2, I didn´t test what the reporter was claiming.

Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.6b) Gecko/20031202

I used File -> Send Link... to send the page using MozillaMail to my account,
Mozilla Mail opened, and was showing Subject and body like below,
same, as I received after sending.

So this is working internally, but I can´t test if it is also working, if using
an external mailclient, like Outlook.
This should be tested by someone using Outlook, or another mailclient.

Sent/received:

Subject: PC World - Digital video på 65 gram

<http://www.pcworld.dk/default.asp?Mode=2&ArticleID=4762>


Component: XPApps, as in Bug 217328 ?
Comment 4 Neil Parks 2004-01-22 11:54:40 PST
Using either

Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7a) Gecko/20040121
Firebird/0.8.0+ (scragz)

or Mozilla 1.6 release, I get the following in QM when I try to "send link"
(Moz) or "send page" (FB):

http%3A%2F%2Fbugzilla.mozilla.org%2Fshow_bug.cgi%3Fid%3D227268&subject=Bug%20227268%20-%20Subject%20line%20in%20File%20%7C%20Send%20Link...%20mail%20uses%20strange%20characters%20instead%20of%20non-ASCII%20characters%20(probably%20UTF-8)

With older versions of Firebird the page title was also appended (after the word
"subject"), but at least the URL part didn't convert the colon and slashes to
the hex equivalents which make the link completely useless.  This is a major
regression.

Comment 5 Jesper Hertel 2004-01-23 01:38:00 PST
I just installed Mozilla 1.6 ("Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
rv:1.6) Gecko/20040113 MultiZilla/1.6.0.0c"), and the problem is still the same.
Comment 6 Mike Kaply [:mkaply] (Out June 27-July 5) 2004-02-12 13:29:54 PST
Not a blocker.

Copying someone who might have an idea.
Comment 7 Mike Kaply [:mkaply] (Out June 27-July 5) 2004-02-12 13:46:12 PST
This is actually much easier to see with Firefox since the mail doesn't get in
the way.

I set up my mailto for Lotus Notes and sure enough my subject is wrong.
Comment 8 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-02-12 13:52:04 PST
It's a URI.  It contains non-ascii chars.  There is no source document
information.  So it gets encoded as UTF-8 (which is the standard for non-ASCII
chars in URIs in any case); that's what GetAsciiSpec does on nsIURI objects.
Comment 9 Christian :Biesinger (don't email me, ping me on IRC) 2004-02-12 14:19:56 PST
<mkaply> biesi: The native charset would be the right call
<biesi> mkaply: yeah, it means unescaping the spec, converting to native
charset, escaping again (probably), and calling shellexecute on that, I suppose
Comment 10 Christian :Biesinger (don't email me, ping me on IRC) 2004-02-12 14:21:11 PST
...in
http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/win/nsOSHelperAppService.cpp#258
(win/nsOSHelperAppService::LoadUrl)
Comment 11 Boris Zbarsky [:bz] (Out June 25-July 6) 2004-02-12 14:31:51 PST
Or you could GetSpec instead of GetAsciiSpec...
Comment 12 Christian :Biesinger (don't email me, ping me on IRC) 2004-02-12 15:09:07 PST
104      * Some characters may be escaped.
105      */
106     attribute AUTF8String spec;

doesn't gain you anything.
Comment 13 Jungshik Shin 2004-02-12 17:33:46 PST
What mozilla currently does is 'correct'. External mail programs should be
'fixed' to understand IRI (Internationalized Resource Identifier : URIs in
UTF-8) [1].  That is, it's not Mozilla but MS OE, Lotus Notes, etc that should
be fixed. Of course, there's little, if any, we can do about them. How does
Thunderbird work when used as an external mail program for firefox? At least, we
have to get TB to do the 'right thing'. 

We might convert to the most widely used _legacy_ character encoding for a given
locale, but that doesn't seem to be the 'right' thing. What should we do when
there are characters not representable in the encoding? Using IRIs for those
cases _only_ seems to be worse than what we do now. What if the default
character encoding for outgoing emails in MS OE is different from the legacy
encoding of the default system locale? 

A work-around for MS OE users is to change the default encoding (for outgoing
emails) to UTF-8. Nowadays, except for __stupid__ web mail services (such as
hotmail and yahoo mail),  mail clients can handle messages in UTF-8
'transparently'. There should be a similar option in Lotus Notes. We can add
this to the release notes. 

[1] http://www.w3.org/International/O-URL-and-ident.html
Comment 14 Katsuhiko Momoi 2004-02-13 02:56:09 PST
We should probably pay attention to RFC 2368 (mailto URL scheme):

http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2368.txt

Let me quote from this RFC a bit:

"8-bit characters in mailto URLs are forbidden. MIME encoded 
words (as defined in [RFC2047]) are permitted in header values, 
but not for any part of a "body" hname."

If the data provided by the original bug reporter is correct,
Mozilla does not use MIME in the mailto "subject" URL scheme that
is sent to the mail program when this "File > Send Link"
menu is used. If we are using mailto protocol to populate
the subject header with the title of the document, then 
we can at least indicate what the charset of the MIME encoded
string is. 

Assuming that this is the right way to go, then we have one of
two choices:

1. Use the encoding of the document in which the title resides
and use that as the MIME charset. In this Danish example, that
would be ISO-8859-1. As long the charset is indicated in the MIME'd
URL, any mailer capable of interpreting MIME header should be
ale to handle it.

2. We can uniformly change all non-ASCII titles into UTF-8 MIME
encoded mailto url. 


We earlier had this same discussion in:

http://bugzilla.mozilla.org/show_bug.cgi?id=12851

where I did not get my opinion to prevail. MIME-decode was checked
in but not MIME-encode. You might want to review the rationale
discussed there and see if that is still valid. 
Comment 15 Jungshik Shin 2004-02-13 05:16:00 PST
Thanks for the note on RFC 2368.
Naoki had the following rationale for not supporting it.

> In fact, I am not sure MIME encode in mailto URL is practically useful.
> It is not supported by IE and 4.x. Using UTF-8 for URL is simpler than including
> MIME encoded words inside URL.
 To support RFC 2368, we need to move some  mailnews code out of mailnews into
necko (netwerk/mime : see bug 162765 for a similar change). It's doable, but
before actually doing it,  we have to make sure that what you wrote below is
true. Being able to decode RFC 2047-encoded words is one thing and being able to
decode RFC-2047 encoded words in URLs before putting them in 'Subject' header is
another.

> As long the charset is indicated in the MIME'd
> URL, any mailer capable of interpreting MIME header should be
> ale to handle it.

 
Comment 16 Mike Kaply [:mkaply] (Out June 27-July 5) 2004-02-13 08:02:57 PST
Protocol handlers can actually be used to invoke any native applications.

These native applications all expect native character sets.

I think it is a bit naive to believe that we should change every mail client
versus fixing our application.

However, I will point out that this doesn't work on IE either :)
Comment 17 Jesper Hertel 2004-02-13 09:07:36 PST
> However, I will point out that this doesn't work on IE either

It works on my IE 6.0.2800.1106 with Outlook 2000 SP3 (9.0.0.6627) on Windows
2000 5.00.2195: The subject of the mail is right when I use File | Send | Link
(or whatever the English translation is - my IE speaks Danish). 

But -- it doesn't use the mailto: protocol. It doesn't matter what I set the
registry key HKEY_CLASSES_ROOT\mailto\shell\open\command\(default) to. IE must
speak directly with Outlook in some way. 
Comment 18 Jungshik Shin 2004-02-13 09:27:32 PST
The concept of 'native' charset is not so clear any more on modern Unicode-based
OS' like Windows 2k/XP, Mac OS X, BeOS.  Either we have to leave this alone or
have to make Mozilla compliant to RFC 2368. 
Comment 19 Mike Kaply [:mkaply] (Out June 27-July 5) 2004-02-13 09:50:11 PST
I was testing IE to Notes and it fails there using mailto.
Comment 20 Jungshik Shin 2004-02-13 09:51:49 PST
If we do what's suggested in comment #9, characters that have never been a part
of any legacy code page can't be used at all. With what we have now, UTF-8-aware
mail clients can be configured to work for any characters. Obviously, we cannot
fix others' bug (we can report it though), but we shouldn't _break_ (it's not a
fix) ours to work around others' bug. Besides, as time goes on, IRI will be
supported by more and more programs. 
Comment 21 Jungshik Shin 2004-02-13 09:56:57 PST
I also have to point out that any characters outside the repertoire of the
current default system locale can't be used, either if we use so-called native
charset.  For instance, Chinese can't be used on Windows with the default locale
set to French. 

Comment 22 Christian :Biesinger (don't email me, ping me on IRC) 2004-02-13 10:12:55 PST
beos native charset is utf8, for windows we can use utf16 apis. don'T know about
macos.
Comment 23 Jungshik Shin 2004-02-13 16:52:28 PST
Well, I'm aware that BeOS uses UTF-8 througout its APIs and file system. Mac OS
X uses UTF-8 (in NFD) on its file system and in some of its APIs (especially,
POSIX-related ones). However, most of Cocoa APIs are based on UTF-16 (afaik).
So, depending on how you look at this, the native charset on Mac OS X can be
either UTf-8 or UTF-16. I'd say it's UTF-8 (assuming what file system uses is
what's closest to the 'native charset'). However, that doesn't help if mail
clients on Mac OS X don't understand IRI. For them, the 'native' charset could
be the most widely used legacy character encoding for a given locale.

More or less the same is true of Windows (2k/XP) except that NTFS uses UTF-16 so
that our 'operational definition' (for the sake of discussion here [1]) on Win
2k/XP has to use either UTF-8 or what we get from GetACP() (or equivalent),
which is   I think what mkaply meant by 'native' charset (if I'm not mistaken).
 GetACP() returns  cp1252, cp949, cp932, cp1251, etc. That doesn't work for
cases I mentioned earlier. 



[1] because using UTF-16 for emails is out of question.   
Comment 24 Christian :Biesinger (don't email me, ping me on IRC) 2004-02-13 16:57:30 PST
what prevents us from passing an utf16 url to ShellExecuteW?
Comment 25 Sergey Svishchev 2005-11-22 02:55:39 PST
Still happens in Firefox 1.0.7 (Win32) & Outlook 2003.
Comment 26 Andrew Schultz 2005-11-22 22:52:46 PST
*** Bug 314477 has been marked as a duplicate of this bug. ***
Comment 27 Sergey Svishchev 2006-01-19 23:56:16 PST
Still happens in Firefox 1.5 & Outlook 2003.
Comment 28 Jungshik Shin 2006-01-20 00:37:21 PST
(In reply to comment #24)
> what prevents us from passing an utf16 url to ShellExecuteW?

That's probably what we have to do arguing that RFC 2368 is not so relevant for 'inter-process' communication. One 'minor' problem is that it's not available on Win 9x/ME, but we can work around it. 

Comment 29 Sergey Svishchev 2008-02-05 23:33:11 PST
Was there any progress on this bug?
Comment 30 polwnos 2009-03-30 11:08:12 PDT
Same as bug 412076. Bug fixed in Firefox 3.0.8. The site works fine on Vista.
Comment 31 Sergey Svishchev 2009-04-21 23:48:18 PDT
Still happens in Firefox 3.0.8, Outlook 2003 SP3 and Windows 2000.
Comment 32 Ken Lyon 2009-06-19 16:32:21 PDT
Still happens in Firefox 3.0.11, Outlook 2003 SP3 and Windows XP Professional SP3.
Comment 33 Manuel FLURY 2009-12-09 05:26:56 PST
Not only one URL but every URL with non ASCII 7 chars generate bad Subject line in the email.

In french all éèçàîù (.../...) generate messages with characters like that :

Subject line : éèàç - Recherche Google
From page : http://www.google.com/search?q=%C3%A9%C3%A8%C3%A0%C3%A7&btnG=Rechercher&meta=&aq=f&oq=

Still present in Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Comment 34 Ken Lyon 2009-12-09 08:03:42 PST
I just tried this latest example in Firefox 3.5.5 and it appears to be working now. Has this been fixed or is my mail client compensating in some way? I have outlook 2007.
Comment 35 Manuel FLURY 2009-12-09 09:13:39 PST
Tried using FF 3.5.5 under Linux

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.5) Gecko/20091107 Firefox/3.5.5 (Swiftfox) (.NET CLR 3.5.30729)

and Thunderbird version 2.0.0.23 (20090817)

and yes it seems to work.

That would mean the bug stands in the email client I guess.
Comment 36 Simon Montagu :smontagu 2009-12-09 09:45:14 PST
This might have been fixed, but it might also be a Windows-specific bug; in fact that is quite likely if the problem is in converting UTF-8 as if it was the native character set. I'll check next week when I have access to a Windows system.
Comment 37 Simon Montagu :smontagu 2009-12-16 04:43:26 PST
The bug still occurs on Windows on trunk with Outlook Express set as default email client, but not with Thunderbird. The current patch for bug 411511 doesn't fix it, but I may be able to tweak it so that it does (I'm still not sure of the exact code path in question -- the reference in comment 10 is out-of-date).
Comment 38 Simon Montagu :smontagu 2009-12-22 07:48:32 PST
Created attachment 418854 [details] [diff] [review]
Possible patch

This is only a partial solution, at least on my system (Windows XP Professional with Outlook 2003 (11.83183.8221) SP3 and Outlook Express 6.00.2900.5512).

With both Outlook and Outlook Express it only works for Subjects that are expressible in the default Windows non-unicode character set, even though we call ShellExecuteW with UTF-16 arguments. 

The last 3 hunks of the patch are not strictly relevant, but fix a problem that I found in the WINCE code path while experimenting: 
 { sinfo.lpFile =  NS_ConvertUTF8toUTF16(urlSpec).get(); }
doesn't work because the converted string goes out of scope before it's used. This may be the cause of bug 518164, but I have no way to verify that.

Thoughts? Comments?
Comment 39 epkrrgo 2010-01-15 08:41:13 PST
http://www.idg.se/2.1085/1.285190/ny-databas-raddningen-for-webben

Same result with Outlook 2003 and it works with IE 7 so it really looks like
this is the Firefox browser somehow.
Comment 40 Christian :Biesinger (don't email me, ping me on IRC) 2010-01-21 15:06:59 PST
Comment on attachment 418854 [details] [diff] [review]
Possible patch

I don't think this is the right fix. There's no guarantee that after unescaping you'll get UTF-8...
Comment 41 Simon Montagu :smontagu 2010-01-24 05:03:31 PST
(In reply to comment #40)
> (From update of attachment 418854 [details] [diff] [review])
> I don't think this is the right fix. There's no guarantee that after unescaping
> you'll get UTF-8...

No? Did I misunderstand comment 8?
Comment 42 Christian :Biesinger (don't email me, ping me on IRC) 2010-01-26 08:59:07 PST
That's true for the specific case of Send Link, but LoadURI has other callers. For example, if a web page has a mailto: URI, in Firefox that will go through this function. And in those case we do have the web page's charset information, and of course it's also possible that the web page already has specified escaped characters.

Maybe using GetSpec instead of GetAsciiSpec (and not unescaping stuff in addition to that) would actually be good enough, contrary to comment 12. It would probably fix this bug at least, though there's still cases it would get wrong.
Comment 43 Simon Montagu :smontagu 2010-01-27 07:01:51 PST
(In reply to comment #42)
> Maybe using GetSpec instead of GetAsciiSpec (and not unescaping stuff in
> addition to that) would actually be good enough, contrary to comment 12. It
> would probably fix this bug at least, though there's still cases it would get
> wrong.

No, it turns out that it doesn't fix this bug.
Comment 44 Simon Montagu :smontagu 2010-01-27 07:04:57 PST
Created attachment 423795 [details] [diff] [review]
More stable patch

Like the previous patch, this works well with Thunderbird, but only works with Outlook and Outlook Express if the page title is expressible in the native character set.
Comment 45 Benjamin Smedberg [:bsmedberg] 2010-01-28 07:53:27 PST
Comment on attachment 423795 [details] [diff] [review]
More stable patch

Is there a way to write an automated test for this (without registering a system MIME handler, which doesn't sound wise)? If not, litmus?
Comment 47 Phil Ringnalda (:philor) 2012-02-26 16:08:18 PST
https://hg.mozilla.org/mozilla-central/rev/68a94128a3b1

Note You need to log in before you can comment on or make changes to this bug.