<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Comment 9

•

9 years ago

Hmm, console shows: NS_ERROR_ILLEGAL_INPUT: Component returned failure code: 0x8050000e (NS_ERROR_ILLEGAL_INPUT) [nsIExternalProtocolService.loadUrl] contentAreaClick.js:187 And that code is: function openLinkExternally(url) { let uri = url; if (!(uri instanceof Components.interfaces.nsIURI)) uri = Services.io.newURI(url); PlacesUtils.asyncHistory.updatePlaces({ uri: uri, visits: [{ visitDate: Date.now() * 1000, transitionType: Components.interfaces.nsINavHistoryService.TRANSITION_LINK }] }); Components.classes["@mozilla.org/uriloader/external-protocol-service;1"] .getService(Components.interfaces.nsIExternalProtocolService) .loadUrl(uri); } where line 187 is the last line with the loadUrl(uri). So it's a M-C service that failing there, but I didn't check which input it gets. Maybe Alta88 has got some spare cycles to investigate this further.

Flags: needinfo?(alta88)

alta88

Comment 10

•

9 years ago

it's almost certainly due to what was done in some places in feeds when opening a web page link - the url (string) must be converted to utf8. there may be a more 'modern' way to do this rather than nsIScriptableUnicodeConverter, which may even be on a hit list.. let converter = Cc["@mozilla.org/intl/scriptableunicodeconverter"] .createInstance(Ci.nsIScriptableUnicodeConverter); converter.charset = "UTF-8"; url = converter.ConvertToUnicode(url);

Flags: needinfo?(alta88)

Assignee

Comment 11

•

9 years ago

Care to help us out with a patch? That would be much appreciated. AFAIK, conversion to UTF-8 can be done like this: https://dxr.mozilla.org/comm-central/search?q=TextEncoder(%22UTF-8%22)&redirect=false

Flags: needinfo?(alta88)

WADA:World Anti-bad-Duping Agency

Assignee

Comment 12

•

9 years ago

OK, comment #0 says to use http://www.nix.xx/de/shop/kunde.php?name=M%FCller which I believe is invalid. Since this is an ASCII string, converting this to UTF-8 does nothing. I even tried. http://stackoverflow.com/questions/29988255/browser-support-for-utf8-encoded-characters-in-urls suggests to try: http://example.com/lörickè or http://example.com/l%c3%b6rick%c3%a8/ I made myself a message with both and both links work just fine in TB 54. Note that here the encoding is UTF-8, so ö is %c3%b6. All the examples from this bug and bug 1100017 http://www.nix.xx/de/shop/kunde.php?name=M%FCller http://www.cx5-forum.de/f14/zubeh%F6rsammlung-f%FCr-cx-5-a-1245-new-post.html http://www.jyllands-posten.dk/protected/premium/indland/ECE7200862/Kommunerne-om-b%F8rnemillioner-R%E6kker-kun-til-015-p%E6dagog/ are not UTF-8 encoded, but windows-1252/ANSI/ISO-8859-1 encoded. But hey, Wada already said that in comment #1 ! Wada also said in comment #3, that if you want Müller, you need to use http://www.googl.com?name=M%C3%BCller Just hover you mouse over the links and you can see that FF can handle the UTF-8 ones, but not the ANSI ones. I don't see a bug here I can fix, I'd mark this invalid. Maybe it worked differently in FF 12.

alta88

Comment 13

•

9 years ago

in Bug 1018589, the issue was that the header (and entire message) was stored as a utf8 string and had to be converted to unicode for the domain (because IDN is now on) part, in order to construct an accepted uri for loadUrl(). if the url isn't encoded correctly per the message charset, i'm not sure what you can do to 'craft' it.

Flags: needinfo?(alta88)

Comment 14

•

9 years ago

https://en.wikipedia.org/wiki/Uniform_Resource_Identifier https://en.wikipedia.org/wiki/Internationalized_domain_name IDN is applicable to domain name which corresponds to "host" part in URI. URI = scheme:[//[user:password@]host[:port]][/]path[?query][#fragment] Why IDN can be applied to path part, query part, fragment part of next URI in HTML mail? http://www.nix.xx/de/shop/kunde.php?name=M%FCller Following is apparently wrong action. 1. Unescape %FC of Link in HTML mail and convert it to binary of 0xFC. 2. Try to escape the binary of 0xFC as UTF-8. HTTP server can freely interpret and use %FC part in URI. Who is reponsible to correctly escape a non-ascii character in URI according to server's request == Who created URI string of link in HTML "path part, query part, fragment part of next URI in HTML mail" should be passed as-is to browser, after appropriate IDN processing on www.nix.xx part if required. <a href="http://www.nix.xx/de/shop/kunde.php?name=M%FCller">Link Text</a> Please note that "character in URI string of link in HTML" won't be(shouldn't be) affected by character set conversion because "URI of link in HTML" is 7bit-ascii string except IDN part. Following are different issues. (a) Linkify of URI like string in text/plain mail by Thunderbird, with IDN support. Because many HTTP servers already interpret escaped non-ascii as UTF-8, and many browsers already assumes that servers interpret escaped non-ascii as UTF-8, non-ascii character in URI like string is better escaped after UTF-8 conversion. (b) Generate IDN compatible URI for RSS. (c) HTML link creation support in HTML mode composer of Thunderbird with IDN support.

Valentin Gosu [:valentin] {{ IETF until 25 July - PTO until 3 August }}

Assignee

Comment 15

•

9 years ago

I'd really like some clear answers here, so let's ask someone who knows. Valentin, are these URLs valid or not: http://www.nix.xx/de/shop/kunde.php?name=M%FCller http://www.cx5-forum.de/f14/zubeh%F6rsammlung-f%FCr-cx-5-a-1245-new-post.html http://www.jyllands-posten.dk/protected/premium/indland/ECE7200862/Kommunerne-om-b%F8rnemillioner-R%E6kker-kun-til-015-p%E6dagog/ Note that the special characters are encoded with one byte, like %FC, which is some windows-1252 encoding, and not UTF-8, which would be %C3%BC. I think they are all invalid, at least FF doesn't recognise then when it does recognise UTF-8 encoding like: http://example.com/l%c3%b6rick%c3%a8/ or http://www.example.com?name=M%C3%BCller And a second question: Is it possible that FF 12 recognised them and FF from version 13 doesn't? I'd really like to close this bug, one way or another.

Flags: needinfo?(valentin.gosu)

msd+bugzilla

Comment 16

•

9 years ago

Hi, 1. My point of view ------------------- In my opinion, those URL are valid. It is the responsibility of the application to know the encoding of the parameters. The application can expect UTF-8 *or* ISO-8859-1 *or* another encoding. The browser just has to call the URL. 2. Firefox behavior ------------------- Firefox loads the URL, but because UTF-8 is the better and recommended encoding today, it *visually* turns the URL http://example.com/?p=%C3%A9 into http://example.com/?p=é. And, if you write the URL http://example.com/?p=é in the URL bar, Firefox *assumes* that it's encoded in UTF-8 and sends an HTTP request with 'C3%A9'. And Firefox doesn't try to translate this URL (because it's obviously not UTF-8 encoded), but it loads it : http://example.com/?p=%E9. 3. RFC 3986 : Uniform Resource Identifier (URI): Generic Syntax --------------------------------------------------------------- 2. Characters The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. ***When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.*** 1.2.1. Transcription In local or regional contexts and with improving technology, users might benefit from being able to use a wider range of characters; such use is not defined by this specification. Percent-encoded octets (Section 2.1) may be used within a URI to represent characters outside the range of the US-ASCII coded character set if this representation is allowed by the scheme or by the protocol element in which the URI is referenced. Such a definition should specify the character encoding used to map those characters to octets prior to being percent-encoded for the URI. 2.1. Percent-Encoding A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoded octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing that octet's numeric value. For example, "%20" is the percent-encoding for the binary octet "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space character (SP). Section 2.4 describes when percent-encoding and decoding is applied. https://tools.ietf.org/html/rfc3986

alta88

Comment 17

•

9 years ago

(In reply to msd+bugzilla from comment #16) > > In my opinion, those URL are valid. > There's no possible way to make that determination, given that no one has attached an actual email containing such a link. So the email's charset and the html document's charset (if it's even html) are unknown. Further, the failure is in loadUrl(), which was passed an nsIURI with a value it rejects, with a pretty clear NS_ERROR_ILLEGAL_INPUT. So commentary regarding anything that Fx may do in the url bar or anything with what servers accept is irrelevant.

Comment 18

•

9 years ago

Indeed, the URLs, even though they are valid by themselves, there is no way for the browser to know that %FC is actually ü, without the charset being specified in some way. Converting the URL to a UTF8 encoding _may_ be possible, but I'm not sure there is a one-size-fits-all solution here.

Flags: needinfo?(valentin.gosu)

msd+bugzilla

Comment 19

•

9 years ago

But why, in Thunderbird, do you need to know the charset of the URL's parameters ? Except for a better *user* display, I don't understand why. If you want to display to the user an 'é' intead of 'C3%A9', then you should do it only when you can presume it's UTF-8. If you can't because it is '%E9', then just display '%E9'. No ? But when the user clicks on the URL, please, throw the URL to the browser with percent-encoding values ! Currently Thunderbird does nothing !

Assignee

Comment 20

•

9 years ago

(In reply to msd+bugzilla from comment #19) > But when the user clicks on the URL, please, throw the URL to the browser > with percent-encoding values ! Currently Thunderbird does nothing ! The URL is passed to the Mozilla Core function loadUrl() and that function returns an error, see comment #9. Most likely it expects UTF-8 encoded URLs, as was previously stated. In a windows1252-encoded webpage, http://example.com/?p=%E9 does in fact translate to http://example.com/?p=é, however, when clicking the link, http://example.com/?p=%E9 appears in the URL bar. Try: <html><head><meta http-equiv="content-type" content="text/html; charset=windows-1252"></head> <body><a href="http://example.com/?p=%E9">Test</a></body></html> So we could drill open the box and transcode http://example.com/?p=%E9 to http://example.com/?p=%C3%A9 before passing it to loadUrl() if that link was found in a windows1252-encoded e-mail body. Alta88 already suggested that in comment #10. My apologies for dragging out the discussion.

msd+bugzilla

Comment 21

•

9 years ago

1. URL are in US-ASCII ---------------------- > The URL is passed to the Mozilla Core function loadUrl() and that function returns an error, see comment #9. Most likely it expects UTF-8 encoded URLs, as was previously stated. The URL http://example.com/?p=%E9 and http://example.com/?p=%C3%A9 are both encoded in _US-ASCII_. So I don't understand why the Mozilla Core function loadUrl() tries to percent-decode the URL. *For me, the problem is may be here.* 2. The sample webpage --------------------- > In a windows1252-encoded webpage, http://example.com/?p=%E9 does in fact translate to http://example.com/?p=é, however, when clicking the link, http://example.com/?p=%E9 appears in the URL bar. It's interesting to see that Firefox behavior differs from Chrome and Internet Explorer behavior with your sample. Firefox shows 'http://example.com/?p=é'. Chrome and IE show 'http://example.com/?p=%E9' With this page (be careful to encode it in Windows-1252), it's even better to understand that be displaying behavior differs : <html><head><meta http-equiv="content-type" content="text/html; charset=windows-1252"></head><body><a href="http://example.com/?p=%E9">link1 - %E9</a> - <a href="http://example.com/?p=%C3%A9">link2 - %CE%A9</a> - <a href="http://example.com/?p=é">link3 - é</a></body></html> - Firefox : - link1 : show 'é' - link2 : show 'é' - link3 : show 'é' - Chrome : - link1 : show '%E9' - link2 : show 'é' (because it *presume* it's UTF-8) - link3 : show '%E9' (because of the _encoding of the page_) - Internet Explorer 11 : - link1 : show '%E9' - link2 : show '%CE%A9' - link3 : show 'é' But it's just information for the users. When you click on the link1, on each browser, you reach the good webpage : http://example.com/?p=%E9 When you click on the link2, on each browser, you reach the good webpage : http://example.com/?p=%C3%A9 When you click on the link3 : - Firefox : http://example.com/?p=%E9 - Chrome : http://example.com/?p=%E9 - IE11 : http://example.com/?p=é (and I don't know what is technically sent, surely '%E9') 3. Not a good solution to use the encoding of the email body ------------------------------------------------------------ For me, the encoding of the e-mail body is not the good encoding to decode the percent-encoded URL. I explain why : I want to send you a link to a blog http://blog.example.com/?p=%E9. This blog expect the param p to be encoded in ISO-8859-1. So I write you an e-mail, and I share you this link : http://blog.example.com/?p=%E9. The body of my email can be encoded in UTF-8, in ISO-8859-1 or anything else, but it's not the good encoding to match the encoding of %E9. In fact, in Thunderbird, _you can't know the expected encoding_ : only the programmer of blog.example.com knows. And it's not a problem because Thunderbird doesn't have to percent-decode this URL it is not its responsibility. The URL http://blog.example.com/?p=%E9 is in _US-ASCII_. The param 'p' of the link http://example.com/?p=%41A is 'AA' with : - UTF-8 - windows-1252 - iso-8859-1 - iso-8859-15 The param 'p' of the link http://example.com/?p=%BE is : - '¾' with iso-8859-1 - 'Ÿ' with iso-8859-15 The param 'p' of the link http://example.com/?p=%A4 is : - '¤' with iso-8859-1 - '€' with iso-8859-15 And Thunderbird can not know what is the expected encoding. Only example.com knows. So it's impossible to convert http://example.com/?p=%BE in UTF-8 because Thunderbird don't know what is the good encoding. 4. To resume my though ---------------------- - Thunderbird doesn't have to percent-decode the URL (except may be for a display comfort for the user like Firefox and Chrome do) - Firefox doesn't have to percent-decode the URL (except may be for a display comfort) - The final website have to percent-decode the URL, and, perfect, this website knows what is the good encoding. So, if I well understand, the problem is that the Mozilla Core function loadUrl() tries to percent-decode the URL, and should not have to do this. But why ?

Assignee

Comment 22

•

9 years ago

Thanks for you comment #21. Interesting reading. And thanks for doing a browser comparison. You are quite right, all the mentioned URLs are US-ASCII and it's surprising that loadUrl() rejects those where the percent encoding doesn't represent UTF-8. A URL consists of various parts: https://dxr.mozilla.org/mozilla-central/rev/af8a2573d0f1e9cc6f2ba0ab67d7a702a197f177/netwerk/base/nsIURI.idl#16 and in the examples in comment #15, only pathname and query contain percent encodings, not the hostname. So your suggestion is to accept anything in the pathname and query and just open a that URL and let the server sort out which page needs to be served. I looked up loadUrl() and it's deprecated: https://dxr.mozilla.org/mozilla-central/rev/af8a2573d0f1e9cc6f2ba0ab67d7a702a197f177/uriloader/exthandler/nsIExternalProtocolService.idl#96 Another function loadURI() should be used instead. Let me try that and report back.

Assignee

Comment 23

•

9 years ago

I changed the code to use loadURI() and the result is just the same: NS_ERROR_ILLEGAL_INPUT Mozilla Core code is long and twisted, but I found the function here: http://searchfox.org/mozilla-central/source/uriloader/exthandler/nsExternalHelperAppService.cpp#1019 I tried a search for NS_ERROR_ILLEGAL_INPUT: http://searchfox.org/mozilla-central/search?q=NS_ERROR_ILLEGAL_INPUT&case=false&regexp=false&path= It seems to be mostly used in the "intl" module. In the end I traced through it and I got NS_ERROR_UDEC_ILLEGALINPUT returned from rv = unicodeDecoder->Convert(aURI.get(), &srcLen, ustr, &dstLen); where the input was surprisingly mData = 0x000000000033bec0 "http://example.com/?name=Müller/". Stack is this: xul.dll!nsTextToSubURI::convertURItoUnicode(const nsCString & aCharset, const nsCString & aURI, nsAString_internal & _retval) Line 210 C++ xul.dll!nsTextToSubURI::UnEscapeNonAsciiURI(const nsACString_internal & aCharset, const nsACString_internal & aURIFragment, nsAString_internal & _retval) Line 285 C++ xul.dll!nsMIMEInfoWin::LoadUriInternal(nsIURI * aURL) Line 238 C++ xul.dll!nsMIMEInfoBase::LaunchWithURI(nsIURI * aURI, nsIInterfaceRequestor * aWindowContext) Line 326 C++ xul.dll!nsExternalHelperAppService::LoadURI(nsIURI * aURI, nsIInterfaceRequestor * aWindowContext) Line 1082 C++ The key to understanding the problem is this function: NS_IMETHODIMP nsTextToSubURI::UnEscapeNonAsciiURI(const nsACString & aCharset, const nsACString & aURIFragment, nsAString &_retval) { nsAutoCString unescapedSpec; NS_UnescapeURL(PromiseFlatCString(aURIFragment), esc_AlwaysCopy | esc_OnlyNonASCII, unescapedSpec); // leave the URI as it is if it's not UTF-8 and aCharset is not a ASCII // superset since converting "http:" with such an encoding is always a bad // idea. if (!IsUTF8(unescapedSpec) && (aCharset.LowerCaseEqualsLiteral("utf-16") || aCharset.LowerCaseEqualsLiteral("utf-16be") || aCharset.LowerCaseEqualsLiteral("utf-16le") || aCharset.LowerCaseEqualsLiteral("utf-7") || aCharset.LowerCaseEqualsLiteral("x-imap4-modified-utf7"))){ CopyASCIItoUTF16(aURIFragment, _retval); return NS_OK; } return convertURItoUnicode(PromiseFlatCString(aCharset), unescapedSpec, _retval); } In my debugging I get: aURIFragment = {mData=0x0000000008830e48 "http://example.com/?name=M%FCller/" mLength=0x00000022 mFlags=0x00010005 } unescapedSpec = {mStorage=0x000000000033bec0 "http://example.com/?name=Müller/" } but then, as mentioned, rv = unicodeDecoder->Convert(aURI.get(), &srcLen, ustr, &dstLen); fails. I think it's time to ask someone who knows. Masatoshi-san, can you help us here. The problem is this: In an e-mail we have this link: http://example.com/?name=M%FCller/ and when we click on it, it goes through loadUrl() or loadURI() and the stack shown above. In nsTextToSubURI::UnEscapeNonAsciiURI() the text is un-escaped to http://example.com/?name=Müller/ which is passed to convertURItoUnicode() and that fails. Why? There are of course other problems here which we should ask other people, like for example: Why unescape the entire URI when the pathname and query should just be left alone.

Flags: needinfo?(VYV03354)

Comment 24

•

9 years ago

Information is insifficient to answer the question. > unescapedSpec = {mStorage=0x000000000033bec0 "http://example.com/?name=Müller/" } What is the binary byte value of "ü" in this string? ("\xFC" or "\xC3\xBC"?) What is aCharset value?

Flags: needinfo?(VYV03354) → needinfo?(jorgk)

Comment 25

•

9 years ago

FYI, bug 227268 added the UnEscapeNonAsciiURI() call.

Assignee

Comment 26

•

9 years ago

I added this in nsTextToSubURI::convertURItoUnicode() since I don't know how to convince the MS VC++ debugger to show me the bytes: printf("=== charset %s\n", aCharset.get()); int32_t i; for (i=0; i< srcLen; i++) printf("=== %x\n", aURI.get()[i]); rv = unicodeDecoder->Convert(aURI.get(), &srcLen, ustr, &dstLen); // rv = NS_OK; and that gives (I added the stuff in parenthesis): === charset UTF-8 === 68 (h) === 74 (t) === 74 (t) === 70 (p) [stuff deleted] === 3d (=) === 4d (M) === fffffffc === 6c (l) === 6c (l) === 65 === 72 === 2f Sorry about the clumsy output, looks like the leading bit is carried into the whole int. But as you can see, the answer is "\xFC". So that's ANSI and most likely a result of the unescaping of %FC. I can see two problems here: 1) LoadURI() don't take a charset argument (unless that can be extracted from 'aWindowContext'). 2) If UnEscapeNonAsciiURI() produced something that isn't UTF-8, it should most likely use the escaped value. If I just hover with the mouse over the HTML, I can see the same debug output. So the same function is called also without a click. Since there is an encoding error, the display just comes out as: http://example.com/?name=M%FCller/ We should do the same in the click processing. So if there's an encoding error, revert back to the original escaped string.

Flags: needinfo?(jorgk)

Comment 27

•

9 years ago

(In reply to Jorg K (GMT+1) from comment #26) > 2) If UnEscapeNonAsciiURI() produced something that isn't UTF-8, it should > most likely use the escaped value. > If I just hover with the mouse over the HTML, I can see the same debug > output. > So the same function is called also without a click. Since there is an > encoding error, the > display just comes out as: http://example.com/?name=M%FCller/ > We should do the same in the click processing. So if there's an encoding > error, revert back to the > original escaped string. It is caller's responsibility to handle unEscapeNonAsciiURI() failures because the nsITextToSubURI contract says unEscapeNonAsciiURI() can fail: https://dxr.mozilla.org/mozilla-central/source/intl/uconv/nsITextToSubURI.idl#48-49 > * @throws NS_ERROR_UCONV_NOCONV when there is no decoder for aCharset > * or error code of nsIUnicodeDecoder in case of conversion failure Blame nsMIMEInfoWin in this case.

Assignee

Comment 28

•

9 years ago

Thanks. https://dxr.mozilla.org/mozilla-central/rev/f4f374622111022d41dd8d5eb9220624135c534a/uriloader/exthandler/win/nsMIMEInfoWin.cpp#238 rv = textToSubURI->UnEscapeNonAsciiURI(urlCharset, urlSpec, utf16Spec); NS_ENSURE_SUCCESS(rv, rv); Who would be in charge of that?

Comment 29

•

9 years ago

Bz according to <https://wiki.mozilla.org/Modules/Core>? Feel free to redirect if you know a more appropriate person.

Flags: needinfo?(bzbarsky)

Assignee

Comment 30

•

9 years ago

OK, Boris, here's the problem. If you click this link http://example.com/?name=M%FCller/ in a mail display, you trigger an error in nsTextToSubURI::convertURItoUnicode() since one level up in nsTextToSubURI::UnEscapeNonAsciiURI() the %FC was converted to an ANSI character "\xFC" by calling NS_UnescapeURL(). Relevant comments: Comment #23 with call stack and comment #26 with more debug. The caller of UnEscapeNonAsciiURI() is here: https://dxr.mozilla.org/mozilla-central/rev/f4f374622111022d41dd8d5eb9220624135c534a/uriloader/exthandler/win/nsMIMEInfoWin.cpp#238 The interesting thing is that when you hover the link, it displays as http://example.com/?name=M%FCller/ in the status bar. In my debug I can see that it also goes through nsTextToSubURI::convertURItoUnicode() but that is on another code path which handles the error better: xul.dll!nsTextToSubURI::convertURItoUnicode(const nsCString & aCharset, const nsCString & aURI, nsAString_internal & _retval) Line 215 C++ xul.dll!nsTextToSubURI::UnEscapeURIForUI(const nsACString_internal & aCharset, const nsACString_internal & aURIFragment, nsAString_internal & _retval) Line 237 C++ xul.dll!nsDocShell::OnOverLink(nsIContent * aContent, nsIURI * aURI, const char16_t * aTargetSpec) Line 14208 C++ You have to look twice: The click calls UnEscapeNonAsciiURI() and the hover calls UnEscapeURIForUI() and the latter has: if (convertURItoUnicode( PromiseFlatCString(aCharset), unescapedSpec, _retval) != NS_OK) { // assume UTF-8 instead of ASCII because hostname (IDN) may be in UTF-8 CopyUTF8toUTF16(aURIFragment, _retval); } Who would have some spare cycles for this? This bug has been on the books for five years. Surely I can send a small patch to rectify this, by the pain and time-consuming part is always to come up with some test.

Comment 31

•

9 years ago

UnEscapeURIForUI() is for UI (as the name implies) and the conversion is lossy. Don't mimic the behavior here.

Comment 32

•

9 years ago

OK, so the nsMIMEInfo* impls are owned by per-platform experts in theory, at least in parts. In this case, probably Jim Mathies [:jimm] for the interaction with ShellExecute. Anyway, if we do need to unescape before calling ShellExecute, and assuming we do in fact need to pass punycoded hostnames (so have to GetAsciiSpec on the URI) there are several problems here: 1) We do in fact need a fallback if UnEscapeNonAsciiURI fails. Just doing CopyASCIItoUTF16(urlSpec, utf16Spec) is probably the best we can do under the circumstances. It will mean you kinda get garbage in some cases (e.g. those of bug 227268 when we end up taking the fallback path), but slightly-garbage is probably better than nothing at all. 2) The fact that we have a URI claiming a charset of UTF-8 that is clearly not escaped-UTF8 is somewhat wrong. OK, but where does that URI come from? Most immediately, nsExternalHelperAppService::LoadURI does this: nsAutoCString spec; aURI->GetSpec(spec); // Do some fixups on "spec" nsCOMPtr<nsIURI> uri; nsresult rv = ios->NewURI(spec, nullptr, nullptr, getter_AddRefs(uri)); It sure would be a good idea to pass the charset of aURI in as the second argument there. Whether that would help on its own or not depends on where aURI is coming from. The caller is openLinkExternally in contentAreaClick.js which has: if (!(uri instanceof Components.interfaces.nsIURI)) uri = Services.io.newURI(url); Note lack of charset arg. So if a string is passed in, you will get a URI that thinks it's UTF-8-encoded. If an actual URI is passed in we could maybe do better. It looks like the caller in contentAreaClick in fact passes a string. Passing along a charset, and specifically the charset of the target's owner document when the target is a node, might help somewhat in some cases. You should probably get two separate bugs going here; one on issue #1 (which is a core issue) and one on #2, which is a Thunderbird UI issue. As for testing... automated testing is hard, obviously, because the whole point of this codepath is to hand a load over to a different application. Manual testing should probably be done; the obvious cases are http:// links being handed from Thunderbird to Firefox and other browsers and mailto: links handed from Firefox to Thunderbird and other mail clients (e.g. bug 227268 had some proposed fixes that worked with Thunderbird but not Outlook). And obviously whatever the steps are in bug 227268. But the upshot of at least the core change is that things will be no worse than before bug 227268 in the failing case (and better in the success case). I wouldn't necessarily block landing on the manual testing here.

Flags: needinfo?(bzbarsky)

Comment 33

•

9 years ago

I guess actually we need three bugs: 1) Add the fallback in mimeinfowin. This is pure win, afaict, and can just land. 2) Pass aURI's charset in nsExternalHelperAppService::LoadURI when calling NS_NewURI. This part would probably need some careful testing before we land it. 3) Pass a useful charset in to all this stuff in the Thunderbird UI. Again, needs testing but no opinion whether that should block it being checked in.

Assignee

Comment 34

•

9 years ago

Boris, thanks for the detailed comment. You missed one question: ;-) Who would have some spare cycles for this? I mean the M-C part. We're happy to do the TB parts of course.

Comment 35

•

9 years ago

Item #1 of the m-c part should be pretty simple. I'll review if you write the patch. ;) That will at least get you to the point where the URI loads, whether it's "right" or not. Item #2... I don't know. You could ask Jim and see whether he has time or recommendations for someone who has time.

Assignee

Comment 36

•

9 years ago

Attached patch 770022-unescape-fallback.patch (v1). — Details — Splinter Review

That's what you asked for. Works fine, BTW. In case you approve, kindly add "checkin-needed". You know it's late here.

Assignee: nobody → jorgk

Status: NEW → ASSIGNED

Attachment #8835148 - Flags: review?(bzbarsky)

Assignee

Updated

•

9 years ago

Product: Thunderbird → Core

Comment 37

•

9 years ago

Comment on attachment 8835148 [details] [diff] [review] 770022-unescape-fallback.patch (v1). r=me

Attachment #8835148 - Flags: review?(bzbarsky) → review+

https://hg.mozilla.org/mozilla-central/rev/a765f1fd046e

Updated

•

9 years ago

Keywords: checkin-needed

Pulsebot

Comment 38

•

9 years ago

Pushed by ihsiao@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a765f1fd046e Provide fallback when UnEscapeNonAsciiURI() fails in nsMIMEInfoWin::LoadUriInternal(). r=bz

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 39

•

9 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox54: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Comment 40

•

9 years ago

I'm assuming you want this for TB52, so go ahead and request Aurora/Beta approval when ready :)

status-firefox52: --- → affected

status-firefox53: --- → affected

Flags: needinfo?(jorgk)

Assignee

Comment 41

•

9 years ago

Comment on attachment 8835148 [details] [diff] [review] 770022-unescape-fallback.patch (v1). Approval Request Comment [Feature/Bug causing the regression]: I don't know. We never looked for the regression, apparently this worked in TB/FF 12 but not 13. [User impact if declined]: Can't click on some URLs in an e-mail display. [Is this code covered by automated tests?]: No. [Has the fix been verified in Nightly?]: Yes, local TB build. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No, just a small fix to provide some fallback behaviour when some URL with encoded international characters can't be decoded. [Why is the change risky/not risky?]: See above. [String changes made/needed]: None.

Flags: needinfo?(jorgk)

Attachment #8835148 - Flags: approval-mozilla-beta?

Attachment #8835148 - Flags: approval-mozilla-aurora?

Liz Henry (:lizzard)

Comment 42

•

9 years ago

Comment on attachment 8835148 [details] [diff] [review] 770022-unescape-fallback.patch (v1). Minor fix/fallback for escaped characters, looks pretty safe.

Attachment #8835148 - Flags: approval-mozilla-beta?

Attachment #8835148 - Flags: approval-mozilla-beta+

Attachment #8835148 - Flags: approval-mozilla-aurora?

Attachment #8835148 - Flags: approval-mozilla-aurora+

Liz Henry (:lizzard)

Comment 43

•

9 years ago

Jorg, does this still need followup as bz mentions in comment 33? Or is what you have enough for now?

Flags: needinfo?(jorgk)

Assignee

Comment 44

•

9 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #42) > Minor fix/fallback for escaped characters, looks pretty safe. Thanks for the approval, yes, it looks pretty safe ;-) (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #43) > Jorg, does this still need followup as bz mentions in comment 33? Or is what > you have enough for now? Thanks for the question. I was going to ask Boris if he would be very upset if we only went with the first part of the solution he envisaged since that's all we need at the moment. I've just tested it again in TB: Clicking on http://example.com/?name=M%FCller/ shows exactly that in the status bar and opens exactly that URL. Boris envisaged a solution which would take the charset into account and maybe show http://example.com/?name=Müller/ when hovering the link and opening the that link, too. I'm not convinced that this is required, at least not according to comment #21 (last paragraph). So quick answer: What we have is enough for now, and perhaps even for the foreseeable future. Thanks again!

Flags: needinfo?(jorgk)

https://hg.mozilla.org/releases/mozilla-aurora/rev/afe4b9f1e68f

Comment 45

•

9 years ago

bugherder uplift

status-firefox53: affected → fixed

https://hg.mozilla.org/releases/mozilla-beta/rev/b9bc6b310328

Comment 46

•

9 years ago

bugherder uplift

status-firefox52: affected → fixed

Comment 47

•

9 years ago

If hovering the link doesn't show the "right" thing, then I don't think you'd get it even if you passed through the document charset. I do think it's worth filing a bug on Gecko to at least do its part (use the incoming URI charset) right.

Assignee

Comment 48

•

9 years ago

Thanks for the comment, Boris, but I'm a little confused. As I pointed out in comment #20, if you hover the following in FF <html><head><meta http-equiv="content-type" content="text/html; charset=windows-1252"></head> <body><a href="http://example.com/?p=%E9">Test</a></body></html> you see http://example.com/?p=é and if you click you open http://example.com/?p=%E9. Is that not the desired behaviour that's already working? So which bug do you want me to file? Reading your comment #32 and comment #33 again I can see that you want to pass the character set further down the line, which is a good idea in general, but what's the immediate demonstrable bug in FF that this would fix? If I hover this in TB with the e-mail charset of windows-1252, I see http://example.com/?p=%E9, but clicking opens http://example.com/?p= (yes, truncated). However, using http://example.com/?p=%E9e hovers and opens http://example.com/?p=%E9e. And http://example.com/?p=%E9%E9 hovers and opens http://example.com/?p=%E9%E9, so I'm not sure who eats the single character there. Strange. But that's not the bug you're talking about. So I could file two TB bugs here: Hovering http://example.com/?p=%E9 in windows-1252 e-mail doesn't show http://example.com/?p=é like in FF and Clicking http://example.com/?p=%E9 in windows-1252 e-mail opens http://example.com/?p= But for both, something seems to have gone wrong at the TB/Gecko interface. Without investigating, I can't blame that on Gecko ;-)

Flags: needinfo?(bzbarsky)

Assignee

Comment 50

•

9 years ago

I filed bug 1338758 for the second (truncation) issue. I also came across bug 1138608 which got fixed here.

Comment 51

•

9 years ago

> Is that not the desired behaviour that's already working? Yes. > So which bug do you want me to file? Comment 33 item 2. > but what's the immediate demonstrable bug in FF that this would fix? Try doing the same thing but replace your http:// URL with something Firefox does not handle internally (e.g. mailto:). See what gets handed out to the external helper app then. Ideally, you'd register an app for said protocol that just spits out what it got, so you can easily observe incoming values. No, it's not an obvious issue. ;) I can't speak to the TB issues.

Flags: needinfo?(bzbarsky)

Assignee

Updated

•

9 years ago

Blocks: 1338900

Assignee

Comment 52

•

9 years ago

Done, bug 1338900. I hope that's what you wanted, if not, please change the summary and/or post a comment there.

Assignee

Updated

•

9 years ago

No longer blocks: 1338900

Summary: Links with escaped special german letters don't work in version 13, with 0x8050000e (NS_ERROR_ILLEGAL_INPUT) [nsIExternalProtocolService.loadUrl], location: "JS frame :: chrome://communicator/content/contentAreaClick.js :: openLinkExternally :: line 188" → nsIExternalProtocolService.loadURI() throws NS_ERROR_ILLEGAL_INPUT for URLs containing escaped non-ASCII characters

Cornel Ionce [:noni]

Comment 53

•

9 years ago

Setting qe-verify- per comment 41.

Flags: qe-verify-