Closed Bug 733350 Opened 12 years ago Closed 9 years ago

Some Arabic characters in IDN don't trigger conversion to Punicode

Categories

(Core :: Networking, defect)

10 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: voulnet, Unassigned)

References

Details

(Keywords: sec-low, Whiteboard: [sg:low])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11

Steps to reproduce:

I entered an Arabic character in a URL into the URL bar and hit enter, but Firefox did not convert the URL to punicode as expected (in order to protect against phishing now that internationalized domain names have been introduced).




Actual results:

Firefox accepted the URL as it is, and did not convert it to punicode. 

I tested this URL: Ꮮast.fm (notice the Ꮮ which is NOT the English L) and it was correctly converted to punicode.

However when I entered www.اast.fm (which contained the Arabic letter ا that looks like a small letter L), Firefox did NOT convert it to punicode.

Another test I did was mail٠google.com (the first dot is not a dot, but rather an Arabic zero ٠ ), and Firefox still did not  convert it to punicode! I might easily fish a mail.google.com looking website this way.


Expected results:

Firefox should have converted it to Punicode, something like http://www.xn--ast-wgq.fm for example, so that users do not get phished easily.

I noticed this problem with Arabic characters only so far.
FWIW, the www.اast.fm example doesn't convert to punycode in Safari 5 or Chrome 17, either (whereas the mail٠google.com one does).
Jonathan: Yes, you are spot on. I have already notified Google and Apple about them.

However Firefox here is the worst offender as it doesn't correctly handle mail٠google.com, which might be more sinister in my opinion.
The simplest "fix" for this is to add the Arabic zero (U+0660) to network.IDN.blacklist_chars in about:config (you'd also want to add the Eastern variant U+06F0).

However, there are doubtless lots more potentially "confusable" characters in Unicode, so simply trying to blacklist each individual character that seems at all risky isn't really a workable solution.

See also http://unicode.org/reports/tr39/.
Hmm, IDN spec should define the confusable characters which shouldn't be used in IDN...
For what its worth: 

In Firefox, typing http://مثال.السعودية  (Written in Arabic) yields the correct website, while still showing the Arabic URL in the URL bar.

In Google Chrome, typing the http://مثال.السعودية URL (Written in Arabic), it gets quickly converted to punicode, loads the website and shows the punicode in the URL.
Do you have network.IDN.whitelist.com set to true in about:config? For me mail٠google.com is converted to http://www.xn--mailgoogle-2rn.com/
http://www.اast.fm/ is strange though: that should be displayed as punicode whatever the domain, because there must not be rtl and ltr characters in the same label -- and we have unit tests for this in netwerk/test/unit/test_bug427957.js. This should be the same case as  www.מיץpetel.com which we test for there.
Interesting.... if I try http://mail٠google.com, the address bar converts it to punycode; but if I try https://mail٠google.com (note the https), then it remains displayed as https://mail٠google.com (although the error page says "Firefox can't find the server at xn--mailgoogle-2rn.com.")
www.מיץpetel.com also displays as such in the address bar, so test_bug427957.js is obviously not testing what it should be testing :(
That one doesn't even get punycoded in the "Firefox can't find the server at www.מיץpetel.com" error page.
Confirming. Who can own this bug?
Status: UNCONFIRMED → NEW
Ever confirmed: true
I think the issue here in relation to mixed-direction examples like http://www.اast.fm/ or www.מיץpetel.com is primarily a cosmetic one rather than a real security problem. AFAICT, these addresses are (correctly) rejected as invalid, and the browser does not attempt to access them at all. Watching the Web Console, I see no network requests being generated. OTOH, if I try www.ا.а.s.t.fm (the 'a' is Cyrillic, by the way), which is theoretically valid although non-existent, the expected GET request (in punycode form) is issued.

When test_bug427957.js calls convertToDisplayIDN for one of these invalid addresses, the result is *not* a punycode string; rather, an error is thrown and the display IDN remains blank. So displayIDN != inputIDN, as expected, but it's not punycode.

What I feel is rather misleading is that for these invalid examples, we show the same error page ("Firefox can't find the server at www.מיץpetel.com") as we'd show for a valid but nonexistent address. Would it be better if we showed something like "Firefox can't access the address www.מיץpetel.com because it uses an invalid domain name"?

A second question is whether we should transform invalid names to punycode (either in the address bar or the resulting error page), or simply leave them in their original form, given that we're refusing to do anything with them anyway. (My inclination, I think, would be to leave them unchanged. But I do think we need to tell the user the address is invalid, as opposed to simply inaccessible.)
(In reply to Jonathan Kew (:jfkthame) from comment #12)
> I think the issue here in relation to mixed-direction examples like
> http://www.اast.fm/ or www.מיץpetel.com is primarily a cosmetic one rather
> than a real security problem.

If you don't think this bug has serious security implications we can probably make progress faster with a public discussion (open it up).
That would seem reasonable to me, but I don't think I should make that call. I've just been noting the details I observe, but I don't really know anything about the networking side of our code, or how phishing protection and other such security issues are handled.
If we're not making network requests for these names then this is not a security bug. Is that true in all these cases?
mail٠google.com does "GET http://xn--mailgoogle-2rn.com/" and "GET http://www.xn--mailgoogle-2rn.com/".

However, as I said above, I see Punicode there anyway, in both the address bar and the error page. Jonathan (comment 8) got different results.
(In reply to Simon Montagu from comment #16)
> mail٠google.com does "GET http://xn--mailgoogle-2rn.com/" and "GET
> http://www.xn--mailgoogle-2rn.com/".
> 
> However, as I said above, I see Punicode there anyway, in both the address
> bar and the error page. Jonathan (comment 8) got different results.

Hmm, it looks like this changed at some point between FF10 and current code - with FF10.0.2, the https://mail٠google.com example appears unchanged in the address bar (as per comment 8), but in Aurora it displays punycode there.

In any case, I don't think that one is particularly worrying - while the Arabic zero is a kind of "dot", it's at least somewhat visually distinct from the ASCII period in typical fonts. ISTM that it's roughly on a par with examples like "http://www.goog1e.com" or "http://www.mozi11a.org", which may or may not be easy to spot, depending on the fonts being used.
Group: core-security
Summary: Arabic Characters in URL not converted to Punicode → Some Arabic characters in IDN don't trigger conversion to Punicode
Whiteboard: [sg:low]
Component: Untriaged → Networking
Product: Firefox → Core
Attached file Test case
In testing latest IDN changes, I found that this was true with Arabic, Hebrew and Thaana. See attachment.
This bug as originally reported seems to be fixed (presumably by bug 479520), though the issue with error reporting in comment 12 still needs attention.
Status: NEW → RESOLVED
Closed: 9 years ago
Depends on: IDNA2008
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: