Closed Bug 983982 Opened 10 years ago Closed 7 years ago

Address bar converts non-ASCII Capital letters erronously (coul be used for phishing)

Categories

(Core :: Internationalization, defect)

27 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: aalpersen, Unassigned)

References

Details

Attachments

(1 file)

Attached image sahip.png
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 (Beta/Release)
Build ID: 20140212131424

Steps to reproduce:

When I tried to browse a popular internet site (www.sahibinden.com) a different web page loaded. At the time I was not aware that I entered WWW.SAHİBİNDEN.COM (I was editing a document file and keyboard was in CAPS LOCK mode) but after Firefox loaded the wrong site I checked address and it was correct (Address bar converted " WWW.SAHİBİNDEN.COM " to " www.sahibinden.com "). I use Turkish Q keyboard. From google search I searched for site and clicked the result and it was the correct page. I suspected some kind of malware. I tried on another browsers and when I realized that other browser converted the address to " http://www.xn--sahibinden-q2fc.com/ ". From the attached file you may see that the addresses in address bar seem same but when I copy both addresses and paste in Notepad they are:
http://www.sahi̇bi̇nden.com/
http://www.sahibinden.com/
Also I will add Notepad screen shot you may see it is different on it too :)


Actual results:

Firefox address bar erronously converted capital non-ASCII Turkish letters to non-ASCII non Turkish letters. See screenshot for problem.


Expected results:

1) Firefox may convert Capital Turkish letter to Lowercase Turkish letter (in this case 'İ' to 'i"). Also Capital Turkish letter 'I' should be converted to 'ı' but it is converted to 'i'. So when I enter "WWW.SAHIBINDEN.COM" it is converted to "www.sahibinden.com" to correct real address wrong way. 
2) Firefox may convert address to " http://www.xn--sahibinden-q2fc.com/ " as other browsers. User may be fooled into phising attacks (if the converted wrong web page did disguise like the original web page, I would use it without realizing.
Also some Turkish banking web sites use lowercase "i" letter. Try:
www.garanti.com.tr and
WWW.GARANTİ.COM.TR
it opens : http://www.garanti̇.com.tr/ on address bar. You may not realize visual diffrence (you wont). So it is very big possibility that it may be used for phising.
Summary: Address bar converts non-ASCII Capital letters erronously → Address bar converts non-ASCII Capital letters erronously (coul be used for phishing)
Blocks: IDN
Component: Untriaged → Networking
Product: Firefox → Core
I believe this issue is addressed in IDNA2008
Depends on: IDNA2008
It seems tolower('İ') returns wrong result. It should be 'i' (U+0069), not 'i+COMBINING DOT ABOVE' (U+0069 U+0307)

The problem can also be observed on this page http://www.fileformat.info/info/unicode/char/0130/index.htm
Table titled Unicode Data textually says Lower case is U+0069 which is correct. However, in table titled Java Data, the row with string.toLowerCase() shows the same wrong result.
COMBINING DOT ABOVE in UTF-8 is 0xCC87
http://www.fileformat.info/info/unicode/char/0307/index.htm
Component: Networking → Internationalization
No, U+0130 => U+0069 U+0307 is the correct transformation per http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt. This was implemented in bug 744357. In effect what we are doing is normalizing U+0130 to U+0049 U+0307, and then lowercasing U+0049 to U+0069.
(In reply to Simon Montagu :smontagu from comment #4)
> No, U+0130 => U+0069 U+0307 is the correct transformation per
> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt. 
Thanks for the link.
One unconditional mapping explains it all:
    # Preserve canonical equivalence for I with dot.
    0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

That seems to store dot information for 'İ' in locale-agnostic way and effectively creates a second code point for 'i', which is surprising because that had already been proposed many years ago yet overruled by Unicode(even has its own Faq entry http://www.unicode.org/faq/casemap_charprop.html#9).
So why does it exist now? Who needed it? Probably the banking system, international payment & transaction systems.
The question is: Does IDN need this? Is it necessary to store dot info next to letter 'i' so that it becomes 'İ' when uppercased?
Unfortunately I don't know the rationale behind the IDN decision (and not sure where to ask for it), but this is not a bug in Firefox at this point, so closing.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: