Closed Bug 281381 Opened 20 years ago Closed 20 years ago

Unicode characters can be used to spoof URLs


(Core :: Security, defect)

Not set





(Reporter: bugzilla, Assigned: dveditz)




User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0

In Unicode, more than one character can look the same: for example, latin 'a'
and Cyrillic 'а' look identical. This enables URLs such as
http://www.pа to be created. Supposedly, the registrar is meant to
associate a character set with each domain label, and to filter the characters
in the label against that character set. It clearly does not appear that
Verisign has done this. Therefore, we must do so, as this otherwise leaves a
giant spoofing hole wide open.

Further, the about: page workaround of turning off IDNs has widely been reported
as not working.

See for more details.

Reproducible: Always

Steps to Reproduce:
1. Visit http://www.pа
2. See that it is not the same as

Actual Results:  
Went to a spoofed website, which was not the same as the real website with
all-ASCII name.

Expected Results:  
* reported the URL as spoofed, and refused to visit it, or
* visited the URL that it visually purported to be
More references for the technical issues behind this bug:
* IDN Language Table Registry 
*  Draft Unicode Technical Report #36, Security Considerations for the
Implementation of Unicode and Related Technology
See also this very interesting reference:
* Method for detecting a homographic attack in a webpage by means of language
identification and comparison
And this as well:
*  "The Homograph Attack", Communications of the ACM, 45(2):128, February 2002
Turning off IDN is not a good long-term solution, and neither is expecting the
average user to know how to check code pages, look for minute serif differences,
pull out a hex editor, etc.

However, phishing attempts *can* be detected by looking for domains that are a
combination of ordinary characters (based on the user's language setting) and 
at least one character whose traditional rendering might be confused with
ordinary letters or numbers.

Upon detection, the browser could provide a notification bar (similar to those
seen on pop-up and plug-in install attempts) with a message such as:

     This site may be trying to trick you into believing it
     is Click _here_ for more details.

In the above message, the hostname would be "translated" to the possible
equivalent "look-alike" string. Clicking the message would provide a window with
a Joe Sixpack-level explanation of the issue (without getting into details about
Unicode) and would highlight and describe any characters that might be confused.

This general approach could be used for any user language, and might also be
useful for semi-homographs, such as letters with diatric or accent marks that
might go unnoticed.
Confirming. No dupes found. All/all. Core/Security: general.
Assignee: bugs → dveditz
Component: Location Bar and Autocomplete → Security: General
Ever confirmed: true
OS: Windows 2000 → All
Product: Firefox → Core
QA Contact: davidpjames
Hardware: PC → All
Version: unspecified → Trunk

*** This bug has been marked as a duplicate of 279099 ***
Closed: 20 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.