Dealing with confusables in domain names




Address Bar
7 months ago
4 months ago


(Reporter: annevk, Unassigned)


(Blocks: 1 bug)


Firefox Tracking Flags

(firefox57 wontfix)




7 months ago
Here's an idea for dealing with confusables in domain names:

1. We compute a canonical domain name for each entry in history. The exact details TDB, but consider removing hyphens, diacritics, replacing "1" with "l", "rn" with "m", etc.
2. We compute a canonical domain name _destination_ for the domain the user is navigating to.
3. We compare _destination_ with each canonical domain name in history.
4. If there's a match and the corresponding actual domain names are different we ask the user if they meant to navigate to a different URL. We can still navigate since unless the user takes action their cookies et al should be safe.

The dialog should allow for navigating to the address in history (and if the user takes that action we shouldn't put _destination_ into history), but also allow for the user stating somehow that it's not a duplicate in which case we shouldn't show the dialog again. (That probably makes the above stated model a little more involved, but hopefully the general idea is clear.)


7 months ago
Blocks: 1332714

Comment 1

7 months ago
It seems to me that you want to deal with in-Latin-script confusables, rather than with whole-string cross-script confusables, so I'm not sure why this should affect bug 1332714 (which, by the way, seemed to be not a bug but intended behaviour, so I'm not sure why it was reopened).

More importantly, you cannot deal with confusables with one-way normalizations: the algorithm you propose would be able to detect that "" is a potential spoof of "", but would not be able to detect that "" is a potential spoof of "". You would rather need to calculate every time the entire possible set of combinations that can be created by replacing each single letter with all potentially confusable other letters, and check for all of them - and they quickly become hundreds or thousands.

- this algorithm (in the complex version) already exists, it's the IDN variant mechanism, and it should be applied just once by TLD registries at registration, rather than each and every time by every Internet user whenever they encounter a domain name;
- however, at the policy level, no one ever thought that (e.g.) "1" and "l" should be considered variants, so that you cannot have legitimate domain names differing only by "1" in place of "l"; so your algorithm would not just be ineffective and CPU-heavy, but also generate false positives.

Comment 2

6 months ago
It wouldn't be CPU-heavy since we only compare against sites the user already visited. So if the user visited and then gets directed to it would be a simple string comparison (against each history entry, which can be further optimized) after normalization and then an alert of sorts to the user that they may be on the wrong site.
status-firefox57: --- → wontfix
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.