Open Bug 525831 Opened 16 years ago Updated 3 years ago

Unicode TLDs with RTL characters can be used to spoof the domain part of the URL

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

People

(Reporter: tomer, Unassigned)

References

()

Details

(Keywords: rtl, sec-low, Whiteboard: [sg:low? spoof][necko-would-take])

Attachments

(2 files)

I'm thinking about this for some time now, and the recent ICANN announcement for international top-level domains put us on high risk for domain hijacks, in case we use RTL characters for both the domain and the page name. The link below demonstrate my experiment - I've used ICANN testing environment for international domains, the domain used is http://דוגמה.טעסט (Hebrew/Yiddish characters), while it can be easily reproduced on the Arabic characters as well. I've placed there a page named דומיין.קום (in Hebrew - domain.com), which will appear as the domain name itself because of the Slash character being a weak character in the Unicode bidi algoritm (http://unicode.org/reports/tr9/). URL for the experiment - http://דוגמה.טעסט/%D7%93%D7%95%D7%9E%D7%99%D7%99%D7%9F.%D7%A7%D7%95%D7%9D * IDN domains under the .com tld and others are less risky, because TLDs with LTR characters block this behavior from happening.
Luckily we currently only support IDN in non-ascii TLDs only for the ICANN test TLDs (everything else gets punycode) so we have time to fix this before ICANN approves any active domains. What is the expected way of reading a URI in a RTL script? If we flipped the scheme around would that make things better? e.g. xxxxxxxxxxxx//:ptth or is that just too weird? If we preserve the true domain on the left between the scheme and the first slash, should the rest of the path be fully RTL or only RTL between slash delimiters? There should be a law against BIDI. LTR or RTL: pick one, none of this ambiguous mixing :-)
For the list of currently supported IDN TLDs open about:config and filter on IDN.whitelist.xn--
I'm fairly sure the current IDNA2008 work spent a lot of time on working out the BiDi rules. We should look at what they are proposing (perhaps as modified by Unicode Technical Report TR46) and see if that defines what we should do. If we think there's a spoofing problem with the IDNA2008 spec, now is the time to raise it - ASAP. It's in Last Call. Gerv
As far as I can gather, the IDNA folk either don't consider this a problem or don't think there is a possible solution. They seem to be confident that users will get used to reading right-to-left sequences within URIs correctly and not be confused by cases like Tomer's example. I am not so sanguine.
Does this bug need to be confidential? The problem is more with the spec than with our implementation, and the most important thing is that all browsers should handle the case consistently.
We should figure out how we want this to work. If we were in a world where we'd decided to ditch the "http://", then you could have a rule where if the entire URL was RTL, do the equivalent of (obviously, not in Latin script): [ lmth.oof/htap/moc.elpmaxe.niamodbus ] but if any parts were RTL, you would do: [ subdomain.example.com/htap/lmth.foof ] Dan: It would be great if the rule were "pick one" but if that were the case, until we had IDN TLDs (i.e. now), there would have been no way to do RTL at all. Is it time to start doing more in the URL bar to separate URL components? Would that help? Gerv
Oh, and as long as we are sure this is a spec issue and not an implementation issue, I have no problem opening this bug. Gerv
The same issue was raised about Arabic on the IDNA list recently http://www.alvestrand.no/pipermail/idna-update/2009-November/005771.html I don't think we need to hide the bug.
Group: core-security
Whiteboard: [sg:low? spoof]
(In reply to comment #6) > Dan: It would be great if [stupid suggestion, but] there would have been > no way to do RTL at all. It was a joke, along the lines of the classic "If English was good enough for Jesus..."
Another variant of this issue can be made using anchors. http://דוגמה.טעסט/#/דומיין.קום Domain part is דוגמה.טעסט, while the user expect the anchor part to be the domain (דומיין.קום). I've added some slashes, so even the redirect of the MediaWiki installation there won't spot it is really a fake address.
Can't we just display URLs like this in our UI? http://[domain-name][LRM][path-including-the-initial-slash]
(In reply to comment #11) > Can't we just display URLs like this in our UI? > > http://[domain-name][LRM][path-including-the-initial-slash] We can also replace every slash with lrm-slash, which will make subdirectories in URL path to appear as expected instead of reverse order, but it may confuse people who copy to URL to another medium.
(In reply to comment #12) > (In reply to comment #11) > > Can't we just display URLs like this in our UI? > > > > http://[domain-name][LRM][path-including-the-initial-slash] > > We can also replace every slash with lrm-slash, which will make subdirectories > in URL path to appear as expected instead of reverse order, but it may confuse > people who copy to URL to another medium. We can copy the URL without those LRM characters to the clipboard! There aren't many places where we can copy the URL from in the UI. Two of them come to my mind: the location bar, and Copy Link Location in the context menu (which we probably won't need to touch).
I don't think it will be complicated to implement this, as we do similar magics when Unicode URLs are translated back to %-escaped string, but imagine what will happen when someone paste the URL into a forum, for example - The URL will appear completly different between the URL bar and in the page content.
(In reply to comment #14) > I don't think it will be complicated to implement this, as we do similar magics > when Unicode URLs are translated back to %-escaped string, but imagine what > will happen when someone paste the URL into a forum, for example - The URL will > appear completly different between the URL bar and in the page content. I don't think that there is anything that we can do in Firefox/Gecko to solve that problem...
Attached image domain highlighting
This bug is partially addressed by the fix to Bug 451833 . The correct part of the URI is highlighted.
Whiteboard: [sg:low? spoof] → [sg:low? spoof][necko-would-take]
Please note that this very same issue have been recently reported by a security researcher named Rafay Baloch as valid security issues in both Firefox and Chrome. http://www.rafayhackingarticles.net/2016/08/google-chrome-firefox-address-bar.html
(In reply to Tomer Cohen :tomer from comment #17) > Please note that this very same issue have been recently reported by a > security researcher named Rafay Baloch as valid security issues in both > Firefox and Chrome. https://www.mozilla.org/en-US/security/advisories/mfsa2016-82/ https://bugzilla.mozilla.org/show_bug.cgi?id=1284372
Priority: -- → P5
this is an experimental patch to try the suggested approach in the urlbar, it has a lot of unknowns, some code likes to directly access the urlbar input field value, plus it's unclear if there's any code path that may end up storing the wrong url, plus user-editing can end up removing the force ltr char, and then the domain slides away to the right, plus the char can move around, so we can't just replace that one easily (and the patch ends up replacing all forceRTL). I don't plan to spend any more time on this shortly, I'm attaching just to avoid losing this for future investigation.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: