IDN script mixing policy: Consider switching to highly-restrictive

VERIFIED FIXED in Firefox 57

Status

()

P1
normal
VERIFIED FIXED
2 years ago
a year ago

People

(Reporter: jshin1987, Assigned: Gijs)

Tracking

unspecified
Firefox 58
Points:
---

Firefox Tracking Flags

(firefox57 verified, firefox58 verified)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Coming from Chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=726950

Currently, Chromium uses moderately-restrictive for mixed script check in domain name components. As a result, certain scripts with confusable characters to Latin are allowed to mix with Latin.   To protect against them, Chromium has added some ad-hoc rules.  

When I reviewed Verisign and IDN ccTLD rules for scripts other than Han, Hangul, Hiragana/Katakana, and Bopomofo,  they don't allow mixing with Latin. [1] 

So, using 'moderately-restrictive' policy for mixed-script detection does not do much for (major) TLDs.   

Of course, tertiary and lower level domain components can mix Latin and those scripts (Hebrew, Deva, Arabic, Armenian, Georgian), but if you look at web pages in languages written in those scripts,  script-mixing within a single word is very rare if any. 

I propose that Firefox and Chromium can sync up on this policy and change to 'highly restrictive' script mixing check. 


[1] https://www.verisign.com/en_US/channel-resources/domain-registry-products/idn/idn-policy/registration-rules/index.xhtml  ; section 3

Hebrew domain name policy (Israel): does not allow mixing Hebrew characters and Latin. 

http://www.isoc.org.il/files/docs/ISOC-IL_Registration_Rules_v1.5_ENGLISH_-_26.6.2016.pdf

https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-hebrew-30aug16-en.html


Indian IDN policy (not sure if it's the latest. it's from 2009)

http://meity.gov.in/writereaddata/files/India-IDN-Policy.pdf

3.B has this:
B. NOT PERMISSIBLE
1. CODE-PAGE MIXING
No mixing of scripts at a given level will NOT be allowed

As an example, Latin-Devanagari mixed label is given.
Hi Jungshik,

Did you get the emails I sent to you and Mark Davis on this topic recently?

Do you have any measures of the level of impact this change will have? E.g. how many domain names are affected, and whether any are in the Alexa top 1M or any other list of popular sites?

Gerv
(Reporter)

Comment 2

2 years ago
Hi Gerv,  I must have missed it. I'll look for it. 

As for # of domains affected by this change, I only checked dot com domain (as of a few months ago). IIRC,  it's 0 because apparently Verisign did what their policy page says they do in terms of mixed script names. I'll go back and check again after  disabling some additional checks I put in for Chromium on top of the current moderately restrictive policy. 

I'll also check some of IDN ccTLDs as well as some TLDs not controlled by Verisign.
(Reporter)

Comment 3

2 years ago
Gerv,  I couldn't find any recent email from you (both my personal and company account) regarding IDN display policy. Can you send it again to my personal gmail (associated with my bugzilla account)?  Thank you
Comment hidden (off-topic)
Jungshik: I've resent the email using the address you use for Bugzilla. But basically, the questions are: do we have to switch to Highly Restrictive, or is there another way? What impact would that switch have? And even if we do switch, does that solve all the issues which have been raised recently? (Spoiler: no.)

Gerv
(Reporter)

Comment 6

2 years ago
Gerv, it'll not solve *ALL* the issues recently raised, but quite a lot of them will become mute. In case of Chromium, I can get rid of some ad-hoc rules I added to protect against Latin-{script foo} confusables. 

Do you have any TLDs you want me to get stats of # of affected domains on?  If you don't, I'll try .com (the latest list), .museum, and a few IDN ccTLDs (non-CJK).
Comment hidden (off-topic)
Comment hidden (off-topic)
(In reply to Jungshik Shin from comment #6)
> Do you have any TLDs you want me to get stats of # of affected domains on? 
> If you don't, I'll try .com (the latest list), .museum, and a few IDN ccTLDs
> (non-CJK).

I'd like to see stats on the most popular gTLDs, and most used IDN ccTLDs, and then ccTLDs in countries which have a script which might be mixed with Latin. You probably have better popularity data than I do, but TLDs like .com, .org, .net, .de, .ru, .in, .ir, .eu, .vn etc.

Gerv
(Reporter)

Comment 10

2 years ago
Do .de and .ru (IDN equivalent) allow scripts other than Latin and Cyrillic, respectively?  If not, they're not affected with a proposed change.
(Reporter)

Comment 11

2 years ago
None of domains are affected in .com (over a million), .net (230k) and .org (25k). So, Verisign has enforced their rules.
(Reporter)

Comment 12

2 years ago
The numbers inside parents in the previous comment are IDN domain counts (not the total domain counts).
At this stage, I can't see a way of solving this issue in a general form other than this switch, so I think we should go ahead and do it - particularly as Chrome agree. I'm still recovering from an operation, so dveditz: can you get agreement from anyone else on our side who needs to agree (e.g. selena was in the loop last time we had a question about how we did this) and give the thumbs up? Thanks :-)

Gerv
Flags: needinfo?(dveditz)
Comment hidden (off-topic)
Flags: needinfo?(dveditz)
We should just do this. I'll let Selena know but we know this is solving real spoofing issues.
Looks like this is as simple as changing the network.IDN.restriction_profile pref to "high" at
https://searchfox.org/mozilla-central/source/modules/libpref/init/all.js#1926

Is there something else we'd have to do?
Flags: needinfo?(jfkthame)
No, I believe that's all it takes.
Flags: needinfo?(jfkthame)
Comment hidden (mozreview-request)
(Assignee)

Comment 19

2 years ago
(In reply to Daniel Veditz [:dveditz] from comment #15)
> We should just do this. I'll let Selena know but we know this is solving
> real spoofing issues.

Patch incoming.

Do we want to pursue doing this for 56? I assume for nightly and 57, we simply land the pref change in all.js and uplift, but for 56 we could do something if we thought it was important enough (potentially using SHIELD or a hotfix addon or whatever the current Best Method is). The chromium issue is a bit hard to follow, but it looks like they shipped this a while back, so presumably we don't need to wait...
Flags: needinfo?(dveditz)
(Assignee)

Comment 20

2 years ago
(In reply to :Gijs from comment #19)
> The chromium issue
> is a bit hard to follow, but it looks like they shipped this a while back,
> so presumably we don't need to wait...

Ah, no, I misread, my apologies. They're still using "moderate". I don't know what their shipping schedule is, and I don't have a handle on how urgent this is, either, so leaving ni for dveditz who can presumably help answer that. :-)
While you're there, can you please update the link to the definitions of the profiles on unicode.org a few lines above? It should now be http://www.unicode.org/reports/tr39/#Restriction_Level_Detection.
Flags: needinfo?(gijskruitbosch+bugs)
Comment on attachment 8913163 [details]
Bug 1399939 - switch to highly restrictive profile for IDN,

https://reviewboard.mozilla.org/r/184586/#review189726

LGTM, though it'd be nice to also fix up the comment as Simon suggested.
Attachment #8913163 - Flags: review?(jfkthame) → review+
(Assignee)

Comment 23

2 years ago
(In reply to Simon Montagu :smontagu from comment #21)
> While you're there, can you please update the link to the definitions of the
> profiles on unicode.org a few lines above? It should now be
> http://www.unicode.org/reports/tr39/#Restriction_Level_Detection.

Good point, fixed. I also used 'https', which seems to work even though there's no automatic redirect.
Flags: needinfo?(gijskruitbosch+bugs)
Comment hidden (mozreview-request)
(In reply to :Gijs from comment #20)
> I don't have a handle on how urgent this is, either

We certainly don't need to hotfix already shipping Firefox 56. Uplifting to Firefox 57 would be nice
Flags: needinfo?(dveditz)
(Reporter)

Comment 26

2 years ago
Thank you for making a quick move. (btw, Gerv, hope you will recover before long. )

Chrome still uses 'strict', but a CL is up for review to make a switch. See the cr bug at the top of this bug. (it's public). 

It's not urgent because major gTLDs (e.g. net/org/com) and ccTLDs with IDN (India, Isarel, Saudi Arabia, etc) do not allow mixing of Latin and scripts other than CJK.  That is, they're enforcing "strictly restrictive rules" on their ends.
(In reply to Jungshik Shin from comment #26)
> Chrome still uses 'strict',

I assume you actually mean "moderately-restrictive", and the plan is to switch _to_ strict?

Gerv
(Reporter)

Comment 28

2 years ago
Yes. Sorry for 'typos'  in comment 26.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Priority: -- → P1

Comment 29

2 years ago
mozreview-review
Comment on attachment 8913163 [details]
Bug 1399939 - switch to highly restrictive profile for IDN,

https://reviewboard.mozilla.org/r/184586/#review189964
Attachment #8913163 - Flags: review?(dveditz) → review+

Comment 30

2 years ago
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e1c1ebf60808
switch to highly restrictive profile for IDN, r=dveditz,jfkthame

Comment 31

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/e1c1ebf60808
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox58: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 58
(Assignee)

Comment 32

2 years ago
Comment on attachment 8913163 [details]
Bug 1399939 - switch to highly restrictive profile for IDN,

Approval Request Comment
[Feature/Bug causing the regression]: n/a
[User impact if declined]: increased potential for spoofing using mixed scripts
[Is this code covered by automated tests?]: Yes, we have unit tests for the different ways we can process IDN, and they cover both the 'high' and 'moderate' values for this preference.
[Has the fix been verified in Nightly?]: not formally, but I've done a quick check just now, and it works.
[Needs manual test from QE? If yes, steps to reproduce]: not sure about 'need', but it may be useful. Steps:

1. put a URL mixing latin scripts with e.g. Devanagari in the URL bar, e.g. http://www.aहिन्दी.in/ , and hit enter
Expected: the location bar display changes to:
www.xn--a-tvdf1e4ai3g.in

Pre-patch: the location bar would continue to display the Devanagari characters.
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: no
[Why is the change risky/not risky?]: it's a pref flip. In the worst case, we can flip it back, but the best way to find out if this causes problems in the wild (despite the automated test coverage) is by exposing it to a wider audience on beta. Given that it's a pref flip, it's trivial to revert if we do find issues, even if it's late in the cycle.
[String changes made/needed]: nope
Attachment #8913163 - Flags: approval-mozilla-beta?
Comment on attachment 8913163 [details]
Bug 1399939 - switch to highly restrictive profile for IDN,

OK, let's try if we are lucky but if we are finding too many regressions, we should disable that before 57 goes to release.
Should be in 57b5
Attachment #8913163 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
status-firefox57: --- → affected

Comment 34

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/49cc08c809f7
status-firefox57: affected → fixed
Reproduced this bug using an affected Nightly build from 2017-09-14.

I can confirm that this issue is verified fixed on 57.0 (20171112125346) and 58.0b3 (20171114032831) across platforms: Win 10 x64, macOS 10.12.6 and Ubuntu 16.04 x64.
Status: RESOLVED → VERIFIED
status-firefox57: fixed → verified
status-firefox58: fixed → verified
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.