Open
Bug 1257275
Opened 9 years ago
Updated 3 years ago
Consider using ICU's uspoof API in isLabelSafe implementation?
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
NEW
People
(Reporter: jshin1987, Unassigned)
Details
This is an outreach to align the IDN display policy of Mozilla and Chrome as closely as possible.
Basically, a proposed IDN display policy of Chrome [1] is very close to Mozilla. [2] There are a few differences but the percentage of domains affected by them are really low.
About 2,000 out of a million (~ 0.2%) in .com TLD are filtered by Chrome's policy and the vast majority of them - well over 90% - are blocked by the character set limit of [:IdentifierType=Allowed:] + [Aspirational Scripts] that Mozilla also uses. There are some others blocked by both Chrome and Firefox due to 1) ICU's ToUnicode conversion (e.g. BiDi check, leading combining mark, etc) 2) Combining marks in sequence, number system mixing, etc.
Here are differences between the current Mozilla implementation and Chrome's proposed implementation:
1. Chrome uses the Unicode character property, Script_Extension instead of Script when detecting script mixing. There's already a bug (IIRC) on Mozilla to switch to Script_Extension. So, I guess there's no disagreement.
2. When adding adding 5 'aspirational scripts' to the allowed characters, we
use characters listed with Status/Type= Aspirational at
http://www.unicode.org/Public/security/latest/xidmodifications.txt instead of adding characters whose 'script' property is one of 5 scripts. If [:Script=Foo:] is added, some characters deemed not suitable for Id are added.
3. Chrome also checks if a label is 'mixed script confusable' as described in UTS 39. This helps blocking cases like 'google' with Armenian Small Letter Oh or labels made entirely of Katakana except for one Hiragana that looks like a Katakana. There are about a dozen of the second case out of ~ one million in com TLD. (e.g. デリへル where へ is Hiragana).
The remaining ~ 25 labels (out of a million) labels filtered by this check is either "{Kanji, ASCII Latin} + Katakana Prolongation mark (U+30FC)" or {ASCII Latin} + Katakana Middle Dot (U+30FB).
I remember whether Armenian should be prevented from mixing with Latin because there are quite a number of Armenian letters that look like Latin. Turning on this makes it unnecessary to worry about Armenian + Latin.
4. The proposed Chrome policy does not allow mixing non-Latin script with 'non-ASCII Latin'. (Only ASCII-Latin is allowed to mix with non-Latin scripts other than Cyrl/Grek). This is a precaution, but I didn't find any labels blocked by this yet. I tested against about 6 TLDs with almost 2 million domains total. Moreover most (not all) of IDN tables for non-Latin scripts or languages written in non-Latin scripts only list ASCII Latin (or do not list Latin at all). So, this difference would have little, if any, practical impact.
All of these can be done by modifying the current implementation. For instance, the current code already refers to xidmodifications.txt so that it'd be easy to do #2 with a very small change.
On the other hand, #1 and #3 require a bit of more changes in the current code. An alternative to further changes in the current implementation [1] is to switch to ICU's uspoof API. If
If ICU's uspoof API [4] can be used (Because Firefox does not use ICU in some platform/build configurations, it might be an issue), implementing #1 and #3 would be simpler. Chrome's CL can be referenced if necessary.
What would you say to this idea?
[1] https://codereview.chromium.org/1258813002/
[2] https://wiki.mozilla.org/IDN_Display_Algorithm
[3]
nsIDNService::isLabelSafe implements the IDN display algorithm.
( http://lxr.mozilla.org/mozilla-central/source/netwerk/dns/nsIDNService.cpp#785 ).
Comment 1•9 years ago
|
||
NIing bsmedberg for an update on our plans for use of ICU, and whether we are going to be using it everywhere at any point soon.
I have no objections to the principle of aligning Mozilla's and Chrome's implementations. My key principle for IDNs has been "if it works somewhere, it works everywhere" - this is true of Firefox, and to have it true of both Firefox and Chrome would be great :-) I just want to avoid user-locale-based or extra-network-request-based algorithms.
I have no objection to both implementations switching to uspoof if that is technically feasible and aids interoperability.
I think we should try hard to stick to documented algorithms - so doing a UTS 39 mixed script check seems better to me than arbitrarily switching off Armenian + Latin. But I am open to arguments.
Gerv
Flags: needinfo?(benjamin)
Comment 3•9 years ago
|
||
waldo: you fixed bug 1075758. Do you know what the plan is with ICU? If you don't, who does?
Gerv
Flags: needinfo?(jwalden+bmo)
Comment 4•9 years ago
|
||
We use ICU everywhere except Firefox for Android. Fennec product owners at last note (see bug 1215247) didn't like ICU/Intl's 3MB APK download size hit. Bug 1215247 has ideas for eliminating 3MB, but none lie in my wheelhouse, nor do I have time to implement them.
So I've taken the lazy approach. Every other browser's latest release has Intl support, so eventually the web (and complaints about inconsistent/inadequate behavior) will force them to ship Intl support. See, for example, bug 1217790. That's easier to me than fighting a battle I'd likely lose.
The last reliable news as to when Android might ship with ICU/Intl (bug 1215247 comment 24) is that Intl/ICU's been added to Fennec's "funnel items". Margaret can clarify what that means.
Alternatively, I heard a quasi-authoritative prediction, unconnected to the bug comment, that they might be forced to ship it sometime in the next few months. This is unreliable information for future planning. :-)
Is that "at any point soon"? ¯\_(ツ)_/¯ Probably not. But feel free to pile on the Android people more reasons to ship ICU/Intl, and add this to bug 1215247's dependencies -- maybe they'll change their minds faster.
Flags: needinfo?(jwalden+bmo) → needinfo?(margaret.leibovic)
Comment 5•9 years ago
|
||
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #4)
> We use ICU everywhere except Firefox for Android. Fennec product owners at
> last note (see bug 1215247) didn't like ICU/Intl's 3MB APK download size
> hit. Bug 1215247 has ideas for eliminating 3MB, but none lie in my
> wheelhouse, nor do I have time to implement them.
>
> So I've taken the lazy approach. Every other browser's latest release has
> Intl support, so eventually the web (and complaints about
> inconsistent/inadequate behavior) will force them to ship Intl support.
> See, for example, bug 1217790. That's easier to me than fighting a battle
> I'd likely lose.
>
> The last reliable news as to when Android might ship with ICU/Intl (bug
> 1215247 comment 24) is that Intl/ICU's been added to Fennec's "funnel
> items". Margaret can clarify what that means.
We created a card for this in our Aha roadmap, and we discussed it at our team's weekly funnel meeting, where we discuss and priortize feature ideas. Here's a link to the card about the issue:
https://mozilla.aha.io/features/FENN-224
Let me know if you need access to view that card, but basically it says we're still blocked on the APK size increase.
> Alternatively, I heard a quasi-authoritative prediction, unconnected to the
> bug comment, that they might be forced to ship it sometime in the next few
> months. This is unreliable information for future planning. :-)
>
> Is that "at any point soon"? ¯\_(ツ)_/¯ Probably not. But feel free to
> pile on the Android people more reasons to ship ICU/Intl, and add this to
> bug 1215247's dependencies -- maybe they'll change their minds faster.
Yes, we need more reasons to ship it before we're willing to take the APK size increase, I'm happy for people to add to the list.
In the meantime, if developers who really want us to ship this feature want to make it happen sooner, it would be great if they could help us find ways to reduce our APK size. Although we've exhausted a lot of our options for quick wins, a bunch of small wins could add up. libxul.so is by far the largest component of our APK.
Flags: needinfo?(margaret.leibovic)
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•