286534 - Implement IDN punycode display by .tld

Reporter

Description

•

21 years ago

We need to replace the network.IDN_show_punycode pref with something more fine-grained, based on the TLD of the domain. This allows us to enable/disable correct display of IDNs in individual TLDs based on whether or not they have good homograph-protection policies in place. I suggest a pref branch, so: network.IDN.disable.com network.IDN.disable.net ... Then, we can assess a particular URL with a single pref lookup. (The only way this might one day run into trouble is if we get non-ASCII characters in TLDs themselves. How likely is that?) Gerv

Neil Harris

Comment 1

•

21 years ago

Actually, it's quite likely in the long run. But then, you cound probably do something like network.IDN.disable.xn--asdfghjk ... does this mean that pref names are ASCII-only?

Michiel van Leeuwen (email: mvl+moz@)

Comment 2

•

21 years ago

We have too many hostname based lookups. In theory, nsIPermissionManager could do this as well, but those setting aren't this easy to update from xpi's

Jungshik Shin

Assignee

Comment 3

•

21 years ago

*** Bug 290275 has been marked as a duplicate of this bug. ***

Jungshik Shin

Assignee

Comment 4

•

21 years ago

The following was taken from bug 290275 (assuming this bug is not only about blacklisting), All IDN urls are currently displayed as Punycode by default. Whitelist of safe TLDs like Opera is needed to resolve this IDN problem more smartly. http://www.opera.com/windows/changelogs/800b2/ I suggest new pref like this: pref("network.IDN_TLDs_whitelist", "no, jp, de, se, kr, tw, cn, at, dk, ch, li"); If IDN url matches whitelist, the url should be displayed as Unicode whether "network.IDN_show_punycode" pref is true or false.

Erik van der Poel

Comment 5

•

21 years ago

Opera 8.0 beta 3 allows the user to switch from a whitelist to a blacklist: http://www.opera.com/windows/changelogs/800b3/ How about using the same format in Mozilla? E.g.: user_pref("network.IDN_TLD", ":no:jp:de:se:kr:tw:cn:at:dk:ch:li:museum:hu:"); user_pref("network.IDN_TLD", "~:com:"); This allows users of both Mozilla and Opera to copy and paste their preferences from one product to the other.

Gervase Markham [:gerv]

Reporter

Comment 6

•

21 years ago

> This allows users of both Mozilla and Opera to copy and paste their preferences > from one product to the other. And who apart from you is ever going to do that? :-) This bug is currently being worked on by Neil and Lorenzo, with me providing input. The current design calls for a pref tree (given that this sort of thing is what they were invented for), allowing a zero-parsing, single-pref-lookup solution: network.idn.whitelist.com network.idn.whitelist.org ... This also allows users and extensions to easily set or unset a single TLD without worrying about the others. Gerv

Erik van der Poel

Comment 7

•

21 years ago

> And who apart from you is ever going to do that? :-) :-) > network.idn.whitelist.com Is this a boolean pref? (I assume network.idn.blacklist.* is not planned?)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 8

•

21 years ago

I think that the idea of comment 5 is better than the idea of comment 6. Because the idea is more simple and better for performance. And the idea of comment 5 doesn't need to parse the value of pref too.

Gervase Markham [:gerv]

Reporter

Comment 9

•

21 years ago

Erik: yes, they are boolean prefs. We could have a whitelist or a blacklist; having both doesn't make much sense. I argued for a blacklist, but I think it's more important that we do something which is consistent with the stance of the other browser manufacturers. Masayuki: both ideas require a single pref lookup; Mozilla has support for things called "pref trees" which allow you to get a handle on e.g. network.idn.whitelist and then look up relative prefs. The "all in one string" idea requires extra parsing. I agree the performance difference probably isn't big - the key requirements are ease of updating and maintenance. If it's all one string, how do you suggest the Foundation publish updated whitelists to people who have modified their whitelist? In the pref branch plan, it's easy - the new prefs.js is shipped, and everything automatically works. Gerv

Gervase Markham [:gerv]

Reporter

Comment 10

•

21 years ago

Nominating for 1.8b3. We need this for Firefox 1.1. Gerv

Blocks: 290872

Flags: blocking1.8b3?

Asa Dotzler [:asa]

Updated

•

21 years ago

Flags: blocking1.8b3?

Flags: blocking1.8b3+

Flags: blocking-aviary1.1+

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 11

•

21 years ago

Gerv: Who fix this bug? If nobody can create the patch, I try it.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

21 years ago

Depends on: 283016

Gervase Markham [:gerv]

Reporter

Comment 12

•

21 years ago

Masayuki: thank you for your kind offer. Two guys are supposed to be working on it; I'll give them a ping and make sure they assign this bug to themselves. Gerv

Lorenzo Colitti

Comment 13

•

21 years ago

For a possible implementation, see bug 279099 comment 230. I'm working on this, but I probably won't have time to do much until next week. If someone else thinks they can have a patch ready before then, feel free to step up and do it.

Status: NEW → ASSIGNED

chris hofmann

Comment 14

•

21 years ago

> ------- Additional Comment #13 From Lorenzo Colitti 2005-04-19 16:21 PDT [reply] ------- > >For a possible implementation, see bug 279099 comment 230. I'm working on this, >but I probably won't have time to do much until next week. If someone else >thinks they can have a patch ready before then, feel free to step up and do it. Lorenzo, did you get a chance to work on this... jshin, darin and I thought you would be a good person to dedicate some time to this bug so it doesn't fall thought the cracks. can you help out with coming up with a fix or talking a look at the proposals made by lorenzo or others? thanks

Assignee: darin → jshin1987

Status: ASSIGNED → NEW

Jungshik Shin

Assignee

Comment 15

•

21 years ago

Ok. I'll try to drive this. Lorenzo, have you had a chance to work on this since? I'll begin my work later today.

Jungshik Shin

Assignee

Comment 16

•

21 years ago

Attached patch 1st shot (obsolete) — Details — Splinter Review

I introduced 'network.IDN.safe.TLD' (the name is tentative). For TLDs with 'network.IDN.safe.TLD set to true, we show IDNs in UTF-8 rather than in punycode. If pref. is not defined for a TLD or set to false, we use punycode. These preference entries are only referred to when network.IDN_show_punycode' is 'false'. If it's set to true, we always use punycode for all TLDs regardless of the values of 'network.IDN.safe.TLD'. I'm not sure if I need caching and/or adding observer(s). darin, what would you say?

Jungshik Shin

Assignee

Updated

•

21 years ago

Status: NEW → ASSIGNED

Jungshik Shin

Assignee

Comment 17

•

21 years ago

If we decide to support both whitelist and blacklist (not at the same time), 'network.IDN.show_punycode' (with an appropriate name change) can be used to indicate which to use, blacklist or whitelist. We can do as follows: if ((whitelist && not explicitly safe) || (blacklist && explicitly set unsafe)) use punycode else use UTF-8 Needless to say, we can only support blacklist if it's decided that it's better.

Jungshik Shin

Assignee

Comment 18

•

21 years ago

any comment, good or bad?

Erik van der Poel

Comment 19

•

21 years ago

Jungshik, it's great that you're doing this work. I won't attempt to answer all of your questions, but Gerv said that he would like to adopt the same stance as other browser vendors, and Opera has chosen the whitelist approach (by default), so Mozilla should too. I don't think we can allow the user to switch to a blacklist since Gerv would like it to be possible to update the whitelist from the mozilla.org side. E.g. additional pref entries for whitelisted domains. Gerv, is Mozilla going to adopt Opera's whitelist?

Gervase Markham [:gerv]

Reporter

Comment 20

•

21 years ago

We're planning to adopt Opera's whitelist unless either someone comes to me with a good reason why we should remove one of their entries, or I have time to do some in-depth research myself. So yes, we do not want to allow switching between whitelist and blacklist. This is a security feature, and should have no UI. I would prefer the pref stub to be "network.IDN.whitelist.". Other than that, this patch looks pretty good in terms of function. Gerv

Lorenzo Colitti

Comment 21

•

21 years ago

Jungshik: sorry for not replying earlier, but I wasn't CC'd on this bug. The patch looks like what I wanted to do, except it's better written. :) I have a couple of questions though: 1. You use PromiseFlatCString(host).get() to convert the hostname to ASCII. Is this safe? And if any Unicode TLDs are ever assigned, it will need to be changed. 2. Why not add another pref observer and put the code that loads the pref into PrefsChanged()? If I understand the code correctly (but maybe I don't) that would be simple and would give you the benefit of being able to react to pref changes.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 22

•

21 years ago

(In reply to comment #21) > 1. You use PromiseFlatCString(host).get() to convert the hostname to ASCII. Is > this safe? And if any Unicode TLDs are ever assigned, it will need to be changed. That function does not convert to ASCII... it essentially just allows you to call .get().

Darin Fisher

Comment 23

•

21 years ago

I think we should consider using nsIPermissionManager for hosts the user wants to explicitly allow or explicitly block. The preferences systems is a bad choice for storing this information since it is affected by changes to the default prefs. IMO, we should use a non-pref backend for storing the information we download. Perhaps we could store a file in the profile, and fetch it periodically (using a conditional HTTP query). We could look at extending the permission manager to aggregate multiple data files, or we could layer something on top of the permission manager.

Gervase Markham [:gerv]

Reporter

Comment 24

•

21 years ago

> The preferences systems is a bad choice for storing this information since it is > affected by changes to the default prefs. IMO, that's what makes it a good choice. If the Turks and Caicos Islands come out with appropriate policies on IDN, the next update we ship needs to turn on IDN for that TLD for everyone who has not explicitly disabled it in the past. Changing the default prefs achieves exactly that, doesn't it? Gerv

Jungshik Shin

Assignee

Comment 25

•

21 years ago

(In reply to comment #23) > I think we should consider using nsIPermissionManager for hosts the user wants > to explicitly allow or explicitly block. 'Per host' en/disabling is beyond the scope of this bug even if it's desirable.

Gervase Markham [:gerv]

Reporter

Comment 26

•

21 years ago

Jungshik: any chance of an updated patch with the tweak I requested? I'll then review it and we can chase SR. Gerv

Jungshik Shin

Assignee

Comment 27

•

21 years ago

Attached patch patch update (with the pref. name change) (obsolete) — Details — Splinter Review

I changed the name of the pref. branch to network.IDN.whitelist as suggested by gerv.

Attachment #184849 - Flags: review?(gerv)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 28

•

21 years ago

jshin: Your patch is not valid file. Please check it.

Jungshik Shin

Assignee

Comment 29

•

21 years ago

Attached patch patch update — Details — Splinter Review

ooops. thanks for pointing that out. this is 'the' real thing.

Attachment #182326 - Attachment is obsolete: true

Attachment #184849 - Attachment is obsolete: true

Attachment #184858 - Flags: review?(gerv)

Jungshik Shin

Assignee

Updated

•

21 years ago

Attachment #184849 - Flags: review?(gerv)

Jo Hermans

Comment 30

•

21 years ago

Why is only de, jp and kr included in the patch ? What about no, se, tw, cn, at, dk, ch, and li (rest of the Opera domains) ?

Gervase Markham [:gerv]

Reporter

Comment 31

•

20 years ago

Comment on attachment 184858 [details] [diff] [review] patch update r=gerv. I note that this patch doesn't include all the domains. That's fine - we can do a separate patch adding (or removing) domains later. This is certainly not a reason to hold this up. Gerv

Attachment #184858 - Flags: superreview?(darin)

Attachment #184858 - Flags: review?(gerv)

Attachment #184858 - Flags: review+

Darin Fisher

Comment 32

•

20 years ago

Comment on attachment 184858 [details] [diff] [review] patch update >Index: netwerk/base/src/nsStandardURL.cpp >+ >+ nsCOMPtr<nsIPrefService> prefs( do_GetService(NS_PREFSERVICE_CONTRACTID) ); >+ if (prefs) { >+ nsCOMPtr<nsIPrefBranch> branch; >+ if (NS_SUCCEEDED(prefs->GetBranch( NS_NET_PREF_IDNWHITELIST, >+ getter_AddRefs(branch) ))) >+ NS_ADDREF(gIdnWhitelistPrefBranch = branch); > } hmm... can't you just QI prefBranch to nsIPrefService instead of calling GetService again? > NS_IF_RELEASE(gIDN); > NS_IF_RELEASE(gCharsetMgr); >+ NS_IF_RELEASE(gIdnWhitelistPrefBranch); I think it'd be good to stick with "IDN" instead of using "Idn" here. >+ if (gShowPunycode || !IsInWhitelist(PromiseFlatCString(host).get())) { Can you make it so that IsInWhitelist does not need a null terminated host string? How about using the string API all the way down into IsInWhitelist? > static nsIIDNService *gIDN; >+ static nsIPrefBranch *gIdnWhitelistPrefBranch; > static nsICharsetConverterManager *gCharsetMgr; > static PRBool gInitialized; > static PRBool gEscapeUTF8; > static PRBool gAlwaysEncodeInUTF8; > static PRBool gShowPunycode; nit: please list these variables in the same order in the cpp file as they are listed here. general comment: so, we anticipate using the update service to add or remove TLDs periodically? and just bail on trying to deal with the problems associated with user-set preferences matching a changed default pref value, thus resulting in the user-set value being cleared?

Attachment #184858 - Flags: superreview?(darin) → superreview+

Gervase Markham [:gerv]

Reporter

Comment 33

•

20 years ago

(I appear to remember having this conversation before, but I may need enlightening again.) Regarding Darin's general comment: have I misunderstood how default prefs work? Surely user-set prefs override default prefs? That's why they are called "default"? Gerv

Jungshik Shin

Assignee

Comment 34

•

20 years ago

Comment on attachment 184858 [details] [diff] [review] patch update asking for a. I made sure this works well. We for sure want this feature in 1.1 Darin's comments were addressed in my tree. As for IsInWhitelist, I tried a few alternatives and ended up doing this. At one point, I need a null-terminated string because GetBoolPref needs 'char *'. Because |host| is usually less than 64 bytes, this seems to be a good way to avoid the memory allocation. if (gIDNWhitelistPrefBranch && (pos = nsCAutoString(host).RFind(".")) != kNotFound && NS_SUCCEEDED(gIDNWhitelistPrefBranch-> GetBoolPref(nsCAutoString(Substring(host, pos + 1)).get(), &safe))) return safe;

Attachment #184858 - Flags: approval-aviary1.1a2?

Asa Dotzler [:asa]

Updated

•

20 years ago

Attachment #184858 - Flags: approval-aviary1.1a2? → approval-aviary1.1a2+

Jungshik Shin

Assignee

Comment 35

•

20 years ago

fix checked in. I added no, se, tw, cn, at,dk, ch, and li to the pref. file.

Status: ASSIGNED → RESOLVED

Closed: 20 years ago

Resolution: --- → FIXED

Asaf Romano (gone)

Comment 36

•

20 years ago

Jungshik, any reason we're still blocking hebrew and arabic tld(s)?

Jungshik Shin

Assignee

Comment 37

•

20 years ago

It's not about scripts/languages but about TLDs. As Gerv wrote in comment #31, we have to deal with that in a separate 'bug'. Anyway, basically what we did is the same as what Opera did: ( http://www.opera.com/windows/changelogs/800b2/) # TLDs are considered safe if they have implemented anti-homographic character policies or otherwise limited the available set of characters to prevent spoofing. # Current whitelist contains the following top-level domains: no, jp, de, se, kr, tw, cn, at, dk, ch, and li

bugzilla

Comment 38

•

20 years ago

> what we did is the same as what Opera did: > (http://www.opera.com/windows/changelogs/800b2/) > > Current whitelist contains the following top-level domains: > no, jp, de, se, kr, tw, cn, at, dk, ch, and li The cited document is incorrect. Starting several months ago, the whitelist in the Opera distribution package has been: IDNA White List=:no:jp:de:se:kr:tw:cn:at:dk:ch:li:museum:hu:

bugzilla

Comment 39

•

20 years ago

> > what we did is the same as what Opera did: > > (http://www.opera.com/windows/changelogs/800b2/) From the changelog in http://www.opera.com/windows/changelogs/800b3/ : Improvement to IDNA handling. * Script detection now allows more IDNs to be displayed in Unicode. Only a single script and specific combinations are allowed in each label (separated by "." or "-"). * Added support for switching from IDN whitelist to blacklist by adding a '~' as the first character in the string, that is, "~:com:tw:" * Added .hu and .museum to list of trusted domains.

Gervase Markham [:gerv]

Reporter

Comment 40

•

20 years ago

Please don't make any changes to the list until I've had a chance to review the current one. If you think a TLD has the appropriate policies in place and should be added, please send me an email with URLs proving that this is the case. Gerv

Gervase Markham [:gerv]

Reporter

Comment 41

•

20 years ago

OK. I've reviewed the existing list, and also .museum and .hu. Here's the result: http://www.mozilla.org/projects/security/tld-idn-policy-list.html I've filed bug 299927 on getting those last two added to the list. Gerv

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

20 years ago

Blocks: 315411

1st shot 21 years ago Jungshik Shin 7.40 KB, patch		Details \| Diff \| Splinter Review
patch update (with the pref. name change) 21 years ago Jungshik Shin 214 bytes, patch		Details \| Diff \| Splinter Review
patch update 21 years ago Jungshik Shin 7.47 KB, patch	gerv : review+ darin.moz : superreview+ asa : approval-aviary1.1a2+	Details \| Diff \| Splinter Review