Closed Bug 479520 (IDNA2008) Opened 15 years ago Closed 9 years ago

Implement IDNA2008 and Unicode UTS #46

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: dveditz, Assigned: smontagu, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: arch, sec-want, Whiteboard: [sg:want?][adv-main44-])

Attachments

(5 files, 1 obsolete file)

Our current support for Internationalized Domain Names (IDN) follows a set of rfcs collectively known as "IDNA2003" based on an earlier version of Unicode. An updated version known as IDNA2008 is working its way toward standards track, in particular taking into account the many new scripts and characters added to the Unicode standard in the past five years.

We should investigate implementing the new standards.

http://tools.ietf.org/wg/idnabis/
http://www.ietf.org/html.charters/idnabis-charter.html

See bug 479336 for examples of problem characters not covered by the earlier standard.
Whiteboard: [sg:want?]
Blocks: 316730
The IDNA2008 RFCs have now been released:

Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework -- http://tools.ietf.org/html/rfc5890
Internationalized Domain Names in Applications (IDNA) Protocol -- http://tools.ietf.org/html/rfc5891
The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) -- http://tools.ietf.org/html/rfc5892
Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA) --    http://tools.ietf.org/html/rfc5893

There is also an informative document:
Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale --    http://tools.ietf.org/html/rfc5894
Alias: IDNA2008
Assignee: nobody → smontagu
The .de NIC (denic.de) will implement IDNA2008 from 2010-11-16 onwards, especially allowing for ß (\u00df) in domain names. Hence, the automatic translation of ß to ss may result in looking up the wrong domain name, allowing for spoofing attacks.
(DENIC will run a sunrise period (2010-10-26 to 2010-11-15) during which holders of domains with ss will be allowed top register the respective ß domain in advance.)

http://www.denic.de/en/domains/internationalized-domain-names/sharp-s.html
That's a good example of how IDNA2003 / IDNA2008 incompatibilities could let through spoofed domains. 

I'm not sure how this could be resolved. 

Fixing this at the browser, for example by marking TLDs with a flag to determine which version of IDNA they are using, seems fraught with potential problems, particularly during any phase where both types of labels are supported, and won't solve the entire problem. 

Should we instead be requiring TLD registries that are transitioning to IDNA2008 to enforce antispoofing measures that prohibit them indefinitely from issuing any domains that would allow spoofing risks to anyone but the owners of one of the spoofable domains, in either direction, as a condition of turning on IDN display in Firefox?

Perhaps we should be doing both?
This doesn't require any "fixing" at all, other than upgrading Firefox to IDNA2008 as soon as we can.

This issue is a known incompatibility between IDNA2003 and IDNA2008 (one of a small handful). In order to avoid spoofing, registries need to make sure that the "ß" and "ss" versions of a domain are registered to the same person. And that's what DENIC is trying to do, to an extent, by having a sunrise period.

If I were them, I would have implemented perpetual bundling of the two variants. They have chosen not to do that, which does leave registrants who are unaware of this issue at risk, if they register one variant and not the other. However, the fault for this lies squarely with DENIC, not with us.

I think Neil is right that we should enforce bundling/blocking of ß and ss as a condition of turning on IDN support. However, it's a separate question as to whether we should _disable_ IDN support for registries which are already enabled, and then implement an upgrade plan which does not involve perpetual bundling/blocking (as DENIC have done). Germans expect IDN to work in Firefox now. I'm not sure this is a big enough issue to justify turning it off.

Gerv
(In reply to comment #4)
> This doesn't require any "fixing" at all, other than upgrading Firefox to
> IDNA2008 as soon as we can.

Yes.

> If I were them, I would have implemented perpetual bundling of the two
> variants.

That would not be feasible, because ß and ss are not exchangable in German. ss instead of ß is just a makeshift. Germans expect ß to usually just work if umlauts work (which already do for a while).
Karsten: yes, thank you, another contributor has helpfully clued me in on the details of German orthography. Let's leave DENIC to get on with it, and focus on implementing IDNA2008.

Gerv
smontagu: is this on your radar? Do you have any estimate as to when it might happen?

Thanks :-)

Gerv
Any news on this issue?
Is there a legal issue if this doesn't get implemented in a set timeframe? Launching meßdienst.de opens messdienst.de, the page of competitor.
This is on my worklist for this quarter.
Status: NEW → ASSIGNED
Just to let you know, the state of the implementation of IDNA 2008 in browsers is currently being discussed on the Unicode mailing list.
(In reply to Neil Harris from comment #17)
> Just to let you know, the state of the implementation of IDNA 2008 in
> browsers is currently being discussed on the Unicode mailing list.

Where/How can we follow the discussion?
To subscribe to the mailing lists:

http://unicode.org/consortium/distlist.html

For the mailing list archive:

http://www.unicode.org/mail-arch/
For the record, following documents and resources shall/may be considered for the implementation:

* Unicode Technical Standard #46, Unicode IDNA Compatibility Processing
http://unicode.org/reports/tr46/

* IDNA2010 special interest group, IUCG
http://iucg.org/wiki/IDNA2010
(Adding chofmann@mozilla.com per his request although he possibly already is as chofmann@gmail.com?!) [mozilla.de meeting Cologne 2011]
Not sure if folks are aware, but JPRS has released idnkit2 with IDNA2008 support, which could potentially be used to implement the base protocol support: http://jprs.co.jp/idn/index-e.html
(In reply to Simon Montagu from comment #13)
> This is on my worklist for this quarter.

Simon, any updates on when you will be able to get to this?

Is it pretty well understood exactly what to do or are there more details to work out?
We have a problem with the licence for idnkit2 :-| Legal are working on it.

Gerv
opera 11.6 implemented idna, so why not firefox?
IDN is implemented in Firefox. This bug is about implementing IDNA2008, whcih is an updated version of the standard with various changes. 

Gerv
yes, thats the title and the theme of this discussion and opera 11.6 has implemented idna 2008!
Blocks: 728180
(In reply to Gervase Markham [:gerv] from comment #24)
> We have a problem with the licence for idnkit2 :-| Legal are working on it.

Any updates in the last 9 month?
I am pleased to say that JPRS have taken our feedback into account and have proposed a new license for IDNKit2 which addresses our concerns. They say:

"We are accepting your questions and now developing idnkit-2.2 which 
includes new LICENCE with version number (1.1) and new features corresponding 
to RFC 6452. We think we can release idnkit-2.2 by the middle of October."

I suggest that this bug can now proceed at full speed. We should develop a patch using the existing version of IDNKit, and then we can adapt it to the new one when it is released, and check in the result. (I have asked if they are planning any significant incompatibilities.)

Gerv
I received the following from Yoshiro YONEYA. This tells us where to get the newly-licensed code. There is now no obstacle to implementing this support.


Dear all,

I'm very pleased to announce that JPRS has released idnkit-2.2 which is
an implementation of IDNA2008, and it provides features IDN encoding 
conversion tool and APIs for application software.

The idnkit-2.2 and its additional packages are available from following URL:
<http://jprs.co.jp/idn/index-e.html>

Major changes since idnkit-2.1 release are:

- Licence update
  + Added license version (version 1.1)
  + Modified article 6 and article 7 of the license to be more advantageous 
    to the end users
  + Updated year of the copyright notice
- Correspond to RFC publication
  + IDNA Table version was updated according to RFC 6452 publication
  + Reference revision of UTS#46 was updated (3->5) according to Unicode 
    6.0.0 correspondence

I hope that the idnkit-2.2 is useful for IDN zone administrators, IDN site 
administrators, IDN-aware application developers, and I18N related protocol 
designers.

Please feel free to give your comments about the idnkit-2.2 to following 
e-mail address:
<mailto:idnkit-info@jprs.co.jp>

Regards,

-- Yoshiro YONEYA <yoshiro.yoneya@jprs.co.jp>
FWIW, I recommend reading: http://www.alvestrand.no/pipermail/idna-update/2012-November/date.html "Updating RFC 5890-5893 (IDNA 2008) to Full Standard" 

IDNA2008 is not backwards compatible. And Opera's implementation is at best a superset. I would prefer if there was some cross-browser agreement before moving forward here (if moving forward is at all a good idea).
(In reply to Anne van Kesteren from comment #31)
> IDNA2008 is not backwards compatible. And Opera's implementation is at best
> a superset. I would prefer if there was some cross-browser agreement before
> moving forward here (if moving forward is at all a good idea).

I'm not sure about where "forward" is, but leaving Mozilla in a state where you can't fully access the second largest tld's domain scope (due to the ß encoding change in IDNA 2008) won't be a good idea at all.
Anne: the registries, such as the German and Greek registries, who asked for the changes to the handling of Eszett and final-form Sigman in IDNA 2008 clearly did so because they thought that the problems which would arise from people implementing a backwardly-incompatible standard were less bad than the problems which would continue from not being able to use those characters in the way that people wanted. If they didn't think that, they would have kept the old behaviour. Therefore, my default view is that we should take them at their word and simply ship IDNA 2008.

Having said that, I believe that at least one group (the Unicode people? Or some people who have published an RFC?) have attempted to define a "more compatible" way of doing the implementation. I'm sure Simon, or whoever implements this bug, will be looking at the options there.

Gerv
Hey there,

I'm just a user and one of those guys, that reported this "bug". I'm deeply impressed on how thoroughly things are debated and weighted here. On the other hand the process is very slow and somehow annoying :)

I reported it, because in Germany we have a "ß", which could/has been replaced other letters for a long time (e.g. "ss" or "sz"), but which are not the same as the "ß". So imho the browser does some kind of translation/thinking which should be a problem of the user. Why does it not just go to whatever character I entered? I hope you implement IDNA 2008 soon.

And I thank you and all the other devs for the work you put into Mozilla products :) I'm a big fan :)

Markus
Gerv, 1) this affects way more than the mapping of two code points 2) even then there's phishing potential as DENIC has not implemented variant blocking http://www.unicode.org/mail-arch/unicode-ml/y2011-m07/0036.html 3) If one browser implements UTS #46, another some kind of made-up superset of IDNA2008, and the rest remains with IDNA2003 with updated Unicode, I'm not sure we're really helping move the web forward which is why I asked to get some cross-browser plan on this first.
I don't understand the hesitation. Since two years ago Denic allow to register domains with "ß". Thus for Germany the decision of using IDNA2008 has already be taken long ago. Just as an example "http://heiko.daßow.de/" try to open "http://heiko.dassow.de/", which is registered to a totally different person. The correct at Denic registered page to open is "http://heiko.xn--daow-wna.de/". When we are talking about security issues, open the wrong domain seems to be a severe issue.
Blocks: IDN
Blocks: 853226
Note: .be (Belgian TLD) will also start to use IDNA2008, see <http://www.dns.be/en/idn>. But for compatibility reasons with IDNA2003 browsers, they ask every website that wants to use a ß, to register the ss variant too. It's not enforced though.
Blocks: 868201
No longer blocks: 868201
Blocks: 883796
FWIW, the URL standard mandates IDNA2003 instead of IDNA2008 for compatibility reasons:

> Using the latest version of Unicode as well as IDNA2003 rather than IDNA2008 are willful violations, to be compatible with widely deployed clients.

See <http://url.spec.whatwg.org/#idna>.
Reported 2009!
As owner of a not firefox-surfable-domain I am shocked!
(In reply to Mathias Bynens from comment #40)
> FWIW, the URL standard mandates IDNA2003 instead of IDNA2008 for
> compatibility reasons:
> 
> > Using the latest version of Unicode as well as IDNA2003 rather than IDNA2008 are willful violations, to be compatible with widely deployed clients.
> 
> See <http://url.spec.whatwg.org/#idna>.

That's a bit of a circular dependency, since the WHATWG standard only kludges things that way to keep in sync with "widely deployed clients" -- if the "widely deployed clients", of which Firefox is one of the big three, move to IDNA2008, WHATWG will surely eventually follow.
OK. Although I don't know when smontagu or someone else will have time to implement this, it seems wise to spec out exactly what we are going to implement. Here is a proposal.

We should implement IDNA2008. IDNA2008 permits a "local mapping" phase. We should use the local mapping defined in TR46 - http://unicode.org/reports/tr46/ - which maintains maximum compatibility with IDNA2003. TR46 has a number of options. I believe we should choose as follows:

- Do NOT take the option to allow transitional processing of symbols and punctuation.
- Use Nontransitional Processing for the 4 Deviations - i.e. straight IDNA2008 behaviour. 
  (The rationale for this is in comment 4.)
- UseSTD3ASCIIRules=true (I believe this is what we do now)

I believe this means we will be conforming to conformance clause C2.

I suspect we will want to implement an algorithm like in section 6 of TR46 rather than import a whole mapping table, but that's an implementation detail. If we do import the table, we will need to take a new copy of the mapping table for each new version of Unicode that is released. 

Comments welcome.

Gerv
Since we now have all of ICU in our own source tree, it probably makes more sense to use their implementation of IDNA 2008/UTS 46 rather than import a second one, assuming that they support the options specified in comment 43, which I think they do from a quick look at http://www.icu-project.org/apiref/icu4c/classicu_1_1IDNA.html and http://www.icu-project.org/apiref/icu4c/uidna_8h.html
Depends on: 724538
We don't use UseSTD3ASCIIRules (e.g. we support subdomains with an underscore, they exist). It also seems very bad for us to break many existing URLs. Apart from that our users would go elsewhere, it's very much opposite to the spirit of the web: http://www.w3.org/Provider/Style/URI.html
(In reply to Anne (:annevk) from comment #45)
> We don't use UseSTD3ASCIIRules (e.g. we support subdomains with an
> underscore, they exist). 

OK, I didn't realise; scratch that part, then. We'll need to use the version of the table with UseSTD3ASCIIRules=false, and implement our own extra ASCII checking, as TR46 notes. Is _ the only additional character we want to permit?

Gerv
(In reply to Gervase Markham [:gerv] from comment #46)
> OK, I didn't realise; scratch that part, then. We'll need to use the version
> of the table with UseSTD3ASCIIRules=false, and implement our own extra ASCII
> checking, as TR46 notes. Is _ the only additional character we want to
> permit?

Why do we invent yet another our own IDN parsing algorithm?
https://xkcd.com/927/
Yes, thank you.

Clearly, the aim would be to be backwardly-compatible with our existing behaviour.

Gerv
(In reply to Gervase Markham [:gerv] from comment #48)
> Yes, thank you.
> 
> Clearly, the aim would be to be backwardly-compatible with our existing
> behaviour.
> 
> Gerv

(In reply to Gervase Markham [:gerv] from comment #46)
> (In reply to Anne (:annevk) from comment #45)
> > We don't use UseSTD3ASCIIRules (e.g. we support subdomains with an
> > underscore, they exist). 
> 
> OK, I didn't realise; scratch that part, then. We'll need to use the version
> of the table with UseSTD3ASCIIRules=false, and implement our own extra ASCII
> checking, as TR46 notes. Is _ the only additional character we want to
> permit?
> 
> Gerv

I think the discussion in bug 317946 is probably relevant here. I seem to recall that special behaviour for the + and $ characters may have been implemented. 

I distantly recall another bug where a character blacklist was replaced by a whitelist, and this is probably where the relevant code might still be found.

Neil
Yes: Bug 355181 - "net_IsValidHostName() comment says one thing, code does another", was the bug in question, and it suggests that the extra non-LDH characters supported at that time were $, _, and +.

Neil
smontagu: do you need anything else to press ahead with a patch for an ICU-based solution?

Gerv
FYI: 

Shawn Steel (of Microsoft) posted IE's behavior at http://blogs.msdn.com/b/shawnste/archive/2013/09/09/how-does-ie-handle-the-idn2008-rfcs.aspx

Chrome's initial implementation will be similar to IE's except for Bidi check. The current ICU implementation of UTS 46 / IDNA 2008 does not allow a separate treatment of 4 deviation characters (either all of them are treated in IDNA 2003 way or in 2008 way). ICU can be revised to treat them separately (it's rather simple to revise it). 

However, initially we'll treat them all in the UTS 46 transitional mechanism. Later, we may allow ZWJ/ZWNJ with ContextJ rules. We're not yet sure what we eventually do about Greek final-sigma.  For German sharp-s, it'll be mapped at least for now. 

@Gerv,
As for symbols and punctuations, are you ok with breaking some host names that use symbols like 'heart'?
(In reply to Gervase Markham [:gerv] from comment #51)
> smontagu: do you need anything else to press ahead with a patch for an
> ICU-based solution?

Free time to devote to it :)
(In reply to Jungshik Shin from comment #52)
> Shawn Steel (of Microsoft) posted IE's behavior at
> http://blogs.msdn.com/b/shawnste/archive/2013/09/09/how-does-ie-handle-the-
> idn2008-rfcs.aspx

Thanks, Jungshik, that's helpful. I'd summarise that blog post with "mostly sticking with IDNA2003 behaviours". Is that fair?

> Chrome's initial implementation will be similar to IE's except for Bid
> check. The current ICU implementation of UTS 46 / IDNA 2008 does not allow a
> separate treatment of 4 deviation characters (either all of them are treated
> in IDNA 2003 way or in 2008 way). ICU can be revised to treat them
> separately (it's rather simple to revise it). 

Do you plan to write such a patch?

> However, initially we'll treat them all in the UTS 46 transitional
> mechanism. Later, we may allow ZWJ/ZWNJ with ContextJ rules. We're not yet
> sure what we eventually do about Greek final-sigma.  For German sharp-s,
> it'll be mapped at least for now. 
> 
> @Gerv,
> As for symbols and punctuations, are you ok with breaking some host names
> that use symbols like 'heart'?

Absolutely OK, yes. Such hostnames have not been registerable at the top level in any TLD that Firefox displays as Unicode for many years, and since we changed the display rules recently, AIUI are no longer displayable as Unicode at any level in any TLD, because of the script-mixing restrictions.

Gerv
Here's a thought: it looks like the .de and .gr registries have implmentated carefully thoughout out policies regarding their respective "deviation characters". Perhaps, in an echo of the previous IDN whitelist strategy, there should be a very small whitelist of TLDs for which "deviation characters" policies have been defined, and for which full IDNA2008 semantics are thus safe to use, but retain the IDNA2008-except-for-the-four-deviation-characters behaviour for all other TLDs?
(In reply to Neil Harris from comment #55)
> Here's a thought: it looks like the .de and .gr registries have implmentated
> carefully thoughout out policies regarding their respective "deviation
> characters". Perhaps, in an echo of the previous IDN whitelist strategy,
> there should be a very small whitelist of TLDs for which "deviation
> characters" policies have been defined, and for which full IDNA2008
> semantics are thus safe to use, but retain the
> IDNA2008-except-for-the-four-deviation-characters behaviour for all other
> TLDs?

s/implementated/implemented/
Shawn from IE rejected per-TLD rules in a mailing list message. I think he's right. I think we should just trust that the process consulted all the right stakeholders, use the IDNA2008 behaviour and blame registries if they allow their customers to attack one another.

Gerv
(In reply to Gervase Markham [:gerv] from comment #54)
> (In reply to Jungshik Shin from comment #52)
> > Shawn Steel (of Microsoft) posted IE's behavior at
> > http://blogs.msdn.com/b/shawnste/archive/2013/09/09/how-does-ie-handle-the-
> > idn2008-rfcs.aspx
> 
> Thanks, Jungshik, that's helpful. I'd summarise that blog post with "mostly
> sticking with IDNA2003 behaviours". Is that fair?

Well, it tries to be compatible with the previous behavior. Chrome's approach is similar except for BiDi check. 

 
> > Chrome's initial implementation will be similar to IE's except for Bid
> > check. The current ICU implementation of UTS 46 / IDNA 2008 does not allow a
> > separate treatment of 4 deviation characters (either all of them are treated
> > in IDNA 2003 way or in 2008 way). ICU can be revised to treat them
> > separately (it's rather simple to revise it). 
> 
> Do you plan to write such a patch?

If necessary, either I can write or I can ask the ICU UTS46 API author to write it. Are you interested in this feature? If you're, I can start by filing a bug against ICU. 

When I do that, I'm also considering talking to the safe-browsing team (that is also used by Firefox IIUC) as to what they can do to mitigate potential risk. 
 

> > @Gerv,
> > As for symbols and punctuations, are you ok with breaking some host names
> > that use symbols like 'heart'?
> 
> Absolutely OK, yes. Such hostnames have not been registerable at the top
> level in any TLD that Firefox displays as Unicode for many years, and since
> we changed the display rules recently, AIUI are no longer displayable as
> Unicode at any level in any TLD, because of the script-mixing restrictions.

It's not displayable as Unicode in Chrome, either. However, not displaying Unicode is one thing and treating them as invalid in the look-up is another. Do you mean that look-up will still work in Firefox?
> it looks like the .de and .gr registries have implmentated carefully thoughout out policies 
> regarding their respective "deviation characters". 

I'm not sure if deNIC has been that careful. They had an extremely short sunrise period and does not bundle two domains that differ only in 'sharp-s vs ss'. Do they have a reasonable blocking policy, instead, for domains that are different from existing domains only in 'sharp-s vs ss'? 

Not having found it, Chrome wants to continue to map sharp-s to ss as IE does.  Until such a policy is in force, it'd be great if Firefox does the same so that all of us are in sync.
They have no blocking policy nor will they ever have one. Sharp-s is not the same as ss in German. Deal with it and use IDNA2008. You didn't hesitate to implement other new standards, or all browsers would still be in sync with IE6.
(In reply to Jungshik Shin from comment #59)
> I'm not sure if deNIC has been that careful. They had an extremely short
> sunrise period and does not bundle two domains that differ only in 'sharp-s
> vs ss'. Do they have a reasonable blocking policy, instead, for domains that
> are different from existing domains only in 'sharp-s vs ss'? 

So then the question is: is it our responsibility to 'save' them from the consequences of actions that they took? Whatever they did, I think we should at least give them the credit of doing it knowing what the consequences were. 

If we asked DENIC "shall we implement IDNA2008 behaviour for sharp-S?", what would they say? I assume Yes; if they said No, that would indicate something very wrong with the IDNA2008 process, as they are the main user of this character (perhaps together with the Austrian NIC). Shame for the Greeks/Cypriots and final-sigma.

Gerv
Quoting some "technical meeting" minutes from 2010:
> As regards long-term establishment of "eszett" in the browsers, some further
> action is required. Today, even frequently used libraries do not yet
> support the standard. DENIC is considering to financially support the
> necessary implementation. Some browser vendors apparently plan to
> update to the new IDNA standard already in 2011.

DENIC did let owners of existing domains with ss that were meant to mean sharp-s preregister those prior to the sharp-ss landrush. New users are required to decide themselves whether they want to register both variants or not.
@ Gervase Markham
DENIC clearly states that they are using IDNA2008 and have information on ß http://www.denic.de/en/domains/internationalized-domain-names/sharp-s.html 

Their own converter is IDNA2008. Due to missing backwards compatibility of IDNA2008 standard currently they list Opera as the only browser supporting IDNA. Mozilla/Chrome/IE will all just lead you to wrong pages.
Oops. s/Shame/Same/ in comment 61!

I think the above comments show that DENIC want IDNA2008 behaviour, and so we should give it to them.

Gerv
(In reply to ChristianO from comment #63)
> Their own converter is IDNA2008. Due to missing backwards compatibility of
> IDNA2008 standard currently they list Opera as the only browser supporting
> IDNA. Mozilla/Chrome/IE will all just lead you to wrong pages.

I guess Opera dropped the support since version 15.
Note that Swiss German does not use sharp-s at all. They always use 'ss' where sharp-s is used.  My native German source is very clear that sharp-s and ss are identical for all practical purposes. He and others talked to deNIC and it's mainly their "business" reason that they insist on NOT bundling two characters.
 
As for the sunrise period, it's extremely short (something like a week).
(In reply to Jungshik Shin from comment #67)
> Note that Swiss German does not use sharp-s at all. They always use 'ss'
> where sharp-s is used.

So no problem using IDNA2008.

> My native German source is very clear that sharp-s
> and ss are identical for all practical purposes.

Maße - Masse
Buße - Busse
are different words with different meanings.

> He and others talked to
> deNIC and it's mainly their "business" reason that they insist on NOT
> bundling two characters.

There are domains differing only in ß vs. ss, for whatever reason. I would like to see them all.

> As for the sunrise period, it's extremely short (something like a week).

3 weeks from October 26, 2010
http://www.denic.de/denic-im-dialog/pressemitteilungen/pressemitteilungen/2981.html
WRT spoofing attacks:
UTS #46 lists sparkasse-giessen.de/sparkasse-gießen.de as potentially problematic[0]. A short research shows that the following domains have the same problem:

* sparkasse-werra-meissner.de/sparkasse-werra-meißner.de
* sparkasse-meissen.de/sparkasse-meißen.de
* kskgrossgerau.de/kskgroßgerau.de

However each of the above domain owners own both the _ss_ and _ß_ variant. Therefore both old (pre-IDNA 2008) and new browsers (IDNA 2008) will open the same address. I guess that the sunrise period worked/was enough.

I obtained the above list by going through the list of cities in Germany, filter on those cities that contain an "ß" in it and then perform a Google search on "sparkasse <city name>". I'm not sure if i missed anything, but so far .de looks fine.

Apart from domain ownership, isn't this green certificate information thing commonly found in modern browsers supposed to defeat this kind of spoofing? Otherwise deutsche-baпk.de may look the same as deutsche-bank.de, but isn't.

[0]: http://www.unicode.org/reports/tr46/#Deviations
Chromium latest snapshot include IDNA 2008 support, here is the bug review link: https://codereview.chromium.org/23642003
Chromium didn't go all the way to IDNA 2008 (yet). Instead, it adopted UTS 46 with Bidi check.
(In reply to Sebastian Hoß from comment #69)
> However each of the above domain owners own both the _ss_ and _ß_ variant.
> Therefore both old (pre-IDNA 2008) and new browsers (IDNA 2008) will open
> the same address. I guess that the sunrise period worked/was enough.

Only for your sample, and maybe for a large sample, but certainly not comprehensive.

There existed an "Amt Golßener Land" (until a recent administrative reorg) with the web site amt-golssener-land.de. Long after the sunrise period I registered amt-golßener-land.de (http://xn--amt-golener-land-mlb.de/). I just wanted to show that DENIC's practice is unsafe, others could have used that to harm people.

Another example that is sometimes mentioned is meßdienst.de vs. messdienst.de, as in comment 12 here. messdienst.de redirects to a large energy logistics company (Techem). meßdienst.de (http://xn--medienst-rya.de/) belongs to Mr. Guntermann offering small-scale utility metering. Note that Mr. Guntermann's web site uses the ß in the domain name, but the page title and contents only ever use "ss". I believe the 1996 spelling reform changed from Meßdienst to Messdienst. (His web site has had the "coming soon" notice for years.) Someone less nice than Mr. Guntermann could want to harm Techem and its customers.

Then there is fussball.de=fußball.de. Yes, they are owned by the same association (German soccer league), but xn--fuball-cta.de redirects _to_fussball.de and the web site shows nicely how the lowercase form is spelled with ß but the uppercase is always FUSSBALL.
Gerv,

> If we asked DENIC "shall we implement IDNA2008 behaviour for sharp-S?", what
> would they say? I assume Yes

FWIW (since we've already discussed this in the past): confirmed.

Best regards,
Marcos Sanz
Gerv,

> If we asked DENIC "shall we implement IDNA2008 behaviour for sharp-S?", what
> would they say? I assume Yes

By the way, you would get the same answer from DNS.be since we have also implemented IDNA2008 
and also support the registration of domain names containing sharp-S

Best regards,
Maarten Bosteels
This bug is assigned to smontagu, but he told me at the Summit (and says in comment 53) that the thing preventing him implementing it was time. Therefore, if there are other people out there who would like to take this on, please contact him and coordinate :-)

Gerv
 Marcos Sanz and  Maarten Bosteels : when you let your customers register a domain with 'sharp-s', do you make it clear that it will NOT be reachable by any of major browsers today?
To me: Just after the landrush period. Domain reseller made it clear that these domains were currently not usable with major browsers.

But at that time it was kind of expected to be supported "any day now".
(In reply to Jungshik Shin from comment #76)
>  Marcos Sanz and  Maarten Bosteels : when you let your customers register a
> domain with 'sharp-s', do you make it clear that it will NOT be reachable by
> any of major browsers today?

We actively inform our registrars about it. They do know and mourn, the same as we do.
Over and over I see customers asking for the browser functionality (last one I saw - in German:
http://www.sosseo.de/2013/09/11/heizoelrueckstossabdaempfung-macht-mich-irre/)
While it was clear for people, whene we launched IDN (in 2012) for AFNIC ccTLDs (.fr/.re/.pm/.tf/.wf/.yt), that the choice of 'sharp-s' in their domain names didn't make them fully supported by all browsers, their expectations was (and still is) that they could use that domain as any domain... one day...
The same rules (IDNA2008/'sharp-s') will apply for the new gTLDs we are involved in. So, IDNA2008 behaviour for sharp-s in your browser would make the AFNIC registrants happier ... :-)
re comment 79, 78, 77 :  THank you for the reply. It'd be better to give them a more realistic picture (than saying that it'll come any day) and advise them to register both 'ss' and 'sharp-s' together. At least for now, MS IE and Chrome continue to map sharp-s to 'ss'.
Blocks: 983982
The specific algorithms we want here are http://url.spec.whatwg.org/#idna and http://url.spec.whatwg.org/#concept-host-parser which through UTS #46 defer to IDNA2008 but apply restrictions relevant to URLs on the web.
Summary: Implement new IDNA2008 standards → Implement Unicode UTS #46 per URL Standard
(In reply to Anne (:annevk) from comment #82)
> The specific algorithms we want here are http://url.spec.whatwg.org/#idna
> and http://url.spec.whatwg.org/#concept-host-parser which through UTS #46
> defer to IDNA2008 but apply restrictions relevant to URLs on the web.

Should we use ICU IDNA for this (and drop our IDNA code) or change our existing IDNA code (and keep ICU IDNA disabled at build time)?
:smontagu expressed interest in having our own code, but given his other assignments I'm not sure how reasonable that is. Basically this bug is looking for someone with time to develop a strategy, convince dev.platform to follow it, and then implement it.
(In reply to Arnevk) from comment #82)
> The specific algorithms we want here are http://url.spec.whatwg.org/#idna
> and http://url.spec.whatwg.org/#concept-host-parser which through UTS #46
> defer to IDNA2008 but apply restrictions relevant to URLs on the web.

Wait, does this mean that Firefox will use transitional processing and so "ß" (german sharp s) will be mapped to "ss"? 

I really hope this is not the case because such a decision would ignore a particularity of the german language. No matter what people said here before, the sharp s (ß) is a special letter which can not be automatically replaced with a double s. 

Also, I Understood Gerv's previous comments in a way that Firefox will support IDNA2008 without mapping "ß" to "ss".
I am in favour of using Nontransitional processing, because it has been repeatedly made clear that this is what's wanted by the registries most concerned with the 4 characters involved (see comment 73, comment 74, comment 78). My case is that we should give them what they ask for, and it becomes the registries' responsibility to make sure users are not harmed by it. This is in line with our shift from IDN TLD whitelisting to an algorithm <https://wiki.mozilla.org/IDN_Display_Algorithm>, which also leaves some edge cases (e.g. whole-script spoofing) to be dealt with by registries.

I think the WHAT-WG spec should also listen to the requests and view of the affected registries (who speak for their customers), expressed both in what happened when the IDNA 2008 standard was defined, and many times since. But I do not control that document.

Gerv
Distinguishing ß and ss makes business sense for registrars, but it's bad for web users. First, domain names are case-insensitive, and ß and ss both uppercase to SS; that's why IDNA2003 does not distinguish them (nor the sigmas). Second, the spelling of many common German words has changed between ß and ss in the 1996 reform, and many native Germans have always been confused about when to write which. In Switzerland, German is never written with ß. (I am talking about de-CH, not gsw.)

FYI: ICU's UTS #46 code supports nontransitional mappings via a simple flag.
(In reply to Markus Scherer from comment #87)
> Distinguishing ß and ss makes business sense for registrars, 

It seems to me that the marginal increase in business any registrar would have from this change is more than offset by the time and effort required to manage the transition and uncertainty. Although I am disappointed that more registries have not adopted permanent bundling rules.

> but it's bad
> for web users. First, domain names are case-insensitive, and ß and ss both
> uppercase to SS; 

But domains are normally displayed in lowercase. What's the lowercase of "SS"?

AIUI, whether ß and ss are "the same letter" depends on lots of things, including where you are and who you ask.

> that's why IDNA2003 does not distinguish them (nor the
> sigmas). 

Effectively, you are saying that all of the years of consultation around IDNA2008 (based on experience with IDNA2003) of linguists, registrars and so on came to the wrong conclusion when they decided that the improvement in changing the handling of these characters was significant enough that it was worth breaking backwards compatibility. I guess it's possible they didn't ask all the right people, although as someone who observed the process, there seemed to be a great deal of discussion.

Gerv
> > and ß and ss both uppercase to SS; 

There is an Capital ß: "ẞ" take a look at http://en.wikipedia.org/wiki/Capital_%E1%BA%9E

And ß is not removed from language, there are also company names that contain a "ß" in their brand. Also my last name and company name both containing a ß. I am registar of roß.de and almost all browsers are converting this, regular existing hostname, into ross.de instead of xn--ro-hia.de! This is pretty bad.
(In reply to Gervase Markham [:gerv] from comment #88)
> > but it's bad
> > First, domain names are case-insensitive, and ß and ss both
> > uppercase to SS; 
> 
> But domains are normally displayed in lowercase. What's the lowercase of
> "SS"?

"ss", obviously. That's the point -- ß does not survive the round-trip through case conversion.

> AIUI, whether ß and ss are "the same letter" depends on lots of things,
> including where you are and who you ask.

I don't claim they are the same letter. They are practically case-equivalent, native German speakers (and writers) are confused, and the rules have changed over time, and a few million German speakers in Switzerland never use it to begin with.

> Effectively, you are saying that all of the years of consultation around
> IDNA2008 (based on experience with IDNA2003) of linguists, registrars and so
> on came to the wrong conclusion when they decided that the improvement in
> changing the handling of these characters was significant enough that it was
> worth breaking backwards compatibility.

IDNA2008 was extremely controversial.

> I guess it's possible they didn't
> ask all the right people, although as someone who observed the process,
> there seemed to be a great deal of discussion.

IDNA2008 was passed under protest of plenty of people with implementation experience, and UTS #46 was developed in order to deal with the problems that IDNA2008 caused.

(In reply to Daniel Roß from comment #89)
> > > and ß and ss both uppercase to SS; 
> 
> There is an Capital ß: "ẞ" take a look at
> http://en.wikipedia.org/wiki/Capital_%E1%BA%9E

Yes, apparently there is, although I had never seen it growing up in Germany all through school and college, nor visiting Germany every year since then. (I have seen the few examples in the Unicode proposal and on the Wikipedia page.) The standard uppercase form of ß continues to be SS.

> And ß is not removed from language, there are also company names that
> contain a "ß" in their brand. Also my last name and company name both
> containing a ß.

And I grew up in Roßdorf... (one of several)

> I am registar of roß.de and almost all browsers are
> converting this, regular existing hostname, into ross.de instead of
> xn--ro-hia.de! This is pretty bad.

I am sorry that it is inconvenient for you, but that's because of the history of German orthography. As you know, the recent reform changed the spelling of the word (but of course not of your name) Roß to Ross, so older and younger people spell it differently.

Aside from German orthography, as I pointed out, the reason that IDNA2003 does not distinguish ß from ss (nor the sigmas) is case equivalence. We would not have this discussion if domain names were case-sensitive, or if somehow IDNA2003 had already distinguished them.
(In reply to Markus Scherer from comment #90)
> > I am registar of roß.de and almost all browsers are
> > converting this, regular existing hostname, into ross.de instead of
> > xn--ro-hia.de! This is pretty bad.
> 
> I am sorry that it is inconvenient for you

That's completely off target. This is not about "inconvenience", at least for the .de TLD.
DENIC has decided to distinguish between ss and ß in its domain names and browsers have to implement that, the size of the TLD .de namespace is not totally neglectable.
Due to the pre-register phase where holders of domains with ss could get the similar domain with ß, *nobody* can expect to reach a ss domain by using ß anymore.
Those who are in favour of "transitional processing": do you believe it is actually transitional, or do you want to see this as the permanent behaviour? If you do think it's transitional, when do you think the transition period will end?

Gerv
Domainnames, containing ß, are allowed since August 2010. I think nearly 4 years should be enough.
Daniel: that question was not aimed at you, because (AIUI) you are in favour of "nontransitional processing".

Gerv
(In reply to Gervase Markham [:gerv] from comment #92)
> Those who are in favour of "transitional processing": do you believe it is
> actually transitional, or do you want to see this as the permanent
> behaviour?

For me, personally, as a native German as well as looking at it in the larger context of i18n handling, a case-insensitive protocol needs to handle ß and ss as equivalent. Anything else would be a disservice to the user, and with that I mean a web user wishing to reliably and safely navigate the web. Therefore, personally, I would prefer to have the "transitional" mappings be permanent.

I understand that registries have an incentive to make the distinctions and offer both versions.

I also understand that this can be painful for web site owners. It seems like the best way to handle it is "bundling" (alternates always go to the same site). There should be a way to *display* a name with uppercase and ß etc., via data provided in a special place on the site, although we would need to be careful about misleading display strings.
Anne: I'm particularly interested in whether you are anticipating that the URL standard will stay with "Transitional Processing" for ever?

I personally think the idea that one would need to download additional metadata from a site in order to know how to display its name correctly is a non-starter. It is not reasonable to make extra network requests and 404-spam in logs to solve a problem which affects only a small number of sites. If we want to allow ß in a domain name, we should adopt the standard for domain names which allows ß to appear in a domain name.

Gerv
(In reply to Gervase Markham [:gerv] from comment #96)
> I personally think the idea that one would need to download additional
> metadata from a site in order to know how to display its name correctly is a
> non-starter.

You misunderstand; this would be much more than ambiguous characters, it could also allow the display of uppercase letters, compatibility variants, and maybe spaces. (A favicon is also separately loaded data which many browsers happily show.)
Favicons with their well-known URL were always a terrible hack that caused logspam, although work has made them less so. Regardless, a site works fine without a favicon, but would not work nearly as well if some browsers had to download a file before even showing the site name at all (or changing it partway through the load) and some browsers showed it in the original form, whatever that is. And domain names are for far more than just browsers.

This is not a very fruitful discussion because even if you convinced me, you'd need to convince the Firefox owner, persuade someone to implement it, and then get support in all the other user agents and browsers, and then wait for that support to be ubiquitous. All to solve a problem that can instead be solved in the way that is already defined by standards and requested by the relevant registries and many site owners.

Gerv
(In reply to Markus Scherer from comment #97)
> You misunderstand; this would be much more than ambiguous characters, it
> could also allow the display of uppercase letters, compatibility variants,
> and maybe spaces.

Then, allow site owners to do homograph attacks.
Gerv, my understanding is that Microsoft and Google do not want to switch away from Transitional until at least some solution is found to the attacks mentioned in this bug. What Markus suggests is also suggested by Microsoft and could actually be feasible when combined with Extended Validation or some such. It's a much better solution to the problem. (And of course, you'd only allow them to use code points that would map to the actual domain in the end, avoiding the attacks.)
About adoption:
gmx.de, one of the larger german email providers,  does support IDNA2008 from its webmail interface.
Summary: Implement Unicode UTS #46 per URL Standard → Implement IDNA2008 and Unicode UTS #46
Even if no one is still interested (since opera stopped supporting ß again & chrome sees this as "solved"), i like to point out again, as in comment 68 here, that there are words in german, which will permanently have different meaning:

"
Maße - Masse
Buße - Busse
are different words with different meanings."

and many redirects aren't german words at all, like "gross". while the english meaning of gross fits this topic imho.

So treating ss and ß as the same is strange, just saying as a german.

I don't think favicons can be all the way, but i just guess there could be a browser-embedded message or something, that gives the user the choice, where he wants to go or at least reminds him of "maybe you wanted to go somewhere else"...
I decided to write because I'm really tired that Firefox can't handle IDNA (after ALL these years) while Chrome does for instance.

Firefox mishandles for example links in http://www.pouruneécolelibre.com, try to click on a link (like the title of post)

For instance, click on a link to http://www.pouruneécolelibre.com/2015/03/ecr-lavertissement-de-la-cour-supreme.html

It gets interpreted as 
http://www.pourune%fdcolelibre.com/2015/03/ecr-lavertissement-de-la-cour-supreme.html

and fails.

Type in the URL pouruneécolelibre.com (without www.)

The URL box shows "www.pouruneécolelibre.com" and fails. 

When copying the URL (from the URL box) it returns : http://www.xn--pourunecolelibre-eua93hsa416a.com/

(viz it encodes the strange é) which is different from Google Chrome : http://www.xn--pourunecolelibre-hqb.com/, obviously http://www.xn--pourunecolelibre-eua93hsa416a.com/ fails 


It's *really* a pain !
It has nothing to do with IDNA2008.
I was fearing such constructive and long comment.

What does it have to do with? Implementation of what exactly? This has nothing to do with the "é" in the domain name, you thibk?

Is there already a bug for this?
P. Andries: all of your test cases work for me in the latest version of Firefox. If you think you have found a bug in Firefox's IDNA support, please file it. But your problem has nothing to do with IDNA2008.

Gerv
Perhaps P. Andries' issue might be related to character encoding/escaping issues in the handoff from other applications when clicking links in them? When I click on, for example, 

   http://www.pouruneécolelibre.com/

in an email in Thunderbird such as the one I got as bugmail for this bug, I get an error page saying 

   "Firefox can't find the server at www.pourune%c3%a9colelibre.com."

If I cut and paste the exact same link into the address bar, everything works as expected. Whatever it is, it isn't something to do with IDNA2008, and should be reported as a separate bug.
Germany can't use names like straße.de cause major browsers do not support it yet.

What is the status of implementing completly ( with ß ) IDNA2008 in FF in 2015 ?
The status is that nothing is going to happen without a patch, which the main person working on this area of the code (smontagu) has told me he doesn't have time to write. So unless someone else writes it, it's not going to happen soon.

Ther is also not unanimity among Mozillians on exactly what to do about the transitional processing question. I have my view but, as you can see above, others have different views. Still, it's all academic without a patch.

Gerv
Simply incredible. Just got another message telling me the internal links all failed with Firefox because of an accent in the domain name. I suggested changing browser, I refuse to revert back to an non accented domain name.

<blockquote>«
Tous vos hyperliens pointant vers votre blogue dysfonctionnent sur votre blogue, à cause du « é » dans les url.

Je me suis dit que ça vous rendrait service de vous en aviser.»</blockquote>
(In reply to P. Andries from comment #111)
> Simply incredible. Just got another message telling me the internal links
> all failed with Firefox because of an accent in the domain name.

Where did this message come from? Firefox almost certainly fully supports all of the accented characters you might, as a Canadian person, be using.

It is quite possible that the links are incorrectly encoded in your HTML, but you would need to file another bug with a test case for us to investigate.

Gerv
Hello,
I have reported now this bug to DENIC, ICANN and some technician newspapers. 

It is absolutely crazy - 60 Million people are using "ß" and "ss" to execute different domainames (like fußballplatz.de and fussballplatz.de) and what's the answer of mozilla-team? "First of all we need a patch, but no one has time for it, so nothing will happen". And this after 4 years of waiting...

It is a disaster for german companies with "ß" in their names, for users with this "ß" in their surnamens and also for domain-owners.

Bye,
Andy
(In reply to Neil Harris from comment #107)
> Perhaps P. Andries' issue might be related to character encoding/escaping
> issues in the handoff from other applications when clicking links in them?
> When I click on, for example, 
> 
>    http://www.pouruneécolelibre.com/
> 
> in an email in Thunderbird such as the one I got as bugmail for this bug, I
> get an error page saying 
> 
>    "Firefox can't find the server at www.pourune%c3%a9colelibre.com."
> 
> If I cut and paste the exact same link into the address bar, everything
> works as expected. Whatever it is, it isn't something to do with IDNA2008,
> and should be reported as a separate bug.

This particular use case was fixed in Bug 1142083
Hi,

I read quite a few comments of this bug and to me it is incomprehensible that the problem with the German double/sharp s has not been resolved yet.

The company I am working for is not only a registrar of the German .de registry DENIC, but is also a technical provider for various registries for the new top level domains, among these there are TLDs like  .ruhr, .nrw, .koeln, .cologne which address mostly German speaking/reading/writing population. For these registries, the German double s is treated in the very same way as DENIC does it, i.e. treated as a separate character. There is a major consent in the German DNS community that this is the right way to deal with the character and that the various browsers should stop performing the mapping of the ß to ss.

Please let me also note the following as a German native speaker: The replacement of ß by ss does not mean that these two are identical. The unconditional mapping in the reverse direction is not allowed. While they have common ancestors, they are no longer the same. The replacement of ß by ss mostly originates from the non-existance of the ß at the first typewriters. The argument of potential spoofing attacks and conflicts between registrants that have the ß and ss variants registered is a quite weak one, as the same applies to the other three special characters used in the in German language -- the umlauts. Before the introduction of IDNs in the DNS, and in general before the introduction of accented characters in computer systems (and on the old mechanical typewriters!), it was common to write ä as ae, ö as oe and ü as ue. So you have the same problems with "grün" vs. "gruen", as with "straße" vs. "strasse". And I really don't know any registry in the world who does domain blocking in respect to umlauts and their replacement representation mentioned before, and I don't know any browser that would block the use of umlauts for that reason.

So please give yourself a push and fix that. Thanks.

Klaus
As noted in comment #110, this bug isn't going to get anywhere without a patch. If this is a big problem for your company, they should invest in writing patches themselves - both Firefox and Chromium are open source. You can probably coordinate with Mozilla and Google on getting this fixed if you can provide the manpower. That doesn't solve the 'international consensus' issue of course, but again your company could be a driving force in achieving that.
Sorry, that is the knockout argument I have heard before to stop any wishes of the users of an open source project to have bugs fixed or to have it improved in some way. If I (or some of my colleagues) had the ability to provide a patch, I would consider it. But we are mostly programming in Java, and my last larger C++ program (with 10000 lines – ludicrous compared to FF) is about 18 years ago. I also heard that it takes quite a while to get clue how FF is constructed and working. So I guess your suggestion is not suitable for me.
(In reply to Klaus Malorny from comment #117)
> Sorry, that is the knockout argument I have heard before to stop any wishes
> of the users of an open source project to have bugs fixed or to have it
> improved in some way. If I (or some of my colleagues) had the ability to
> provide a patch, I would consider it. 

Anyone has the ability to provide a patch, by contracting someone to do it - if the functionality is important enough to them.

What more can we say? Firefox is open source, and any person or company can write a patch or pay for one to be written. If we then ignore the patch, then that would be a legitimate criticism of us. 

smontagu: can you at least say what approach you would prefer? Would you prefer us to take an updated version of the JPNIC IDN library, or would you prefer us to switch to using libicu, which I believe has an implementation of this and which we already ship? http://icu-project.org/apiref/icu4c/uidna_8h.html

Gerv
(In reply to Gervase Markham [:gerv] from comment #118)
> smontagu: can you at least say what approach you would prefer? Would you
> prefer us to take an updated version of the JPNIC IDN library, or would you
> prefer us to switch to using libicu, which I believe has an implementation
> of this and which we already ship?

In this case certainly libicu. There are other bugs where using ICU is blocked on Android (dependencies of Bug 864843), but here I don't think that's such a concern.
OK. So the patch would need to remove the use of the idnkit IDN library, and switch to using libicu with the right parameters to use IDNA2008. The fact that this is not trivial may explain why this bug has not yet got a patch attached.

Gerv
Note that even with a patch it's not clear we want to do this given the compatibility and security implications.
(In reply to Anne (:annevk) from comment #121)
> Note that even with a patch it's not clear we want to do this given the
> compatibility and security implications.

If we set aside the issues surrounding eszett, final sigma, ZWJ and ZWNJ, are you still asserting we should not upgrade to IDNA2008? Comment 52 explains that both IE and Chrome have moved to IDNA2008 in general (with various policies about these more controversial corner cases). There is a lot in IDNA2008 which everyone agrees on and, for us to be compatible with the other browsers in IDN handling in the general case, we need to implement 2008. And smontagu says he would prefer that we do that by moving to libicu.

As for the four deviation characters, Jungshik (is he still involved?) and smontagu own i18n, so it's their decision as to what to do AIUI. I'm happy to write up my case, and you can write up yours, and we'll see what they say.

Gerv
For almost 5 years now Firefox has shown wrong sites for domain names with an ß. If anything this is the real security risk.

It’s clear there won’t be a patch implementing ICU in the near future. Therefore as a makeshift keep idnkit and delete the mapping entry for ß from netwerk/dns/nameprepdata.c.
Anne, it's not really clear what issues comment 121 is about. Is the point of contention updating IDN to IDNA2008, or using ICU? In the past (e.g. in bug 724538 comment 1)you've expressed reservations about using ICU because
Flags: needinfo?(annevk)
... of decreasing diversity, but I'm not sure if you're saying the same thing here or not -- in particular "compatibility" seems like almost the opposite issue.

If diversity is your concern, I believe it's outweighed by the amount of developer resources that would be required to maintain our own versions of the functionality that we could be getting from ICU.
Gerv, comment 52 suggests Chrome implements UTR #46 in transitional mode. It also explicitly mentions they map ß to ss and similar for the other code points that IDNA2008 changed the mapping of. Therefore, I would not call that implementing IDNA2008, but rather UTR #46 within the constraints of the URL Standard.

Simon, my concern is us adopting different behavior from Chrome, e.g. not mapping ß to ss, as that would have security implications and also create interoperability issues. I would also be somewhat sad with yet another browser depending on ICU (the only balance provided by Microsoft), but that seems inevitable.
Flags: needinfo?(annevk)
AFAIK the plan has always been to aim for maximal interoperability, which I think requires us to implement UTR #46, and inter alia to map ß to ss.
Then all is well.
Yes, everything is well, except that you are completely ignoring the wish of the majority of the users. 

There is no reason for mapping ß to double-s, neither in German language nor from security view. There was a sunrise period for owners of double-s-domains. And no one, really no one expects a browser to redirect to double-s-domain when typing a domain with an ß. 

Don't map ß to double-s! At least, let the users vote! This is a community project, do I have to remind you of that? Don't make Firefox an imitate of Chrome, where you simply reverse the users will into the opposite, like they did with asking for support of the ß.
I agree.
From a users POV, implementing IDNA2008 means * NOT * mapping ß to double-s.
My view is that we should do what the relevant language community/ies want us to do. The registries concerned appear to want no mapping (i.e. strict IDNA2008 behaviour) - which is not surprising, as this is why IDNA2008 was written that way. And the small sampling of opinion in this bug agrees.

However, whatever we do about the four deviant characters, I hope we all can agree that in general moving to IDNA2008 rather than IDNA2003 is an improvement and we should try and get it done. However, there is still significant uncertainty about where the resources to make this change will come from.

Gerv
(In reply to Horst Dünsch from comment #129)
> There is no reason for mapping ß to double-s, neither in German language nor
> from security view.

It's simply not true that "there is no reason". I and others have given several reasons in earlier replies here. I accept that you disagree, but it's unfair to just ignore the discussion.

> And no one, really no one expects a browser to redirect to
> double-s-domain when typing a domain with an ß.

That's also not true, and such absolutes rarely apply to people.

Aside from the fact that this behavior has been established practice for ten or so years.

Users are quite familiar with internet navigation ignoring and flattening certain distinctions, most prominently uppercase vs. lowercase.
(In reply to Anne (:annevk) from comment #126)

> Simon, my concern is us adopting different behavior from Chrome, e.g. not
> mapping ß to ss, as that would have security implications and also create
> interoperability issues.

Anne, first of all, I do understand and do share your wish for interoperability. This goal doesn't need to be in opposition to the needs of the language community, though. Let me explain:

When the relevant IDNA2008 code was checked in in Chromium (s. https://codereview.chromium.org/23642003) following decision was taken, quoting from the link: "Use transitional mechanism for 4 deviant characters: German sharp-S, Greek final-sigma, ZWJ and ZWNJ. That is, the former two are mapped to 'ss' and regular sigma and the latter two are dropped. All the major browsers do this at the moment so allowing them does not do any good. <b>We'll review this later as the consensus builds among browser vendors and registrars.</b>"

(Emphasis added by me). I think that time has come. I expect Mozilla to take the first step, hear the needs of the relevant communities who have already outspoken and implement IDNA2008/UTS#46 in non-transitional mode (that is, no mappings for deviants). At the same time I'd expect the Chromium project to keep their promise and review their position, leaving behind transitional mapping mechanims, which were just thought for an interim time. Both leading browsers together would finally drive this issue after so many years to a safe harbor.

As for contributing code, DENIC would be willing to sponsor the patch, still looking for a proficient developer on the subject.
Depends on: 319030
Depends on: 321491
Attachment #8670202 - Flags: review?(jfkthame)
We'll want some new tests as well, but I'm not sure if we have a way to add xpcshell tests with an #ifdef
Attachment #8670203 - Flags: review?(jfkthame)
I see conflicting opinions in the comments here regarding whether we should implement IDNA2008 / UTS46 in transitional or non-transitional mode.

What I don't see is any clear basis for making the decision; both sides make strong points, IMO.

Whichever behavior we ship, maybe we should provide an about:config preference so that users who really want the other option can have it? It looks like that wouldn't be hard to add.
You will see that I put in a compile-time switch for transitional/non-transitional processing; making it user-configurable in about:config would be no problem technically, but do we want to do that? AIUI it has traditionally been our policy not to have user-configurable or locale-dependent differences in IDN processing -- i.e. typing the same characters in the URL bar in two different copies of (the same version of) Firefox should always lead to the same site.

Gerv, any opinion?
Flags: needinfo?(gerv)
Comment on attachment 8670202 [details] [diff] [review]
Patch 2: call ICU instead of IDNKit

Review of attachment 8670202 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/dns/nsIDNService.cpp
@@ +128,5 @@
>  nsIDNService::nsIDNService()
>  {
> +#ifdef IDNA2008
> +  uint32_t IDNAOptions = UIDNA_CHECK_BIDI | UIDNA_CHECK_CONTEXTJ;
> +#ifndef IDNA2008_TRANSTIONAL

If we do end up keeping this as a compile-time switch,

s/TRANSTIONAL/TRANSITIONAL/

otherwise we'll always get the non-transitional behavior, despite appearances!
(In reply to Simon Montagu :smontagu from comment #139)
> You will see that I put in a compile-time switch for
> transitional/non-transitional processing; making it user-configurable in
> about:config would be no problem technically, but do we want to do that?
> AIUI it has traditionally been our policy not to have user-configurable or
> locale-dependent differences in IDN processing -- i.e. typing the same
> characters in the URL bar in two different copies of (the same version of)
> Firefox should always lead to the same site.
> 
> Gerv, any opinion?

I am certainly of the opinion that making it user or locale-configurable is a bad idea, for exactly the reasons you state. This would be worse than using transitional processing :-) We need to figure out how to make a decision here, even if we can't reach consensus among everyone.

I would say that the proper decision-maker is the owner of the i18n module. And that would currently be you and Jungshik:
https://wiki.mozilla.org/Modules/All#I18N_Library

Perhaps it might be a good idea, in order to help you make your decision, for the proponents of both sides to produce a wiki page or similar document summarising their arguments?

Gerv
Flags: needinfo?(gerv)
Attachment #8670201 - Flags: review?(ted) → review+
Comment on attachment 8670202 [details] [diff] [review]
Patch 2: call ICU instead of IDNKit

Review of attachment 8670202 [details] [diff] [review]:
-----------------------------------------------------------------

This is pretty #ifdef-ugly, but until we have ICU available in all builds I guess it is OK (modulo the typo fix, see above); but it's still unclear to me whether we should be shipping this in transitional mode (as per comments 126-128, for example) or non-transitional (as per comment 131 and others).

The more conservative, arguably "safer" option would be to use transitional mode, which AIUI is the common practice among other browsers at this point, and represents the less-breaking change from our existing behavior. One approach would be to land the patch as such, and then file a separate followup about switching to non-transitional processing.

But I'm somewhat concerned that if we do this, and resist (for now) the user community's calls to adopt non-transitional, we may be adding to the inertia that will tend to prevent that change ever happening. Once we're firmly in the situation where every UA implements IDNA2008/transitional, the interoperability argument against any vendor switching to non-transitional will seem overwhelming. So our "intermediate step" could turn out to be a swamp in which we're trapped indefinitely. :(

If we accept -- as I think we should -- that the desired outcome in the longer term is for browsers to implement non-transitional IDNA2008/UTS46, then maybe we should make the first move here (with prominent messaging to communicate this to the relevant user communities, registries, etc), and urge the other browser vendors to follow suit in a timely fashion.

Gerv, Simon, etc: where and how should we be making these decisions?
Attachment #8670202 - Flags: review?(jfkthame) → review+
Comment on attachment 8670203 [details] [diff] [review]
Patch 3: changes to existing tests

Review of attachment 8670203 [details] [diff] [review]:
-----------------------------------------------------------------

We should also have at least a test that includes ß, but of course the expected result will depend on the behavior we implement.
Attachment #8670203 - Flags: review?(jfkthame) → review+
What are the differences from today's behavior with UTS46-transitional? There shouldn't really be any. Using UTS46-non-transitional would satisfy some vocal part of the user community, but would also put our users at risk. Given that no other browser vendor seems to be planning to make this change I don't really see why we should be first. We've already lost enough users due to being incompatible.
(In reply to Jonathan Kew (:jfkthame) from comment #142) 
> Gerv, Simon, etc: where and how should we be making these decisions?

Comment #141 gives the answer to that. I think it would be much better for both sides to set out their case in full, and allow the module owners to make a decision - or ask for further information or validation of particular claims.

Gerv
(In reply to Anne (:annevk) from comment #144)

> I don't really see why we should be first. We've already lost enough users
> due to being incompatible.

That suggests that non-transitional behavior would lead to a loss of user base, which is very questionable. I would even hypothesize that being the first browser to really support ß would lead to a gain in the user base. So who's right?
German Users waiting or a browser to use ß as it was released,
domainowners have to register 2 domainnames to get safe, as long
as ß is not supported by browsers.

Being the first to do it as expected by i.e. DENIC, would lead to an advantage.
But, you have to tell the world, that you did it. Just silently updating it,
will falsely being seen as a mistake by some users.

I talked to Denic in Frankfurt about it a while ago. I'm pretty sure they will help with press releases
if you contact them before it gets released. Maybe coordinating it with a new major release of ff.
This may have the result we all wish for, as Apple, M$ and Google will hear about it for sure.

Having the german NIC on your side, should be enough force on them to make the same move.

If you need someone, who makes the contact with Denic on your behalve, just contact me. I'm glad to help out.
We could simply send out the "intent to ship" mail on dev-platform, and also a similar announcement on chromium-dev asking them to do the same. I'm assuming we should file bugs to get Safari and Edge on board.

Another question is if older versions of IE would get the update (probably not) and if we care about that.
I have said this here before, but FWIW: *Given that domain names are case-insensitive*, I consider mapping ß to ss, as in IDNA2003, the correct behavior. Even more so since de-CH never uses ß (only ss), and since the 1996 spelling reform changed the rules on when to use it in many commonly occurring words. I consider the "non-transitional" behavior a bug and a (mild?) security risk. I would very much like to see FF behave like the other major browsers.

FYI: I grew up in Germany, to beyond college into the start of being a software engineer. I have been doing full-time i18n (ICU) since 1999.
(In reply to Markus Scherer from comment #149)
> I have said this here before, but FWIW: *Given that domain names are
> case-insensitive*, I consider mapping ß to ss, as in IDNA2003, the correct
> behavior. Even more so since de-CH never uses ß (only ss), and since the
> 1996 spelling reform changed the rules on when to use it in many commonly
> occurring words. I consider the "non-transitional" behavior a bug and a
> (mild?) security risk. I would very much like to see FF behave like the
> other major browsers.
> 
> FYI: I grew up in Germany, to beyond college into the start of being a
> software engineer. I have been doing full-time i18n (ICU) since 1999.

Matching the current behaviour of other major browsers would at least seem to follow the principle of least surprise. Perhaps this is a suitable issue for coordination between browser vendors, perhaps in the W3C or WHATWG?
Matching the current behaviour isn't the goal. Germany want/needs the ß and not the translation to ss.
The conversion ß -> ss was a mistake in the firstplace and should not have made it into any kind of standard. The only valid application of the translations ( ß->ss ä->ae ö->oe and ü->ue ) is the situation, that you find yourself with a 7BIT ASCII charset with plain a-zA-z0-9 chars and nothing else.

I have no clue why every other language got it right at the beginning with theire non-ascii letters, but german forgot about ß .. ( not the time and place to argue about that, i believe)
(In reply to Gervase Markham [:gerv] from comment #145)
> (In reply to Jonathan Kew (:jfkthame) from comment #142) 
> > Gerv, Simon, etc: where and how should we be making these decisions?
> 
> Comment #141 gives the answer to that. I think it would be much better for
> both sides to set out their case in full, and allow the module owners to
> make a decision - or ask for further information or validation of particular
> claims.

I don't think that's necessary. I believe I understand the positions of both sides well enough by now.

Since the German community are most affected by the choice between transitional and non-transitional processing, I think we should give weight to what they want here, and I agree with Jonathan that we can and should make the first move here.

However, I am a great believer in "divide et impera", so what I plan in practice is to check this in in the first instance with the switch set to transitional processing, deal with any regressions, and then flip the switch to non-transitional in a separate bug after a week or two, but within the same version cycle.

Full disclosure: I am currently working on this issue under contract for DENIC, but with my module-owner hat I believe this is the correct decision.
Gerv, can I check in the reviewed patches right away, or do we need a sec review?
Flags: needinfo?(gerv)
smontagu: I'm not the person who decides what code needs a secreview. I'm not sure what the current process is. dveditz, can you tell us?

Gerv
Flags: needinfo?(gerv) → needinfo?(dveditz)
(In reply to Simon Montagu :smontagu from comment #152)
> Since the German community are most affected by the choice between
> transitional and non-transitional processing, 

Yes, for the sharp-s.

> I think we should give weight
> to what they want here, and I agree with Jonathan that we can and should
> make the first move here.

However, I would like to remind folks that other communities that have not been very vocal here may also be affected: e.g. the Greek community (due to the final sigma), and script communities using ZWJ/ZWNJ. 

I'm aware that the .gr and .cy registries did ask for the final sigma to be treated as a PVALID character in its own right. I wouldn't be surprised if they are already bundling the IDNA2008 forms for the registrations, which means that using non-transitional processing wouldn't adversely affect them.

Still, I think it's important to hear from them as well as the communities using ZWJ/ZWNJ.
Hello folks,

I was following this thread for a long long time anonymously, but now, after coming back after a few weeks and reading the latest posts, I just had to register to post here.

I am honestly so happy, that this long story is finally coming to a good end. Listening to the users voices, even it might only be a specific problem of the German speaking part of the world, was the best decision you could have done and is where Firefox always was different in a positive way from other major browsers.

So thanks everyone for your good work and also thanks to DENIC for supporting this development.

Greetings,
Christian
I'm concerned there are no tests for non-transitional vs transitional behavior here, especially since the code typo noted in comment 140 means we'll get non-transitional behavior when the surface intention of the code is transitional.

Is there a follow-up bug about switching (or at least considering the switch) to non-transitional behavior? I'd hate to land this patch and then have the conversation about 'ß' continue on in a FIXED bug (or worse, leave it open). I would love to give the Germans what they want, and would have without hesitation if DENIC had followed the UTS-46 recommendation to "bundle or block" domains with deviation characters (a sunset period is not the same thing). As it is, if we switch there will be web page links that go to one place in Firefox and another in Chrome and IE. If DENIC had followed that recommendation the link would either go to the same owner's site or would safely fail. But we should move /THAT/ conversation to a new bug.

I'm even more concerned about ZWJ/ZWNJ but I see the patch does add UIDNA_CHECK_CONTEXTJ. Good! UTS-46 says "applications that perform IDNA2008 lookup are not required to check for these contexts, so overall security is dependent on registries having correct implementations." That's a bad recommendation! Safer to enforce the checks in 4 or 5 browser engines than hundreds of registries--not to mention there's no way for registries to enforce anything for sub-domains.

Do we have any tests that show UIDNA_CHECK_CONTEXTJ works correctly?

(In reply to Simon Montagu :smontagu from comment #153)
> Gerv, can I check in the reviewed patches right away, or do we need a sec
> review?

Check in the reviewed patches, or check in versions that correct the items noted by jfkthame in comment 140 and comment 143? The latter is fine though I'd like to see more tests as noted above. ICU code is already being used elsewhere in Gecko (though maybe not the IDNA part of it) and needs to be inspected/tested anyway.
Flags: needinfo?(dveditz)
I have the TRANSTIONAL typo corrected locally and will of course check in that version. I've confirmed that changing the #define turns transitional processing on and off as expected.

I'll see what I can do about extra tests, but there may be issues for the time being with platforms where ICU is not available.
(In reply to Simon Montagu :smontagu from comment #158)
> I have the TRANSTIONAL typo corrected locally and will of course check in
> that version. I've confirmed that changing the #define turns transitional
> processing on and off as expected.

Actually, rather than using a #define, why not say something like

  const bool kIDNA2008_Transitional = true;

there at the top of the file to select the mode, and then instead of the #ifndef in nsIDNService(), you can say

  if (!kIDNA2008_Transitional) {
    IDNAOptions |= UIDNA_NONTRANSITIONAL_TO_UNICODE;
  }

(where the compiler should optimize away the const condition, so there's no runtime cost). Seems a bit neater than using the preprocessor, IMO -- and less typo-prone because the compiler will insist the name matches!
Modified per comment 159 -- carrying forward r=jfkthame
Attachment #8670202 - Attachment is obsolete: true
Attachment #8678571 - Flags: review+
Blocks: 1218179
(In reply to Daniel Veditz [:dveditz] from comment #157)
> Is there a follow-up bug about switching (or at least considering the
> switch) to non-transitional behavior?

Bug 1218179
(In reply to William Tan from comment #155)

> However, I would like to remind folks that other communities that have not
> been very vocal here may also be affected: e.g. the Greek community (due to
> the final sigma), and script communities using ZWJ/ZWNJ. 
> 
> I'm aware that the .gr and .cy registries did ask for the final sigma to be
> treated as a PVALID character in its own right. I wouldn't be surprised if
> they are already bundling the IDNA2008 forms for the registrations, which
> means that using non-transitional processing wouldn't adversely affect them.
> 
> Still, I think it's important to hear from them as well as the communities
> using ZWJ/ZWNJ.

Here is the position of our colleagues from the Greek registry: They are launching IDNA2008 very soon and will do so following their so-called domain names decision 750/2, which is publicy available under https://grweb.ics.forth.gr/public/tomcat_docs/412-B-2015_EN.pdf
(Relevant for this discussion are Article 3, paragraphs 4, 8, 9, 10, 11 and article 23, par. 5, 6)

Summing it up: they will allow final sigma in domain names as a different character and bundle at registration time with small sigma. The domain owner might then decide to block variants, or aliase them with a DNS DNAME or use them as an independent fully qualified domain name (and thus point the final sigma domain to wherever they like to). All in all, they *welcome* a Firefox non-transitional implementation because it offers maximum flexibility for their customers.

I've invited the Greek colleagues to follow this issue and make a statement themselves here. Also still trying to get an answer from the Cypriot registry.
(In reply to Marcos Sanz from comment #163)
> Here is the position of our colleagues from the Greek registry: They are
> launching IDNA2008 very soon and will do so following their so-called domain
> names decision 750/2, which is publicy available under
> https://grweb.ics.forth.gr/public/tomcat_docs/412-B-2015_EN.pdf
> (Relevant for this discussion are Article 3, paragraphs 4, 8, 9, 10, 11 and
> article 23, par. 5, 6)
> 
> Summing it up: they will allow final sigma in domain names as a different
> character and bundle at registration time with small sigma. The domain owner
> might then decide to block variants, or aliase them with a DNS DNAME or use
> them as an independent fully qualified domain name (and thus point the final
> sigma domain to wherever they like to). All in all, they *welcome* a Firefox
> non-transitional implementation because it offers maximum flexibility for
> their customers.
> 
> I've invited the Greek colleagues to follow this issue and make a statement
> themselves here. Also still trying to get an answer from the Cypriot
> registry.

As it looks like registries are beginning to roll our IDNA 2008 support one-by-one, would it make sense to have some sort of per-registry config flag that could be used to turn on IDNA 2008 on a TLD-by-TLD basis, similar to the way that IDN display was once restricted on a TLD-by-TLD basis during its transition? If so, the code to support this is essentially already written, and trivially available for reuse.
(In reply to Neil Harris from comment #164)
> As it looks like registries are beginning to roll our IDNA 2008 support
> one-by-one, would it make sense to have some sort of per-registry config
> flag that could be used to turn on IDNA 2008 on a TLD-by-TLD basis, similar
> to the way that IDN display was once restricted on a TLD-by-TLD basis during
> its transition? If so, the code to support this is essentially already
> written, and trivially available for reuse.

That's an interesting idea, but maintaining the (white- or black-?) list of TLDs would be quite a burden. I don't know if anyone would be willing to take that on.

Do you in fact mean IDNA 2008, or the non-transitional option under IDNA 2008? I don't see a reason to turn on IDNA 2008 with transitional processing other than across the board.
The reasons we came up with the approach in tr46 was because of the security problems with simply going to IDNA2008. For the first time, I can go to a completely different page just whether which browser I'm using. There are a number of examples of how that can be used for unpleasant spoofs.

That is pretty-much orthogonal to whether ß would be good to distinguish from "ss" or not. Note that there are a host of correct spellings in languages that can't be represented in IDNAs. For example, take http://www.fox.com/bobs-burgers. The phrase <Bobs Burgers> (with a plural Bob) is simply wrong; in correct English it has to be <Bob’s Burgers> (with a genitive Bob).

To my mind, the only really safe approach would be to turn off the transitional flag at a level only if
a) labels at that level cannot have ß, or
b) labels at that level that have ß are guaranteed to either be bundled or blocked.

Now, two important conditions:

1. This applies to all 4 of the characters, not just ß.

2. It applies at a level. That is, suppose that DENIC did guarantee bundle-or-block at the second level, eg <masse.de> vs <maße.de>. That means that you could turn off the transitional processing for <*.de>. But that doesn't necessarily mean that you can turn it off for <*.foo.de>: not unless either 
(a) DENIC has the contractual guarantee in place at higher levels (2, 3, ...), or 
(b) foo.de guarantees it.
Practically speaking, I think (a) is the only feasible test for a browser, because otherwise the maintenance gets awful.

Now, of course, Firefox could just say "Germans want this, so we'll do it no matter the consequences". But if Waltraud or Jürgen end up falling prey to a spoof, I suspect they would blame Firefox—not DENIC.
The scope of this bug is hereby limited to getting the IDNA2008/UTS46 implementation *with transitional processing* checked in and working, which I will do just as as soon as attachment 8678572 [details] [diff] [review] gets reviewed.

Please let's have any further conversation about ß ς and the joiners in bug 1218179.
Attachment #8678572 - Flags: review?(jfkthame) → review+
Are you still hoping to land this and bug 1218179 for the same release cycle (as per comment 152)?
Oh, never mind -- I think your comment 167 intended to override that plan, so this part goes ahead anyway.
It was more Time's wingèd chariot that overrode that plan, but yes, I'll check this in now and hope to get bug 1218179 in early in Fx45.
sorry had to back this out for bustage like :

    StaticXULComponentsEnd/StaticXULComponentsEnd.o
 04:22:23     INFO -  ld: warning: could not create compact unwind for _ffi_call_unix64: does not use RBP or RSP based frame
 04:22:23     INFO -  Undefined symbols for architecture x86_64:
 04:22:23     INFO -    "_uidna_close_55", referenced from:
 04:22:23     INFO -        nsIDNService::~nsIDNService() in Unified_cpp_netwerk_dns0.o
 04:22:23     INFO -        nsIDNService::~nsIDNService() in Unified_cpp_netwerk_dns0.o
 04:22:23     INFO -    "_uidna_openUTS46_55", referenced from:
 04:22:23     INFO -        nsIDNService::nsIDNService() in Unified_cpp_netwerk_dns0.o
 04:22:23     INFO -    "_uidna_labelToUnicode_55", referenced from:
 04:22:23     INFO -        nsIDNService::IDNA2008ToUnicode(nsACString_internal const&, nsAString_internal&) in Unified_cpp_netwerk_dns0.o
 04:22:23     INFO -        nsIDNService::IDNA2008StringPrep(nsAString_internal const&, nsAString_internal&, nsIDNService::stringPrepFlag) in Unified_cpp_netwerk_dns0.o
 04:22:23     INFO -  ld: symbol(s) not found for architecture x86_64
 04:22:23     INFO -  clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation)

and https://treeherder.mozilla.org/logviewer.html#?job_id=16381289&repo=mozilla-inbound


/home/worker/workspace/gecko/netwerk/dns/nsIDNService.cpp:142: error: undefined reference to 'uidna_openUTS46_55'
/home/worker/workspace/gecko/netwerk/dns/nsIDNService.cpp:174: error: undefined reference to 'uidna_labelToUnicode_55'
/home/worker/workspace/gecko/netwerk/dns/nsIDNService.cpp:199: error: undefined reference to 'uidna_labelToUnicode_55'
/home/worker/workspace/gecko/netwerk/dns/nsIDNService.cpp:155: error: undefined reference to 'uidna_close_55'
Flags: needinfo?(smontagu)
(In reply to Markus Scherer from comment #149)
> I have said this here before, but FWIW: *Given that domain names are
> case-insensitive*, I consider mapping ß to ss, as in IDNA2003, the correct
> behavior. Even more so since de-CH never uses ß (only ss), and since the
> 1996 spelling reform changed the rules on when to use it in many commonly
> occurring words. I consider the "non-transitional" behavior a bug and a
> (mild?) security risk. I would very much like to see FF behave like the
> other major browsers.

FWIW, I find this argument persuasive on its own but even more so considering who is making the argument.

(In reply to "support" from comment #151)
> Germany want

FWIW, foreigners should be very wary of arguments of this form. Countries may have vocal prescriptive orthography contrarians who try to bolster their position by trying to convince foreigners in standards-setting context of stuff that results in impracticality locally. (The official Finnish position that lead to ISO-8859-15 is an example close to home for me that makes me suspect an analogous thing may be going on when someone else from another country makes claims that systems in common use are all wrong for $LANGUAGE or $COUNTRY.) In this case, the impracticality would be Firefox resolving domains differently than before and differently from other browsers.

> The conversion ß -> ss was a mistake in the firstplace and should not have
> made it into any kind of standard.

Whether or not this is true, Web tech is full of mistakes that would cause more trouble if fixed after the fact.
There is no upper- and lowercase problem with "ß" as there is no uppercase form of it.

btw, we already have facts about using ß, instead of ss,  made by Denic years ago. 
ß is an offical part of german domainnames, which is completly irgnored by browsers until now. 

Denic told anyone with a "ß" in the domainname, to registrer the ss domain too, so theoretically, 
they will just work like intended and endup at the same server as always.

AFTER browsers have worldwide adopted the IMHO correct behaviour, things can change, so that ß domains no longer need a companion domain to be usefull. They can start "a live of it own" , so to speak ;)

> Whether or not this is true, Web tech is full of mistakes that would cause more trouble if fixed after the fact.

Sadly, thats true. I our case, the mistake lead to the situation, that you can own something, but you can't make use of it. So we make those domainnames useable in the first place, which is a bit different than breaking the working SMTP protocol :)
(In reply to support from comment #176)
> There is no upper- and lowercase problem with "ß" as there is no uppercase
> form of it.

U+1E9E says otherwise.

> btw, we already have facts about using ß, instead of ss,  made by Denic
> years ago. 

That someone official positioning themselves as speaking for a country wants to make a breaking change doesn't mean it's a good idea to make one and doesn't mean that the change would be a net benefit (considering the downside of technical incompatibility) for the people that someone official represents. When the Finnish delegation to ISO claimed that ISO-8859-1 was insufficient for Finnish orthography (despite the ISO-8859-1 repertoire having successfully been in use in Finland at that point), it was the official position of the official regulatory body for the Finnish language. ISO-8859-15 was still a terrible idea.
Rechecked in with CLOBBER, so far with no bustage.

For the third time of asking: switching to non-transitional processing is bug 1218179, and all discussion of that question should happen there and not here.
Flags: needinfo?(smontagu)
Backgroundinfos: 

ß is never used as a first letter, so it can't have an uppercase form. This is ofcourse only true for the realworld written language. It does not mean, that you can't have a char representing ß+SHIFT which is normally also ß . öäü have uppercase representation ÖÄÜ as they can be first letters. 

>That someone official positioning themselves as speaking for a country wants to make a breaking change 
>doesn't mean it's a good idea to make one and doesn't mean that the change would be a net benefit 

Denic is the authority of domainnames for .de not for all german based languages, so yes, they are not speaking for all german speaking people. But with 16 Million domainnames backing them, theire wish and authority should have some weight. And Denic already decided it. 

The real question should be, why did it take so long to add ß to the domainnames and browsers. 

>(considering the downside of technical incompatibility) for the people that someone official >represents. When the Finnish delegation to ISO claimed that ISO-8859-1 was insufficient for Finnish >orthography (despite the ISO-8859-1 repertoire having successfully been in use in Finland at that >point), it was the official position of the official regulatory body for the Finnish language. >ISO-8859-15 was still a terrible idea.

I understand what you wanne say. As i explained earlier, the implentation of idna2008 with ß fixes more, than it can break, as it's already broken from the start and unusable atm. we are just talking about fixing a behaviour for roundabout 23.000 domainnames out of ~16.000.000 total ( for .de / numbers have been supplied by denic / thanks DBS ) that is intended by the nic, expected by the ownerns and and totally unkown to most internet users :)

If there will be an impact, it's insignificant compared to the opportunity for those domainnameowners to make use of theire domainname for the first time in 8 years.
Still getting link errors:

mozHunspell.obj : warning LNK4217: locally defined symbol ??0Hunspell@@QAE@PBD00@Z (public: __thiscall Hunspell::Hunspel
l(char const *,char const *,char const *)) imported in function "public: virtual enum nsresult __stdcall mozHunspell::Se
tDictionary(wchar_t const *)" (?SetDictionary@mozHunspell@@UAG?AW4nsresult@@PB_W@Z)
mozHunspell.obj : warning LNK4217: locally defined symbol ??1Hunspell@@QAE@XZ (public: __thiscall Hunspell::~Hunspell(vo
id)) imported in function "protected: virtual __thiscall mozHunspell::~mozHunspell(void)" (??1mozHunspell@@MAE@XZ)
mozHunspell.obj : warning LNK4217: locally defined symbol ?spell@Hunspell@@QAEHPBDPAHPAPAD@Z (public: int __thiscall Hun
spell::spell(char const *,int *,char * *)) imported in function "public: virtual enum nsresult __stdcall mozHunspell::Ch
eck(wchar_t const *,bool *)" (?Check@mozHunspell@@UAG?AW4nsresult@@PB_WPA_N@Z)
mozHunspell.obj : warning LNK4217: locally defined symbol ?suggest@Hunspell@@QAEHPAPAPADPBD@Z (public: int __thiscall Hu
nspell::suggest(char * * *,char const *)) imported in function "public: virtual enum nsresult __stdcall mozHunspell::Sug
gest(wchar_t const *,wchar_t * * *,unsigned int *)" (?Suggest@mozHunspell@@UAG?AW4nsresult@@PB_WPAPAPA_WPAI@Z)
mozHunspell.obj : warning LNK4217: locally defined symbol ?get_dic_encoding@Hunspell@@QAEPADXZ (public: char * __thiscal
l Hunspell::get_dic_encoding(void)) imported in function "public: virtual enum nsresult __stdcall mozHunspell::SetDictio
nary(wchar_t const *)" (?SetDictionary@mozHunspell@@UAG?AW4nsresult@@PB_W@Z)
LINK : warning LNK4199: /DELAYLOAD:dbghelp.dll ignored; no imports found from dbghelp.dll
LINK : warning LNK4199: /DELAYLOAD:PowrProf.dll ignored; no imports found from PowrProf.dll
Unified_cpp_netwerk_dns0.obj : error LNK2019: unresolved external symbol _uidna_openUTS46_55 referenced in function "pub
lic: __thiscall nsIDNService::nsIDNService(void)" (??0nsIDNService@@QAE@XZ)
Unified_cpp_netwerk_dns0.obj : error LNK2019: unresolved external symbol _uidna_close_55 referenced in function "protect
ed: virtual __thiscall nsIDNService::~nsIDNService(void)" (??1nsIDNService@@MAE@XZ)
Unified_cpp_netwerk_dns0.obj : error LNK2019: unresolved external symbol _uidna_labelToUnicode_55 referenced in function
 "private: enum nsresult __thiscall nsIDNService::IDNA2008StringPrep(class nsAString_internal const &,class nsAString_in
ternal &,enum nsIDNService::stringPrepFlag)" (?IDNA2008StringPrep@nsIDNService@@AAE?AW4nsresult@@ABVnsAString_internal@@
AAV3@W4stringPrepFlag@1@@Z)
xul.dll : fatal error LNK1120: 3 unresolved externals
Flags: needinfo?(smontagu)
(In reply to Philip Chee from comment #181)
> Still getting link errors:

Did you clobber your objdir?
> Did you clobber your objdir?
Yes that worked, Sorry for the bugspam
Flags: needinfo?(smontagu)
Blocks: 853237
Blocks: 733350
Whiteboard: [sg:want?] → [sg:want?][adv-main44-]
Blocks: 309435
Is this expected to work with firefox 45? I have no luck.

I own the domain roßdeutscher.de and tried to access it.

The developers networking Tab shows a request to the correct IP (mine), but beside http 500 the headers returned describe a completely different server (CentOS instead my Debian). I can see no errors or access in the server logs.
What's the IP address of the server you expect it to reach?
> What's the IP address of the server you expect it to reach?

Sorry for the trouble, was my DNS misconfiguration.
The IDNA2008 features including the support for ZWNJ for Arabic Script has been added to firefox 45.0. According to IDNA2008 standards ZWNJ is CONTEXTJ character and it is valid only if it satisfies the following condition otherwise the domain-name is invalid. However the current implementation of the FIREFOX accepts ZWNJ without any condition. 

RFC 5892: 
Appendix A.1.  ZERO WIDTH NON-JOINER

   Code point:
      U+200C

   Overview:
      This may occur in a formally cursive script (such as Arabic) in a
      context where it breaks a cursive connection as required for
      orthographic rules, as in the Persian language, for example.  It
      also may occur in Indic scripts in a consonant-conjunct context
      (immediately following a virama), to control required display of
      such conjuncts.

   Lookup:
      True

   Rule Set:

      False;

      If Canonical_Combining_Class(Before(cp)) .eq.  Virama Then True;

      If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C

         (Joining_Type:T)*(Joining_Type:{R,D})) Then True;
Can you open a new bug for comment 187 with a testcase that shows the problem? We do implement CONTEXTJ checks (or rather, we set the flag for ICU to implement them), and there is a test for it in netwerk/test/unit/test_idna2008.js.
Flags: needinfo?(alireza)
Depends on: 1215247
Blocks: 1324716
You need to log in before you can comment on or make changes to this bug.