Value of nsIEffectiveTLDService.getEffectiveTLDLength meaningless in JavaScript

RESOLVED FIXED in mozilla1.9beta1

Status

()

Core
Networking
--
critical
RESOLVED FIXED
11 years ago
9 years ago

People

(Reporter: Wladimir Palant (for Adblock Plus info Cc bugzilla@adblockplus.org), Assigned: dwitte@gmail.com)

Tracking

(Blocks: 1 bug)

Trunk
mozilla1.9beta1
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9 +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [has patch][has reviews])

Attachments

(1 attachment, 5 obsolete attachments)

nsIEffectiveTLDService.getEffectiveTLDLength() uses UTF-8 and the returned TLD length also means UTF-8 characters. Now when one calls from JavaScript the conversion to UTF-8 is done automatically by XPConnect - so the length returned refers to a string we don't have. Example:

var tld 
    = Components.classes["@mozilla.org/network/effective-tld-service;1"]
                .getService(Components.interfaces.nsIEffectiveTLDService);

alert(tld.getEffectiveTLDLength("domain.ru"));  // returns 2
alert(tld.getEffectiveTLDLength("domain.ру"));  // returns 4

One would of course expect this to return 2 in both cases. Requesting blocking1.9 since this issue limits the usefulness of effective TLD service considerably.
Flags: blocking1.9?
Blocks: 331510
Severity: normal → critical
Hmm. :-(

If we are going to rejig this interface, can I reopen the idea of changing the name? Publically and politically, we can't call it the "effective TLD" - the registrars will throw a fit and not cooperate with us. And we need their help to maintain the list.

My current idea is the "public prefix service" - does that sound OK?

Gerv
Surely that's "suffix" rather than prefix? I'd suggest "public domain" service perhaps.

It's not even a very good interface from native code, since it can take an IDN domain name and return a length based on the ACE encoding which doesn't match the input.

I'd like to get two string-returning scriptable methods, one returning the "public domain" (e.g. ".co.uk") and the other the best-guess effective domain (e.g. "amazon.co.uk" returned from "beta.www.amazon.co.uk"). consumers could always calculate the latter from the former, but I suspect it's going to be a common enough desire we might as well build it into the service and debug the code only once.
The current API is as usable from javascript as from anywhere else, you just have to start with a nsIURI and use the hostASCII version to make sure it's converted the same way the effective service will convert it.
At the very least that should be documented...
(In reply to comment #3)
Yes, taking the detour through asciiHost will probably work. It doesn't make this a good interface however.
Dan: I guess it is a suffix, the way domains are normally written. So "Public Suffix Service". 

So your methods would be something like:
getPublicSuffix()
getBaseDomain()

"Public Domain Service" has an unfortunate ambiguity because of the other meanings of "Public Domain" :-(

Gerv
something like that. The length is useful only if we're really concerned about the perf/bloat impact of the string copies (which we may well be in places), and it's a little confusing that it may be a length relative to a string different from the one you passed into it.

Then we have to figure out if the returned string is an IDN domain or punycoded, and is it always one or the other or does it follow the IDN whitelist?

Comment 8

11 years ago
Blocking at least for correct documentation, if not a better API.
Flags: blocking1.9? → blocking1.9+
There is bug 368700 and bug 368702 depending on how this bug is dealt with - I didn't want to check in a consumer for nsIEffectiveTLDService if the interface is going to be refactored. But since I don't see any movement here - how should I proceed? I could check in anyway or change the patches to use the one-dot rule instead of nsIEffectiveTLDService as it is currently common in the source base.

Comment 10

11 years ago
Wladimir, can you take this bug?
Assignee: nobody → trev.moz
So I guess what should be done here is adding getEffectiveTLD and getEffectiveDomain methods, as well as marking getEffectiveTLDLength as [noscript] because of its ambiguousness when called from JavaScript (I think it should be kept for performance reasons however). Unfortunately, I doubt that I will have time for this.
Created attachment 264676 [details] [diff] [review]
Proposed patch

This does what I described in the previous comment. It also changes the testcase from bug 368702 to use getEffectiveTLD and getEffectiveDomain since getEffectiveTLDLength is no longer available.
Attachment #264676 - Flags: superreview?(dveditz)
Attachment #264676 - Flags: review?(darin.moz)
Status: NEW → ASSIGNED

Comment 13

11 years ago
A similar patch is in bug 367446.
Wladimir: can you and Dave (bug 367446) cooperate to work out what we need to do? 

Gerv

Updated

11 years ago
Whiteboard: [has patch] needs r?darin,dveditz
Target Milestone: --- → mozilla1.9alpha5

Updated

11 years ago
Attachment #264676 - Flags: review?(darin.moz) → review?(cbiesinger)
Dave agreed with me that development should continue in this bug. I will add truncation of output parameters that I forgot in my patch, otherwise the two patches are pretty much identical with exception of comments and variable naming. Bug 367446 comment 10 needs to be addressed as well.
Duplicate of this bug: 367446
Comment on attachment 264676 [details] [diff] [review]
Proposed patch

Clearing review requests awaiting an updated patch.

1) need to address bug 367446 comment 10 (getBaseDomain() needs a numeric "depth" argument. Current code assumes "1").

2) renaming is now or never. I don't feel strongly about the service name--especially given the PITA of moving files--but the new methods need better names. I'm happy enough with Gerv's names from comment 6

3) It would make more sense for these methods to take a nsIURI instead of a string domain name. I suspect the vast majority of the callers will already have a nsIURI and have to call getHost, and then we have to worry that each callsite will correctly drill into the innermost URI in cases like jar: uri's (offline apps, etc) before reading the host. Centralizing would be safer. Plus the URIs should already be normalized and we could skip that step.

If we think enough places might just have a string host then maybe we need two sets of interfaces, one normalizing a string, one taking the innermost host out of a URI, and both feeding into shared guts.

4) should we get rid of the GetEffectiveTLDLength method? I think it's just asking for trouble having people use it.

5) we might not actually need both getPublicSuffix() and getBaseDomain() -- if we add the "depth" argument they come out to the same thing. getBaseDomain(nsIURI, 0) is public, getBaseDomain(nsIURI, 1) is the basic private one. 

On second thought getPublicSuffix(nsIURI) would add readability to code that uses it, doesn't hurt to keep it.
Attachment #264676 - Attachment is obsolete: true
Attachment #264676 - Flags: superreview?(dveditz)
Attachment #264676 - Flags: review?(cbiesinger)
getEffectiveTLDLength should remain as a scriptable method. Otherwise this change will make it even harder to get the effective sub-domain (i.e. host minus base domain.)

(In reply to comment #5)
> (In reply to comment #3)
> Yes, taking the detour through asciiHost will probably work. It doesn't make
> this a good interface however.

It does. See bug 367446 comment 7.
I would argue for a single method, as Dan outlines in comment #5, taking an nsIURI and an integer. It's rather hard to figure out what best to call it, though. I'd be happy with getPublicSuffix(), but we might consider e.g. getPublicSuffixPlus(1, nsIURI), which returns the public suffix + 1.

If people want the effective TLD length, they can call length() on the string returned by getPublicSuffixPlus(0, nsIURI). The function which returns a number for the length is broken.

Gerv

Daniel Veditz:

3) I am not so sure about making parameter an nsIURI. The one existing consumer is document.domain setter - and it doesn't have an nsIURI, neither do I see a meaningful way to construct one. Same goes for cookie handlers once they start using this service. Adblock Plus will probably have the same problem when using this interface.

This means having two methods with the only difference being the type of the input parameter. Is this really a good solution?

4) It should be fine if we have getBaseDomainLength marked as [noscript]. There is little point using this method from JavaScript anyway, but it is safe to use from C++, and it can improve performance there (saves you string copy/compare, see document.domain setter).

5) I don't think that "getPublicSuffixPlus" actually makes it more clear what the following parameter means. I am leaning towards getBaseDomain as the only method, this is the most consistent solution.
Created attachment 266257 [details] [diff] [review]
Patch v2

I decided not to change hostname parameter type here, I rather prefer not to make this patch more complicated than necessary. Please file a follow-up bug if you want this fixed. Other than that - methods have been renamed and an integer parameter for the number of additional domain parts has been added.
Attachment #266257 - Flags: superreview?(dveditz)
Attachment #266257 - Flags: review?(cbiesinger)
Whiteboard: [has patch] needs r?darin,dveditz → [has patch] needs r?biesi,dveditz

Updated

11 years ago
Target Milestone: mozilla1.9alpha5 → mozilla1.9alpha6
Moving to A6 as this probably won't happen in time.

Updated

11 years ago
Blocks: 366797, 367799, 378668
(Assignee)

Comment 23

11 years ago
(In reply to comment #20)
> meaningful way to construct one. Same goes for cookie handlers once they start
> using this service. Adblock Plus will probably have the same problem when 

actually, in cookies we always have an nsIURI to work with, so for us i think dveditz' solution of having two methods is pragmatic. getting things right should always come first, and you should encourage people to use the preferred method. (e.g. you could call the nsIURI version getBaseDomain, and the string version getBaseDomainFromString.) at the very least, if you stick with the string method you should provide a comment in the idl showing consumers exactly how to extract the string from their nsIURI properly.

i'm also in agreement with gerv re not having the length method; i think the cost of the string copy (which will often be into a stack-based string c.f. nsCAutoString) is negligible, and it makes the interface more sane. in this case, i think neatness of the interface wins over micro-optimization.

thanks for working on this; i'm currently working on a patch to use this stuff in cookie code (bug 385299). i hope you get it in for alpha!
(Assignee)

Updated

11 years ago
Blocks: 385299
(Assignee)

Comment 24

11 years ago
also, how will the aAdditionalParts parameter work in the case of ip addresses (bug 364129) - will it work as expected and give back the original ip address?
Yes, an IP address is considered to be one single TLD, the aAdditionalParts is irrelevant in this case since there is nothing to add.
(Assignee)

Comment 26

11 years ago
i have another question about aAdditionalParts - say i give getBaseDomain a string like "co.uk" and ask for 1 additional part, expecting something back like "bbc.co.uk". it'll give me back "co.uk", and say i'm using that to check if a different URI "evil.co.uk" is a subdomain of "bbc.co.uk". my test will now pass when it should've failed. if the consumers are deriving these strings from real URI's, this should never happen, but the getBaseDomain method takes a string so you should be prepared for anything. you could also imagine this case when asking for e.g. 5 levels on a 3-level URI - the point is that the consumer expects 5 levels and they didn't get it, which i think we should tell them.

there's no way for the consumer to check that what they got back was actually what they expected, short of re-parsing the domain string to check that the number of levels they asked for was what they got. would it be sensible to return a specific failure code if they asked for more levels than the URI contains? e.g. NS_ERROR_INSUFFICIENT_DOMAIN_LEVELS. consumers could then check for this specific error and respond accordingly. an alternative would be to make aAdditionalParts an inout, and return the actual number of parts found.

it may also make sense to have a similar error code for ip addresses, e.g. NS_ERROR_IP_ADDRESS. (in cookie code, we need to special-case ip's before we even call getBaseDomain, so this would help us there, and i think it makes sense in general.)

wladimir, dveditz, do you have any comments/ideas?
punting remaining a6 bugs to b1, all of these shipped in a5, so we're at least no worse off by doing so.
Target Milestone: mozilla1.9alpha6 → mozilla1.9beta1
(Assignee)

Comment 28

10 years ago
no response from Wladimir -> taking bug. patch forthcoming.
Assignee: trev.moz → dwitte
Status: ASSIGNED → NEW
Target Milestone: mozilla1.9 M8 → M1
(Assignee)

Comment 29

10 years ago
Created attachment 274575 [details] [diff] [review]
patch v3

this builds on Wladimir's work (patch v2) and addresses comment 17 (adding an nsIURI method) and comment 26. this patch also has a fix for bug 364129 rolled in (IP address handling), since it's all related.

specifically, what this patch does:
1) reworks the nsIEffectiveTLDService interface to have getBaseDomain() and getBaseDomainFromURI() methods. the latter uses NS_GetInnermostURI and GetHost.
2) defines two new error codes, NS_ERROR_INSUFFICIENT_DOMAIN_LEVELS and NS_ERROR_HOST_IS_IP_ADDRESS, to provide consumers with specific failure information. (see comment 26).
3) per bug 364129, we check for IP addresses and throw an error. in that bug, there was talk of simply returning the whole IP address as the eTLD, without failing. i think it's safer to error out, because we really don't know what the consumer wants to do with an IP address (e.g. in cookies we need to special-case it - it's useless having it silently succeed). having a special failure code here avoids the cost of the consumer checking it a second time.
4) changes the normalization semantics. i looked at the code in urlbarBindings.xml, where it works around etld's normalization of the host string by reverse engineering the result and re-applying it, and figured it's probably better to do this in the service itself. so, we normalize the host to do the hash lookup, but then we apply the result to the original unnormalized host string. thus, the result we return will be directly comparable to whatever the consumer parsed in, normalized or not.
5) updates the two consumers of eTLD in the codebase (though one is disabled).
6) updates unit tests.

biesi, dveditz, comments and suggestions appreciated! specifically on the nsIURI-related stuff, because you probably know better than me what to do there.
Attachment #266257 - Attachment is obsolete: true
Attachment #274575 - Flags: superreview?(dveditz)
Attachment #274575 - Flags: review?(cbiesinger)
Attachment #266257 - Flags: superreview?(dveditz)
Attachment #266257 - Flags: review?(cbiesinger)
Comment on attachment 274575 [details] [diff] [review]
patch v3

>+++ browser/base/content/urlbarBindings.xml	31 Jul 2007 04:43:43 -0000
...
>+              // get the etld of the host, and parse off the subdomain portion from the
>+              // beginning of the string
>+              // e.g. host = "foo.co.uk" -> subdomain = "foo." and etld = "co.uk"
>+              var etld = this.tldService.getBaseDomain(host, 0);
>+              var subdomain = host.substring(0, host.length - etld.length);
>+              if (subdomain.length > 0) {
>+                this._subDomain.setAttribute("value", subdomain);
>+                host = etld;

This should detect the effective domain, not the tld, so getBaseDomain should be called with a 1 as the second argument. Also, shouldn't there be a try/catch for IP addresses?
(Assignee)

Updated

10 years ago
Target Milestone: M1 → mozilla1.9 M8
Comment on attachment 274575 [details] [diff] [review]
patch v3

netwerk/dns/public/nsIEffectiveTLDService.idl
+     *         NS_ERROR_INSUFFICIENT_DOMAIN_LEVELS

you should have an @throws for each of those


+     * @param   aHost       The hostname to be analyzed, in UTF-8
...
+    AUTF8String getBaseDomain(in ACString aHost, in PRUint32 aAdditionalParts);

aHost should be AUTF8String, so that JS actually can pass UTF-8

+     * Helper method for dealing with nsIURI's.

personally I'd avoid the apostrophe by writing "nsIURI objects" :)

netwerk/dns/src/nsEffectiveTLDService.cpp	
+  if (beginIter != endIter) {

might be a better idea to check for an empty string at the start of the function and return NS_ERROR_INVALID_ARG then

netwerk/test/unit/test_bug368702.js	
+    do_check_eq(e.result, 0x804b0050); // NS_ERROR_INSUFFICIENT_DOMAIN_LEVELS

if you defined constants for these errors, you wouldn't need the comments :)

+    etld = tld.getBaseDomain("::192.9.5.5", 0);

I'd use ::ffff:192.9.5.5 here, as that's the standard way for mapping IPv4 addresses to IPv6
Attachment #274575 - Flags: review?(cbiesinger) → review+
(Assignee)

Updated

10 years ago
Attachment #274575 - Flags: superreview?(dveditz)
(Assignee)

Comment 32

10 years ago
Created attachment 279395 [details] [diff] [review]
patch v3.1

patch updated per comments. dveditz, any chance you can get to this soon?
Attachment #274575 - Attachment is obsolete: true
Attachment #279395 - Flags: superreview?(dveditz)
Attachment #279395 - Flags: review+
Comment on attachment 279395 [details] [diff] [review]
patch v3.1

>+     * @throws NS_ERROR_INSUFFICIENT_DOMAIN_LEVELS
>+     *         when the sum of the aAdditionalParts parameter and the number of
>+     *         subdomain levels in the effective TLD are greater than the total number
>+     *         of subdomain levels in aHost

I'm dubious about this aspect of the new API, will someone wanting "2" care if there's only 1 to be had?

>+    AUTF8String getBaseDomain(in AUTF8String aHost, in PRUint32 aAdditionalParts);
>+    AUTF8String getBaseDomainFromURI(in nsIURI aURI, in PRUint32 aAdditionalParts);

suggestion: reverse these so getBaseDomain() takes a URI and getBaseDomainFromHost() or getBaseDomainFromStr() takes the string. Just a thought since we're trying to encourage the use of nsIURIs rather than have people try to pass around string specs and the like.

>+  // take the unnormalized original domain string, and count off the number
>+  // of required domains. this avoids issues related to returning a normalized
>+  // version of the host string, which the consumer might try to compare to
>+  // other strings which haven't been similarly normalized.

Clever, but in an unnormalized host dots might not be 0x2e which could lead to miscounts (I guess throwing the insufficient levels error, or including too many levels in the returned domain). This is why I'm minusing the review.
http://lxr.mozilla.org/mozilla/source/netwerk/dns/src/nsIDNService.cpp#501

I'd be more comfortable if you passed back normalized base domains. Anyone starting with a nsIURI will have a normalized host, what situation did you think people would run into trouble with a normalized return value?
Attachment #279395 - Flags: superreview?(dveditz) → superreview-
(Assignee)

Comment 34

10 years ago
Created attachment 280305 [details] [diff] [review]
v3.2

(In reply to comment #33)
thanks for the comments... here's an updated patch. i have some questions below:

> I'm dubious about this aspect of the new API, will someone wanting "2" care if
> there's only 1 to be had?

well, i'm not too fussed either way... my argument for adding this is given in comment 26. an alternative (as suggested in that comment) would be to make aAdditionalParts an inout and return success, so consumers who care can check that what they wanted was what they got.

> suggestion: reverse these so getBaseDomain() takes a URI and
> getBaseDomainFromHost() or getBaseDomainFromStr() takes the string. Just a
> thought since we're trying to encourage the use of nsIURIs rather than have
> people try to pass around string specs and the like.

ok; done.

> I'd be more comfortable if you passed back normalized base domains. Anyone
> starting with a nsIURI will have a normalized host, what situation did you
> think people would run into trouble with a normalized return value?

good point about the dots. the reason i did this was after looking at what consumer code in browser/base/content/urlbarBindings.xml was doing (see patch v3.1); it's basically doing the same thing to apply the effective tld length to its original string, although in their case it's coming from a URI and will be normalized. we can just let the consumer worry about normalization and encoding.

i don't think we can assume nsIURI's will be utf8 either; per http://mxr.mozilla.org/mozilla/source/netwerk/base/src/nsStandardURL.cpp#886 it's possible to end up with an ACE-encoded string. so, i've added a check for ACE encoding into the normalization function. this will benefit getBaseDomainFromHost() as well. do we need to check for url escaped hosts or anything else there?

also, do we really need to keep track of a trailing dot on the host, and add it back at the end? i'm not sure why we need it...

   // add on the trailing dot, if applicable
+  if (trailingDot)
+    aBaseDomain.Append('.');
Attachment #279395 - Attachment is obsolete: true
Attachment #280305 - Flags: superreview?(dveditz)

Updated

10 years ago
Target Milestone: mozilla1.9 M8 → mozilla1.9 M9

Updated

10 years ago
Blocks: 397655
Any progress on this at all?  We need to get this in ASAP if we are going to make it into beta.  
(Assignee)

Comment 36

10 years ago
waiting on sr=dveditz, he said he should be able to get to it within a few days.

Updated

10 years ago
Whiteboard: [has patch] needs r?biesi,dveditz → [has patch] needs sr?dveditz

Updated

10 years ago
Blocks: 223895
(In reply to comment #34)
> although in their case it's coming from a URI and will be normalized.
> we can just let the consumer worry about normalization and encoding.

Additional consumers are as likely as not to be people writing their first add-on who don't even know IDN exists let alone know they're expected to deal with it. Letting the consumer worry about it is a dangerous approach.
Comment on attachment 280305 [details] [diff] [review]
v3.2

Bugzilla ate my homework :-(
I reviewed this over the weekend, but apparently it got lost. I'll try to remember my comments

I had assumed the nsIURI form would be the base form rather than having it just forward to the string form. That means the string form may or may not be normalized the same way as other code and that's the kind of difference that can lead to security errors.

Also today I overheard sdwilsh on IRC ask for a default arg for getBaseDomain(), it made me prefer the suggested split getPublicSuffix()/getBaseDomain() suggested several comments back. That makes the proposed getBaseDomain(uri,0) equivalent to this patch's getBaseDomain(uri,1) which could then be an [optional] arg in the IDL. I really prefer the clarity of having two separate names to having people try to remember what 0 or 1 mean. Most people would want either getPublicSuffix() or plain getBaseDomain(), the very few who need extra bits could pass the extra arg. getBaseDomain() would throw if given a purely public suffix.

>               var asciiHost = this._uri.asciiHost;
...
>+                               this.tldService.getBaseDomainFromHost(asciiHost, 0)
>+                                              .split(".").length - 1;

Doesn't look like you need asciiHost anymore, you can call getBaseDomain(this._uri, 0). Do you need to worry this might throw now? Above this chunk you try to strip out IP addresses, would it make more sense just to call this and catch it throwing the IP address error rather than try to duplicate that code with regexp's that might not capture the rules?

>Index: content/html/document/src/nsHTMLDocument.cpp
>===================================================================
>+  if (current.Length() > aDomain.Length() &&
>+      StringEndsWith(current, aDomain, nsCaseInsensitiveStringComparator()) &&

I guess this isn't your problem per se, but setting document.domain may require different code depending on whether the browser supports IDN for the TLD or not. This can differ by browser vendor and by updates to the IDN TLD whitelist.

Site www.müller.de
------------------
  document.domain = "www.müller.de"        // works
  document.domain = "www.xn--mller-kva.de" // throws

Site www.xn--mller-kva.com
--------------------------
  document.domain = "www.müller.com"        // throws
  document.domain = "www.xn--mller-kva.com" // works

But setting document.location  works with any of those four forms (the resulting URI gets normalized).

Seems like document.domain ought to normalize first, then do these compares. At least the "current" document.domain is guaranteed to start out normalized so it should match our eTLD list.

Ok, so the eTLD list is in UTF-8 so we need to match when we compare. Alright, let's leave it alone then.

>Index: netwerk/dns/public/nsIEffectiveTLDService.idl
>===================================================================
>+    /**
>+     * Returns the effective TLD of a host string. Otherwise identical to getBaseDomain().
>+     * When determining the effective TLD, the hostname will be converted to UTF8
>+     * if necessary and normalized using nsIIDNService::normalize, which follows
>+     * RFC 3454. getBaseDomain() will fail if the hostname contains characters that
                -----^^^^^ getBaseDomainFromHost()

>+     * @see     getBaseDomainFromHost()
             ------^^^^^^^ getBaseDomain()

>Index: netwerk/dns/src/nsEffectiveTLDService.cpp
>===================================================================
>@@ -107,100 +108,157 @@ nsEffectiveTLDService::~nsEffectiveTLDSe
>+nsEffectiveTLDService::GetBaseDomain(nsIURI     *aURI,
>+  return GetBaseDomainFromHost(host, aAdditionalParts, aBaseDomain);

As mentioned I thought the nsIURI form should be the preferred form so we can ensure consistent normalization

>+// nsEffectiveTLDService::GetBaseDomainInternal

"...FromHost"?

sr=dveditz

(but I think getPublicSuffix() would be useful, and getBaseDomain() would make more sense if the default value were "mozilla.com" rather than "com")
Attachment #280305 - Flags: superreview?(dveditz) → superreview+
The normalizing code lowercases before looking things up, but effective_tld_names.dat contains some suffixes with upper-case (particularly the .it ccTLD, but there are a few others). Those aren't going to match, are they?
Thanks for digging into this, everyone.  :)
(Assignee)

Comment 41

10 years ago
Comment on attachment 280305 [details] [diff] [review]
v3.2

requesting approval to get this in for M9. i think it's worth it since developers might want to use this interface, and having it updated in beta will potentially get it more testing. it's also better to ship with the revised interface than something obsolete.

risk here is minimal since we have only two consumers, one of which is switched off.
Attachment #280305 - Flags: approvalM9?
(Assignee)

Comment 42

10 years ago
Comment on attachment 280305 [details] [diff] [review]
v3.2

nevermind, apparently i've got autoapproval!
Attachment #280305 - Flags: approvalM9?
Comment on attachment 280305 [details] [diff] [review]
v3.2

> nsEffectiveTLDService::NormalizeHostname(nsCString &aHostname)
 ...
>+  if (IsASCII(aHostname)) {
>+    PRBool isACE;
>+    if (NS_SUCCEEDED(mIDNService->IsACE(aHostname, &isACE)) && isACE)
>+      return mIDNService->ConvertACEtoUTF8(aHostname, aHostname);
>+
>+    ToLowerCase(aHostname);
>+    return NS_OK;
>+  }
>+
>   return mIDNService->Normalize(aHostname, aHostname);

If the URI is ACE then it doesn't go through ToLowerCase(), nor does it get the lowercasing nsIDNService::Normalize() will do.

ConvertACEtoUTF8 will never return a failure code, but it can hit errors and is advertised as returning its input unchanged. I thought I stepped through once and got just a '.' back out though.
Please land and resolve ASAP.
(Assignee)

Comment 45

10 years ago
I'll be landing this today.
Whiteboard: [has patch] needs sr?dveditz → [has patch][has reviews]
(Assignee)

Comment 46

10 years ago
Created attachment 286124 [details] [diff] [review]
v3.3

patch as checked in
Attachment #280305 - Attachment is obsolete: true
(Assignee)

Comment 47

10 years ago
i changed the interface (to getPublicSuffix/getBaseDomain, with optional parameter) per dveditz' suggestion, fixed the ACE bug in the normalization method, and fixed the various nits. there are two remaining issues highlighted in dveditz' comment, which i'll file followup bugs for. briefly, these are:

1) normalization. the TLD file is in UTF8, but URI's can be UTF8 or ACE depending on whitelisting rules. these normalization details are buried deep in nsStandardURL, and there's no public interface that exposes them, to my knowledge. so we either have to re-normalize URI's to UTF8 for comparison (as this patch does), or expose this whitelist-based normalization so that it can be applied to the TLD file on read-in. i'll investigate the latter in the followup bug.

2) nsHTMLDocument::SetDomain() failing in some cases due to working with non-normalized strings. easy to fix!
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Assignee)

Comment 48

10 years ago
issue 2) above is bug 400552.
(Assignee)

Comment 49

10 years ago
for issue 1) above, filed bug 402008 and bug 402013.
Blocks: 467520
You need to log in before you can comment on or make changes to this bug.