Closed Bug 1022207 Opened 10 years ago Closed 10 years ago

nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag after Bug 960014

Categories

(Core :: Networking, defect)

32 Branch
x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: obrufau, Assigned: valentin)

References

Details

Attachments

(1 file, 3 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:32.0) Gecko/20100101 Firefox/32.0 (Beta/Release)
Build ID: 20140530030202

Steps to reproduce:

Run this code (it works):

> var uri = Components
>     .classes["@mozilla.org/network/io-service;1"]
>     .getService(Ci.nsIIOService)
>     .newURI("http://example.com[flag]", null, null);

And now this throws error:

> uri.host += "";


Actual results:

This error is thrown:

> [Exception... "Component returned failure code: 0x804b000a (NS_ERROR_MALFORMED_URI) [nsIURI.host]"  nsresult: "0x804b000a (NS_ERROR_MALFORMED_URI)"  location: "JS frame :: debugger eval code :: <TOP_LEVEL> :: line 5"  data: no]


Expected results:

It should work.

This seems caused by Bug 960014:
 - Good: 004d84a6905a (1401376133 inbound)
 - Bad: a9263fda991e (1401377392 inbound)
 - Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=004d84a6905a&tochange=a9263fda991e

I think the problem is nsStandardURL::ValidIPv6orHostname (in nsStandardURL.cpp line 433), which does:

> if (openBracket || closeBracket) {
> 	// Fail if only one of the brackets is present
> 	return false;
> }

Since closeBracket is true but openBracket is false, it thinks that the URI is malformed.
But in this case it's a [] flag,  not an IPv6 address, so it is not malformed.
Blocks: 960014
Why is this crap added to the URL Standard?
Assignee: nobody → valentin.gosu
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 8437376 [details] [diff] [review]
nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag

This patch fixes the use case in comment 0
Attachment #8437376 - Flags: review?(honzab.moz)
Not sure is the loop to check is necessary.

 - According to http://url.spec.whatwg.org/#hostname-state, "[" and "]" only set and unset "[] flag", but it doesn't say anything about unclosed brackets.
 - According to http://url.spec.whatwg.org/#concept-host-parser, if the first character is "[", then there must be an IPv6 check. But if it isn't, there should only be a failure if "domain to ASCII" is failure, or if the domain contains some characters (not including "[" nor "]").
 - And according to http://web.archive.org/web/20140422052343/http://www.unicode.org/reports/tr46/proposed.html#ToASCII, possible failures of "Unicode ToASCII" are unrelated to "[" and "]".

So I think that the loop is not needed, and unclosed brackets shouldn't produce errors (except in IPv6 case).

Note I'm no expert and maybe I misread the spec.
Also note that in your patch
> foo[[bar]
doesn't throw error.

But this does:
> foo[bar]]

I find this behavior a bit incoherent :)
Also, removed tests should be moved to test_ipv6() instead of being removed to test that they will not fail.
Thanks for the very useful comments and suggestions.
Comment on attachment 8437376 [details] [diff] [review]
nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag

@Anne, do you think the suggestions in comment 4 are correct?

@obrufau: I still think we need a loop to ensure that colons are enclosed in brackets.
Attachment #8437376 - Flags: review?(honzab.moz)
Flags: needinfo?(annevk)
Yeah, his reading of the specification is correct. IPv6 is leading [ and trailing ] for host. Nothing else.
Flags: needinfo?(annevk)
> @obrufau: I still think we need a loop to ensure that colons are enclosed in brackets.

To ensure there are no colons, I think we can use
> PL_strchr(host, ':')

And shouldn't we also check U+0000, U+0009, U+000A, U+000D, U+0020, "#", "%", "/", ":", "?", "@", and "\"?

Or maybe it would be better first fixing the too restrictive behavior after Bug 960014, and then create other bugs (blocking bug 906714) to make it compliant with the spec.
Attachment #8437376 - Attachment is obsolete: true
Comment on attachment 8437829 [details] [diff] [review]
nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag

Also filed Bug 1023468 to handle special characters.
Attachment #8437829 - Flags: review?(honzab.moz)
Comment on attachment 8437829 [details] [diff] [review]
nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag

Review of attachment 8437829 [details] [diff] [review]:
-----------------------------------------------------------------

I'm kinda lost in this code and don't have a good feeling of these recent patches (no offense to you Valentin - the code is bad from its basics so it's hard to have a good patch for it)

I really can't wait until we have a new code for this from scratch!

::: netwerk/base/src/nsStandardURL.cpp
@@ +430,5 @@
>          return net_IsValidIPv6Addr(host + 1, length - 2);
>      }
>  
> +    if (openBracket && !closeBracket) {
> +        // Missing the last bracket

// Missing the closing bracket at the end of the host name.  It may be somewhere in the middle but we require it at the end only.

@@ +446,5 @@
> +        }
> +        if (host[i] == ':' && !bracketFlag) {
> +            // The : character should be enclosed in brackets
> +            return false;
> +        }

so, 

'[foo]' - wrong, not an IPv6, OK
'[foo' - wrong, we expect ] at the end, OK
'foo]' - passes
'fo]o' - passes
'foo]]' - passes
'f[oo' - passes
'f[[oo' - passes
'f[oo[' - passes
'f[oo]' - passes
'f[o:' - passes < but shouldn't, I think
'f[o:]' - passes, not sure it should...
'f[o:]o' - ditto
'f[]oo' - passes
'f[:[:]:]' - doesn't pass what your algo proclaims to check :)
'f[:[:]o' - passes
'f][oo' - passes

::: netwerk/test/unit/test_standardurl.js
@@ +187,5 @@
> +  do_check_eq(url.port, 1);
> +
> +  url = stringToURL("http://example.com");
> +  url.host = "2001[::1]";
> +  do_check_eq(url.host, "2001[::1]");

if you want to support this insanity, then also check host = 2001[foo].
Attachment #8437829 - Flags: review?(honzab.moz) → review-
(In reply to Honza Bambas (:mayhemer) from comment #13)
> Comment on attachment 8437829 [details] [diff] [review]
> nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag
> 
> Review of attachment 8437829 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm kinda lost in this code and don't have a good feeling of these recent
> patches (no offense to you Valentin - the code is bad from its basics so
> it's hard to have a good patch for it)
> 
> I really can't wait until we have a new code for this from scratch!
> 

Indeed, it does seem to be slowly turning into spaghetti-code.
I'd be happy to have a meeting about the plans to rewrite this. 

> '[foo]' - wrong, not an IPv6, OK
> '[foo' - wrong, we expect ] at the end, OK
> 'foo]' - passes
> 'fo]o' - passes
> 'foo]]' - passes
> 'f[oo' - passes
> 'f[[oo' - passes
> 'f[oo[' - passes
> 'f[oo]' - passes
> 'f[o:' - passes < but shouldn't, I think
> 'f[o:]' - passes, not sure it should...
> 'f[o:]o' - ditto
> 'f[]oo' - passes
> 'f[:[:]:]' - doesn't pass what your algo proclaims to check :)

According to step 4 of http://url.spec.whatwg.org/#hostname-state and comment 4, which I think makes a good case, brackets shouldn't make the hostname parsing fail. 
The algorithm seems pretty clear and simple concerning brackets, but it does allow for some weird corner cases.

> 'f[:[:]o' - passes
> 'f][oo' - passes

Not sure how to proceed. Should we back out the ipv6 patch to fix this regression?
Flags: needinfo?(honzab.moz)
Since the original bug was not that serious, I think it's not a bad idea to back it out.  I would like to start rewriting the URL parsing code properly.

Anyway, let's not give up completely on this one yet.  

* What happens when you do url.host = "foo:" now (w/o this patch)?  If we accept it now, then I think we can still accept it regardless it's correct or not.

* Why are you actually checking the ':' are enclosed in [ ]?  Which of these are in your opinion bad host names:

- f:oo
- f[:]oo
- f[:oo
- foo]:
- foo:
- f[00
- f[2001::1]

Since when the host doesn't start with [ then it doesn't matter how many and where of [ ] : it contains, according the spec as I understand it.

However, how do we serialize a URI with a host name set like 'foo:123' and a (non)standard port?  http://foo:123:8080 ?  http://foo:123 ?  That is probably wrong :D

Is that the reason you force : enclosed?  And why do we actually allow that?  It is not an IPv6, so why bother?  I would throw and don't accept : in a host name that is not an IPv6 (roughly ^\[\d+(::\d+)+\]$) - unless I'm missing some magic here...
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #15)
> Since the original bug was not that serious, I think it's not a bad idea to
> back it out.  I would like to start rewriting the URL parsing code properly.
> 

You're right, the URL code and even this patch does a lot of things that aren't quite correct.
The tests pass, but not for the right reasons. Having a lexical analyzer would bring us much closer to having a correct and safe implementation.

> Anyway, let's not give up completely on this one yet.  
> 
> * What happens when you do url.host = "foo:" now (w/o this patch)?  If we
> accept it now, then I think we can still accept it regardless it's correct
> or not.
> 

It sets the hostname to foo, but also returns NS_ERROR_MALFORMED_URI which is silently ignored by the URL dom impl.

> * Why are you actually checking the ':' are enclosed in [ ]?  Which of these
> are in your opinion bad host names:
> 
> - f:oo
> - foo]:
> - foo:

The hostname should not contain a colon. When parsing, colons that are not in brackets should actually mark the end of a hostname. So the hostname should be set to f, foo] and foo.
It works correctly when setting via url.host, but if I do it via url.hostname, it actually fails (but shouldn't). 

> - f[:]oo
> - f[:oo
> - f[00
> - f[2001::1]
> 
> Since when the host doesn't start with [ then it doesn't matter how many and
> where of [ ] : it contains, according the spec as I understand it.
> 
I think all of these are correct. Encountering a colon when the bracketFlag isn't set should mark the end of the hostname. This check should be moved to SetHostPort actually.

> However, how do we serialize a URI with a host name set like 'foo:123' and a
> (non)standard port?  http://foo:123:8080 ?  http://foo:123 ?  That is
> probably wrong :D

The hostname can't contain a colon, so that operation fails. Before the IPv6 patch, it actually thought it was an IPv6 address, and serialized it as http://[foo:123]:8080

> 
> Is that the reason you force : enclosed?  And why do we actually allow that?
> It is not an IPv6, so why bother?  I would throw and don't accept : in a
> host name that is not an IPv6 (roughly ^\[\d+(::\d+)+\]$) - unless I'm
> missing some magic here...

According to hostname parsing algorithm, colons are allowed in the hostname if they are enclosed in brackets. Otherwise it marks the start of the port number.

I'll try to fix these issues with the patch. However, if that complicates things too much, we should probably back out the change, and focus on rewriting this.
Thanks!  Wise.
If URL parsing must be rewritten, IMO there is no point in fixing this, and bug 960014 should be backed out.
Attachment #8437829 - Attachment is obsolete: true
Anne,

If I'm reading the spec right, the hostname parsing algorithm allows for the next commands

url.host = "foo[abcd:abcd";
url.port = 1234;

The []flag is set when the colon is encountered, so it goes all the way to the end.
This would lead to an ambiguous serialization.

IMO there should be an extra check at http://url.spec.whatwg.org/#hostname-state step 2.1 - if [] flag is set, return failure.
Flags: needinfo?(annevk)
Comment on attachment 8440201 [details] [diff] [review]
nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag

I moved the search for a colon to another method, which simplifies SetHostPort a bit. I also documented the code and tests a bit more.

Do you think this patch is a step in the right direction, or should we proceed with backing out the ipv6 patch?

Thanks!
Attachment #8440201 - Flags: review?(honzab.moz)
Valentin, how would the ":" make it not fail step 6 of the host parser? Step 6 should maybe check for [ and ] as well just in case some Unicode can decompose into those code points.

(We do need to rewrite URL parsing at some point, however, given how our current code is structured a proper rewrite will affect many extensions. Making incremental improvements meanwhile therefore seems worth it.)
Flags: needinfo?(annevk)
(In reply to Anne (:annevk) from comment #22)
> Valentin, how would the ":" make it not fail step 6 of the host parser? Step
> 6 should maybe check for [ and ] as well just in case some Unicode can
> decompose into those code points.
 
Thanks. I missed that one.
Updating the patch to disallow in-brackets-colons in the hostname.
Attachment #8440201 - Attachment is obsolete: true
Attachment #8440201 - Flags: review?(honzab.moz)
Comment on attachment 8440274 [details] [diff] [review]
nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag

A bit closer to the spec now.
We should also fix Bug 1023468 after this one.
Attachment #8440274 - Flags: review?(honzab.moz)
Comment on attachment 8440274 [details] [diff] [review]
nsStandardURL::ValidIPv6orHostname doesn't allow hostnames with [] flag

Review of attachment 8440274 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like on a good way, still I don't follow the spec here - I think some of the host name forms should not be allowed, but I need to go through the spec my self more in detail (not as part of this bug).

I'll do some manual tests before r+

::: netwerk/base/src/nsStandardURL.cpp
@@ +1443,5 @@
> +void
> +nsStandardURL::FindColonLimit(nsACString::const_iterator& aIter,
> +                         nsACString::const_iterator& aEnd)
> +{
> +    bool bracketFlag = false;

How to behave for "[[:]:]"?  Result will be "[[:]" that will not pass ValidIPv6orHostname, so we are OK.

Then "[[:]foo:]".  Result will be "[[:]foo" that will not pass as well.  OK.

"foo[[]:123".  Result will be "foo[[]" and port will be 123.  This will pass.  Probably OK?

"foo[[:]:123]:123]" -> "foo[[:]" and "123]:123]", setHost will ignore the second part.  Will the serialization be parsable?  Probably yes but worth to check!


Doesn't the spec has gaps?  Should "foo[[:]" as a host name be actually accepted?

@@ +1444,5 @@
> +nsStandardURL::FindColonLimit(nsACString::const_iterator& aIter,
> +                         nsACString::const_iterator& aEnd)
> +{
> +    bool bracketFlag = false;
> +    for (; aIter != aEnd; aIter++) {

++aIter

Please take it as a rule when you are ++'ing and don't need the result, always put ++ before since it's in general considered better performing.

@@ +1455,5 @@
> +        if (*aIter == ':') {
> +            if (!bracketFlag) {
> +                return;
> +            }
> +        }

use switch.

@@ +1526,5 @@
> +    // Finding a colon in the hostname would mark its end
> +    // unless it's enclosed in brackets
> +    nsACString::const_iterator iter(start);
> +    FindColonLimit(iter, end);
> +    end = iter;

so, when I do url.host = "foo:123", what happen w/o this patch and what w/ this patch?  I presume w/ the patch it set host to "foo" and silently ignores 123 port.  Is that what we want?  Should we throw?

::: netwerk/test/unit/test_standardurl.js
@@ +170,5 @@
>    do_check_eq(url.host, "2001::1");
>  
>    url = stringToURL("http://example.com");
> +  Assert.throws(() => { url.hostPort = "2001::1"; }, "missing brackets");
> +  do_check_eq(url.host, "2001"); // The hostname ends at the first colon

is the comment correct?

@@ +171,5 @@
>  
>    url = stringToURL("http://example.com");
> +  Assert.throws(() => { url.hostPort = "2001::1"; }, "missing brackets");
> +  do_check_eq(url.host, "2001"); // The hostname ends at the first colon
> +  do_check_eq(url.port, -1); // Setting the port failed because of extra colon

hm?

@@ +176,3 @@
>  
>    url = stringToURL("http://example.com");
> +  url.host = "[2001::1]:20"; // Setting the hostname, so the port is ignored

aha!  so it's by the spec like this?

@@ +195,5 @@
> +      // hostname with port
> +      ["2001:1",          "2001:1",          "2001",           1],
> +      // hostname contains closed bracket, port = 1
> +      ["2001]:1",         "2001]:1",         "2001]",          1],
> +      ["2001[flag]",      "2001[flag]",      "2001[flag]",    -1]

"2001[flag]:1"
"2001[flag"
"2001[flag:1"
"2001[:]"
"2001[:]:1"
(In reply to Honza Bambas (:mayhemer) from comment #26)
> > +    bool bracketFlag = false;
> 
> How to behave for "[[:]:]"?  Result will be "[[:]" that will not pass
> ValidIPv6orHostname, so we are OK.
> 
> Then "[[:]foo:]".  Result will be "[[:]foo" that will not pass as well.  OK.
> 
> "foo[[]:123".  Result will be "foo[[]" and port will be 123.  This will
> pass.  Probably OK?
> 
> "foo[[:]:123]:123]" -> "foo[[:]" and "123]:123]", setHost will ignore the
> second part.  Will the serialization be parsable?  Probably yes but worth to
> check!

The port parsing will fail, since it contains ] and :
The host parsing will also fail since it contains a colon.

> Doesn't the spec has gaps?  Should "foo[[:]" as a host name be actually
> accepted?
> 

Per comment 22, and http://url.spec.whatwg.org/#concept-host-parser step 6, it turns out we shouldn't accept anything that is not an IPv6 address that contains colons. I had missed it on my first read.

> 
> so, when I do url.host = "foo:123", what happen w/o this patch and what w/
> this patch?  I presume w/ the patch it set host to "foo" and silently
> ignores 123 port.  Is that what we want?  Should we throw?
> 

Without the patch it throws, because the hostname contains a colon.
With the patch, it ignores the part after the colon, per http://url.spec.whatwg.org/#hostname-state step 1

> ::: netwerk/test/unit/test_standardurl.js
> @@ +170,5 @@
> >    do_check_eq(url.host, "2001::1");
> >  
> >    url = stringToURL("http://example.com");
> > +  Assert.throws(() => { url.hostPort = "2001::1"; }, "missing brackets");
> > +  do_check_eq(url.host, "2001"); // The hostname ends at the first colon
> 
> is the comment correct?
> 
The brackets are missing, but it should probably say that it throws because of the extra colon.

> @@ +171,5 @@
> >  
> >    url = stringToURL("http://example.com");
> > +  Assert.throws(() => { url.hostPort = "2001::1"; }, "missing brackets");
> > +  do_check_eq(url.host, "2001"); // The hostname ends at the first colon
> > +  do_check_eq(url.port, -1); // Setting the port failed because of extra colon
> 

Step one says to set the hostname up to the colon, and to proceed to parsing the port.
Since the port contains an extra character (the colon) we return error - port state, step 4

> 
> @@ +176,3 @@
> >  
> >    url = stringToURL("http://example.com");
> > +  url.host = "[2001::1]:20"; // Setting the hostname, so the port is ignored
> 
> aha!  so it's by the spec like this?

Same as above. We ignore everything after the first colon that is not enclosed in brackets

> "2001[:]"
> "2001[:]:1"
> "2001[flag:1"
These should fail because they contains a colon in the hostname.

> "2001[flag"
Would the serialization for this one be ambiguous with the port set to 1?
dump(url.hostPort); // would be 2001[flag:1
url.hostPort = url.hostPort; // it would fail in this case.

Should we backtrack (yet again) and enforce a check for a closed bracket to avoid this issue?

(Thanks for reviewing this, btw. This feels like a very complicated patch.)
If it's so complicated maybe it could be fixed in different bugs?

I would like to first fix the too restrictive behavior after Bug 960014, which breaks extensions like X-notifier.

Currently the patch of Bug 960014 is in Aurora, and I'm afraid it will go to beta before fixed by this bug.

Or maybe we can prevent it from going to next stage?
(In reply to obrufau from comment #28)
> Currently the patch of Bug 960014 is in Aurora, and I'm afraid it will go to
> beta before fixed by this bug.
> 
> Or maybe we can prevent it from going to next stage?

Aurora moves to Beta on the week of July 22, 2014
I hope to have this fixed and uplifted to Aurora by that time. Otherwise we can just backout the patch before that date.
Hi Anne, I've got another spec question for you:

As far as I can tell,
url.host = "foo[flag"; // is ok
url.port = 1; // is ok
// However
url.host = "foo[flag:1 // would fail, since you're not allowed to have : in a hostname except in ipv6
url2.host = url.host; // would fail because url.host is foo[flag:1 (this feels counter-intuitive)

In order to avoid this case, we can either
1. require the use of a ] bracket ( enforce the []flag is false at the end of the hostname)
2. if the first character isn't [, then just make the cut at the first colon ( doesn't really follow the steps in the spec )

Any thoughts? Thanks!
Flags: needinfo?(annevk)
Is there any reason to allow [ and ] in domain names? Maybe they should simply be banned too.
Flags: needinfo?(annevk)
(In reply to Anne (:annevk) from comment #31)
> Is there any reason to allow [ and ] in domain names? Maybe they should
> simply be banned too.

Apparently some addons might depend on this behaviour, given the very reason for this bug. (comment #0 and comment #4).
IMO, disallowing brackets would be a step forward. Other UA's (ie Chrome) don't allow them.
Maybe we should ban them too (or keep banning them that is), but I would be nice for the spec to specify this behaviour.

So, should we WONTFIX this bug?
Flags: needinfo?(annevk)
If we do that, the initial URL parse should fail, no? As in, http://example.com[flag] would result in failure. I can update the specification. The way I would fix this is by adding "[" and "]" to step 6 of http://url.spec.whatwg.org/#concept-host-parser

emk, if you could elaborate on comment 1 that would be appreciated.
Flags: needinfo?(annevk)
(In reply to Anne (:annevk) from comment #33)
> If we do that, the initial URL parse should fail, no? As in,
> http://example.com[flag] would result in failure. I can update the
> specification. The way I would fix this is by adding "[" and "]" to step 6
> of http://url.spec.whatwg.org/#concept-host-parser

That seems fair. Thanks!
Thanks, Anne!

Given the spec change, I'm marking this bug as WONTFIX.
I will probably end up reusing a large part of the patch in Bug 1023468.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Attachment #8440274 - Flags: review?(honzab.moz)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: