Closed Bug 1023468 Opened 6 years ago Closed 4 years ago

Handle special characters when parsing URL hostname

Categories

(Core :: Networking, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 412457

People

(Reporter: valentin, Assigned: valentin)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 3 obsolete files)

(In reply to obrufau from Bug 1022207 comment #10)
> > @obrufau: I still think we need a loop to ensure that colons are enclosed in brackets.
> 
> To ensure there are no colons, I think we can use
> > PL_strchr(host, ':')
> 
> And shouldn't we also check U+0000, U+0009, U+000A, U+000D, U+0020, "#",
> "%", "/", ":", "?", "@", and "\"?
> 
> Or maybe it would be better first fixing the too restrictive behavior after
> Bug 960014, and then create other bugs (blocking bug 906714) to make it
> compliant with the spec.

The exceptions are defined at:
http://url.spec.whatwg.org/#hostname-state
Assignee: nobody → valentin.gosu
Status: NEW → ASSIGNED
This patch contains the good bits you reviewed in bug 1022207. It cleans up and simplifies some of the URL code.
Attachment #8514299 - Flags: review?(honzab.moz)
Attachment #8511726 - Attachment is obsolete: true
This patch enables percent decoding in the hostname. Relevant bugs: 961476, 956463. I know it needs a bunch more unit tests to be added, but a preliminary review would be very useful
Attachment #8514301 - Flags: review?(honzab.moz)
Attachment #8511727 - Attachment is obsolete: true
I don't understand the IPv6 patch. How is that supported by the specification?
This patch adds support for link-local IPv6 addresses ( such as http://[fe80::1%wlan0] - see http://tools.ietf.org/html/rfc6874). It used to be supported, and it's also tracked in bug 700999, but it's clearer if I keep all of the patches in the same place
Attachment #8514304 - Flags: review?(honzab.moz)
Attachment #8511728 - Attachment is obsolete: true
Blocks: 700999
No longer blocks: url
Comment on attachment 8514304 [details] [diff] [review]
Support IPv6 Link-Local addresses

Clearing review flag, as we might not want to support this just yet, as it's not part of the URL spec.
Attachment #8514304 - Flags: review?(honzab.moz)
Blocks: url
Comment on attachment 8514299 [details] [diff] [review]
Handle special characters when parsing URL hostname

Review of attachment 8514299 [details] [diff] [review]:
-----------------------------------------------------------------

In general, I really really hate this code and I'd like to avoid any more wasting of time checking all new crazy through-string iterations once and for all.  I really see this code be completely rewritten with an intelligent parser as a total must!

Please answer first the questions before landing.

::: netwerk/base/src/nsStandardURL.cpp
@@ +447,5 @@
> +        host.BeginReading(start);
> +        host.EndReading(end);
> +        if (FindCharInReadable(gForbiddenHostChars[i], start, end)) {
> +            return false;
> +        }

iterating 12 * host name length characters... that is crazy...

@@ +456,3 @@
>      }
>  
> +

remove this blank

@@ +1470,5 @@
> +            case '[':
> +                bracketFlag = true;
> +                break;
> +            case ']':
> +                bracketFlag = false;

I assume the brackets cannot be nested here, right?

@@ +1506,4 @@
>  
>      FindHostLimit(start, end);
>  
> +    nsresult rv = FindColonLimit(iter, end);

a comment that iter and end are passed as a ref and where they are put on success return is needed.

::: netwerk/test/unit/test_standardurl.js
@@ +187,5 @@
> +  do_check_eq(url.port, -1);
> +
> +  url = stringToURL("http://example.com");
> +  Assert.throws(() => { url.hostPort = "2001::1"; }, "port contains colon");
> +  do_check_eq(url.host, "2001"); // the hostname ends at the first colon

so assignment throws but we end up modified?  is that something the spec says?  even if yes, then it's pretty strange...  or are we here just compatible with ourselfs?

@@ -191,2 @@
>    Assert.throws(() => { url.host = "[2001::1"; }, "missing last bracket");
> -  Assert.throws(() => { url.host = "2001::1]"; }, "missing first bracket");

these cases now work?
Attachment #8514299 - Flags: review?(honzab.moz) → review+
(In reply to Honza Bambas (:mayhemer) from comment #9)
> Comment on attachment 8514299 [details] [diff] [review]
> Handle special characters when parsing URL hostname
> 
> Review of attachment 8514299 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> In general, I really really hate this code and I'd like to avoid any more
> wasting of time checking all new crazy through-string iterations once and
> for all.  I really see this code be completely rewritten with an intelligent
> parser as a total must!

I pretty much agree with you. That code is due for a rewrite.
However, percent decoding and link-local addresses are cool features we might want sooner than later.

It also seems that Anne is going to rewrite the spec in a matter similar to this: http://intertwingly.net/projects/pegurl/url.html
If so, that new lexer is going to be very useful.

> 
> Please answer first the questions before landing.
> 
> ::: netwerk/base/src/nsStandardURL.cpp
> @@ +447,5 @@
> > +        host.BeginReading(start);
> > +        host.EndReading(end);
> > +        if (FindCharInReadable(gForbiddenHostChars[i], start, end)) {
> > +            return false;
> > +        }
> 
> iterating 12 * host name length characters... that is crazy...

True. Do you think using a switch statement would be better?

> 
> @@ +1470,5 @@
> > +            case '[':
> > +                bracketFlag = true;
> > +                break;
> > +            case ']':
> > +                bracketFlag = false;
> 
> I assume the brackets cannot be nested here, right?

They could, but since we forbid any [] chars in the hostname, that would also fail.

> 
> ::: netwerk/test/unit/test_standardurl.js
> @@ +187,5 @@
> > +  do_check_eq(url.port, -1);
> > +
> > +  url = stringToURL("http://example.com");
> > +  Assert.throws(() => { url.hostPort = "2001::1"; }, "port contains colon");
> > +  do_check_eq(url.host, "2001"); // the hostname ends at the first colon
> 
> so assignment throws but we end up modified?  is that something the spec
> says?  even if yes, then it's pretty strange...  or are we here just
> compatible with ourselfs?

It seems so: https://url.spec.whatwg.org/#hostname-state - Step 1.3
Chrome for example, doesn't ever fail parsing the port -> it just sets it to 0 if it fails. We could probably do that too.

> @@ -191,2 @@
> >    Assert.throws(() => { url.host = "[2001::1"; }, "missing last bracket");
> > -  Assert.throws(() => { url.host = "2001::1]"; }, "missing first bracket");
> 
> these cases now work?

When setting the hostname, parsing ends at the first : so it now works.
It follows the spec, but indeed not very intuitive or compatible with other UAs.

--

I guess we could hold on to this patch for a little while, and see if the updated spec would make things any easier/clearer.
Thanks for reviewing this!
Comment on attachment 8514301 [details] [diff] [review]
Do percent decoding in hostname

Review of attachment 8514301 [details] [diff] [review]:
-----------------------------------------------------------------

Let's see what we break with this.

::: netwerk/base/src/nsStandardURL.cpp
@@ +572,5 @@
>      mHostEncoding = eEncoding_ASCII;
>      // Note that we don't disallow URLs without a host - file:, etc
>      if (mHost.mLen > 0) {
> +        nsAutoCString tempHost;
> +        NS_UnescapeURL(spec + mHost.mPos, mHost.mLen, esc_AlwaysCopy|esc_Host, tempHost);

spaces around |

@@ +1556,5 @@
>      FindHostLimit(start, end);
>  
> +    const nsCString substring(Substring(start, end));
> +    nsAutoCString flat;
> +    NS_UnescapeURL(substring.BeginReading(), substring.Length(), esc_AlwaysCopy|esc_Host, flat);

maybe call |substring| |hostName| instead?
Attachment #8514301 - Flags: review?(honzab.moz) → review+
Blocks: 1142083
Due to hostname parsing being more strict, parsing hostname containing backslash now fails. A few tests are disabled until HTML backslash replacement is implemented in bug 652186
Hostname starting with [ and not ending in ] is invalid, so the test checking this use case was removed.
Attachment #8622143 - Flags: review?(bugs)
Comment on attachment 8622143 [details] [diff] [review]
Update test_nsDefaultURIFixup.js

This is worrisome. Could we just not land this before bug 652186?
I think we should not break user typing mozilla\\ to the location bar.
So, I guess we need some changes to nsDefaultURIFixup.
Attachment #8622143 - Flags: review?(bugs) → review-
These patches no longer apply. I am implementing hostname percent decoding in bug 412457.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 412457
You need to log in before you can comment on or make changes to this bug.