Handle special characters when parsing URL hostname

RESOLVED DUPLICATE of bug 412457

Status

()

Core
Networking
RESOLVED DUPLICATE of bug 412457
4 years ago
2 years ago

People

(Reporter: valentin, Assigned: valentin)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 3 obsolete attachments)

(Assignee)

Description

4 years ago
(In reply to obrufau from Bug 1022207 comment #10)
> > @obrufau: I still think we need a loop to ensure that colons are enclosed in brackets.
> 
> To ensure there are no colons, I think we can use
> > PL_strchr(host, ':')
> 
> And shouldn't we also check U+0000, U+0009, U+000A, U+000D, U+0020, "#",
> "%", "/", ":", "?", "@", and "\"?
> 
> Or maybe it would be better first fixing the too restrictive behavior after
> Bug 960014, and then create other bugs (blocking bug 906714) to make it
> compliant with the spec.

The exceptions are defined at:
http://url.spec.whatwg.org/#hostname-state
(Assignee)

Comment 1

3 years ago
Created attachment 8511726 [details] [diff] [review]
Handle special characters when parsing URL hostname
(Assignee)

Updated

3 years ago
Assignee: nobody → valentin.gosu
Status: NEW → ASSIGNED
(Assignee)

Comment 2

3 years ago
Created attachment 8511727 [details] [diff] [review]
Do percent decoding in hostname
(Assignee)

Comment 3

3 years ago
Created attachment 8511728 [details] [diff] [review]
Support IPv6 Link-Local addresses
(Assignee)

Comment 4

3 years ago
Created attachment 8514299 [details] [diff] [review]
Handle special characters when parsing URL hostname


This patch contains the good bits you reviewed in bug 1022207. It cleans up and simplifies some of the URL code.
Attachment #8514299 - Flags: review?(honzab.moz)
(Assignee)

Updated

3 years ago
Attachment #8511726 - Attachment is obsolete: true
(Assignee)

Comment 5

3 years ago
Created attachment 8514301 [details] [diff] [review]
Do percent decoding in hostname


This patch enables percent decoding in the hostname. Relevant bugs: 961476, 956463. I know it needs a bunch more unit tests to be added, but a preliminary review would be very useful
Attachment #8514301 - Flags: review?(honzab.moz)
(Assignee)

Updated

3 years ago
Attachment #8511727 - Attachment is obsolete: true

Comment 6

3 years ago
I don't understand the IPv6 patch. How is that supported by the specification?
(Assignee)

Comment 7

3 years ago
Created attachment 8514304 [details] [diff] [review]
Support IPv6 Link-Local addresses


This patch adds support for link-local IPv6 addresses ( such as http://[fe80::1%wlan0] - see http://tools.ietf.org/html/rfc6874). It used to be supported, and it's also tracked in bug 700999, but it's clearer if I keep all of the patches in the same place
Attachment #8514304 - Flags: review?(honzab.moz)
(Assignee)

Updated

3 years ago
Attachment #8511728 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Blocks: 700999
No longer blocks: 906714
(Assignee)

Comment 8

3 years ago
Comment on attachment 8514304 [details] [diff] [review]
Support IPv6 Link-Local addresses

Clearing review flag, as we might not want to support this just yet, as it's not part of the URL spec.
Attachment #8514304 - Flags: review?(honzab.moz)

Updated

3 years ago
Blocks: 906714
Comment on attachment 8514299 [details] [diff] [review]
Handle special characters when parsing URL hostname

Review of attachment 8514299 [details] [diff] [review]:
-----------------------------------------------------------------

In general, I really really hate this code and I'd like to avoid any more wasting of time checking all new crazy through-string iterations once and for all.  I really see this code be completely rewritten with an intelligent parser as a total must!

Please answer first the questions before landing.

::: netwerk/base/src/nsStandardURL.cpp
@@ +447,5 @@
> +        host.BeginReading(start);
> +        host.EndReading(end);
> +        if (FindCharInReadable(gForbiddenHostChars[i], start, end)) {
> +            return false;
> +        }

iterating 12 * host name length characters... that is crazy...

@@ +456,3 @@
>      }
>  
> +

remove this blank

@@ +1470,5 @@
> +            case '[':
> +                bracketFlag = true;
> +                break;
> +            case ']':
> +                bracketFlag = false;

I assume the brackets cannot be nested here, right?

@@ +1506,4 @@
>  
>      FindHostLimit(start, end);
>  
> +    nsresult rv = FindColonLimit(iter, end);

a comment that iter and end are passed as a ref and where they are put on success return is needed.

::: netwerk/test/unit/test_standardurl.js
@@ +187,5 @@
> +  do_check_eq(url.port, -1);
> +
> +  url = stringToURL("http://example.com");
> +  Assert.throws(() => { url.hostPort = "2001::1"; }, "port contains colon");
> +  do_check_eq(url.host, "2001"); // the hostname ends at the first colon

so assignment throws but we end up modified?  is that something the spec says?  even if yes, then it's pretty strange...  or are we here just compatible with ourselfs?

@@ -191,2 @@
>    Assert.throws(() => { url.host = "[2001::1"; }, "missing last bracket");
> -  Assert.throws(() => { url.host = "2001::1]"; }, "missing first bracket");

these cases now work?
Attachment #8514299 - Flags: review?(honzab.moz) → review+
(Assignee)

Comment 10

3 years ago
(In reply to Honza Bambas (:mayhemer) from comment #9)
> Comment on attachment 8514299 [details] [diff] [review]
> Handle special characters when parsing URL hostname
> 
> Review of attachment 8514299 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> In general, I really really hate this code and I'd like to avoid any more
> wasting of time checking all new crazy through-string iterations once and
> for all.  I really see this code be completely rewritten with an intelligent
> parser as a total must!

I pretty much agree with you. That code is due for a rewrite.
However, percent decoding and link-local addresses are cool features we might want sooner than later.

It also seems that Anne is going to rewrite the spec in a matter similar to this: http://intertwingly.net/projects/pegurl/url.html
If so, that new lexer is going to be very useful.

> 
> Please answer first the questions before landing.
> 
> ::: netwerk/base/src/nsStandardURL.cpp
> @@ +447,5 @@
> > +        host.BeginReading(start);
> > +        host.EndReading(end);
> > +        if (FindCharInReadable(gForbiddenHostChars[i], start, end)) {
> > +            return false;
> > +        }
> 
> iterating 12 * host name length characters... that is crazy...

True. Do you think using a switch statement would be better?

> 
> @@ +1470,5 @@
> > +            case '[':
> > +                bracketFlag = true;
> > +                break;
> > +            case ']':
> > +                bracketFlag = false;
> 
> I assume the brackets cannot be nested here, right?

They could, but since we forbid any [] chars in the hostname, that would also fail.

> 
> ::: netwerk/test/unit/test_standardurl.js
> @@ +187,5 @@
> > +  do_check_eq(url.port, -1);
> > +
> > +  url = stringToURL("http://example.com");
> > +  Assert.throws(() => { url.hostPort = "2001::1"; }, "port contains colon");
> > +  do_check_eq(url.host, "2001"); // the hostname ends at the first colon
> 
> so assignment throws but we end up modified?  is that something the spec
> says?  even if yes, then it's pretty strange...  or are we here just
> compatible with ourselfs?

It seems so: https://url.spec.whatwg.org/#hostname-state - Step 1.3
Chrome for example, doesn't ever fail parsing the port -> it just sets it to 0 if it fails. We could probably do that too.

> @@ -191,2 @@
> >    Assert.throws(() => { url.host = "[2001::1"; }, "missing last bracket");
> > -  Assert.throws(() => { url.host = "2001::1]"; }, "missing first bracket");
> 
> these cases now work?

When setting the hostname, parsing ends at the first : so it now works.
It follows the spec, but indeed not very intuitive or compatible with other UAs.

--

I guess we could hold on to this patch for a little while, and see if the updated spec would make things any easier/clearer.
Thanks for reviewing this!
Comment on attachment 8514301 [details] [diff] [review]
Do percent decoding in hostname

Review of attachment 8514301 [details] [diff] [review]:
-----------------------------------------------------------------

Let's see what we break with this.

::: netwerk/base/src/nsStandardURL.cpp
@@ +572,5 @@
>      mHostEncoding = eEncoding_ASCII;
>      // Note that we don't disallow URLs without a host - file:, etc
>      if (mHost.mLen > 0) {
> +        nsAutoCString tempHost;
> +        NS_UnescapeURL(spec + mHost.mPos, mHost.mLen, esc_AlwaysCopy|esc_Host, tempHost);

spaces around |

@@ +1556,5 @@
>      FindHostLimit(start, end);
>  
> +    const nsCString substring(Substring(start, end));
> +    nsAutoCString flat;
> +    NS_UnescapeURL(substring.BeginReading(), substring.Length(), esc_AlwaysCopy|esc_Host, flat);

maybe call |substring| |hostName| instead?
Attachment #8514301 - Flags: review?(honzab.moz) → review+
(Assignee)

Updated

3 years ago
Blocks: 1142083
(Assignee)

Comment 12

3 years ago
Created attachment 8622143 [details] [diff] [review]
Update test_nsDefaultURIFixup.js

Due to hostname parsing being more strict, parsing hostname containing backslash now fails. A few tests are disabled until HTML backslash replacement is implemented in bug 652186
Hostname starting with [ and not ending in ] is invalid, so the test checking this use case was removed.
Attachment #8622143 - Flags: review?(bugs)
Comment on attachment 8622143 [details] [diff] [review]
Update test_nsDefaultURIFixup.js

This is worrisome. Could we just not land this before bug 652186?
I think we should not break user typing mozilla\\ to the location bar.
So, I guess we need some changes to nsDefaultURIFixup.
Attachment #8622143 - Flags: review?(bugs) → review-
(Assignee)

Comment 14

2 years ago
These patches no longer apply. I am implementing hostname percent decoding in bug 412457.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 412457
You need to log in before you can comment on or make changes to this bug.