Bug 1528939 Comment 5 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

So what happens is that the value:

    http://foopy:\\blah@somewhere.com//whatever

gets "fixed up" by nsIURIFixup / URL parsing to:

    "http://foopy/blah@somewhere.com//whatever"

. This makes the actual host "foopy", not "somewhere.com", and actually fires an http request to "foopy" ("oculus" in comment #0). When that fails (because at least in my network, there is no such host; note that this will take longer on most Windows systems), because when pasted in the URL bar this is a user-initiated request, we attempt to fix up the request by going to "www.foopy.com" ("www.oculus.com") instead.

Unfortunately, the URL highlighting code still uses regular expressions. Part of the reason for this is that we're trying to do highlighting on the original input, instead of forcing the location bar to display the parsed URL (which would clobber sloppy escaping on the user's part, especially in the path/query string). We can't just use the original URL object for this because the URL parser will strip out gunk that might still be in the location bar (as evidenced here, where ":\\" has magically become "/"), so any highlighting offsets could be wrong.

This doesn't work well here, because the original input has backslashes in the supposed-username part, which AIUI will cause the "real" URL parser to trash the username part:

https://url.spec.whatwg.org/#authority-state

> 2. Otherwise, if one of the following is true
> 
>   *  c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#)
>
>   *  url is special and c is U+005C (\)
> 
> then:
> 
>    1. If @ flag is set and buffer is the empty string, validation error, return failure.
> 
>    2. Decrease pointer by the number of code points in buffer plus one, set buffer to the empty string, and set state to host state.

But the regular expression we use looks like:

    let matchedURL = url.match(/^(([a-z]+:\/\/)(?:[^\/#?]+@)?)(\S+?)(?::\d+)?\s*(?:[\/#?]|$)/);

note in particular that the matching group ending in "@)?" only ignores forward slash, question mark and #, and not backslash.

Hence the confusion - the regex thinks this is a valid username portion, and it really isn't.

Now, the obvious/lazy fix is to just add the backslash to both the [^\/#?] things and move on.

I don't like living with just that fix.

I'd really like us to move away from this regex, or at least enforce that the regex turns up the same hostname as URI fixup, or enforcing that the URI's `prePath` has to exactly match the initial N bytes in the URL bar (which already strips leading spaces, I think, so just whitespace won't break user input?), otherwise we just clobber the user input for the parsed URL.

Marco, how does that sound to you? Do any of the people currently working on URL bar code have cycles/expertise to take this on?
So what happens is that the value:

    http://foopy:\\blah@somewhere.com//whatever

gets "fixed up" by nsIURIFixup / URL parsing to:

    "http://foopy/blah@somewhere.com//whatever"

. This makes the actual host "foopy", not "somewhere.com", and actually fires an http request to "foopy" ("oculus" in comment #0). When that fails (because at least in my network, there is no such host; note that this will take longer on most Windows systems), because when pasted in the URL bar this is a user-initiated request, we attempt to fix up the request by going to "www.foopy.com" ("www.oculus.com") instead.

Unfortunately, the URL highlighting code still uses regular expressions. Part of the reason for this is that we're trying to do highlighting on the original input, instead of forcing the location bar to display the parsed URL (which would clobber sloppy escaping on the user's part, especially in the path/query string). We can't just use the original URL object for this because the URL parser will strip out gunk that might still be in the location bar (as evidenced here, where `":\\"` has magically become "/"), so any highlighting offsets could be wrong.

This doesn't work well here, because the original input has backslashes in the supposed-username part, which AIUI will cause the "real" URL parser to trash the username part:

https://url.spec.whatwg.org/#authority-state

> 2. Otherwise, if one of the following is true
> 
>   *  c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#)
>
>   *  url is special and c is U+005C (\)
> 
> then:
> 
>    1. If @ flag is set and buffer is the empty string, validation error, return failure.
> 
>    2. Decrease pointer by the number of code points in buffer plus one, set buffer to the empty string, and set state to host state.

But the regular expression we use looks like:

    let matchedURL = url.match(/^(([a-z]+:\/\/)(?:[^\/#?]+@)?)(\S+?)(?::\d+)?\s*(?:[\/#?]|$)/);

note in particular that the matching group ending in `"@)?"` only ignores forward slash, question mark and #, and not backslash.

Hence the confusion - the regex thinks this is a valid username portion, and it really isn't.

Now, the obvious/lazy fix is to just add the backslash to both the `[^\/#?]` things and move on.

I don't like living with just that fix.

I'd really like us to move away from this regex, or at least enforce that the regex turns up the same hostname as URI fixup, or enforcing that the URI's `prePath` has to exactly match the initial N bytes in the URL bar (which already strips leading spaces, I think, so just whitespace won't break user input?), otherwise we just clobber the user input for the parsed URL.

Marco, how does that sound to you? Do any of the people currently working on URL bar code have cycles/expertise to take this on?

Back to Bug 1528939 Comment 5