Closed Bug 1374505 Opened 7 years ago Closed 2 months ago

URL constructor does not support unknown protocols

Categories

(Core :: Networking, defect, P3)

56 Branch
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: me, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [necko-backlog])

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36

Steps to reproduce:

1. Execute: new URL('git://github.com').hostname


Actual results:

It returns ""

(Firefox 54 and Chrome 59 match this buggy behavior)


Expected results:

It should return "github.com"

(This works correctly in Safari 10.1 and Node 7.1+)
Component: Untriaged → Networking
Product: Firefox → Core
For reference, here's the Chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=734880
The URL parsing rule on WHATWG seems to allow arbitrary protocol scheme.
https://url.spec.whatwg.org/#url-parsing
I'm pretty sure this is a dupe, but I can't find the other bug at the moment.

Unknown protocol schemes may or may not have hostnames.
For example, lets take the tel protocol: tel:+1-816-555-1212
What should the hostname be in this case?
It's difficult to deal with these corner cases, which is why Chrome also exhibits the same behaviour.
Blocks: url
Whiteboard: [necko-backlog]
Status: UNCONFIRMED → NEW
Ever confirmed: true
Per the URL spec, non-special schemes that have double slashes after the scheme colon are to be treated as having a host:

>>>
scheme state
:	git
path or authority state
/	
authority state
g	
<<<

hostname = 'github.com'
pathname = ''

That differs from the tel example:

>>>
scheme state
:	tel
cannot-be-a-base-URL path state
+	
<<<

hostname = ''
pathname = '+1-816-555-1212'
(In reply to Gordon P. Hemsley [:GPHemsley] from comment #4)
> Per the URL spec, non-special schemes that have double slashes after the
> scheme colon are to be treated as having a host

We understand that this is the case. Big changes are planned to our implementation, starting with Firefox 57, when addon compatibility is no longer an issue. Stay tuned.
(In reply to Valentin Gosu [:valentin] from comment #5)
> (In reply to Gordon P. Hemsley [:GPHemsley] from comment #4)
> > Per the URL spec, non-special schemes that have double slashes after the
> > scheme colon are to be treated as having a host
> 
> We understand that this is the case.

(In reply to Valentin Gosu [:valentin] from comment #3)
> Unknown protocol schemes may or may not have hostnames.
> For example, lets take the tel protocol: tel:+1-816-555-1212
> What should the hostname be in this case?
> It's difficult to deal with these corner cases, which is why Chrome also
> exhibits the same behaviour.

This made it seem like you were not aware of how the spec handles these cases.
(In reply to Gordon P. Hemsley [:GPHemsley] from comment #6)
> This made it seem like you were not aware of how the spec handles these
> cases.

There are even more corner cases that our URL parser doesn't handle correctly. Some of these are because we didn't get around to fixing it, others, like the above example, are technical debt, caused by the way our implementation works. I hope to move forward on these in the near future.
Thanks for being involved in our process. We hope to get this fixed soon after the Firefox 57 release.
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Repro website: https://felixfbecker.github.io/whatwg-url-custom-host-repro/
Filed on https://github.com/webcompat/web-bugs/issues/19792
Files on Chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=902253
Specific test case added in https://github.com/web-platform-tests/wpt/pull/13516 (although also already covered in existing tests)

Firefox and Chrome both misbehave, while Edge, Safari, Node and whatwg-url behave spec-compliant.

This makes the URL API unreliable to use for generic URL parsing and always requires us to polyfill it (which is pretty big).

This bug has the unfortunate side effect of requiring users to ship a 95KB polyfill (after minification and gzipping) to our users 😭 https://www.npmjs.com/package/whatwg-url

For the sake of web performance, this bug ought to be prioritized :)

I'll also add that this page (https://felixfbecker.github.io/whatwg-url-custom-host-repro/) which demonstrates the divergence between browsers is quite damning. This is a webcompat issue that should be prioritized, or else this API really can't be used safely.

Just ran into this issue too. I was looking at whether it would be possible to create a platform-independent version of the https://www.npmjs.com/package/amazon-s3-uri package by switching from the Node url module (which has to be polyfilled for browsers) to the URL Web API (which is available everywhere).

Firefox and Chrome both parse urls of the format 's3://bucketname/objectkey' incorrectly relative to the WHATWG URL spec. Node and Safari get it right. Fixing this bug would certainly help simplify frontend bundling a few years down the line.

The size of the current polyfill, and the need to include it in source-map, which is relatively widely-used, makes this a pretty big problem.

Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 5 duplicates.
:kershaw, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(kershaw)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(kershaw)
See Also: → 1829610
Duplicate of this bug: 1855873
Duplicate of this bug: 1863622

Fixed in bug 1603699

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.