Closed Bug 1064700 Opened 10 years ago Closed 3 months ago

Determine which characters need to be percent encoded

Categories

(Core :: Networking, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: valentin, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [necko-backlog])

Similar to Bug 473822 - Square brackets not percent-encoded in URI's query-part, we need to determine if other characters need to be percent encoded such as (,),$
(,),$ are all in sub-delims, which is allowed for the query part.
We should not look at RFC 3986 but rather at http://url.spec.whatwg.org/ and what other browsers are doing. But agreed that "(", ")", and "$" need no escaping.
Whiteboard: [necko-backlog]
Depends on: 1026938, 918331
There's a great analysis report in bug 1152455 comment 6.

Firstly, we should determine which characters need to be percent encoded:
For those non-alpha-digit characters allowed in RFC 3986, (i.e., - . _ ~ ! $ & ( ) * + , ; = : @ / ?)

Firefox escapes '.', ';', '='
Chrome regards '~', ';' as illegal, escapes "!$&()*,="
Safari only allows "-._:". No escapes.
Very different implementation via browsers so we don't have an obvious web-compat intention.

bug 1152455 comment 9 summarized the other printable characters with two exceptions:
1. Bug 918331: pipe | in path/query/hash.
We don't have strong motivation to change and spec is with us per bug 1152455 comment 9.

2. Bug 1026938: backslash \ in query/hash (Other case, \ is replaced by /. It's bug 652186)
Also, other browsers are with us. Maybe we should keep it.

We have dealt with C0 controls (bug 1272284 and bug 1271955)

Similar issue is unescaping alpha-digit (bug 1197123),
we may change it if we have some motivation like bug 1197123 comment 8-9.


Secondly, to solve problem in bug 1163959 comment 0, 
lots of comments are arguing we should separate display URL and underlying URL
(bug 1026938 comment 5, bug 1124600 comment 11)
But that's out of this bug.

Any suggestion for next move?
Flags: needinfo?(annevk)
(In reply to Junior [:junior] from comment #3)
> Firefox escapes '.', ';', '='
> Chrome regards '~', ';' as illegal, escapes "!$&()*,="
> Safari only allows "-._:". No escapes.

I don't really understand this part of your comment.


> Similar issue is unescaping alpha-digit (bug 1197123)

If we decide to do this, we should probably also align on URL encoding "|". (I.e., copy Chrome.)


> Any suggestion for next move?

I believe we currently match the URL Standard and I believe we don't have any outstanding compatibility issues, apart from issues with the Firefox address bar. We could still decide to align some processing of URLs with Chrome rather than Safari, but I'm not sure there's much reason to pursue that.
Flags: needinfo?(annevk)
(In reply to Anne (:annevk) from comment #4)
> (In reply to Junior [:junior] from comment #3)
> > Firefox escapes '.', ';', '='
> > Chrome regards '~', ';' as illegal, escapes "!$&()*,="
> > Safari only allows "-._:". No escapes.
> 
> I don't really understand this part of your comment.
> 

I should revise the comment
Firefox doesn't escape for parsing host like
  >> new URL("http://-._~!$&*+,;=:?/").href
  "http://-._~!$&*+,;=/?/"

I misinterpret previously since I put @ in the URL at that time.
No longer a host parsing.


Chrome will throw like this
  >> new URL("http://~/");
  Uncaught TypeError: Failed to construct 'URL': Invalid URL(…)

and escape like this
  >> new URL("http://!$&()*,=/").href
  "http://%21%24%26%28%29%2A%2C%3D/"
Hmm, yeah, host parsing is probably an area that is worthy of investigating further. Bug 1152455 comment 6 (and subsequent comments) did not cover that.
I think the plan is:
1. Work on encoding "|" (Bug 918331) and unescaping alpha-digit (bug 1197123)
2. Not encode backslash \ in query/hash (Resolved Bug 1026938 invalid)
3. Investigate those non-alpha-digit characters allowed in RFC 3986
   (i.e., - . _ ~ ! $ & ( ) * + , ; = : @ / ?) for host parsing.
If we do 1 we should do it in one go and it should be coupled with changing the URL Standard and trying to engage with Apple to align too (I said this elsewhere).
(In reply to Anne (:annevk) from comment #8)
> If we do 1 we should do it in one go and it should be coupled with changing
> the URL Standard and trying to engage with Apple to align too (I said this
> elsewhere).

Yes you mentioned in bug 1197123 comment 10 :D

I'll put those patches (for bug 918331 and bug 1197123) here to ensure they are landed together.
Meanwhile, do those web-compat stuff before land.
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
No longer depends on: 918331
Severity: normal → S3

I think the URL standard is fairly mature at this point.
Decisions which characters should be percent encoded should be made on the issue tracker https://github.com/whatwg/url/issues

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.