Closed Bug 473822 Opened 11 years ago Closed 5 years ago

Square brackets not percent-encoded in URI's query-part

Categories

(Core :: Networking, defect, major)

x86
Windows XP
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: crisp, Assigned: valentin)

References

(Depends on 2 open bugs)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5

According to RFC3986 reserved characters that are not specifically allowed in parts of an URI should be percent-encoded.

From the ABNF rules this follows for the query-part:

reserved = gen-delims / sub-delims
gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
query = *( pchar / "/" / "?" )

leaving "[" and "]" as reserved (and "#" but that denotes a fragment)

Firefox does not encode (correct) these characters when they occur in (mallformed) URI references, nor are they encoded anymore when a user copies an URL that was previously correctly encoded from the location bar after following such URI

Reproducible: Always

Steps to Reproduce:
1. Copy the following link in the location bar: http://example.org?foo%5Bb%5D=bar
2. This sents the following request: GET /?foo%5Bb%5D=bar HTTP/1.1
3. See how the URL in the location bar changed to http://example.org/?foo[b]=bar
4. Copying this URL gives you the URL without encoded square brackets
5. Re-submitting this request e.g. by pressing enter in the location bar now sents: GET /?foo[b]=bar HTTP/1.1
Actual Results:  
This behaviour causes mallformed URI's to be sent over the network

Expected Results:  
Mallformed URI's should be corrected; e.g. an href="http://example.org?foo[b]=bar" in HTML or inputting such URL in the location bar should be corrected by encoding the square brackets before sending over the network.
Wellformed URI's should be send as is, but not wrongly presented in the location bar after following, or at least should be correctly encoded when copied or dragged from the location bar

Not correctly encoding these characters, or presenting them correctly (at least when copying or dragging from the location bar) causes serious problems in applications that don't expect these characters to not be encoded in URI's. Especially when the square brackets are part of some other meta-markup syntax (such as UBB-syntax on forums) this causes an ambiguity making it impossible to correctly identify URI's/URL's in content.
See related bug 470408.
Very easy to see this bug when cut-n-pasting an "Advanced Search" from Wikipedia's bugzilla that includes the component "[other]": https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&component=[other] -- Chrome and IE properly escape the square brackets.  Firefox does not, so it makes using those URLs in Mediawiki markup more difficult since MediaWiki uses the syntax [URL text] to create <a href>'s

RFC 3986 section 2.2 (http://tools.ietf.org/html/rfc3986#section-2.2) clearly shows this is a bug.
Depends on: url
Firefox is even changing encoded square brackets into proper square brackets.

I.e. copy-pasting the following URL into the location bar

https://groups.google.com/forum/#!topic/mozilla.dev.planning/k-kOnJ14lXE%5B1-25-false%5D

changes it to 

https://groups.google.com/forum/#!topic/mozilla.dev.planning/k-kOnJ14lXE[1-25-false]

whereas Chrome and IE do it the other way.

Also, copying this URL keeps the square brackets, while Chrome and IE encode them.

Keeping the brackets introduces problems with URL-parsers on different sites, i.e. the linkifcation-feature of board-softwares like phpBB.
This bug seems also worth fixing because Thunderbird, often used together with Firefox I guess, cuts links at square brackets and causes them to malfunction. See https://bugzilla.mozilla.org/show_bug.cgi?id=404241.
Also square brackets are not uncommon, they occur for example in JSON encoded arrays.
This is a serious bug that is affecting us. Our stack includes Ruby on Rails and a load balancer.

Ruby on Rails by convention uses square brackets to indicate multiple parameters, e.g. ?a[]=1&a[]=2.

The proxy/load balancing software Stingray Traffic Manager by Riverbed (http://www.riverbed.com/products/application-delivery-performance/) has a configuration option called check_rfc2396 which rejects requests that violate RFC 2396.

Combine these two together, and for our users in Firefox, if they copy a URL with encoded square brackets and paste it into another tab, they get a URL with unencoded square brackets which is in violation of RFC 2396 and rejected by our load balancer.

Please fix this in some way. Either show the user the encoded brackets, or when copying have the actual URL used to generate the response copied and not your user-friendly version that violates web standards.
Assignee: nobody → valentin.gosu
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 8485262 [details] [diff] [review]
Square brackets not percent-encoded in URI's query-part

Review of attachment 8485262 [details] [diff] [review]:
-----------------------------------------------------------------

Even though this is in XPCOM, URL-escaping stuff really falls under the umbrella of the networking folks.  The change looks reasonable enough to me--thank you for the tests!--though I note that other characters in the reserved-character list aren't flagged as needing escaping ("$", "(", and ")", for instance) as you have done here.  And that makes me wonder whether there's some history I'm not aware of here.

I also note that dragging and dropping a URL with [] characters in it from the URL bar appears to produce the correct (escaped) result, whereas cutting and pasting does not (at least on Linux).

Forwarding the review to Jason, who can bounce things around as necessary.
Attachment #8485262 - Flags: review?(nfroyd) → review?(jduell.mcbugs)
It's worth checking what http://url.spec.whatwg.org/ says here, in addition to what RFC 3986 says, since the latter is known to not be web-compatible on some points....

If the URL spec doesn't say the right things, it might need updating.
I checked with Anne over IRC, and it seems that we should escape the brackets.

(01:16:22) valentin1: would it be correct to always encode the brackets in the URL, except for when they're in the hostname?
(10:18:19) annevk: yeah I think so
(10:18:56) annevk: per http://url.spec.whatwg.org/#url-code-points they are not URL code points so they ought to be escaped

So the patch is valid. My thinking is that it should behave just like the (space) character. It's always escaped in URLs, but it appears unescaped in the URL bar, but copy-paste gets the escaped version.

Try seems all green: https://tbpl.mozilla.org/?tree=Try&rev=fa77a792a61e
Attachment #8485262 - Flags: review?(jduell.mcbugs) → review+
Please file followup for the issues nfroyd pointed out, and paste the bug # in here.
Flags: needinfo?(valentin.gosu)
I filed bug 1064700 as a followup.
Flags: needinfo?(valentin.gosu)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/cc192030c28f
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Depends on: 1121826
Depends on: 1124600
You need to log in before you can comment on or make changes to this bug.