Closed Bug 1040285 Opened 6 years ago Closed 5 years ago

Single Quotes in HTTP request-uri Are Incorrectly Encoded as %27

Categories

(Core :: Networking, defect)

30 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: james, Assigned: valentin)

References

Details

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 (Beta/Release)
Build ID: 20140608211622

Steps to reproduce:

Open a URL with a single quote in it, such as http://example.com/single'quote

Although Firefox displays the same URL in the address bar, it actually sent a different resource URL over HTTP. This can be seen in the Network tab's "Edit and Resend" form. This other %27 URL can also be seen in the window.location.href value.

This came to my attention when I noted the HTTP response Location header value was not equivalent to the following HTTP request-uri and its location.href value.


Actual results:

URLs that have a single quote in them are (incorrectly) sent with the single quote replaced with %27. URIs that differ in the replacement of a reserved character, like single quote, with its corresponding percent-encoded octet, like %27, are not equivalent. A URL provided to Firefox with a single quote is not equivalent to the URL Firefox is requesting with %27


Expected results:

Single quotes are part of the sub-delims or reserved characters of RFC3986[1] and like other delimiters (like forward slash, plus, and others) percent-encoding these reserved character (or decoding a percent-encoded one) will change how the URI is interpreted by most applications. Firefox should NOT encode ANY reserved character within a URL, including single quotes.

[1] http://tools.ietf.org/html/rfc3986#section-2.2
agree. see also a previous bug report
https://bugzilla.mozilla.org/show_bug.cgi?id=407172
rfc 7230:  Characters other
   other than those in the "reserved" set are equivalent to their
   percent-encoded octets: the normal form is to not encode them (see
   Sections 2.1 and 2.2 of [RFC3986]).

3986 defines "'" as reserved. but its predecessor 2396 defines it as unreserved. by that logic an update to not hex encode is in order.

3896 also says "Percent-
   encoding a reserved character, or decoding a percent-encoded octet
   that corresponds to a reserved character, will change how the URI is
   interpreted by most applications.  Thus, characters in the reserved
   set are protected from normalization and are therefore safe to be
   used by scheme-specific and producer-specific algorithms for
   delimiting data subcomponents within a URI."

So I would say we should not hex encode single quote.

annevk?

valentin?
Component: Networking: HTTP → Networking
Flags: needinfo?(annevk)
Flags: needinfo?(valentin.gosu)
Duplicate of this bug: 407172
We don't implement RFC 3986. However, per https://url.spec.whatwg.org/#relative-path-state and the definition of https://url.spec.whatwg.org/#relative-path-state referenced from there it does seem like "'" should not be encoded ('"' should).
Flags: needinfo?(annevk)
Status: UNCONFIRMED → NEW
Ever confirmed: true
valentin, will you take it?
:) Yup
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Anne, I have a compat question:
The URL spec lists ' as a URL code point, meaning it shouldn't be encoded.
However, it seems that Chrome encodes it everywhere except the path.
Any idea what we should do? Should we only change it for the path, or for all URL segments except the scheme?
Flags: needinfo?(annevk)
The URL Standard and 3986 agree it should be for most URL segments (username, password, host, path, query, fragment). However, since the URL parser is fragile perhaps we should land path first and then fix the follow up bug if that sticks?
Flags: needinfo?(annevk)
Comment on attachment 8592757 [details] [diff] [review]
Single Quotes in HTTP request-uri Are Incorrectly Encoded as %27

Review of attachment 8592757 [details] [diff] [review]:
-----------------------------------------------------------------

this makes sense to me, but I really am not the most qualified person to review this. annevk?
Attachment #8592757 - Flags: review?(mcmanus) → review?(annevk)
Comment on attachment 8592757 [details] [diff] [review]
Single Quotes in HTTP request-uri Are Incorrectly Encoded as %27

Review of attachment 8592757 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not either, but this looks good.
Attachment #8592757 - Flags: review?(annevk) → review+
Comment on attachment 8592757 [details] [diff] [review]
Single Quotes in HTTP request-uri Are Incorrectly Encoded as %27

Review of attachment 8592757 [details] [diff] [review]:
-----------------------------------------------------------------

ship it!

::: netwerk/test/unit/test_standardurl.js
@@ +236,5 @@
>  }
>  
> +function test_apostropheEncoding()
> +{
> +  // It should not be encoded in the path

since this is going to be tricky to work backwards, let's just comment here that the current single quote escaping policy is limited to the path and that this is controlled by the bitmask in nsEscape.cpp::EscapeChars[]

::: xpcom/io/nsEscape.cpp
@@ +359,5 @@
>  //   0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
>  {
>       0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  // 0x
>       0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  // 1x
> +     0,1023,   0, 512,1023,   0,1023, 112,1023,1023,1023,1023,1023,1023, 953, 784,  // 2x   !"#$%&'()*+,-./

ya gotta love a bitmask that is expressed in decimal :)
Could this be done to the query part too? The complaints are all from server side people who observe the surprising %27 in HTTP request URI. Leaving it in the query part may lead to further complaints. It's the only remaining disagreement with rfc3986 (as far as server side is concerned), so why not fix it all together.
Because comment 8. Any change to the URL parser is problematic so we'll take baby steps. Note that any server that does not decode %27 is not compliant either.
My impression was that this is a similar fix as the fix to path; of course, that's only from looking at the patch code itself, so it's probably missing something.

I disagree that the server must decode %27 in query. Not all applications treat the query as "x-www-form-urlencoded". It could very well be a proprietary structure that is opaque to anyone other than the server itself, and "'" could be selected as a delimiter for that structure, if the developer only knows about rfc3986. As of today, that would work on IE/Safari, but fail on Chrome/Firefox/Opera.
see my survey on encoding behaviors of browsers: https://bugzilla.mozilla.org/show_bug.cgi?id=1152455#c6
Attachment #8592757 - Attachment is obsolete: true
Comment on attachment 8598334 [details] [diff] [review]
Single Quotes in HTTP request-uri Are Incorrectly Encoded as %27

Review of attachment 8598334 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me since it's aligning with the ospath_unix.jsm change so this WFM assuming this isn't breaking file URIs with single quotes on Windows.
Attachment #8598334 - Flags: review?(MattN+bmo) → review+
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: x86_64 → All
Flags: needinfo?(valentin.gosu)
You need to log in before you can comment on or make changes to this bug.