Closed Bug 1152455 Opened 10 years ago Closed 10 years ago

Firefox sends illegal chars in Request-URI

Categories

(Core :: Networking: HTTP, defect)

37 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: zhong.j.yu, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Build ID: 20150402191859

Steps to reproduce:

visit http://www.reddit.com/

use Wireshark or a local proxy to monitor network traffic


Actual results:

The following request is sent on the wire

GET /ados?t=1428516672288&request={%22Placements%22:[{%22A%22:5146,
Host: engine.adzerk.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Referer: http://static.adzerk.net/reddit/ads.html?sr=-reddit.com,loggedout&bust2

(The Request-URI is truncated) 

Notice the curly and squire brackets, which are illegal.


Expected results:

curly and squire brackets should be percent-encoded.
correction: "square" bracket
Why do you think sending {} is "illegal" ?

Unmarking sec-sensitive as this doesn't seem like a security issue.
Group: core-security
Flags: needinfo?(zhong.j.yu)
Component: Untriaged → Networking: HTTP
Product: Firefox → Core
According to RFC 3986, "{" and "}" cannot appear anywhere in any URI; "[" and "]" can only appear in the "host" part of an URI to enclose an IPv6address.

The only legal chars in path and query are:

ALPHA DIGIT 
pct-encoded  
- . _ ~ 
! $ & ' ( ) * + , ; =
: @ 
/ ?
Flags: needinfo?(zhong.j.yu)
Seems like Patrick might know more, but the way http://www.w3.org/Protocols/HTTP/1.0/spec.html#URI reads, I would not be surprised if this is expected and/or per-spec.

Even if it isn't, it might break the web if we started percent-encoding things we didn't encode before. We're going through that right now with the JS-exposed URI.hash stuff.

Somewhat related: bug 1040285.
Flags: needinfo?(mcmanus)
That makes sense. Firefox is not alone on this matter. I will submit a report on how major browsers handle these "illegal" octets in URLs.
This is a survey of how browsers handle characters 0x00-0xFF in URLs.

Test method: Serve an HTML5 document with charset=UTF-8, embed <img> with
"src" URI containing various characters; observe actual request URIs
received on the server end. For example, for char "{",
    <img src="/x/{">     // in path
    <img src="/x?{">     // in query

Browsers tested:
    Chrome, IE, Firefox, Safari, Opera. (current versions on Windows 7)

Test results:


1) RFC 3986's legal chars in path and query, excluding 0x27(')

        ALPHA DIGIT - . _ ~ ! $ & ( ) * + , ; = : @ / ?

   Browsers send the char as-is, without %-encoding


2) other printable ascii chars, including 0x27('), excluding "%" and "#"

   Browsers behave very differently. In the follow table,
   "%" means the char is %-encoded; "." means it's sent as-is.
   For example, Chrome converts URI "/{a}?x={y}" to "/%7Ba%7D?x={y}"


   hex                22  27  3C  3E  5B  5C  5D  5E  60  7B  7C  7D
   char               "   '   <   >   [   \   ]   ^   `   {   |   }

   Chrome   path      %   .   %   %   .   .   .   %   %   %   %   %
            query     %   %   %   %   .   .   .   .   .   .   .   .

   IE       path      %   .   %   %   .   .   .   %   %   %   %   %
            query     .   .   .   .   .   .   .   .   .   .   .   .

   Firefox  path      %   %   %   %   %   %   %   %   %   %   .   %
            query     %   %   %   %   .   .   .   .   %   .   .   .

   Safari   path      %   .   %   %   .   .   .   .   .   .   .   .
            query     %   .   %   %   .   .   .   .   .   .   .   .

   Opera    path      %   .   %   %   .   .   .   %   %   %   %   %
            query     %   %   %   %   .   .   .   .   .   .   .   .


3) U+0080 - U+00FF

   Browsers %-encode the char in UTF-8, e.g. U+0080 is encoded as "%C2%80";
   unless on IE and if the char is in query -
       it's sent as 2 UTF-8 bytes, e.g. 0xC2 0x80


4) 0x7F(DEL)

   Browsers %-encode the char as "%7F";
   unless on IE and in query - it's sent as 1 byte: 0x7F


5) 0x00

   Browsers %-encode it as "%EF%BF%BD" (U+FFFD)
   unless on IE and in query - it's sent as 3 bytes: 0xEF 0xBF 0xBD


6) 0x01-0x1F, 0x20(SPACE)

   6.0) If the char is at the end of the URI, browsers strip it.

   Otherwise the char is in the middle of the URI:

   6.1) 0x09(TAB), 0x0A(LF), 0x0D(CR)

        the char is ignored and removed from the URI. ?!
        maybe relevant: https://url.spec.whatwg.org/#relative-path-state

   6.2) 0x20(SPACE)

        Browsers %-encode the char as %20

   6.3) other control chars (0x01-0x1F except TAB,LF,CR)

        Browsers %-encode the char, e.g. %01
        unless on IE and in query -
            IE considers the URI invalid;
            it will not trigger a request to the server.
I think we can close this bug as invalid/won't fix. For "[]{}" in URI query, no browser %-encode them, and applications might have depended on that fact, therefore we better not change it.

--

However, if I may, I think Firefox could change the handling of following chars:

0x27 '  -  do not %-encode it. the security rationale (given in bug 376844) does not hold. 
           Other browsers do not %-encode it and there's been no problem. 

0x5B [
0x5C \
0x5D ]  -  do not %-encode them in path. This will be consistent with all other browsers.
           These chars can be handy delimiters for server applications.

0x60 `  -  do not %-encode it in query. This will be consistent with all other browsers.

0x7C |  -  %-encode it in path. This will be consistent with other browsers except Safari.
           It's unlikely that this change will break existing applications, 
           because other browsers have been doing it.
i agree with comment 7 at this time - thanks for the detailed research!
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(mcmanus)
Resolution: --- → WONTFIX
Thanks for the research zhong.j.yu! I filed some bugs based on your suggestions:

* Bug 1163028 to deal with [ and ]
* Bug 1163030 to deal with `

We already have dealt with ' in bug 1040285 and \ is bug 652186. 

I did not file a bug on | as it's Safari and Firefox vs Chrome/Opera and IE, and the URL Standard agrees with Safari and Firefox.
> I did not file a bug on |

I prefer that way too - it's better not to encode. Hopefully Chrome/IE will also converge to this behavior.
You need to log in before you can comment on or make changes to this bug.