Closed Bug 1152455 Opened 11 years ago Closed 11 years ago

Firefox sends illegal chars in Request-URI

Categories

(Core :: Networking: HTTP, defect)

37 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: zhong.j.yu, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 Build ID: 20150402191859 Steps to reproduce: visit http://www.reddit.com/ use Wireshark or a local proxy to monitor network traffic Actual results: The following request is sent on the wire GET /ados?t=1428516672288&request={%22Placements%22:[{%22A%22:5146, Host: engine.adzerk.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 Referer: http://static.adzerk.net/reddit/ads.html?sr=-reddit.com,loggedout&bust2 (The Request-URI is truncated) Notice the curly and squire brackets, which are illegal. Expected results: curly and squire brackets should be percent-encoded.
correction: "square" bracket
Why do you think sending {} is "illegal" ? Unmarking sec-sensitive as this doesn't seem like a security issue.
Group: core-security
Flags: needinfo?(zhong.j.yu)
Component: Untriaged → Networking: HTTP
Product: Firefox → Core
According to RFC 3986, "{" and "}" cannot appear anywhere in any URI; "[" and "]" can only appear in the "host" part of an URI to enclose an IPv6address. The only legal chars in path and query are: ALPHA DIGIT pct-encoded - . _ ~ ! $ & ' ( ) * + , ; = : @ / ?
Flags: needinfo?(zhong.j.yu)
Seems like Patrick might know more, but the way http://www.w3.org/Protocols/HTTP/1.0/spec.html#URI reads, I would not be surprised if this is expected and/or per-spec. Even if it isn't, it might break the web if we started percent-encoding things we didn't encode before. We're going through that right now with the JS-exposed URI.hash stuff. Somewhat related: bug 1040285.
Flags: needinfo?(mcmanus)
That makes sense. Firefox is not alone on this matter. I will submit a report on how major browsers handle these "illegal" octets in URLs.
This is a survey of how browsers handle characters 0x00-0xFF in URLs. Test method: Serve an HTML5 document with charset=UTF-8, embed <img> with "src" URI containing various characters; observe actual request URIs received on the server end. For example, for char "{", <img src="/x/{"> // in path <img src="/x?{"> // in query Browsers tested: Chrome, IE, Firefox, Safari, Opera. (current versions on Windows 7) Test results: 1) RFC 3986's legal chars in path and query, excluding 0x27(') ALPHA DIGIT - . _ ~ ! $ & ( ) * + , ; = : @ / ? Browsers send the char as-is, without %-encoding 2) other printable ascii chars, including 0x27('), excluding "%" and "#" Browsers behave very differently. In the follow table, "%" means the char is %-encoded; "." means it's sent as-is. For example, Chrome converts URI "/{a}?x={y}" to "/%7Ba%7D?x={y}" hex 22 27 3C 3E 5B 5C 5D 5E 60 7B 7C 7D char " ' < > [ \ ] ^ ` { | } Chrome path % . % % . . . % % % % % query % % % % . . . . . . . . IE path % . % % . . . % % % % % query . . . . . . . . . . . . Firefox path % % % % % % % % % % . % query % % % % . . . . % . . . Safari path % . % % . . . . . . . . query % . % % . . . . . . . . Opera path % . % % . . . % % % % % query % % % % . . . . . . . . 3) U+0080 - U+00FF Browsers %-encode the char in UTF-8, e.g. U+0080 is encoded as "%C2%80"; unless on IE and if the char is in query - it's sent as 2 UTF-8 bytes, e.g. 0xC2 0x80 4) 0x7F(DEL) Browsers %-encode the char as "%7F"; unless on IE and in query - it's sent as 1 byte: 0x7F 5) 0x00 Browsers %-encode it as "%EF%BF%BD" (U+FFFD) unless on IE and in query - it's sent as 3 bytes: 0xEF 0xBF 0xBD 6) 0x01-0x1F, 0x20(SPACE) 6.0) If the char is at the end of the URI, browsers strip it. Otherwise the char is in the middle of the URI: 6.1) 0x09(TAB), 0x0A(LF), 0x0D(CR) the char is ignored and removed from the URI. ?! maybe relevant: https://url.spec.whatwg.org/#relative-path-state 6.2) 0x20(SPACE) Browsers %-encode the char as %20 6.3) other control chars (0x01-0x1F except TAB,LF,CR) Browsers %-encode the char, e.g. %01 unless on IE and in query - IE considers the URI invalid; it will not trigger a request to the server.
I think we can close this bug as invalid/won't fix. For "[]{}" in URI query, no browser %-encode them, and applications might have depended on that fact, therefore we better not change it. -- However, if I may, I think Firefox could change the handling of following chars: 0x27 ' - do not %-encode it. the security rationale (given in bug 376844) does not hold. Other browsers do not %-encode it and there's been no problem. 0x5B [ 0x5C \ 0x5D ] - do not %-encode them in path. This will be consistent with all other browsers. These chars can be handy delimiters for server applications. 0x60 ` - do not %-encode it in query. This will be consistent with all other browsers. 0x7C | - %-encode it in path. This will be consistent with other browsers except Safari. It's unlikely that this change will break existing applications, because other browsers have been doing it.
i agree with comment 7 at this time - thanks for the detailed research!
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago
Flags: needinfo?(mcmanus)
Resolution: --- → WONTFIX
Thanks for the research zhong.j.yu! I filed some bugs based on your suggestions: * Bug 1163028 to deal with [ and ] * Bug 1163030 to deal with ` We already have dealt with ' in bug 1040285 and \ is bug 652186. I did not file a bug on | as it's Safari and Firefox vs Chrome/Opera and IE, and the URL Standard agrees with Safari and Firefox.
> I did not file a bug on | I prefer that way too - it's better not to encode. Hopefully Chrome/IE will also converge to this behavior.
You need to log in before you can comment on or make changes to this bug.