942074 - javascript method XMLHttpRequest.setRequestHeader(...) is no longer perform URL encoding

Reporter

Description

•

12 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release) Build ID: 20131112160018 Steps to reproduce: Executed following javascript code from scratch pad: var req = new XMLHttpRequest(); req.open('GET', 'http://инфо.об-образовании.рф', true); req.setRequestHeader("Referer", "http://инфо.об-образовании.рф"); req.send(null); Actual results: Exception is thrown: /* Exception: Cannot convert string to ByteString because the character at index 7 has value 1080 which is greater than 255. @Scratchpad/1:12 WCA_evalWithDebugger@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/actors/webconsole.js:890 WCA_onEvaluateJS@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/actors/webconsole.js:547 DSC_onPacket@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/main.js:923 @resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/transport.js:242 @resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/DevToolsUtils.js:61 */ Expected results: Nothing in this case, but the problem is making unusable any Google Web Toolkit application deployed on domains that contain non ascii characters. I would like to know whether you removed url encoding by design or it's a bug. Based on this I can make a decision whether to report a bug to GWT. Thanks.

artur.bystrzycki

Reporter

Updated

•

12 years ago

Blocks: 942095

artur.bystrzycki

Reporter

Updated

•

12 years ago

OS: Windows 7 → All

Hardware: x86_64 → All

Boris Zbarsky [:bzbarsky]

Comment 1

•

12 years ago

> I would like to know whether you removed url encoding by design There was no url encoding that I know of. The old code used to just convert to UTF-8 and dump those bytes on the wire. The change from that was purposeful, to align with the XMLHttpRequest specification. But maybe this is a problem with the spec.... Let me check what other UAs do here.

Component: Untriaged → DOM

Product: Firefox → Core

Boris Zbarsky [:bzbarsky]

Comment 2

•

12 years ago

So I tried this testcase: var req = new XMLHttpRequest(); req.open('GET', 'test.html', true); req.setRequestHeader("Punk", "\u0444"); req.send(null); (because "Referer" is security-restricted in some browsers). In Chrome dev, that throws an exception: Uncaught SyntaxError: Failed to execute 'setRequestHeader' on 'XMLHttpRequest': 'ф' is not a valid HTTP header field value. In Safari, there is no exception, but it seems to put an empty string on the wire (?). Not sure what other UAs do in terms of on-the-wire bits, since I can't run IE locally. :(

artur.bystrzycki

Reporter

Comment 3

•

12 years ago

Thanks for quick reply. You're right and I'm closing this ticket and will report GWT problem. Unfortunately the change to have more strict validation in XMLHttpRequest.XMLHttpRequest in addition to not encoding address bar entry (which I think is useful feature, not a bug) makes GWT application not working in FF when deployed under domain that contains non ascii characters.

artur.bystrzycki

Reporter

Updated

•

12 years ago

Status: UNCONFIRMED → RESOLVED

Closed: 12 years ago

Resolution: --- → INVALID

Boris Zbarsky [:bzbarsky]

Comment 4

•

12 years ago

Hmm. So in Chrome gwt works because Chrome punycodes window.location.href? Anne, it sounds like the XHR changes may not be web-compatible with a Unicode location.href...

Flags: needinfo?(annevk)

Anne (:annevk)

Comment 5

•

12 years ago

location.href is supposed to use Punycode per the URL Standard. The old code we had in place for XMLHttpRequest did forbid certain bytes, correct?

Flags: needinfo?(annevk)

Boris Zbarsky [:bzbarsky]

Comment 6

•

12 years ago

> location.href is supposed to use Punycode per the URL Standard. It does? Why, if I might ask? > The old code we had in place for XMLHttpRequest did forbid certain bytes, correct? Gecko's old implementation simply converted to UTF-8, so any valid Unicode string would work. I'm not sure what the spec used to say....

Boris Zbarsky [:bzbarsky]

Updated

•

12 years ago

Flags: needinfo?(annevk)

Anne (:annevk)

Comment 7

•

12 years ago

If we did not forbid newlines and such there would be a security issue... URLs use Punycode because the data model for URLs appears to be bytes (so sad). And because toUnicode can fail, but toASCII cannot (and is required anyway). We could of course attempt a toUnicode whenever we return location.href, but given that it might not match what the user sees anyway (that's a UI thing) I'm not sure that's the best way. I could maybe see value in exposing URL.toUIString(URL url) or some such.

Flags: needinfo?(annevk)

Boris Zbarsky [:bzbarsky]

Comment 8

•

12 years ago

> If we did not forbid newlines and such there would be a security issue... That was presumably handled by the underlying HTTP code in Gecko. The XHR code had no checks like that (and afaict still does not). > because the data model for URLs appears to be bytes How so? I mean, there's %-escaping, but apart from that you start with strings, not bytes. > And because toUnicode can fail Sure. So Gecko's internal representation of hostnames is generally the toUnicode one, unless that fails, in which case it's the toASCII one. > We could of course attempt a toUnicode whenever we return location.href Why then as opposed to the "parse the host" phase? I don't recall seeing any discussion on the mailing lists about making location.href punycode and I'm once again really worried about compat fallout...

Anne (:annevk)

Comment 9

•

12 years ago

Even if a URL were all strings their code points would be less than U+0080. For HTTP it pretty clearly is just bytes. Discussion: http://lists.w3.org/Archives/Public/public-whatwg-archive/2013Sep/0124.html The "host parser" is required to use ToASCII. It could then follow that with ToUnicode, but I don't see the point. ToUnicode for domain, path, and such, is all about UI and not at all about the data model.

Anne (:annevk)

Comment 10

•

12 years ago

As for compatibility, I guess the same goes for the other browsers :/ It is a major problem for all the "edge" cases in widely deployed features.

Boris Zbarsky [:bzbarsky]

Comment 11

•

12 years ago

Well, what goes on the wire is clearly the ToASCII version. The question is why anyone else at all would want that version. > ToUnicode for domain, path, and such, is all about UI and not at all about the data model. Even if we grant that, why are we completely discounting "UI" (including what web developers see in their debugger!).

Anne (:annevk)

Comment 12

•

12 years ago

Because UI here is not a simple function of the ASCII variant. E.g. Chrome will only do ToUnicode based on whether they think the user will understand the result.

Boris Zbarsky [:bzbarsky]

Comment 13

•

12 years ago

For their URL bar, sure. But they can implement that as a custom thing, obviously, as would Firefox for whatever it does for their URL bar. But that leaves the issue of web developers having to deal with these objects and the unreadable strings they produce.

Anne (:annevk)

Comment 14

•

12 years ago

As I said, we could introduce toUIString() or some such, that would use the same logic (at fingerprinting cost). Users would end up confused if the URL bar and location.href are different, no? And it's not just the URL bar, also link tooltips and such.

Boris Zbarsky [:bzbarsky]

Comment 15

•

12 years ago

> Users would end up confused if the URL bar and location.href are different, no? Er... the URL bar is definitely not always ASCII, so they're already different, right? I'm not sure I follow... Or do you mean that toUIString() would need to use the same heuristics as the url bar?

Anne (:annevk)

Comment 16

•

12 years ago

Yes, I meant that either way you end up with mismatches. And yes, toUIString() would have to use the same heuristics to be useful. Note that the API already exposes methods for converting just the domain portion of the URL: http://url.spec.whatwg.org/#api We should probably offer APIs for converting the path as well.

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

Bugzilla

javascript method XMLHttpRequest.setRequestHeader(...) is no longer perform URL encoding

Categories

(Core :: DOM: Core & HTML, defect)

Tracking

()

People

(Reporter: artur.bystrzycki, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Updated