User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; de; rv:1.9b3) Gecko/2008020511 Firefox/3.0b3 Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; de; rv:1.9b3) Gecko/2008020511 Firefox/3.0b3 Firefox 3 beta 2 and beta 3 RC are (again) sending Content-Type: application/x-www-form-urlencoded; charset=UTF-8 for a form POST. The "; charset=UTF-8" breaks many webservers, resulting in error or empty pages. This is an old problem and it was accepted as not possible because the number of webservers, that can't handle it, is too large. It was removed from earlier Mozilla/Gecko versions, see these bugs for example (there are many bugs around that topic): https://bugzilla.mozilla.org/show_bug.cgi?id=18643 https://bugzilla.mozilla.org/show_bug.cgi?id=7533#c4 I have already seen this (current) bug which is about a similar problem with multipart forms: https://bugzilla.mozilla.org/show_bug.cgi?id=413974 Reproducible: Always Steps to Reproduce: 1. Create a HTML page with a form with action=POST 2. Open this page with Firefox (3b3) 3. Submit the form and trace the HTTP request Actual Results: This HTTP Header is part of the request: Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Expected Results: This is the lowest-common denominator for webservers: Content-Type: application/x-www-form-urlencoded Many servers choke on header params in the form of "; key=value" in the Content-Type header. Used my profile from FF 2.0 for FF 3 beta, so I am a normal "upgrader" - no special configs involved, if that might be the case for such an "experimental feature" ;-)
Uh... do you actually have a form that breaks that you can point me to? There is no code in beta3 to do this for urlencoded form submissions, so I don't see how it could possibly be happening.
Oh. XHR. That matters!
Tested with Firefox 3.0 beta 4 (on Mac) and the fix for bug 413974 (which I assume should be fixed in beta 4) DOES NOT solve this problem. I think the duplication of this bug was wrong from the beginning. - bug 413974 is about enctype multipart/form-data - this bug is about enctype application/x-www-form-urlencoded
> DOES NOT solve this problem. In that case, please put up a testcase showing the bug. That is, a web page that I can go to to see the issue, as well as the source of the server code involved. I did test exactly this situation when writing the patch for bug 413974, and in my testing it is fixed. >- bug 413974 is about enctype multipart/form-data The same codepath is used for both enctypes.
Oh, I should have clarified. The new code WILL always send the charset. It will put the charset right after the MIME type before all other params (instead of at the end of the MIME type, where it used to go). I guess your original report claims that even this will break servers. I suppose we could special-case this one particular MIME type in XMLHttpRequest. That seems uncalled-for to me, but if the breakage is really that widespread we will have to. Nominating for blocking, but honestly, I've seen no other reports of this being a problem.
Yes, it's the pure existence of "; charset=" that will break the webserver. I have already placed a sample HTML in comment #2. But it's difficult to provide the webserver, cause it's proprietary. The parsing bug is fixed in there for a new version, but there are many large-scale installations out there. Problem is that parameters cannot be parsed at all for an XHR post request, which can be tricky in Ajax-heavy applications. Updating the servers is not a quick option. Unfortunately I wasn't able to find an example installation that has a POST form using XHR, but if there is, it's an extremely difficult problem to spot. I also assume that there are lots of other webserver implementations with the same problem; i guessed so by reading this older issue around the same topic: bug 7533 (Interesting comments start at #34). I know it's over 8 years old (whew!), but I wouldn't be confident that this bug is out of the web, especially since all the main browsers don't do it.
Safari 3.0.4 behaves that way: if you don't specify "; charset=..." in the XHR content-type header, it won't add it. If you do, it will be passed through.
The whole point of adding a charset is that we need to identify the charset of the data, because otherwise neither the sender nor the receiver know what encoding the data is sent in. We used to do what Safari 3.0.4 does. It breaks sites, as it happens.
Note that the "breaks many web servers" claim could really use some backing up....
Yes, but Firefox 2 behaved like Safari 3.0.4.
I see, didn't know that. The problem with the unknown charset in browser requests is solved by web frameworks in different flavors anyway (we for example use a hidden parameter FormEncoding...). Would be cool to rely on charset set in content-type as standard, but it must be possible to avoid breaking existing servers that are buggy.
The spec says that when a string is sent it should always be utf8 encoded. In that case it seems like the receiver will always know the encoding and so sending it seems pointless. The spec also says to append the charset parameter, but maybe we can get the spec changed on that. Mailing them now.
> it seems like the receiver will always know the encoding and so > sending it seems pointless. Only if the receiver knows that it's being sent an XMLHttpRequest and can special-case the text processing. Since this whole bug is about using XMLHttpRequest to generate generic form submissions, the receiver knows no such thing in the cases in this bug, or in many other cases. With forms, you can at least select the encoding the receiver expects if absolutely needed, but with XMLHttpRequest you always get UTF-8, so the only way to make it work is to tell the receiver so and fix all receivers to try following the specs instead of just writing a "it happens to work" parser by trial and error.
Per discussion with Jonas we've decided that we won't change how Firefox works here. If there's evidence out there that a large set of real websites out there that break due to this, we'd be willing to reconsider this decision, but with only one bug report and no real feel of the number of broken sites it doesn't seem worth undoing this change.
IMHO there aren't any bug reports yet because FF3 is still beta and there isn't wide adoption yet.
I'll back up this report, and I'll even show you an example of it in action. The server is running helma, and echo's back the post object. http://clusterfudge.org:8082/mix/command?expr=users.dev.postDemo
And yes, that should read "echoes".
9 years ago
Paul, have you considered raising your problem in the W3C group working on XMLHttpRequest? I don't have a problem with adding a way for the page to opt-out of sending the charset header, but I'd be happier doing that in a way allowed by the spec instead of just doing something random.
I'm also having a hard time believing that these firewalls really reject _everything_ with a charset, since so much of the web carries charset params on Content-Types....
Paul, I'd also be interested in knowing your exact situation: What you're setting Content-Type to, what type of object you're passing to send(), what header you're getting as a result, and what your firewall actually accepts and rejects.
Paul, you're not looking at the latest spec version. The latest one, cleverly not linked from anywhere useful, is at http://dev.w3.org/2006/webapi/XMLHttpRequest/#send For what it's worth, I raised the issue with the relevant working group.
In case you want to follow up, this is the '[XHR] Some comments on "charset" in the Content-Type header' thread in the firstname.lastname@example.org mailing list.
And again for what it's worth, I think that getting the spec changed here would benefit from hard data that this is the only way to deal with the problem. I realize that you may not be willing to provide this data in public, but the W3C has provisions for private communication of sensitive information for spec editors, I believe.
Our server (4d_WebStar_D/7.8) does not parse form variables when any charset value is added to Content-Type. So the Firefox3 behavior which adds charset broke the pages using multipart/formdata POST with XMLHttpRequest. Our code could not access the submitted form data. I was able to workaround the problem by using sendAsBinary() instead of send() when browser is Firefox3.
Richard, you need to fix your server to actually follow the HTTP specification....
Boris, unfortunately I don't have access to the source code that controls this. We are using an old version and are unable to upgrade or switch systems right now. The workaround will keep us for awhile. I guess the thing that bothers me about the Firefox3 behavior is that it changes a value specifically set using setRequestHeader(). That seems like an odd thing to do.
It's not that odd in cases where that value is inconsistent with other data we have, for what it's worth... Say if you explicitly set a charset other than UTF-8, and we encode the data as UTF-8. In any case, please take spec issues to the W3C?
(In reply to comment #34) > It's not that odd in cases where that value is inconsistent with other data we > have, for what it's worth... I think this is just too much "magic". If you set request headers directly, you expect them to be used - even if you do it wrong, ie. when the body actually contains a different charset. What is the advantage of the current implementation? For most systems none, since they rely on different ways to pass the charset to the server (hidden form values etc., and in most cases this is utf-8). So if those should switch to the proper way of putting the charset in the content-type header, they should be able to choose that way on their own - when their servers and firewalls are ready to handle that format.
Alexander, you're saying that Mozilla should send malformed HTTP requests, violating the HTTP spec, just because the page author asked it to? I don't think so. > What is the advantage of the current implementation? Much better functioning with cross-site XMLHTPRequest, where the server and XMLHttpRequest caller are completely independent.
(In reply to comment #36) > Alexander, you're saying that Mozilla should send malformed HTTP requests, > violating the HTTP spec, just because the page author asked it to? I don't > think so. Well, not setting the charset in the content type header is not violating the HTTP spec. Also, using XHR as a page developer is more like a HTTP client lib than a user-driven browser, so you need ways to determine what is actually sent from your code.
The W3C spec explicitly states that charset is not allowed for application/x-www-form-urlencoded: http://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm
None of the text you link to talks about the request headers involved. Though it does suggest that servers shouldn't use the Content-Type request header for charset info. However in practice some do...
The linked spec says that the charset is not allowed for this MIME type. This would suggest that "application/x-www-form-urlencoded; charset=utf-8" is not a valid MIME type string. Also it seems that Firefox is adding the charset even when the application explicitly sets the content-type to "application/x-www-form-urlencoded" (no charset) via setRequestHeader(). The XMLHttpRequest spec (http://www.w3.org/TR/XMLHttpRequest/#dom-xmlhttprequest-send) specifies two cases where the user-agent should modify the Content-Type when sending the XHR data: 1. If a Content-Type header is in author request headers and its value is a valid MIME type that has a charset parameter whose value is not a case-insensitive match for encoding, and encoding is not null, set all the charset parameters of that Content-Type header to encoding. 2. If no Content-Type header is in author request headers and mime type is not null, append a Content-Type header with value mime type to author request headers. This does not include the case where the application has set the Content-Type _without_ a charset. Firefox still mangles the content-type in this case.
When using Amazon S3's signed PUT requests, this bug is causing the content-type header to not match what is required (in this case it should be the string "application/json" as in the signature, but is "application/json; charset=UTF-8" instead), and thus the request fails because the signatures do not match. I don't understand why this is marked RESOLVED WONTFIX - this is obviously a bug, prevents sending perfectly reasonable requests that are within the spec, and does so by "magically" changing the headers from what was explicitly requested. At a minimum there must be a way to "opt-out" of this behavior.
Doesn't spec define behavior here?
It does, in step 4 of https://xhr.spec.whatwg.org/#dom-xmlhttprequest-send, which is in line with that comment 40 outlines, though that comment points to a document we should not look at. Removing the whiteboard comment since we removed sendAsBinary() so that no longer works as a workaround. Reopening since we should tweak our behavior here per the specification, although I suggest that bz signs off on it since he largely instigated the whole charset business in the first place, iirc.
Hi, I went to the entire thread. Its important to realize that in more cases its impossible to convince the API provider to make changes like this because they do not want to touch an existing stable system. Our company caters to the BPO industry which has all kinds of browsers in enterprise market and vows to support all of them. But the API that we are using is giving the same problem with charset:utf. Its very important that Mozilla resolves this as soon as possible. Safari, IE, Chrome all works fine except for Firefox which seems to have this issue of charset:utf. I request you not to make us feel handicapped in this regard. For a start may I ask you that what is your target hits on this thread that you need to decide that this needs to be fixed ? At least that should be a starter on this issue.
This is a known bug and we want to fix it - it's required for conformance with the spec. Note how several sub-tests here fail because Gecko is adding too many ";charset=UTF-8" : http://w3c-test.org/XMLHttpRequest/send-content-type-charset.htm (This is a test in the official W3C XMLHttpRequest test suite) I think this is a dupe of bug 918742, which is about fixing failures on that test. Since it's a real world problem we should nudge the priority of that bug upwards, but unfortunately I don't know when somebody will get around to fixing it.