Closed Bug 241540 Opened 16 years ago Closed 5 years ago
No charset encoding sent for application/x-www-form-urlencoded data
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040413 Debian/1.6-5 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040413 Debian/1.6-5 When submiting data form a the brouser fails to add the charset with the mime type. This is problmatic as it leavs the applacation on the server side to guess the encoding format, lain-1, utf-8, ect. Under the case where the form tag dose not spesify a accept-charset atrubute. This is some what accetubal as the html4 rfc states: The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element. But not spesifing it still leaves the applacation guesing some what. The situation is wors when the form dose suply a encoding and suplyes more then one option. ex: <form method="post" action="/newevent" accept-charset='ISO-8859-1,utf-8'> In this case the server sid applacation will have no idea as to the proper interpretation of the submited data. This can be solved by adding the encoting to the content-type sting. ex: Content-Type: application/x-www-form-urlencoded; charset=utf-8 This is espechily an issue for peopel trying to wirte web applacation that suport i18n and l10n. Revelent RFC refreance: http://www.w3.org/TR/REC-html40/interact/forms.html#adef-accept-charset accept-charset = charset list [CI] This attribute specifies the list of character encodings for input data that is accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received. The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element. ftp://ftp.isi.edu/in-notes/rfc2616.txt 7.2.1 Type When an entity-body is included with a message, the data type of that body is determined via the header fields Content-Type and Content- Encoding. These define a two-layer, ordered encoding model: entity-body := Content-Encoding( Content-Type( data ) ) Content-Type specifies the media type of the underlying data. Content-Encoding may be used to indicate any additional content codings applied to the data, usually for the purpose of data compression, that are a property of the requested resource. There is no default encoding. Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream". 14.17 Content-Type The Content-Type entity-header field indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET. Content-Type = "Content-Type" ":" media-type Media types are defined in section 3.7. An example of the field is Content-Type: text/html; charset=ISO-8859-4 Further discussion of methods for identifying the media type of an entity is provided in section 7.2.1. Reproducible: Always Steps to Reproduce: 1. creat form with accept-charset='ISO-8859-1,utf-8' 2. submit content Actual Results: content type is sent as "Content-Type: application/x-www-form-urlencoded" with no encoding information Expected Results: sent the content type as: Content-Type: application/x-www-form-urlencoded; charset=utf-8 or Content-Type: application/x-www-form-urlencoded; charset=ISO-8859-1 depeding on acculy encoding used. I have tested this is a reasont nightly build as well: Mozilla 1.8a: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8a) Gecko/20040423
This is a duplicate. We used to send this information, but this apparently broke a number of server-side applications and we had to disable this feature to make form submission in Mozilla at all usable....
(In reply to comment #1) > This is a duplicate. We used to send this information, but this apparently > broke a number of server-side applications and we had to disable this feature to > make form submission in Mozilla at all usable.... would it be possible to add the content encoding in the case the the attrubute was spesified on the form. My guess is that most apps that break when it is preseant would not have that attrubute set. -Jonathn
Summary: Mozilla dose not provied charset encoding information for application/x-www-form-urlencoded data → No charset encoding sent for application/x-www-form-urlencoded data
More relevent reading on this issue agin from ftp://ftp.isi.edu/in-notes/rfc2616.txt It appers that mozillas policy of not puting the charset encoding on the type is at leas paritaly in line with the standerd: (From 3.7 ftp://ftp.isi.edu/in-notes/rfc2616.txt) Note that some older HTTP applications do not recognize media type parameters. When sending data to older HTTP applications, implementations SHOULD only use media type parameters when they are required by that type/subtype definition. This is of corse a unforcheant stipulation as there is no clear way do determan what a older HTTP applacation is. Espechely when submiting a request to a server the client man not have evere talk to preavousely. Mozill preforms as expected on forms with no accept-charset with respect to the next quote: (form http://www.w3.org/TR/REC-html40/interact/forms.html#adef-accept-charset) The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element. If the document containing the form was in utf-8 mozilla responds in utf-8. Where mozilla falles down is in conforming to the next quote: (Form 3.7.1 ftp://ftp.isi.edu/in-notes/rfc2616.txt) When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. This clearly states that if the encoding is not ISO-8859-1 (Latin-1) that you must include a encoding paramater in the media type. I think that the proper behavor for mozill shold be to never return the ecnoding paramater when the text is in Latin-1 but to allway return it othere wise (as the spec requires).
Why is this UNCONFIRMED? There's no question that this behavior exists, nor that it violates the W3C standard.
-> form submission
Assignee: darin → form-submission
Component: Networking: HTTP → HTML: Form Submission
QA Contact: core.networking.http
bz is right that we used to add it for a brief while, but was forced to remove it because at that time, the majority of server-side programs couldn't cope with it. A quick bugzilla search didn't lead me where it's discussed extensively. The revision history (before Janueary 2002) of nsFormSubmission.cpp was lost (the file was either moved or newly made). The code to add 'charset' is currently blocked. See http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsFormSubmission.cpp#495
The ifdef was added in bug 7533 (took some Attic-digging in CVS to find that). If the server software mentioned in that bug and the numerous duplicates has been fixed, we should consider flipping that ifdef...
This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/
I just took a few days to untangle an encoding problem with Tomcat. I resolved this problem by doing request.setCharacterEncoding("UTF-8") in a filter before the request was handled, and all form submissions work fine now (for me). I'd rather not have to do this hack, and I wish someone would look into fixing this bug.
Did you read comment 7? There's no problem with changing the code to do this, except that then web sites break.
And also note bug 289060 comment 8, which points out that doing this would actually be a spec violation.
Right, what I was asking was if it was time to find out of coldfusion et al were fixed since bug 7533 was filed in 1999, but then I read bug 289060 comment 8 and realized it was the spec that was broken, not firefox or coldfusion.
Should someone maybe mark this as WONTFIX or INVALID? It seems to me that the proper way to 'fix' this would be to submit an RFC to the IETF to get application/x-www-form-urlencoded to accept a charset= paramater just like text content types, but until that happens this bug might as well be closed.
When trying a POST from a UTF-8 encoded page firefox sends the following from e grave u and a grave u: +++ Content-Type: application/x-www-form-urlencoded Content-Length: 34 dataname=%C3%A8u&datavalue=%C3%A0u +++ Which looks ok according to http://www.w3.org/TR/2003/REC-xforms-20031014/slice11.html It does the same from a ISO-8859-1 page which looks weird but seems still correct.
Assignee: form-submission → nobody
QA Contact: form-submission
Wow, no fix since 8 years... And this is a real bug: If the HTTP header says the file is encoded in ISO-8859-1 the common way to override this with HTML is: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> Firefox reads the body in UTF-8 then, which is fine, but the charset used in forms is still ISO-8859-1, so you have to add accept-charset="utf-8" to the form just for firefox (other browser automatically use UTF-8 or send the charset with the content-type). So: Why the hell is nobody fixing this bug?
> So: Why the hell is nobody fixing this bug? Had you actually read the bug, you would know why. Please do so now instead of applying profanity to the problem.
That said, the issue you describe in your fourth paragraph has nothing to do with this bug that I can tell, and I can't reproduce it. I suggest you file a separate but on that issue. Feel free to cc me and point to a web page that shows the problem.
I raaded, but: "but was forced to remove it because at that time, the majority of server-side programs couldn't cope with it" <- this was 8 years ago. Maybe give it another try? Anyway, filling another report...
> this was 8 years ago. Unless there's data that something has changed, the assumption is it hasn't. Breaking things for our users to test that assumption given lack of any indication that something has in fact changed is just a bad idea.
Please close this (very) old bug as WONTFIX because the current implementation is historically proven and expected. Advice for Website-Developers: It is advised that Form-Tags include an "accept-charset"-attribute with exactly one charset (which the server assumes). The "accept-charset" attribute is practically required if different parts of the website are delivered in different charsets. Advice for serverside-Developers implementing the form-action: (The Olde Legacy Way: Charset of the received content-type is not evaluated, and assumed to be in a fixed, pre-decided charset. All website-forms should include an accept-charset with exactly this fixed, pre-decided charset. Make sure the application properly accepts a received content-type with attributes, even if attributes are ignored.) If the received content-type includes a charset, then evaluate the data with the given charset. If the received content-type does not include a charset, there are multiple alternatives: * assume a fixed, pre-decided charset. All website-forms should include an "accept-charset"-attribute with this fixed, pre-decided charset. * if no charset has been agreed on before, then assume the charset of the website. If there is no website yet, then unilaterally agree on UTF-8 (The default charset of RFC3986). * in case of REST-services accepting application/x-www-form-urlencoded and typically without website-form, then assume UTF-8. (As per RFC3986. The RFC2616 3.7.1 recommending ISO-Latin-1 does not apply here because application/x-www-form-urlencoded is not of "text"-mimetype)
Not sending the charset is correct; see <http://www.w3.org/TR/html5/iana.html#application/x-www-form-urlencoded> -- this media type does *not* have a charset parameter.
Resolving as INVALID meaning "works as expected" per latest comments. You can use a special _charset_ parameter if you need the character encoding: https://html.spec.whatwg.org/multipage/forms.html#attr-fe-name-charset
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.