Closed
Bug 289060
Opened 19 years ago
Closed 17 years ago
add a charset to 'Content-Disposition: form-data; name="yourFormFieldName"' when posting multipart/form-data
Categories
(Core Graveyard :: File Handling, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hauser, Unassigned)
References
()
Details
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2 if a html page containing a form is utf-8 encoded. Rightfully, firefox sends the strings input by the user back as such. Unfortunately, it does not declare that the content is encoded as such (see http://www.ietf.org/rfc/rfc1867.txt and http://www.ietf.org/rfc/rfc1521.txt) -- RFC1521: Quote ------------ 7.1 The Text Content-Type The text Content-Type is intended for sending material which is principally textual in form. It is the default Content-Type. A "charset" parameter may be used to indicate the character set of the body text for some text subtypes, notably including the primary subtype, "text/plain", which indicates plain (unformatted) text. The default Content-Type for Internet mail is "text/plain; charset=us- ascii". -- RFC1521: End of quote -----) Reproducible: Always Actual Results: no charset sent Expected Results: send charset for anything but us-ascii see also http://issues.apache.org/bugzilla/show_bug.cgi?id=20813
Reporter | ||
Comment 1•19 years ago
|
||
also see http://issues.apache.org/bugzilla/show_bug.cgi?id=34297 for how to gracefully handle the status quo in the struts MVC
Comment 2•19 years ago
|
||
This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/
Comment 3•19 years ago
|
||
This bug has been automatically resolved after a period of inactivity (see above comment). If anyone thinks this is incorrect, they should feel free to reopen it.
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → EXPIRED
Reporter | ||
Updated•19 years ago
|
Status: RESOLVED → UNCONFIRMED
Resolution: EXPIRED → ---
Updated•18 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•18 years ago
|
Assignee: bross2 → file-handling
Component: General → File Handling
Depends on: 116346
Product: Firefox → Core
QA Contact: general → ian
Version: unspecified → Trunk
Comment 5•18 years ago
|
||
Note that this caused major issues with various server-side stuff when it was tried back in the day. See bug 7533.
Comment 6•18 years ago
|
||
Web Forms 2 says to use the _charset_ field. It doesn't mention adding charset= either.
Comment 7•17 years ago
|
||
For multipart/form-data posts, the charset should be defined on the Content-Type header, not Content-Disposition. Firefox does not appear to send a Content-Type header with individual form fields, assuming the default (text/plain) is sufficient. It would be helpful for applications if this header were added with the charset parameter indicating which character encoding we used.
Comment 8•17 years ago
|
||
I agree with David Nesting, the charset parameter should go with a Content-Type header on the individual parts of the MIME body. This is what the spec says. And I disagree with Boris Zbarsky saying that this caused major issues. I reviewed the bug reports and none of them is mentioning problems with the enctype multipart/form-data, all seemed to have used application/x-www-form-urlencoded. Additionally, these issues where 8 years ago. I also disagree with the conclusions drawn on these bug reports. But first a resume; and I will restrict myself to HTML4: The standard knows about forms to be submitted with 1) HTTP GET (always application/x-www-form-urlencoded) 2) POST application/x-www-form-urlencoded 3) POST multipart/form-data For 1) there is technically no way to attach meta-data to it, as the form data gets attached as the "query" to the URI. It indeed is defined how all octets possible can be included in an URI, application/x-www-form-urlencoded restricts itself to US-ASCII as to how transform character to octets. So the octet/byte representation of a character outside US-ASCII is not specified with application/x-www-form-urlencoded. Number 2) and 3), using POST, have a way to specify meta-data. They "bootstrap" on the HTTP Content-Type header which is send with a POST telling about the "form" of the HTTP POST body. Unfortunately, number 2) specifies application/x-www-form-urlencoded which has no way defined to attach any other meta-data. Mozilla/Firefox did something like: Content-Type: application/x-www-form-urlencoded; charset=... which was WRONG from the very beginning. The charset attribute cant be attached to any content-type at will, it is basically only defined for text/... types. Illustrating example: Content-Type: image/jpeg; charset=... is wrong either, as images have no charsets. Some people would argue that it should have the same meaning as for e.g. text/html, but that interpretation would yield a different thing. See this example: Content-Type: text/html; charset=us-ascii ...<html> ... <p> • The charset is describing the coding of the HTML, not of what the entity reference #8226 in the HTML means (which would be outside of ASCII anyway). So, as the x-www-form-urlencode content-type is always within ASCII a charset attribute is useless. And the meaning of the percent-escaped stuff in that form does describe the x-www-form-urlencode spec only and not it's presentation charset. So let's go with number 3) and do it right this time. multipart/form-data is a MIME type. These are outlined in RFC2045. MIME multipart types allow the inclusion of multiple parts (you guessed it!) and the inclusion of meta-data for every part. Firefox/Mozilla doesn't include a Content-Type header for these parts, so it defaults to "text/plain; charset=us-ascii". Sending octets outside the 0-127 range in a multipart/... without Content-Type: header violates RFC2045 and forces the reader to guess. The correct behavior would be to include in every non-ascii-only part: Content-Type: text/plain; charset=... It is shocking to see no support for HTTP11/HTML4/MIME in Seamonkey/Firefox; the first two standards now over 7 years old, MIME over 10. Taking _charset_ into the game: it is a "solution" that involves modifying the original HTML form, including a hidden field with the name "_charset_". This hidden field gets "automatically" assigned a value from the browser, the charset in use. It is like writing with your favorite font in a jpeg-image 'This is a jpeg,' as this name/value pair gets transported together with the data.
Comment 9•17 years ago
|
||
I opened the bug 379858 so that this issue carries a proper subject.
Comment 10•17 years ago
|
||
A related issue here is the field names themselves. Since multipart/form-data parts carry the field names in the part's headers, RFC2047 must be used to encode these. This is completely independent of any charset parameter on the Content-Type of each field's value. Firefox currently appears to provide field names in the submission's character encoding, without indicating it, just as it does for the data itself. Unfortunately, fixing this particular aspect of this bug makes me more nervous. You ought to be able to add a Content-Type header with a charset parameter, because you're just declaring something that you're already doing, and nobody expects this header today, so the damage should be minimal. But encoding field names correctly would seem to change existing behavior for a lot of applications that use non-ASCII characters in their form fields, and haven't paid attention to RFC2388/RFC2047.
Comment 11•17 years ago
|
||
Also see bug 116346.
Comment 12•17 years ago
|
||
Has anyone considered submitting an RFC to address the charset= issue raised in comment #8 ?
Comment 13•17 years ago
|
||
This got fixed by the patch for bug 116346.
Status: NEW → RESOLVED
Closed: 19 years ago → 17 years ago
Resolution: --- → FIXED
Comment 14•17 years ago
|
||
Joseph, what information not in comment #8 would you expect in such a RFC? Robert
Comment 15•17 years ago
|
||
I'd like the RFC to make charset= legal for the application/x-www-form-urlencoded mime type.
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•