Last Comment Bug 116346 - Content-Type should be supplied for form data of 'enctype="multipart/form-data"'[form sub]
: Content-Type should be supplied for form data of 'enctype="multipart/form-dat...
Status: RESOLVED WONTFIX
:
Product: Core
Classification: Components
Component: HTML: Form Submission (show other bugs)
: Trunk
: All All
: -- normal with 5 votes (vote)
: ---
Assigned To: David Nesting
:
Mentors:
http://www.runout.org/html-form-test/...
: 126407 379858 (view as bug list)
Depends on: 384270 387991 392982
Blocks: 289060
  Show dependency treegraph
 
Reported: 2001-12-20 19:34 PST by Koike Kazuhiko
Modified: 2014-05-27 09:45 PDT (History)
31 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Use a pref to decide to attach the charset to content type (1.81 KB, patch)
2007-04-09 10:58 PDT, Mike Kaply [:mkaply] (Out June 27-July 5)
no flags Details | Diff | Review
Adds Content-Type with charset to each form-data part (1.21 KB, patch)
2007-05-07 12:29 PDT, David Nesting
no flags Details | Diff | Review
Adds Content-Type with charset to each form-data part (1.81 KB, patch)
2007-05-08 20:45 PDT, David Nesting
bzbarsky: review+
jonas: superreview+
Details | Diff | Review
Like so (3.41 KB, patch)
2007-08-09 07:12 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
jst: review+
jst: superreview+
Details | Diff | Review
Backout of the previous attachment. (2.74 KB, patch)
2007-10-24 15:20 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details | Diff | Review
Backout of both fixes that went in for this bug. (2.97 KB, patch)
2007-10-24 15:49 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details | Diff | Review

Description Koike Kazuhiko 2001-12-20 19:34:11 PST
Form data should have Content-Type header when its enctype attribute is
"multipart/form-data".

W3C Documentation:
http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.2

Testcase:
http://www.wakaba.com/~hiji/form-data-test/

Original Report in Bugzilla-jp:
http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=1697
Comment 1 rods (gone) 2001-12-20 20:02:44 PST
->>
Comment 2 Katsuhiko Momoi 2001-12-20 23:12:05 PST
Let me clarify the situation a little bit.

The test case supplied above sends form input data directly from
the form rather than via file uploads. 

Current Mozilla builds actually appends appropriate Content-Type headers
without the charset parameter if the user attaches a file to
multi-part/form-data type of form.

So this report is talking about only the case in which data is sent
directly from the form. It is my understanding that there has been a
long tradition of browsers sending text data from form in the same
encoding/charset as the web page on which the form resides. In other
words, servers expect the data back in the same encoding.

Both HTML 4.01 cited above and RFC 2388 say that Content-Type header 
is optional. 

In case Mozilla generates Content-Type header for file uploads, it
intentionally omits teh charset parameter because there is no
clear way to determine the content encoding of the file being 
uploaded. We discussed before creating a dialog which asks for
the charset of the uploaed file but dropped the idea because most
users may know what this question means in the first place.

This gives us a good opportunity to review the current specs 
in this area for a variety of cases/data.
Comment 3 Katsuhiko Momoi 2001-12-20 23:14:19 PST
There were some spellign errors in my comment above. Let me correct
them as below:

"We discussed before creating a dialog which asks for
the charset of the uploaed file but dropped the idea because most
users may know what this question means in the first place."

should read instead:

"We discussed before creating a dialog which asks for
the charset of the uploaded file but dropped the idea because users 
may not know what this question means in the first place."

Comment 4 Frank Tang 2001-12-21 08:16:32 PST
I remember the reason we not dare to add it is because by experiement it break
too many web sites.
Comment 5 Boris Zbarsky [:bz] (Out June 25-July 6) 2001-12-25 21:58:15 PST
So.... is the request to put:

Content-Type: text/plain; charset=foo

(where "foo" is the encoding of the page the form was in) on all the 
non-file-control data parts?
Comment 6 Alexandru Savulov 2002-01-15 17:57:48 PST
setting milestone and component
Comment 7 John Keiser (jkeiser) 2002-01-21 12:53:29 PST
I know Pollmann tested charset and *that* broke a bunch of websites; but what
about plain old Content-Type?  Is there anything wrong with that?

(Note that even if there /was/ something wrong with it, it may be fixed by now.
 Later versions of Apache are spread all across the web; and IIS has gone
through many revisions since those tests were done.)

See bug 18643 for charset discussion.
Comment 8 R.K.Aa. 2002-02-19 08:11:51 PST
*** Bug 126407 has been marked as a duplicate of this bug. ***
Comment 9 Martin v. Löwis 2002-02-19 09:02:47 PST
RFC 1867 clearly specifies that the parts of a multipart/form-data should have a
Content-Type, see the citation in bug 126407. This is particularly important to
identify the charset of input fields.
Comment 10 Jungshik Shin 2003-07-29 18:47:19 PDT
I agree with Martin. It's important to have C-T with charset when submitting a
form(when uploading a text/* file, Kat's comment #2 makes sense although I tend
to think allowing the user-control woulnd't be that bad UI-wise) because a user
can override the default MIME charset   (in View|Character Coding).

I thought Mozilla supported this, and wrote to that effect on www-international
list[1], but turned out that I was wrong. At the URL given in the URL field,
this  can be tested. In addition to RFC 1867, HTMl 4.01 is clear about the need
to add C-T header (when C-T is NOT the default 'text/plain; charset=US-ASCII' or
C-T-E is NOT the default 7bit). My interpretation of HTML 4.01 is different from
that of Kat here. The repeated references to RFC 2045 and the following sentence
have to be interpreted as requiring C-T/C-T-E for all the cases _other than_
"text-plain; charset=US-ASCII" and "7bit":

<quote>
As with all multipart MIME types, each part has an optional "Content-Type"
header that defaults to "text/plain". User agents should supply the
"Content-Type" header, accompanied by a "charset" parameter.
</quote>

In the above, I believe 'optional' is a bit misleading. The intent is likely to
have been that it's optional only when its value is the default value
'text/plain; charset=US-ASCII'. Otherwise, I believe it's mandatory.  


Now the question is whether we'd still have a problem (comment #4 and comment
#7) with many CGI programs/web servers/server side scripts (jsp, php, asp) if we
add C-T and C-T-E header fields to each part of multipart/form-data. It's likely
that we do, but ..... 

I wish HTML 4.01 had been a lot more explicit about the need for C-T header
field for non-default cases instead of just referring to RFC 2045.  

[1] a thread of articles beginning with 
http://lists.w3.org/Archives/Public/www-international/2003JulSep/0029.html
Comment 11 Mike Kaply [:mkaply] (Out June 27-July 5) 2007-03-07 05:35:28 PST
How easy would it be to add this functionality as a preference turned off by default so people could at least test what it breaks?
Comment 12 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-03-07 07:35:23 PST
Probably pretty easy.  I'll be happy to review if someone posts a patch.
Comment 13 Mike Kaply [:mkaply] (Out June 27-July 5) 2007-04-09 10:58:44 PDT
Created attachment 261020 [details] [diff] [review]
Use a pref to decide to attach the charset to content type

I've added a pref and if it is set, charset is appended only in the multipart/form-data case

Is this what people were looking for?
Comment 14 David Nesting 2007-05-07 11:20:25 PDT
The above patch appears to append the charset parameter only to the HTTP request's Content-Type.  A charset parameter here has no meaning and its behavior is not defined by any specification.  I believe the request is to add it to each *part* of the multipart/form-data entity:

Content-Type: multipart/form-data; boundary="foo"

--foo
Content-Disposition: form-data; name="field"
Content-Type: text/plain; charset=utf-8

value
--foo--

Today, this Content-Type header is absent entirely.  It ought to be safe to *add* it (since nobody expects it) without causing too many problems.

See also bug 379858, which may be a duplicate of this one.  Bug 379858 comment 1 contains a simple patch that implements the behavior described above.
Comment 15 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-07 12:22:28 PDT
> It ought to be safe to *add* it (since nobody expects it)

See comment 7.  I _ought_ to be safe to, but in practice, given the number of broken web servers out there, any change like that requires serious testing.
Comment 16 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-07 12:23:17 PDT
*** Bug 379858 has been marked as a duplicate of this bug. ***
Comment 17 David Nesting 2007-05-07 12:29:27 PDT
Created attachment 264033 [details] [diff] [review]
Adds Content-Type with charset to each form-data part

(As requested, copied from bug 379858 comment 1:)

This is a perhaps naive attempt at adding the requisite header for each
form-data part of a multipart/form-data submission.  

I have also created a tool at http://fastolfe.net/2007/05/06/post-charsets for
testing browser behavior.  The tool will treat anything ambiguous as US-ASCII,
to make ambiguous cases obvious (invalid characters are replaced).  A non-ASCII
submission with a normal build of Firefox will see the submission garbled,
while a submission with a patched Firefox works correctly.

This patch does NOT address:

* non-ASCII form field names
* application/x-www-form-urlencoded submissions
* non-ASCII form values that cannot be encoded in the chosen character encoding

(The latter case causes Firefox to replace the character with an HTML entity,
which IMO is also broken behavior.)
Comment 18 Robert Siemer 2007-05-07 13:04:29 PDT
I liked my bug 379858 more because it had a better subject... (-:

I will attach a copy of bug 289060 comment 8 here:

--------------------
I agree with David Nesting, the charset parameter should go with a Content-Type
header on the individual parts of the MIME body. This is what the spec says.

And I disagree with Boris Zbarsky saying that this caused major issues. I
reviewed the bug reports and none of them is mentioning problems with the
enctype multipart/form-data, all seemed to have used
application/x-www-form-urlencoded. Additionally, these issues where 8 years
ago.

I also disagree with the conclusions drawn on these bug reports. But first a
resume; and I will restrict myself to HTML4:

The standard knows about forms to be submitted with
1) HTTP GET (always application/x-www-form-urlencoded)
2) POST application/x-www-form-urlencoded
3) POST multipart/form-data

For 1) there is technically no way to attach meta-data to it, as the form data
gets attached as the "query" to the URI. It indeed is defined how all octets
possible can be included in an URI, application/x-www-form-urlencoded restricts
itself to US-ASCII as to how transform character to octets. So the octet/byte
representation of a character outside US-ASCII is not specified with
application/x-www-form-urlencoded.

Number 2) and 3), using POST, have a way to specify meta-data. They "bootstrap"
on the HTTP Content-Type header which is send with a POST telling about the
"form" of the HTTP POST body.

Unfortunately, number 2) specifies application/x-www-form-urlencoded which has
no way defined to attach any other meta-data. Mozilla/Firefox did something
like:

Content-Type: application/x-www-form-urlencoded; charset=...

which was WRONG from the very beginning. The charset attribute cant be attached
to any content-type at will, it is basically only defined for text/... types.
Illustrating example:
Content-Type: image/jpeg; charset=...
is wrong either, as images have no charsets. Some people would argue that it
should have the same meaning as for e.g. text/html, but that interpretation
would yield a different thing. See this example:
Content-Type: text/html; charset=us-ascii
...<html> ... <p> &#8226;

The charset is describing the coding of the HTML, not of what the entity
reference #8226 in the HTML means (which would be outside of ASCII anyway).

So, as the x-www-form-urlencode content-type is always within ASCII a charset
attribute is useless. And the meaning of the percent-escaped stuff in that form
does describe the x-www-form-urlencode spec only and not it's presentation
charset.

So let's go with number 3) and do it right this time. multipart/form-data is a
MIME type. These are outlined in RFC2045. MIME multipart types allow the
inclusion of multiple parts (you guessed it!) and the inclusion of meta-data
for every part. Firefox/Mozilla doesn't include a Content-Type header for these
parts, so it defaults to "text/plain; charset=us-ascii".

Sending octets outside the 0-127 range in a multipart/... without Content-Type:
header violates RFC2045 and forces the reader to guess.

The correct behavior would be to include in every non-ascii-only part:
Content-Type: text/plain; charset=...

It is shocking to see no support for HTTP11/HTML4/MIME in Seamonkey/Firefox;
the first two standards now over 7 years old, MIME over 10.

Taking _charset_ into the game: it is a "solution" that involves modifying the
original HTML form, including a hidden field with the name "_charset_". This
hidden field gets "automatically" assigned a value from the browser, the
charset in use. It is like writing with your favorite font in a jpeg-image
'This is a jpeg,' as this name/value pair gets transported together with the
data.
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-07 14:59:56 PDT
> I reviewed the bug reports and none of them is mentioning problems with the
> enctype multipart/form-data, all seemed to have used
> application/x-www-form-urlencoded.

Ah, excellent.  In that case, yeah, we should do this for the multipart/form-data POSTs.  Thanks for looking into that!
Comment 20 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-07 15:05:38 PDT
Comment on attachment 264033 [details] [diff] [review]
Adds Content-Type with charset to each form-data part

>+                 + NS_LITERAL_CSTRING("Content-Type: text/plain; charset=")
>+                 + mCharset
>+                 + NS_LITERAL_CSTRING(CRLF)

So the only concern I have here is that if mEncoder is null we'll end up using UTF8, not mCharset, for the encoding.  We could maybe set mCharset to "UTF-8" in the constructor if mEncoder is null, or we could null-check here (because mCharset is used for some weird bidi stuff that I don't quite understand).

Simon, would it be safe to just reset mCharset if it's a charset we don't have an encoder for?
Comment 21 Simon Montagu :smontagu 2007-05-07 20:30:30 PDT
I think that the worst that would happen is that it might break the weird bidi stuff which nobody understands and is probably broken anyway because it makes some very unsafe assumptions about correlation between the document character set and the characters that might be included in the form submission.
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-07 20:51:54 PDT
Yeah, that stuff was the part I was worried about.  OK, then.

David, want to make that change?  Just reset mCharset to UTF-8 in the constructor if mEncoder is null?
Comment 23 Simon Montagu :smontagu 2007-05-07 20:55:25 PDT
(In reply to comment #22)
> Just reset mCharset to UTF-8 in the
> constructor if mEncoder is null?

I suggest doing it in GetSubmissionFromForm() if GetEncoder() fails.

Comment 24 David Nesting 2007-05-08 14:31:38 PDT
How can I test this null encoder case?  When I attempt to use a bogus charset in the form submission, mCharset contains "UTF-8".
Comment 25 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-08 17:34:43 PDT
I don't think there's an easy way to test it.  You'd need some charset for which we have a decoder (so we can load the page as that charset) but do not have an  encoder...  I guess you could hack nsFormSubmission::GetSubmitCharset to return a bogus charset.  That should work.
Comment 26 David Nesting 2007-05-08 20:30:49 PDT
After getting GetSubmitCharset to return a bogus charset, I couldn't get a form to submit at all, even without my other changes.  If we intend this situation to result in a useful POST, I don't think it's working that way today.

Assuming that is a goal, though, and it just isn't working right now for other reasons, is this the type of check that should be done in GetSubmissionFromForm()?

   // Get unicode encoder
   nsCOMPtr<nsISaveAsCharset> encoder;
   nsFormSubmission::GetEncoder(aForm, charset, getter_AddRefs(encoder));

+  if (encoder == nsnull)
+    charset.AssignLiteral("UTF-8");

If that looks reasonable, I'll post an updated patch.  It seems to work OK, but like I said, I can't get meaningful behavior either way in the null encoder (bogus GetSubmitCharset charset) case.
Comment 27 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-08 20:34:50 PDT
I'd do |if (!encoder)|, but other than that that looks like what I wanted, yes.
Comment 28 David Nesting 2007-05-08 20:45:54 PDT
Created attachment 264215 [details] [diff] [review]
Adds Content-Type with charset to each form-data part

This patch expands upon the previous by also forcing the mCharset to UTF-8 when no encoder is available.
Comment 29 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-08 20:59:59 PDT
Comment on attachment 264215 [details] [diff] [review]
Adds Content-Type with charset to each form-data part

Looks good to me.  sicking, would you sr?
Comment 30 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-13 20:59:56 PDT
Checked in.  David, thanks for the patch!

(For what it's worth, whatever tool you're using is producing broken diff files -- they're missing spaces at the beginning of empty context lines.  Took me a few minutes to figure out why this wasn't applying.)
Comment 31 Johnny Stenback (:jst, jst@mozilla.com) 2007-08-06 17:21:03 PDT
So it turns out that this broke existing sites. Some of the known ones are referenced in bug 384270. So the big question is, is the fix worth the bustage, and how much of the bustage is there out in the wild that we don't yet know about. I'm leaning towards backing this out to fix what broke. Or is there anything else that could be done to leave parts of this in w/o breaking existing sites (or at least not as many of them)?
Comment 32 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-06 18:50:58 PDT
David, are you willing to get in touch with the various back-end folks whose software doesn't deal with this (Eve, etc) and see whether we can do a limited form of this that won't break them?

Given the "It's probably a Minefield bug, let's see if they fix it in the beta" attitude in the Eve forum I'm not that hopeful...  :(  But maybe we'll get something from them.
Comment 33 Jed Wesley-Smith 2007-08-06 21:59:53 PDT
The JIRA dev team accepts that this behaviour in Minefield is standard
compliant and that this is a bug we should and will deal with.

However, there are > 6000 JIRA instances out there as of now, including quite a
few major public ones. The process of updating them all is going to take some
time, so the symptoms are likely to persist for quite some time (> FF3
release). This is likely to be the case for the other back-end software as
well.

We would certainly prefer if there was an option to turn this behaviour on/off
- with off as standard, and then turn it on by default in a later release.
Comment 34 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-06 22:11:07 PDT
Jed, if it's off by default nothing will change and we still won't be able to enable it in a future release.  I'm glad to hear that you guys will fix your end, but as you said there are other back-end packages, most of which will never even hear about the problem if the behavior defaults to off....

Is there by chance any aspect of this behavior that could be preserved without breaking existing JIRA installs?
Comment 35 Jed Wesley-Smith 2007-08-07 01:43:06 PDT
Boris, we do understand the conundrum - we would also like to see the change. Unfortunately, there is very little that can be done about existing installs with the current FF3 behaviour that does not necessitate an upgrade or patch. We currently fail reasonably spectacularly.

BTW. what is the release time-frame for FF3?
Comment 36 Jochen Wiedmann 2007-08-07 03:17:25 PDT
(In reply to comment #33)

> We would certainly prefer if there was an option to turn this behaviour on/off
> - with off as standard, and then turn it on by default in a later release.

I second this as a call for more time.
Comment 37 Mike Kaply [:mkaply] (Out June 27-July 5) 2007-08-07 07:50:04 PDT
Can you give a brief explanation of why this breaks your code? What new codepath does this cause?
Comment 38 Christopher Owen 2007-08-08 19:40:02 PDT
The issue with JIRA also affects Confluence as we use the same underlying multipart parser. We also accept that it is Confluence that is broken with regard to this and not Minefield. 

I'd like to propose that a switch be introduced so that web application may opt-in to have these data submitted as part of a form post. This would aid transition for broken implementations while allowing interested (and working) servers to use the new functionality. Maybe a meta element switch?

e.g.

<meta name="form.include.multipart.content-type" content="true" />

or something similar (instead of a global switch for the page you might want to have a space delimited list of form ids to enable it on).

I think it is great that this capability has been included as it has often caused me frustration when authoring web apps in the past but the pragmatist in me suggests that we need to phase this in (and not just for our sake).

We will of course look to get upcoming releases of Confluence fixed.
Comment 39 Mike Kaply [:mkaply] (Out June 27-July 5) 2007-08-08 22:19:39 PDT
Again, can someone please explain how exactly this is breaking the servers?

I'm curious to understand how it is failing.

Thanks
Comment 40 Jochen Wiedmann 2007-08-08 23:44:58 PDT
(In reply to comment #39)

> Again, can someone please explain how exactly this is breaking the servers?
> 
> I'm curious to understand how it is failing.

Michael, in the case of Jira or Confluence, it simply means that *any* form containing an upload button is unusable.

As you can imagine, Jira contains an upload button on almost any page. In other words, you cannot use Jira, or Confluence with current Gran Paradiso. Indeed, I have stopped using Gran Paradiso immediately, after I understood that I can switch off these problems by using Firefox. Likewise, this would prevent me to upgrade to Firefox 3, if it should contain the same change.
Comment 41 Martin v. Löwis 2007-08-08 23:56:08 PDT
> Michael, in the case of Jira or Confluence, it simply means that *any* form
> containing an upload button is unusable.

Jochen, unfortunately, I think this does not answer Michael's question. He did not ask *what* exactly breaks, but *how* exactly it breaks. I.e. what specific algorithm on the server is invoked that works if Content-type is not included, but fails if it is included. E.g. what specific if condition in what specific source file of what specific library starts to misbehave.
Comment 42 Jed Wesley-Smith 2007-08-09 00:21:26 PDT
well, when we investigate and fix we'll provide you the diff if you like.

The actual library is the pell-multipart-request plugin for webwork, our fork of which is here: https://svn.atlassian.com/svn/public/contrib/tools/pell-multipart-request/trunk

We have not investigated the actual errant code yet as the fix is not scheduled and the most relevant thing right now is the fact that it occurs at all.

We may not even fix pell-multipart-request but write our own multipart handler from scratch.
Comment 43 Jochen Wiedmann 2007-08-09 00:29:14 PDT
(In reply to comment #42)

> We may not even fix pell-multipart-request but write our own multipart handler
> from scratch.

OT: Before doing that, please consider using one of the multipart related Apache libraries, like commons-fileupload, or Mime4J.

I am the author of the streaming API for commons-fileupload and the author of the pull parser API for Mime4J and absolutely willing to support, possibly as part of a contract, or as part of my Apache work. Helping you will ultimately help me.

Comment 44 Martin v. Löwis 2007-08-09 01:02:03 PDT
(In reply to comment #42)

From inspection, it looks like the problem is in /src/main/java/http/utils/multipartrequest/MultipartRequest.java:MultipartRequest.parse, specifically

 // At the top of loop, we assume that the Content-Disposition line is next, otherwise we are at the end.

This assumption now breaks; the first thing in the part will be Content-type, not Content-disposition.

It seems that switching the order of the headers (i.e. putting Content-type after Content-disposition) might restore interoperability: the library later does expect that Content-type may follow before the actual data. In particular, a comment says

// FIX 1.14 IE Problem still: Check for content-type and extra line even though no file specified.

So apparently, MSIE already sends Content-type in other parts (at least in some releases under some circumstances), so if Firefox does the same, interoperability should be good for all sites that also support MSIE.

Notice that the library explicitly supports Content-type being sent for file uploads (which it detects by checking for the presence of the filename= parameter in Content-disposition).

For Firefox, I would recommend that just the order of headers is switched.

For pell-multipart-request, the right fix would be to read all header lines in each part until an empty line is seen, and extract content-disposition and content-type while doing so.
Comment 45 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-09 06:48:46 PDT
Martin, thanks for looking into this!

This is actually quite interesting.  For file upload fields, we send:

800          NS_LITERAL_CSTRING("Content-Disposition: form-data; name=\"")
801        + nameStr + NS_LITERAL_CSTRING("\"; filename=\"")
802        + filenameStr + NS_LITERAL_CSTRING("\"" CRLF)
803        + NS_LITERAL_CSTRING("Content-Type: ") + aContentType
804        + NS_LITERAL_CSTRING(CRLF CRLF);

We also send:

794           NS_LITERAL_CSTRING("Content-Transfer-Encoding: binary" CRLF);

before that, but only if the browser.forms.submit.backwards_compatible preferense is false.  It defaults to true.  See bug 58189 and bug 83065 for that sordid story.  Perhaps we should restore that behavior by default and make sure that header comes after Content-Disposition (so that pell-multipart-request's stupid assumptions are satisfied) but before Content-Type (so that PHP's stupid assumptions are satisfied, if it's still making those stupid assumption).  This is a separate bug, in any case.

Moving on, for other form fields, this patch made us send:

             + NS_LITERAL_CSTRING("Content-Type: text/plain; charset=")
             + mCharset
             + NS_LITERAL_CSTRING(CRLF)
             + NS_LITERAL_CSTRING("Content-Disposition: form-data; name=\"")
             + nameStr + NS_LITERAL_CSTRING("\"" CRLF CRLF)

So indeed, the ordering is different.  Let's switch that and see how compat looks?
Comment 46 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-09 07:12:57 PDT
Created attachment 275968 [details] [diff] [review]
Like so
Comment 47 Johnny Stenback (:jst, jst@mozilla.com) 2007-08-10 17:04:13 PDT
Comment on attachment 275968 [details] [diff] [review]
Like so

Yeah, let's get this in and tested ASAP. r+sr=jst
Comment 48 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-10 17:33:42 PDT
Checked in.
Comment 49 monkeypox37 2007-08-10 18:41:20 PDT
The patch works with the Arstechnica forums (EVE), nice work devs. :)
Comment 50 patrickdrd 2007-08-11 09:56:32 PDT
please take a look at this one too:

http://forums.mozillazine.org/viewtopic.php?p=3007352#3007352
Comment 51 Volkmar Kostka 2007-08-11 10:08:05 PDT
Better see http://forums.mozillazine.org/viewtopic.php?t=574762

It is about: http://www.adslgr.com/forum/ a vBulletin forum with a similar failure.
Comment 52 patrickdrd 2007-08-11 10:15:10 PDT
yes, that's my thread,

anyone have an answer?
Comment 53 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-11 10:42:53 PDT
I'm not sure what sort of answer you're looking for.  The thread has no indication of the actual steps to reproduce the problem (especially steps that could be followed by someone who does not know modern Greek well).

If you're still having a problem on that site with builds from this morning, check whether the issue started when the first patch for this bug got checked in?  That would tell us whether this bug is even relevant to your problem.
Comment 54 patrickdrd 2007-08-13 05:30:22 PDT
I don't know when this bug started,
one thing I know though is that it started when I began using minefield,
worked fine with fx 2.0.0.6 and gran paradiso!

Someone that knows greek can follow these steps in order to reproduce it:
1. Login (or register if you don't have an account, then login) to http://www.adslgr.com
2. Goto any thread in the forum and try to post a quick reply clicking the submit button -> you'll get a please wait (must be div or something) message and the page hangs in there (no post takes place).
However, if you go through the normal reply process, everything is ok.
Comment 55 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-13 08:26:06 PDT
> I don't know when this bug started,

In that case, please file a new bug so we can figure out whether what caused the problem, get blocking flags set as needed, etc. 

Note that this was hardly the only form submission change since the 1.8 branch.

> Login (or register if you don't have an account,

That's basically a non-starter, for what it's worth.  Would you be willing to narrow down when the problem started using builds from http://archive.mozilla.org/pub/firefox/nightly/ and ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ ?  You'll want the dated -trunk builds.  Again, put the resulting information in the new bug you file.  And please cc me on that bug
Comment 56 Benjamin Gavin 2007-08-20 07:46:04 PDT
Hrm... this bug is showing back up in the nightly build...

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007081905 Minefield/3.0a8pre

The Ars Technica forums no longer work [again]...
Comment 57 monkeypox37 2007-08-20 14:07:12 PDT
I just tested with the 8/20 nightly and latest hourly:

Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a8pre) Gecko/2007082013 Minefield/3.0a7 ID:2007082013

And it still WFM. I looked back through Bonsai before testing and nothing jumped out at me, did it give the exact same error about MESSAGE_BODY being a required field or whatever?
Comment 58 Benjamin Gavin 2007-08-20 14:22:40 PDT
It auto-upgraded to:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007082005 Minefield/3.0a8pre

It's still broken for me, getting "TOPIC_MESSAGE_OID is a mandatory field. You must enter a value for it." when editing a post, and "MESSAGE_BODY is a mandatory field. You must enter a value for it." when posting a new message.  'Quick Reply' still works correctly as expected.  The exact same messages I had been seeing prior to the fix.
Comment 59 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-20 14:34:44 PDT
OK.  So what are the two nightly (or even better hourly) builds between which the problem reappeared?
Comment 60 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-20 20:07:22 PDT
Changing the order apparently causes bug 392982...  Trying to figure out why.
Comment 61 Jonas Sicking (:sicking) PTO Until July 5th 2007-08-21 01:47:39 PDT
The question in comment 31 remains unanswered. Does fixing this bug actually fix any real-world problems? Or are we simply doing it to do what the spec says.

It is at this point obvious that this bug is causing multiple sites to break, so there needs to be some significant value added in order for us, and our users, to be worth it.
Comment 62 Martin v. Löwis 2007-08-21 10:13:14 PDT
(In reply to comment #61)
> Does fixing this bug actually fix any real-world problems?

Most definitely. Adding a Content-type allows to add a charset= parameter. This, in turn, allows to specify the encoding used to transmit the fields of the form. It resolves long-standing issues in entering non-ASCII data into forms, even if the page encoding is unknown or does not support the characters being entered. Past bugs that are addressed with the patch are Bug 324964 and Bug 135762; there probably have been more reports of this issue over the years.
Comment 63 Jonas Sicking (:sicking) PTO Until July 5th 2007-08-22 15:45:01 PDT
How do other browsers deal with this issue? I'm very unhappy about breaking as many sites as this potentially breaks.

Couldn't sites that want to support other encodings use enctype attribute?
Comment 64 Martin v. Löwis 2007-08-22 22:53:51 PDT
> Couldn't sites that want to support other encodings use enctype attribute?

No. enctype specifies the Content-type for the entire POST message, not for the individual parts. It is "multipart/form-data" in all cases that are relevant for this bug - see the bug title. Please study all relevant specifications carefully.
Comment 65 Jonas Sicking (:sicking) PTO Until July 5th 2007-08-23 00:22:36 PDT
Well, no matter what the specs say we need to come up with a solution that doesn't break loads of sites.

If the entire message is encoded using the encoding in enctype, aren't the individual parts going to encoded in that encoding too?
Comment 66 Martin v. Löwis 2007-08-23 00:36:01 PDT
(In reply to comment #65)
> Well, no matter what the specs say we need to come up with a solution that
> doesn't break loads of sites.

Is there any proof that the version proposed in comment #46 breaks a lot of sites?

> If the entire message is encoded using the encoding in enctype, aren't the
> individual parts going to encoded in that encoding too?

Please, PLEASE read the specs before making statements like that. The enctype does not include an encoding.
Comment 67 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-23 00:43:49 PDT
> Is there any proof that the version proposed in comment #46

It breaks Yahoo Mail at least (and therefore any site that uses the same server-side setup).  And it's only been in the trunk for less than two weeks, which means it's not gotten any real testing yet.  Note that breaking "lots" of sites is equivalent to breaking a few (or one) high-profile sites for compat purposes.

Now I'm hopeful that Yahoo rolled their own thing and will fix it, but if that's not the case, this patch will need to come out.
Comment 68 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-08-23 13:26:33 PDT
Two other notes.

1) We're at a point in the release cycle where the focus is on blockers, and this bug is not one of them.  So effort to make this stuff work will need to come from people who deeply care about it.  I suggest contacting Yahoo and seeing what they're up to, for a start.

2) If it turns out that we can't just enable it, the next obvious thing to try is a way for pages to opt into it.  That could even get standardized by the HTML WG.
Comment 69 Jonas Sicking (:sicking) PTO Until July 5th 2007-08-23 13:30:56 PDT
Even if Yahoo fixes their thing I'm very worried that there are loads of other form libraries out there that do the same thing. If high profile professional sites like yahoo use sloppy parsing, you can bet that there are tons of home-rolled parsing libraries that do too.

The burden of proof really goes the other way, we should have proof that the patch does not break sites. Especially with formats as old as this one. And extra especially now once we have seen that multiple sites break from various versions of the patch.
Comment 70 Robert Siemer 2007-09-24 15:40:17 PDT
I see this bug report to get reopened because of bug 392982, so let me
outline the key points of this bug report:
-implement the standard
-avoid breaking a bunch of sites that can't handle the standard

I want to point out that this bug is not about implementing something else,
a new non-standard thing or whatever. That's because:
a) some proposed non-standard solutions (e.g. adding an proprietary
   HTTP-header) are not contraindicative with the standard solution itself,
   so no need to mix them
b) there is already one non-standard solution for the problem
   ("_charset_" form field); I'm not going to fight for a second.
c) my bug 379858 got closed referring to this one. I'm definitively going to
   reopen it if this one is drawn to something different


So the real problem is that some sites choke when the browser talks
standard to them. There is actually no provable complete solution to this
problem as _any_ visible change could break a site. - If you can't find one,
I can make one! (This is why I disagree with Jonas.)

But that is not important. Important is to make sure that big, well known, old
applications (web sites) see the old browser behavior if known to fail on the
new one.

Why "big, well known and old sites" only?
-new apps will get tested with standard browsers like Firefox and the bug
 will be seen from the very beginning
-"unknown" sites usually assume "the browser is right, the app is wrong"
-small sites are unknown sites... (-: ... or have a flexible development
 team that corrects the problem in time


How to detect these sites? A (manual) work intensive solution would be a
(domain-/url-)blacklist. It is especially effective for the "old" criterion.
As time passes the blacklist will grow slower and later on needs no
maintaining at all as we can all assume that after some month/year any
site in questions is either not old or not well known. <-:

But I actually have a better idea, as I prefer solutions that need no
manual work at all: check if the page with the form to submit has parsing
errors. (I would like to say "renders in quirks mode", but that is not the
same.)
Pro: Yahoo Mail and any big corporate sites fail that test for sure (-:
Contra: most other sites, especially new sites, do probably fail, too...

Fazit: anyone keen on standards gets his/her solution, while anyone else
sees the old behavior. - Problem solved.

I have even more fine tuning in mind, but I will come back to that in
my next comment.

Robert
Comment 71 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-09-24 15:54:38 PDT
Feel free to post patches to implement the behavior you think should be happening.  Then we can discuss it.
Comment 72 David Nesting 2007-09-24 16:17:08 PDT
There are three components to a form submission: (1) the referrer, (2) the browser, and (3) the form processor.  (1) and (3) may not be under the control of the same entity.  If you are a site that gets many POSTs from 3rd-party sites, you can't possibly get all of them to include the _charset_ parameter in their forms unless you block their submissions until they do.

By placing the character encoding either in the MIME headers of the multipart/form-data content, or within a (non-standard) HTTP header, it's not necessary for the form to "opt-in" for the form processor to benefit.

Making this feature work as-is, but only with forms on pages rendered in standards compliance mode, helps only for "intra-site" form submissions.  The real problem this feature is meant to solve is with form submissions made by unpredictable 3rd-party sites.  The fact that the referring page is or is not standards-compliant may have nothing to do with how the form processor itself is written, which is really the barrier we seem to be facing today.
Comment 73 Jonas Sicking (:sicking) PTO Until July 5th 2007-09-24 17:12:10 PDT
I'm not a big fan of the parsing error solution. First off, like David brings up, it doesn't really solve the problem. Second, it seems very unpredictable and illogical for a web developer that if they change a completely separate part of the page, the form submission format changes. What would happen if yahoo would fix their web pages? Should we punish them by "breaking" their form submissions?

There is no value in implementing standards for the sake of implementing standards. We implement standards to move the web forward. This standard is known to break sites making us, and probably many other browser vendors, very hesitant to implement it.

As I've stated before, I don't want to ship a beta with yahoo broken. So if someone wants another solution, please provide a patch soon. Probably within a week.
Comment 74 Johnny Stenback (:jst, jst@mozilla.com) 2007-10-24 15:20:05 PDT
Created attachment 286068 [details] [diff] [review]
Backout of the previous attachment.

This patch is the reverse of the previous attachment in this bug. This is being backed out due to it causing regression bug 392982. I'm attaching this here partly to test a build with the previous patch backed out, there's no real differences between this patch and the reverse of the previous attachment.
Comment 75 Johnny Stenback (:jst, jst@mozilla.com) 2007-10-24 15:24:49 PDT
Reopening since this got backed out. See bug 392982 for quite a bit of discussion around what this caused and how to possibly re-land this. Clearing blocking1.9+ on this bug as I don't think we'll have the time to look into a fix for this that doesn't cause bug 392982 in time for 1.9.
Comment 76 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-10-24 15:35:50 PDT
jst, I think you need to back out both patches that went in for this bug, not just the second one.... Otherwise you reintroduce bug 384270.

Reinstating the blocking flag, and nominating for beta blocking, since now we're in a known-broken state that we shouldn't be shipping for beta.  Once the first attachment is backed out, we should undo the blocker settings.
Comment 77 Johnny Stenback (:jst, jst@mozilla.com) 2007-10-24 15:38:54 PDT
Ok, backing out the other patch then too...
Comment 78 Johnny Stenback (:jst, jst@mozilla.com) 2007-10-24 15:49:23 PDT
Created attachment 286076 [details] [diff] [review]
Backout of both fixes that went in for this bug.

Boris, please have a look at this patch, this is a combined backout of the two fixes for this bug (already checked in).
Comment 79 Johnny Stenback (:jst, jst@mozilla.com) 2007-10-24 15:51:35 PDT
Clearing blocker flags again as both parts of this bug are now backed out.
Comment 80 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-10-24 15:58:26 PDT
Yeah, that second backout patch looks good.
Comment 81 Evan Jones 2012-05-02 14:22:06 PDT
I believe this bug can be closed. HTML5 now explicitly forbids the Content-Type header:

"The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified."

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data


So this is either not a bug, or the HTML5 specification needs revision.
Comment 82 hussdl 2014-05-27 09:24:09 PDT
So this is what Firefox is currently sending to my server [you will be able to guess which parts have been altered by me]:

POST http://[removed] HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: [removed]
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------294571387113960
Content-Length: [whatever]

-----------------------------294571387113960
Content-Disposition: form-data; name="utf-8"


[some bytes which happen to be a utf-8 sequence]
-----------------------------294571387113960
Content-Disposition: form-data; name="format"


[some bytes which happen to be ascii text]
-----------------------------294571387113960
Content-Disposition: form-data; name="text"

[some bytes which happen to be a utf-8 sequence]
-----------------------------294571387113960--


Please enlighten me: How is the server supposed to know that the encoding of the MIME parts is UTF-8? The MIME spec clearly states that in the absence of a Content-Type header, the correct content type is "text/plain;charset="us-ascii" (as stated in a 13 years old comment).

What really bugs me is th��� �� ������ ���� ������������ ���� and ���� ������ �� ��  �? ��, ��������������, ���� �������� ���� ��
���� ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
Comment 83 Boris Zbarsky [:bz] (Out June 25-July 6) 2014-05-27 09:45:21 PDT
> How is the server supposed to know that the encoding of the MIME parts is UTF-8?

By assuming it's the encoding of the page that the form was on.  Yes, this sucks.  When we tried to fix it, we discovered that too many servers are too broken to allow us to send that information in the POST data.

If you have constructive suggestions for communicating that information, please raise them with the spec...

Note You need to log in before you can comment on or make changes to this bug.