116346 - Content-Type should be supplied for form data of 'enctype="multipart/form-data"'[form sub]

Reporter

Description

•

24 years ago

Form data should have Content-Type header when its enctype attribute is "multipart/form-data". W3C Documentation: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.2 Testcase: http://www.wakaba.com/~hiji/form-data-test/ Original Report in Bugzilla-jp: http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=1697

rods (gone)

Comment 1

•

24 years ago

->>

Assignee: rods → alexsavulov

Katsuhiko Momoi

Comment 2

•

24 years ago

Let me clarify the situation a little bit. The test case supplied above sends form input data directly from the form rather than via file uploads. Current Mozilla builds actually appends appropriate Content-Type headers without the charset parameter if the user attaches a file to multi-part/form-data type of form. So this report is talking about only the case in which data is sent directly from the form. It is my understanding that there has been a long tradition of browsers sending text data from form in the same encoding/charset as the web page on which the form resides. In other words, servers expect the data back in the same encoding. Both HTML 4.01 cited above and RFC 2388 say that Content-Type header is optional. In case Mozilla generates Content-Type header for file uploads, it intentionally omits teh charset parameter because there is no clear way to determine the content encoding of the file being uploaded. We discussed before creating a dialog which asks for the charset of the uploaed file but dropped the idea because most users may know what this question means in the first place. This gives us a good opportunity to review the current specs in this area for a variety of cases/data.

Katsuhiko Momoi

Comment 3

•

24 years ago

There were some spellign errors in my comment above. Let me correct them as below: "We discussed before creating a dialog which asks for the charset of the uploaed file but dropped the idea because most users may know what this question means in the first place." should read instead: "We discussed before creating a dialog which asks for the charset of the uploaded file but dropped the idea because users may not know what this question means in the first place."

Frank Tang

Comment 4

•

24 years ago

I remember the reason we not dare to add it is because by experiement it break too many web sites.

Boris Zbarsky [:bzbarsky]

Comment 5

•

24 years ago

So.... is the request to put: Content-Type: text/plain; charset=foo (where "foo" is the encoding of the page the form was in) on all the non-file-control data parts?

Alexandru Savulov

Comment 6

•

24 years ago

setting milestone and component

Component: HTML Form Controls → Form Submission

Summary: Content-Type should be supplied for form data of 'enctype="multipart/form-data"' → Content-Type should be supplied for form data of 'enctype="multipart/form-data"'[from sub]

Target Milestone: --- → mozilla1.2

John Keiser (jkeiser)

Comment 7

•

24 years ago

I know Pollmann tested charset and *that* broke a bunch of websites; but what about plain old Content-Type? Is there anything wrong with that? (Note that even if there /was/ something wrong with it, it may be fixed by now. Later versions of Apache are spread all across the web; and IIS has gone through many revisions since those tests were done.) See bug 18643 for charset discussion.

R.K.Aa.

Comment 8

•

23 years ago

*** Bug 126407 has been marked as a duplicate of this bug. ***

Martin v. Löwis

Comment 9

•

23 years ago

RFC 1867 clearly specifies that the parts of a multipart/form-data should have a Content-Type, see the citation in bug 126407. This is particularly important to identify the charset of input fields.

Vladimir Ermakov

Updated

•

23 years ago

Priority: -- → P4

Jungshik Shin

Comment 10

•

22 years ago

I agree with Martin. It's important to have C-T with charset when submitting a form(when uploading a text/* file, Kat's comment #2 makes sense although I tend to think allowing the user-control woulnd't be that bad UI-wise) because a user can override the default MIME charset (in View|Character Coding). I thought Mozilla supported this, and wrote to that effect on www-international list[1], but turned out that I was wrong. At the URL given in the URL field, this can be tested. In addition to RFC 1867, HTMl 4.01 is clear about the need to add C-T header (when C-T is NOT the default 'text/plain; charset=US-ASCII' or C-T-E is NOT the default 7bit). My interpretation of HTML 4.01 is different from that of Kat here. The repeated references to RFC 2045 and the following sentence have to be interpreted as requiring C-T/C-T-E for all the cases _other than_ "text-plain; charset=US-ASCII" and "7bit": <quote> As with all multipart MIME types, each part has an optional "Content-Type" header that defaults to "text/plain". User agents should supply the "Content-Type" header, accompanied by a "charset" parameter. </quote> In the above, I believe 'optional' is a bit misleading. The intent is likely to have been that it's optional only when its value is the default value 'text/plain; charset=US-ASCII'. Otherwise, I believe it's mandatory. Now the question is whether we'd still have a problem (comment #4 and comment #7) with many CGI programs/web servers/server side scripts (jsp, php, asp) if we add C-T and C-T-E header fields to each part of multipart/form-data. It's likely that we do, but ..... I wish HTML 4.01 had been a lot more explicit about the need for C-T header field for non-default cases instead of just referring to RFC 2045. [1] a thread of articles beginning with http://lists.w3.org/Archives/Public/www-international/2003JulSep/0029.html

URL: http://www.runout.org/html-form-test/...

Boris Zbarsky [:bzbarsky]

Updated

•

19 years ago

Summary: Content-Type should be supplied for form data of 'enctype="multipart/form-data"'[from sub] → Content-Type should be supplied for form data of 'enctype="multipart/form-data"'[form sub]

Boris Zbarsky [:bzbarsky]

Updated

•

19 years ago

Blocks: 289060

Mike Kaply [:mkaply]

Comment 11

•

18 years ago

How easy would it be to add this functionality as a preference turned off by default so people could at least test what it breaks?

Boris Zbarsky [:bzbarsky]

Comment 12

•

18 years ago

Probably pretty easy. I'll be happy to review if someone posts a patch.

Mike Kaply [:mkaply]

Comment 13

•

18 years ago

Attached patch Use a pref to decide to attach the charset to content type — Details — Splinter Review

I've added a pref and if it is set, charset is appended only in the multipart/form-data case Is this what people were looking for?

David Nesting

Assignee

Comment 14

•

18 years ago

The above patch appears to append the charset parameter only to the HTTP request's Content-Type. A charset parameter here has no meaning and its behavior is not defined by any specification. I believe the request is to add it to each *part* of the multipart/form-data entity: Content-Type: multipart/form-data; boundary="foo" --foo Content-Disposition: form-data; name="field" Content-Type: text/plain; charset=utf-8 value --foo-- Today, this Content-Type header is absent entirely. It ought to be safe to *add* it (since nobody expects it) without causing too many problems. See also bug 379858, which may be a duplicate of this one. Bug 379858 comment 1 contains a simple patch that implements the behavior described above.

Boris Zbarsky [:bzbarsky]

Comment 15

•

18 years ago

> It ought to be safe to *add* it (since nobody expects it) See comment 7. I _ought_ to be safe to, but in practice, given the number of broken web servers out there, any change like that requires serious testing.

David Nesting

Assignee

Comment 17

•

18 years ago

Attached patch Adds Content-Type with charset to each form-data part (obsolete) — Details — Splinter Review

(As requested, copied from bug 379858 comment 1:) This is a perhaps naive attempt at adding the requisite header for each form-data part of a multipart/form-data submission. I have also created a tool at http://fastolfe.net/2007/05/06/post-charsets for testing browser behavior. The tool will treat anything ambiguous as US-ASCII, to make ambiguous cases obvious (invalid characters are replaced). A non-ASCII submission with a normal build of Firefox will see the submission garbled, while a submission with a patched Firefox works correctly. This patch does NOT address: * non-ASCII form field names * application/x-www-form-urlencoded submissions * non-ASCII form values that cannot be encoded in the chosen character encoding (The latter case causes Firefox to replace the character with an HTML entity, which IMO is also broken behavior.)

Robert Siemer

Comment 18

•

18 years ago

I liked my bug 379858 more because it had a better subject... (-: I will attach a copy of bug 289060 comment 8 here: -------------------- I agree with David Nesting, the charset parameter should go with a Content-Type header on the individual parts of the MIME body. This is what the spec says. And I disagree with Boris Zbarsky saying that this caused major issues. I reviewed the bug reports and none of them is mentioning problems with the enctype multipart/form-data, all seemed to have used application/x-www-form-urlencoded. Additionally, these issues where 8 years ago. I also disagree with the conclusions drawn on these bug reports. But first a resume; and I will restrict myself to HTML4: The standard knows about forms to be submitted with 1) HTTP GET (always application/x-www-form-urlencoded) 2) POST application/x-www-form-urlencoded 3) POST multipart/form-data For 1) there is technically no way to attach meta-data to it, as the form data gets attached as the "query" to the URI. It indeed is defined how all octets possible can be included in an URI, application/x-www-form-urlencoded restricts itself to US-ASCII as to how transform character to octets. So the octet/byte representation of a character outside US-ASCII is not specified with application/x-www-form-urlencoded. Number 2) and 3), using POST, have a way to specify meta-data. They "bootstrap" on the HTTP Content-Type header which is send with a POST telling about the "form" of the HTTP POST body. Unfortunately, number 2) specifies application/x-www-form-urlencoded which has no way defined to attach any other meta-data. Mozilla/Firefox did something like: Content-Type: application/x-www-form-urlencoded; charset=... which was WRONG from the very beginning. The charset attribute cant be attached to any content-type at will, it is basically only defined for text/... types. Illustrating example: Content-Type: image/jpeg; charset=... is wrong either, as images have no charsets. Some people would argue that it should have the same meaning as for e.g. text/html, but that interpretation would yield a different thing. See this example: Content-Type: text/html; charset=us-ascii ...<html> ... <p> • The charset is describing the coding of the HTML, not of what the entity reference #8226 in the HTML means (which would be outside of ASCII anyway). So, as the x-www-form-urlencode content-type is always within ASCII a charset attribute is useless. And the meaning of the percent-escaped stuff in that form does describe the x-www-form-urlencode spec only and not it's presentation charset. So let's go with number 3) and do it right this time. multipart/form-data is a MIME type. These are outlined in RFC2045. MIME multipart types allow the inclusion of multiple parts (you guessed it!) and the inclusion of meta-data for every part. Firefox/Mozilla doesn't include a Content-Type header for these parts, so it defaults to "text/plain; charset=us-ascii". Sending octets outside the 0-127 range in a multipart/... without Content-Type: header violates RFC2045 and forces the reader to guess. The correct behavior would be to include in every non-ascii-only part: Content-Type: text/plain; charset=... It is shocking to see no support for HTTP11/HTML4/MIME in Seamonkey/Firefox; the first two standards now over 7 years old, MIME over 10. Taking _charset_ into the game: it is a "solution" that involves modifying the original HTML form, including a hidden field with the name "_charset_". This hidden field gets "automatically" assigned a value from the browser, the charset in use. It is like writing with your favorite font in a jpeg-image 'This is a jpeg,' as this name/value pair gets transported together with the data.

Boris Zbarsky [:bzbarsky]

Comment 19

•

18 years ago

> I reviewed the bug reports and none of them is mentioning problems with the > enctype multipart/form-data, all seemed to have used > application/x-www-form-urlencoded. Ah, excellent. In that case, yeah, we should do this for the multipart/form-data POSTs. Thanks for looking into that!

Boris Zbarsky [:bzbarsky]

Comment 20

•

18 years ago

Comment on attachment 264033 [details] [diff] [review] Adds Content-Type with charset to each form-data part >+ + NS_LITERAL_CSTRING("Content-Type: text/plain; charset=") >+ + mCharset >+ + NS_LITERAL_CSTRING(CRLF) So the only concern I have here is that if mEncoder is null we'll end up using UTF8, not mCharset, for the encoding. We could maybe set mCharset to "UTF-8" in the constructor if mEncoder is null, or we could null-check here (because mCharset is used for some weird bidi stuff that I don't quite understand). Simon, would it be safe to just reset mCharset if it's a charset we don't have an encoder for?

Attachment #264033 - Flags: review?(smontagu)

Simon Montagu :smontagu

Comment 21

•

18 years ago

I think that the worst that would happen is that it might break the weird bidi stuff which nobody understands and is probably broken anyway because it makes some very unsafe assumptions about correlation between the document character set and the characters that might be included in the form submission.

Boris Zbarsky [:bzbarsky]

Comment 22

•

18 years ago

Yeah, that stuff was the part I was worried about. OK, then. David, want to make that change? Just reset mCharset to UTF-8 in the constructor if mEncoder is null?

Simon Montagu :smontagu

Comment 23

•

18 years ago

(In reply to comment #22) > Just reset mCharset to UTF-8 in the > constructor if mEncoder is null? I suggest doing it in GetSubmissionFromForm() if GetEncoder() fails.

David Nesting

Assignee

Comment 24

•

18 years ago

How can I test this null encoder case? When I attempt to use a bogus charset in the form submission, mCharset contains "UTF-8".

Boris Zbarsky [:bzbarsky]

Comment 25

•

18 years ago

I don't think there's an easy way to test it. You'd need some charset for which we have a decoder (so we can load the page as that charset) but do not have an encoder... I guess you could hack nsFormSubmission::GetSubmitCharset to return a bogus charset. That should work.

David Nesting

Assignee

Comment 26

•

18 years ago

After getting GetSubmitCharset to return a bogus charset, I couldn't get a form to submit at all, even without my other changes. If we intend this situation to result in a useful POST, I don't think it's working that way today. Assuming that is a goal, though, and it just isn't working right now for other reasons, is this the type of check that should be done in GetSubmissionFromForm()? // Get unicode encoder nsCOMPtr<nsISaveAsCharset> encoder; nsFormSubmission::GetEncoder(aForm, charset, getter_AddRefs(encoder)); + if (encoder == nsnull) + charset.AssignLiteral("UTF-8"); If that looks reasonable, I'll post an updated patch. It seems to work OK, but like I said, I can't get meaningful behavior either way in the null encoder (bogus GetSubmitCharset charset) case.

Boris Zbarsky [:bzbarsky]

Comment 27

•

18 years ago

I'd do |if (!encoder)|, but other than that that looks like what I wanted, yes.

David Nesting

Assignee

Comment 28

•

18 years ago

Attached patch Adds Content-Type with charset to each form-data part — Details — Splinter Review

This patch expands upon the previous by also forcing the mCharset to UTF-8 when no encoder is available.

Attachment #264033 - Attachment is obsolete: true

Attachment #264033 - Flags: review?(smontagu)

Boris Zbarsky [:bzbarsky]

Comment 29

•

18 years ago

Comment on attachment 264215 [details] [diff] [review] Adds Content-Type with charset to each form-data part Looks good to me. sicking, would you sr?

Attachment #264215 - Flags: superreview?(jonas)

Attachment #264215 - Flags: review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

18 years ago

Attachment #264215 - Flags: superreview?(jonas) → superreview+

Boris Zbarsky [:bzbarsky]

Comment 30

•

18 years ago

Checked in. David, thanks for the patch! (For what it's worth, whatever tool you're using is producing broken diff files -- they're missing spaces at the beginning of empty context lines. Took me a few minutes to figure out why this wasn't applying.)

Assignee: alexsavulov → david

Flags: in-testsuite?

Boris Zbarsky [:bzbarsky]

Updated

•

18 years ago

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Brian Polidoro

Updated

•

18 years ago

Depends on: 387991

Johnny Stenback (:jst)

Comment 31

•

18 years ago

So it turns out that this broke existing sites. Some of the known ones are referenced in bug 384270. So the big question is, is the fix worth the bustage, and how much of the bustage is there out in the wild that we don't yet know about. I'm leaning towards backing this out to fix what broke. Or is there anything else that could be done to leave parts of this in w/o breaking existing sites (or at least not as many of them)?

Status: RESOLVED → REOPENED

Flags: blocking1.9+

Resolution: FIXED → ---

Johnny Stenback (:jst)

Updated

•

18 years ago

Depends on: 384270

Boris Zbarsky [:bzbarsky]

Comment 32

•

18 years ago

David, are you willing to get in touch with the various back-end folks whose software doesn't deal with this (Eve, etc) and see whether we can do a limited form of this that won't break them? Given the "It's probably a Minefield bug, let's see if they fix it in the beta" attitude in the Eve forum I'm not that hopeful... :( But maybe we'll get something from them.

Jed Wesley-Smith

Comment 33

•

18 years ago

The JIRA dev team accepts that this behaviour in Minefield is standard compliant and that this is a bug we should and will deal with. However, there are > 6000 JIRA instances out there as of now, including quite a few major public ones. The process of updating them all is going to take some time, so the symptoms are likely to persist for quite some time (> FF3 release). This is likely to be the case for the other back-end software as well. We would certainly prefer if there was an option to turn this behaviour on/off - with off as standard, and then turn it on by default in a later release.

Boris Zbarsky [:bzbarsky]

Comment 34

•

18 years ago

Jed, if it's off by default nothing will change and we still won't be able to enable it in a future release. I'm glad to hear that you guys will fix your end, but as you said there are other back-end packages, most of which will never even hear about the problem if the behavior defaults to off.... Is there by chance any aspect of this behavior that could be preserved without breaking existing JIRA installs?

Jed Wesley-Smith

Comment 35

•

18 years ago

Boris, we do understand the conundrum - we would also like to see the change. Unfortunately, there is very little that can be done about existing installs with the current FF3 behaviour that does not necessitate an upgrade or patch. We currently fail reasonably spectacularly. BTW. what is the release time-frame for FF3?

Jochen Wiedmann

Comment 36

•

18 years ago

(In reply to comment #33) > We would certainly prefer if there was an option to turn this behaviour on/off > - with off as standard, and then turn it on by default in a later release. I second this as a call for more time.

Mike Kaply [:mkaply]

Comment 37

•

18 years ago

Can you give a brief explanation of why this breaks your code? What new codepath does this cause?

Christopher Owen

Comment 38

•

18 years ago

The issue with JIRA also affects Confluence as we use the same underlying multipart parser. We also accept that it is Confluence that is broken with regard to this and not Minefield. I'd like to propose that a switch be introduced so that web application may opt-in to have these data submitted as part of a form post. This would aid transition for broken implementations while allowing interested (and working) servers to use the new functionality. Maybe a meta element switch? e.g. <meta name="form.include.multipart.content-type" content="true" /> or something similar (instead of a global switch for the page you might want to have a space delimited list of form ids to enable it on). I think it is great that this capability has been included as it has often caused me frustration when authoring web apps in the past but the pragmatist in me suggests that we need to phase this in (and not just for our sake). We will of course look to get upcoming releases of Confluence fixed.

Mike Kaply [:mkaply]

Comment 39

•

18 years ago

Again, can someone please explain how exactly this is breaking the servers? I'm curious to understand how it is failing. Thanks

Jochen Wiedmann

Comment 40

•

18 years ago

(In reply to comment #39) > Again, can someone please explain how exactly this is breaking the servers? > > I'm curious to understand how it is failing. Michael, in the case of Jira or Confluence, it simply means that *any* form containing an upload button is unusable. As you can imagine, Jira contains an upload button on almost any page. In other words, you cannot use Jira, or Confluence with current Gran Paradiso. Indeed, I have stopped using Gran Paradiso immediately, after I understood that I can switch off these problems by using Firefox. Likewise, this would prevent me to upgrade to Firefox 3, if it should contain the same change.

Martin v. Löwis

Comment 41

•

18 years ago

> Michael, in the case of Jira or Confluence, it simply means that *any* form > containing an upload button is unusable. Jochen, unfortunately, I think this does not answer Michael's question. He did not ask *what* exactly breaks, but *how* exactly it breaks. I.e. what specific algorithm on the server is invoked that works if Content-type is not included, but fails if it is included. E.g. what specific if condition in what specific source file of what specific library starts to misbehave.

Jed Wesley-Smith

Comment 42

•

18 years ago

well, when we investigate and fix we'll provide you the diff if you like. The actual library is the pell-multipart-request plugin for webwork, our fork of which is here: https://svn.atlassian.com/svn/public/contrib/tools/pell-multipart-request/trunk We have not investigated the actual errant code yet as the fix is not scheduled and the most relevant thing right now is the fact that it occurs at all. We may not even fix pell-multipart-request but write our own multipart handler from scratch.

Jochen Wiedmann

Comment 43

•

18 years ago

(In reply to comment #42) > We may not even fix pell-multipart-request but write our own multipart handler > from scratch. OT: Before doing that, please consider using one of the multipart related Apache libraries, like commons-fileupload, or Mime4J. I am the author of the streaming API for commons-fileupload and the author of the pull parser API for Mime4J and absolutely willing to support, possibly as part of a contract, or as part of my Apache work. Helping you will ultimately help me.

Martin v. Löwis

Comment 44

•

18 years ago

(In reply to comment #42) From inspection, it looks like the problem is in /src/main/java/http/utils/multipartrequest/MultipartRequest.java:MultipartRequest.parse, specifically // At the top of loop, we assume that the Content-Disposition line is next, otherwise we are at the end. This assumption now breaks; the first thing in the part will be Content-type, not Content-disposition. It seems that switching the order of the headers (i.e. putting Content-type after Content-disposition) might restore interoperability: the library later does expect that Content-type may follow before the actual data. In particular, a comment says // FIX 1.14 IE Problem still: Check for content-type and extra line even though no file specified. So apparently, MSIE already sends Content-type in other parts (at least in some releases under some circumstances), so if Firefox does the same, interoperability should be good for all sites that also support MSIE. Notice that the library explicitly supports Content-type being sent for file uploads (which it detects by checking for the presence of the filename= parameter in Content-disposition). For Firefox, I would recommend that just the order of headers is switched. For pell-multipart-request, the right fix would be to read all header lines in each part until an empty line is seen, and extract content-disposition and content-type while doing so.

Boris Zbarsky [:bzbarsky]

Comment 45

•

18 years ago

Martin, thanks for looking into this! This is actually quite interesting. For file upload fields, we send: 800 NS_LITERAL_CSTRING("Content-Disposition: form-data; name=\"") 801 + nameStr + NS_LITERAL_CSTRING("\"; filename=\"") 802 + filenameStr + NS_LITERAL_CSTRING("\"" CRLF) 803 + NS_LITERAL_CSTRING("Content-Type: ") + aContentType 804 + NS_LITERAL_CSTRING(CRLF CRLF); We also send: 794 NS_LITERAL_CSTRING("Content-Transfer-Encoding: binary" CRLF); before that, but only if the browser.forms.submit.backwards_compatible preferense is false. It defaults to true. See bug 58189 and bug 83065 for that sordid story. Perhaps we should restore that behavior by default and make sure that header comes after Content-Disposition (so that pell-multipart-request's stupid assumptions are satisfied) but before Content-Type (so that PHP's stupid assumptions are satisfied, if it's still making those stupid assumption). This is a separate bug, in any case. Moving on, for other form fields, this patch made us send: + NS_LITERAL_CSTRING("Content-Type: text/plain; charset=") + mCharset + NS_LITERAL_CSTRING(CRLF) + NS_LITERAL_CSTRING("Content-Disposition: form-data; name=\"") + nameStr + NS_LITERAL_CSTRING("\"" CRLF CRLF) So indeed, the ordering is different. Let's switch that and see how compat looks?

Boris Zbarsky [:bzbarsky]

Comment 46

•

18 years ago

Attached patch Like so — Details — Splinter Review

Attachment #275968 - Flags: superreview?(jst)

Attachment #275968 - Flags: review?(jst)

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

18 years ago

Priority: P4 → --

QA Contact: madhur → form-submission

Target Milestone: mozilla1.2alpha → ---

Johnny Stenback (:jst)

Comment 47

•

18 years ago

Comment on attachment 275968 [details] [diff] [review] Like so Yeah, let's get this in and tested ASAP. r+sr=jst

Attachment #275968 - Flags: superreview?(jst)

Attachment #275968 - Flags: superreview+

Attachment #275968 - Flags: review?(jst)

Attachment #275968 - Flags: review+

Boris Zbarsky [:bzbarsky]

Comment 48

•

18 years ago

Checked in.

Status: REOPENED → RESOLVED

Closed: 18 years ago → 18 years ago

Resolution: --- → FIXED

monkeypox37

Comment 49

•

18 years ago

The patch works with the Arstechnica forums (EVE), nice work devs. :)

patrickdrd

Comment 50

•

18 years ago

please take a look at this one too: http://forums.mozillazine.org/viewtopic.php?p=3007352#3007352

Volkmar Kostka

Comment 51

•

18 years ago

Better see http://forums.mozillazine.org/viewtopic.php?t=574762 It is about: http://www.adslgr.com/forum/ a vBulletin forum with a similar failure.

patrickdrd

Comment 52

•

18 years ago

yes, that's my thread, anyone have an answer?

Boris Zbarsky [:bzbarsky]

Comment 53

•

18 years ago

I'm not sure what sort of answer you're looking for. The thread has no indication of the actual steps to reproduce the problem (especially steps that could be followed by someone who does not know modern Greek well). If you're still having a problem on that site with builds from this morning, check whether the issue started when the first patch for this bug got checked in? That would tell us whether this bug is even relevant to your problem.

patrickdrd

Comment 54

•

18 years ago

I don't know when this bug started, one thing I know though is that it started when I began using minefield, worked fine with fx 2.0.0.6 and gran paradiso! Someone that knows greek can follow these steps in order to reproduce it: 1. Login (or register if you don't have an account, then login) to http://www.adslgr.com 2. Goto any thread in the forum and try to post a quick reply clicking the submit button -> you'll get a please wait (must be div or something) message and the page hangs in there (no post takes place). However, if you go through the normal reply process, everything is ok.

Boris Zbarsky [:bzbarsky]

Comment 55

•

18 years ago

> I don't know when this bug started, In that case, please file a new bug so we can figure out whether what caused the problem, get blocking flags set as needed, etc. Note that this was hardly the only form submission change since the 1.8 branch. > Login (or register if you don't have an account, That's basically a non-starter, for what it's worth. Would you be willing to narrow down when the problem started using builds from http://archive.mozilla.org/pub/firefox/nightly/ and ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ ? You'll want the dated -trunk builds. Again, put the resulting information in the new bug you file. And please cc me on that bug

Boris Zbarsky [:bzbarsky]

Updated

•

18 years ago

Depends on: 392046

Boris Zbarsky [:bzbarsky]

Updated

•

18 years ago

No longer depends on: 392046

Benjamin Gavin

Comment 56

•

18 years ago

Hrm... this bug is showing back up in the nightly build... Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007081905 Minefield/3.0a8pre The Ars Technica forums no longer work [again]...

monkeypox37

Comment 57

•

18 years ago

I just tested with the 8/20 nightly and latest hourly: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a8pre) Gecko/2007082013 Minefield/3.0a7 ID:2007082013 And it still WFM. I looked back through Bonsai before testing and nothing jumped out at me, did it give the exact same error about MESSAGE_BODY being a required field or whatever?

Benjamin Gavin

Comment 58

•

18 years ago

It auto-upgraded to: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007082005 Minefield/3.0a8pre It's still broken for me, getting "TOPIC_MESSAGE_OID is a mandatory field. You must enter a value for it." when editing a post, and "MESSAGE_BODY is a mandatory field. You must enter a value for it." when posting a new message. 'Quick Reply' still works correctly as expected. The exact same messages I had been seeing prior to the fix.

Boris Zbarsky [:bzbarsky]

Comment 59

•

18 years ago

OK. So what are the two nightly (or even better hourly) builds between which the problem reappeared?

Boris Zbarsky [:bzbarsky]

Updated

•

18 years ago

Depends on: 392982

Boris Zbarsky [:bzbarsky]

Comment 60

•

18 years ago

Changing the order apparently causes bug 392982... Trying to figure out why.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 61

•

18 years ago

The question in comment 31 remains unanswered. Does fixing this bug actually fix any real-world problems? Or are we simply doing it to do what the spec says. It is at this point obvious that this bug is causing multiple sites to break, so there needs to be some significant value added in order for us, and our users, to be worth it.

Martin v. Löwis

Comment 62

•

18 years ago

(In reply to comment #61) > Does fixing this bug actually fix any real-world problems? Most definitely. Adding a Content-type allows to add a charset= parameter. This, in turn, allows to specify the encoding used to transmit the fields of the form. It resolves long-standing issues in entering non-ASCII data into forms, even if the page encoding is unknown or does not support the characters being entered. Past bugs that are addressed with the patch are Bug 324964 and Bug 135762; there probably have been more reports of this issue over the years.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 63

•

18 years ago

How do other browsers deal with this issue? I'm very unhappy about breaking as many sites as this potentially breaks. Couldn't sites that want to support other encodings use enctype attribute?

Martin v. Löwis

Comment 64

•

18 years ago

> Couldn't sites that want to support other encodings use enctype attribute? No. enctype specifies the Content-type for the entire POST message, not for the individual parts. It is "multipart/form-data" in all cases that are relevant for this bug - see the bug title. Please study all relevant specifications carefully.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 65

•

18 years ago

Well, no matter what the specs say we need to come up with a solution that doesn't break loads of sites. If the entire message is encoded using the encoding in enctype, aren't the individual parts going to encoded in that encoding too?

Martin v. Löwis

Comment 66

•

18 years ago

(In reply to comment #65) > Well, no matter what the specs say we need to come up with a solution that > doesn't break loads of sites. Is there any proof that the version proposed in comment #46 breaks a lot of sites? > If the entire message is encoded using the encoding in enctype, aren't the > individual parts going to encoded in that encoding too? Please, PLEASE read the specs before making statements like that. The enctype does not include an encoding.

Boris Zbarsky [:bzbarsky]

Comment 67

•

18 years ago

> Is there any proof that the version proposed in comment #46 It breaks Yahoo Mail at least (and therefore any site that uses the same server-side setup). And it's only been in the trunk for less than two weeks, which means it's not gotten any real testing yet. Note that breaking "lots" of sites is equivalent to breaking a few (or one) high-profile sites for compat purposes. Now I'm hopeful that Yahoo rolled their own thing and will fix it, but if that's not the case, this patch will need to come out.

Boris Zbarsky [:bzbarsky]

Comment 68

•

18 years ago

Two other notes. 1) We're at a point in the release cycle where the focus is on blockers, and this bug is not one of them. So effort to make this stuff work will need to come from people who deeply care about it. I suggest contacting Yahoo and seeing what they're up to, for a start. 2) If it turns out that we can't just enable it, the next obvious thing to try is a way for pages to opt into it. That could even get standardized by the HTML WG.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 69

•

18 years ago

Even if Yahoo fixes their thing I'm very worried that there are loads of other form libraries out there that do the same thing. If high profile professional sites like yahoo use sloppy parsing, you can bet that there are tons of home-rolled parsing libraries that do too. The burden of proof really goes the other way, we should have proof that the patch does not break sites. Especially with formats as old as this one. And extra especially now once we have seen that multiple sites break from various versions of the patch.

Robert Siemer

Comment 70

•

18 years ago

I see this bug report to get reopened because of bug 392982, so let me outline the key points of this bug report: -implement the standard -avoid breaking a bunch of sites that can't handle the standard I want to point out that this bug is not about implementing something else, a new non-standard thing or whatever. That's because: a) some proposed non-standard solutions (e.g. adding an proprietary HTTP-header) are not contraindicative with the standard solution itself, so no need to mix them b) there is already one non-standard solution for the problem ("_charset_" form field); I'm not going to fight for a second. c) my bug 379858 got closed referring to this one. I'm definitively going to reopen it if this one is drawn to something different So the real problem is that some sites choke when the browser talks standard to them. There is actually no provable complete solution to this problem as _any_ visible change could break a site. - If you can't find one, I can make one! (This is why I disagree with Jonas.) But that is not important. Important is to make sure that big, well known, old applications (web sites) see the old browser behavior if known to fail on the new one. Why "big, well known and old sites" only? -new apps will get tested with standard browsers like Firefox and the bug will be seen from the very beginning -"unknown" sites usually assume "the browser is right, the app is wrong" -small sites are unknown sites... (-: ... or have a flexible development team that corrects the problem in time How to detect these sites? A (manual) work intensive solution would be a (domain-/url-)blacklist. It is especially effective for the "old" criterion. As time passes the blacklist will grow slower and later on needs no maintaining at all as we can all assume that after some month/year any site in questions is either not old or not well known. <-: But I actually have a better idea, as I prefer solutions that need no manual work at all: check if the page with the form to submit has parsing errors. (I would like to say "renders in quirks mode", but that is not the same.) Pro: Yahoo Mail and any big corporate sites fail that test for sure (-: Contra: most other sites, especially new sites, do probably fail, too... Fazit: anyone keen on standards gets his/her solution, while anyone else sees the old behavior. - Problem solved. I have even more fine tuning in mind, but I will come back to that in my next comment. Robert

Boris Zbarsky [:bzbarsky]

Comment 71

•

18 years ago

Feel free to post patches to implement the behavior you think should be happening. Then we can discuss it.

David Nesting

Assignee

Comment 72

•

18 years ago

There are three components to a form submission: (1) the referrer, (2) the browser, and (3) the form processor. (1) and (3) may not be under the control of the same entity. If you are a site that gets many POSTs from 3rd-party sites, you can't possibly get all of them to include the _charset_ parameter in their forms unless you block their submissions until they do. By placing the character encoding either in the MIME headers of the multipart/form-data content, or within a (non-standard) HTTP header, it's not necessary for the form to "opt-in" for the form processor to benefit. Making this feature work as-is, but only with forms on pages rendered in standards compliance mode, helps only for "intra-site" form submissions. The real problem this feature is meant to solve is with form submissions made by unpredictable 3rd-party sites. The fact that the referring page is or is not standards-compliant may have nothing to do with how the form processor itself is written, which is really the barrier we seem to be facing today.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 73

•

18 years ago

I'm not a big fan of the parsing error solution. First off, like David brings up, it doesn't really solve the problem. Second, it seems very unpredictable and illogical for a web developer that if they change a completely separate part of the page, the form submission format changes. What would happen if yahoo would fix their web pages? Should we punish them by "breaking" their form submissions? There is no value in implementing standards for the sake of implementing standards. We implement standards to move the web forward. This standard is known to break sites making us, and probably many other browser vendors, very hesitant to implement it. As I've stated before, I don't want to ship a beta with yahoo broken. So if someone wants another solution, please provide a patch soon. Probably within a week.

Johnny Stenback (:jst)

Comment 74

•

18 years ago

Attached patch Backout of the previous attachment. — Details — Splinter Review

This patch is the reverse of the previous attachment in this bug. This is being backed out due to it causing regression bug 392982. I'm attaching this here partly to test a build with the previous patch backed out, there's no real differences between this patch and the reverse of the previous attachment.

Johnny Stenback (:jst)

Comment 75

•

18 years ago

Reopening since this got backed out. See bug 392982 for quite a bit of discussion around what this caused and how to possibly re-land this. Clearing blocking1.9+ on this bug as I don't think we'll have the time to look into a fix for this that doesn't cause bug 392982 in time for 1.9.

Status: RESOLVED → REOPENED

Flags: blocking1.9+

Resolution: FIXED → ---

Boris Zbarsky [:bzbarsky]

Comment 76

•

18 years ago

jst, I think you need to back out both patches that went in for this bug, not just the second one.... Otherwise you reintroduce bug 384270. Reinstating the blocking flag, and nominating for beta blocking, since now we're in a known-broken state that we shouldn't be shipping for beta. Once the first attachment is backed out, we should undo the blocker settings.

Flags: blocking1.9+

Target Milestone: --- → mozilla1.9 M9

Johnny Stenback (:jst)

Comment 77

•

18 years ago

Ok, backing out the other patch then too...

Johnny Stenback (:jst)

Comment 78

•

18 years ago

Attached patch Backout of both fixes that went in for this bug. — Details — Splinter Review

Boris, please have a look at this patch, this is a combined backout of the two fixes for this bug (already checked in).

Johnny Stenback (:jst)

Comment 79

•

18 years ago

Clearing blocker flags again as both parts of this bug are now backed out.

Flags: blocking1.9+

Boris Zbarsky [:bzbarsky]

Comment 80

•

18 years ago

Yeah, that second backout patch looks good.

Target Milestone: mozilla1.9 M9 → ---

Evan Jones

Comment 81

•

13 years ago

I believe this bug can be closed. HTML5 now explicitly forbids the Content-Type header: "The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified." http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data So this is either not a bug, or the HTML5 specification needs revision.

Boris Zbarsky [:bzbarsky]

Updated

•

13 years ago

Status: REOPENED → RESOLVED

Closed: 18 years ago → 13 years ago

Resolution: --- → WONTFIX

hussdl

Comment 82

•

11 years ago

So this is what Firefox is currently sending to my server [you will be able to guess which parts have been altered by me]: POST http://[removed] HTTP/1.1 Host: localhost:8080 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Referer: [removed] Connection: keep-alive Content-Type: multipart/form-data; boundary=---------------------------294571387113960 Content-Length: [whatever] -----------------------------294571387113960 Content-Disposition: form-data; name="utf-8" [some bytes which happen to be a utf-8 sequence] -----------------------------294571387113960 Content-Disposition: form-data; name="format" [some bytes which happen to be ascii text] -----------------------------294571387113960 Content-Disposition: form-data; name="text" [some bytes which happen to be a utf-8 sequence] -----------------------------294571387113960-- Please enlighten me: How is the server supposed to know that the encoding of the MIME parts is UTF-8? The MIME spec clearly states that in the absence of a Content-Type header, the correct content type is "text/plain;charset="us-ascii" (as stated in a 13 years old comment). What really bugs me is th�� and �� ? ��, ��, ��

Boris Zbarsky [:bzbarsky]

Comment 83

•

11 years ago

> How is the server supposed to know that the encoding of the MIME parts is UTF-8? By assuming it's the encoding of the page that the form was on. Yes, this sucks. When we tried to fix it, we discovered that too many servers are too broken to allow us to send that information in the POST data. If you have constructive suggestions for communicating that information, please raise them with the spec...

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: HTML: Form Submission → DOM: Core & HTML

Jochen Wiedmann

Comment 85

•

3 years ago

@Adel Mohammed: That is a simple question of server, and client, agreeing to override the spec. Admitted, in the case of Bugzilla, it is more likely the server dictating that override, and the clients being forced to agree. In general, that will work, too.

Besides, you are missing the point: Even with charset="US-ASCII", there will be cases, when another character set needs to be applicated.

For example, when sending an URL-Encoded string (Example: A form value), all the letters will be in US-ASCII. Nevertheless, the result will be a byte array, and there must be an agreement on how to convert those into a string value. Obviously, for characters, like German Umlauts, this can't be US-ASCII.

Use a pref to decide to attach the charset to content type 18 years ago Mike Kaply [:mkaply] 1.81 KB, patch		Details \| Diff \| Splinter Review
Adds Content-Type with charset to each form-data part 18 years ago David Nesting 1.21 KB, patch		Details \| Diff \| Splinter Review
Adds Content-Type with charset to each form-data part 18 years ago David Nesting 1.81 KB, patch	bzbarsky : review+ sicking : superreview+	Details \| Diff \| Splinter Review
Like so 18 years ago Boris Zbarsky [:bzbarsky] 3.41 KB, patch	jst : review+ jst : superreview+	Details \| Diff \| Splinter Review
Backout of the previous attachment. 18 years ago Johnny Stenback (:jst) 2.74 KB, patch		Details \| Diff \| Splinter Review
Backout of both fixes that went in for this bug. 18 years ago Johnny Stenback (:jst) 2.97 KB, patch		Details \| Diff \| Splinter Review