5313 - Accept-Charset for form is not implement.

Assignee

Description

•

28 years ago

(This bug imported from BugSplat, Netscape's internal bugsystem.  It
was known there as bug #56223
http://scopus.netscape.com/bugsplat/show_bug.cgi?id=56223
Imported into Bugzilla on 04/20/99 12:24)


Split from Bug 48964:
From: http://www.nagual.ru/~ache/n4w95.html#bug_list
1) Netscape not decode <FORM>s input from CP1251 (Russian Windows
default character set) to KOI8-R when needed. I.e. it totally
ignores ACCEPT-CHARSET="KOI8-R" <FORM> attribute and global HTML
page character set too for both <META> and HTTP header cases. See
Internationalization of the Hypertext Markup Language (RFC 2070)
for details. Look at http://www.nagual.ru/~ache/main.html#form_input
to see this bug in action.

Frank Tang

Assignee

Comment 1

•

28 years ago

*** Bug 48964 has been marked as a duplicate of this bug. ***

Frank Tang

Assignee

Comment 2

•

28 years ago

We don't plan to support Accept-Character in FORM according to the Multilingual
HTML RFC in Dogbert. Later this.

Katsuhiko Momoi

Comment 3

•

28 years ago

Per 6/30 I18n Latered Bug Meeting, this bug is marked as WONTFIX.
We should do review of RFC specs compliance and this bug should be
marked as a duplicate of that bug.

Katsuhiko Momoi

Comment 4

•

28 years ago

*** This bug has been marked as a duplicate of 75280 ***

jeffu

Comment 5

•

27 years ago

Duplicate bug, bulk Verified

Katsuhiko Momoi

Comment 6

•

26 years ago

This bug will be moved over to 5.0 for a review.
It is true that we don't do anything with Accpet-Charset
attribute for Form Input and TextArea.

The relevant section of the RFC 2070 is "5.1 DTD additions". This does
not seem to be a requirement but rather a recommendation for a user
agent. The recommended action upon encountering the Accept-Charset
attribute would be:

1) a warning to the user about what charset the form can accept, or
2) restrict the input charsets to those listed as the attribute values.

We need to decide if we should follow this requirement.

Erik van der Poel

Updated

•

26 years ago

Assignee: erik → bobj

Erik van der Poel

Comment 7

•

26 years ago

Bob, we need to decide who will own HTML form I18N issues.

bobj

Updated

•

26 years ago

Target Milestone: M7

bobj

Comment 8

•

26 years ago

Here's a reference:
http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.3:
  accept-charset = charset list [CI]
     This attribute specifies the list of character encodings for input
     data that must be accepted by the server processing this form. The
     value is a space- and/or comma-delimited list of charset values.
     The server must interpret this list as an exclusive-or list, i.e.,
     the server must be able to accept any single character encoding per
     entity received.

We need a strategy on supporting charset encodings in form submissions
   http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13

Frank Tang

Assignee

Comment 9

•

26 years ago

Bob, you forget to include the important paragraph:
The default value for this attribute is the reserved string "UNKNOWN". User
agents MAY interpret this value as the character encoding that was used
to transmit the document containing this FORM element.

Basicly, the HTML spec does not say that the user agent MUST return the value in
those charsets (and which one from the list ???). It only said the server MUST
be able to process these charsets. The user agent MAY interprete this value as
the character encoding that used to transmit. So in other word, this is an
invalid bug. Ignore this value does confirm to the HTML spec.

Katsuhiko Momoi

Comment 10

•

26 years ago

What is the relationship of RFC2070 to this bug? I thought that this bug was
originalyl about a case like the following:

1. The web designer wants restrict the input charset to those
   she/he specifies as the Accept-Charset attributes of Form.
2. Now if someone inputs into form, via a client, in a charset not
   listed as Accept-Charset attributes, then the client can
   either 1) warn the user that the input charset is not allowed by the
   form but send it anyway or 2) refuse to submit in that charset, or
   3) convert it to a charset which is of the same encoding family
      if that is possible.

3. If no Accept-Charset value is present, then it's the same as "UNKNOWN".
   If "UNKNOWN" is present, then it's still the same thing. But if
   explicit values are present, then we need to honor these and
   do one of the things listed in 2 above.

This is my interpreation of RFC 2070 and this seems to be also consistent
with what HTML 4.0 spec says about Accept-Charset in form.

These are all client-responsibilities.

bobj

Updated

•

26 years ago

Status: NEW → ASSIGNED

bobj

Updated

•

26 years ago

Target Milestone: M7 → M8

bobj

Updated

•

25 years ago

Target Milestone: M8 → M9

bobj

Comment 11

•

25 years ago

There are 2 content types into which form data can be encoded (enctype):
  (1) application/x-www-form-urlencoded
  (2) multipart/form-data
See: http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4

In case (1), there is no way to pass the charset encoding back to the
server, so I think we should continue with the current 4.x behavior or
encoding the form data set in the charset encoding of the form.

In case (2) (not supported prior to 5.0), we can specify the charset
of the form data being submitted by using the charset parameter in
the MIME content-type [see RFC2045].  I suggest that we try to listen
to the <FORM> accept-charset parameter by trying to convert the form
data set into the specified charset(s).  If it converts without
error, submit the converted data, otherwise try the next charset in
the accept-charset list.  If none of the listed charsets convert
without error, then default to the charset of the form.  But we always
include the charset parameter.

Comments?

Erik van der Poel

Comment 12

•

25 years ago

In case (1), there are 2 subcases (a) and (b):

  (a) method=get
  (b) method=post

In case (1)(a), it is not possible to send the charset label along with the
form submission. In case (1)(b), it *is* possible:

  Content-Type: application/x-www-form-urlencoded; charset=iso-8859-1

Note that the entire form submission must be in this charset, so we would have
to try converting all of the fields into that charset to see if it's OK.

Note also that we had problems with certain servers/CGIs when we tried this a
while ago (adding charset label in POST case).

In case (2), it is not necessary for the entire form submission to be in a
single charset, since you can label each field separately:

--AaB03x
content-disposition: form-data; name="field1"
content-type: text/plain;charset=windows-1250
content-transfer-encoding: quoted-printable

Joe Blow owes =80100.
--AaB03x

bobj

Comment 13

•

25 years ago

Good points.  But what are you recommending?

I don't think it is normally useful to submit different fields in
different charsets in the multipart/form-data case.

For (1b), we could modify the proposal to label the post with a charset.
But as you point out, it may cause problems for servers/CGI's which
cannot handle the parameter.  We could control the behavior by prefs
for cases (1b) and (2), with defaults off and on respectively?

I still like the first proposal.  It preserves backward compatibility
and HTML4 does recommend ("should") using multipart/form-data for non-ASCII:
  http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4

   The content type "application/x-www-form-urlencoded" is inefficient for
   sending large quantities of binary data or text containing non-ASCII
   characters. The content type "multipart/form-data" should be used for
   submitting forms that contain files, non-ASCII data, and binary data.

Content developers who want to add accept-charset, could also change
the forms to use multipart/form-data.

Erik van der Poel

Comment 14

•

25 years ago

I didn't intend to recommend that we use more than one charset in the form-data
case. I was just pointing out that our implementation *must* use a single
charset in the other case (1). It is probably better to use a single charset
in the form-data case, just to avoid needless confusion and so on, but I don't
feel too strongly about this.

Using prefs to control whether or not we append the charset in case (1b) is
probably a good idea. Those prefs do not need to be surfaced in UI, I think.

I also like the idea of trying to convert to one of the charsets in the
accept-charset attribute.

Furthermore, it might be a good idea to experiment with adding the charset
in case (1b).

Maybe we should even try adding a Content-Type header with a charset to the
request headers immediately following the GET command. GET doesn't have a body,
so it's abnormal, but it might work, and would allow CGIs to receive the
charset info.

Added Valeski to Cc list for opinions.

Frank Tang

Assignee

Comment 15

•

25 years ago

Currently the label part of (1b) is implemented in 5.0
see http://lxr.mozilla.org/mozilla/source/layout/html/forms/src/nsFormFrame.cpp
for details- look at #ifdef SPECIFY_CHARSET_IN_CONTENT_TYPE. We can easily
remove this feature/bugs byt comment out the #define
SPECIFY_CHARSET_IN_CONTENT_TYPE

I didn't do this for case (2). It should be easy- just change
1108   sprintf(buffer, "Content-type: %s; boundary=%s" CRLF, MULTIPART,
boundary);

Currently it decided the submission charset on what it believe the document is-
the same way we did in 1.x - 4.x

There is a method call GetSubmitCharset() which will return 1 charset .
Currently it return the charset of the document. We can change it to return 1
charset from the Accept-Charset list.

bobj

Updated

•

25 years ago

Assignee: bobj → ftang

Status: ASSIGNED → NEW

bobj

Comment 16

•

25 years ago

Assigned to ftang.  Here's my updated proposal:
  (1a) application/x-www-form-urlencoded, method=get
         Submit in charset of HTML form document (4.x behavior)  - Done
  (1b) application/x-www-form-urlencoded, method=post
         If pref-xxx enabled
            Submit in GetSubmitCharset() and label with charset parameter
         Else (4.x behavior)
            Submit in charset of form, and no charset parameter
  (2) multipart/form-data
         Submit in GetSubmitCharset() and label with charset parameter

GetSubmitCharset() would return either
   (i)  a valid charset from the prioritized accept-charset list, or
   (ii) form charset
A "valid charset" means that the data for submission can successfully be
converted into that charset.

Should the default for pref-xxx be disabled (4.x behavior) or enabled?

Do we want to consider Erik's suggestion for (1a) (under pref control):
  Maybe we should even try adding a Content-Type header with a charset to
  the request headers immediately following the GET command. GET doesn't
  body, have a so it's abnormal, but it might work, and would allow CGIs
  to receive the charset info.

Frank Tang

Assignee

Updated

•

25 years ago

Target Milestone: M9 → M12

Frank Tang

Assignee

Comment 17

•

25 years ago

move to M12

Frank Tang

Assignee

Updated

•

25 years ago

Status: NEW → ASSIGNED

Frank Tang

Assignee

Updated

•

25 years ago

Target Milestone: M12 → M11

Frank Tang

Assignee

Comment 18

•

25 years ago

move it back to M11

Frank Tang

Assignee

Updated

•

25 years ago

Priority: P2 → P3

tague

Updated

•

25 years ago

Assignee: ftang → tague

Status: ASSIGNED → NEW

msanz

Updated

•

25 years ago

QA Contact: ftang

msanz

Updated

•

25 years ago

Blocks: 16127

tague

Updated

•

25 years ago

Status: NEW → ASSIGNED

Frank Tang

Assignee

Updated

•

25 years ago

Assignee: tague → ftang

Status: ASSIGNED → NEW

Frank Tang

Assignee

Comment 19

•

25 years ago

reassign this to myself.

Frank Tang

Assignee

Updated

•

25 years ago

Status: NEW → ASSIGNED

Target Milestone: M11 → M12

msanz

Updated

•

25 years ago

No longer blocks: 16127

Frank Tang

Assignee

Updated

•

25 years ago

Assignee: ftang → bobj

Status: ASSIGNED → NEW

Target Milestone: M12 → M13

bobj

Updated

•

25 years ago

Status: NEW → ASSIGNED

bobj

Updated

•

25 years ago

Target Milestone: M13 → M14

Frank Tang

Assignee

Comment 20

•

25 years ago

Change OS to ALL

OS: Windows NT → All

Teruko Kobayashi

Updated

•

25 years ago

Keywords: beta1

bobj

Updated

•

25 years ago

Keywords: beta1

bobj

Updated

•

25 years ago

Target Milestone: M14 → M15

bobj

Updated

•

25 years ago

Target Milestone: M15 → M16

bobj

Comment 21

•

25 years ago

Reassigned to jbetak for Beta2.

Assignee: bobj → jbetak

Status: ASSIGNED → NEW

Keywords: beta2

jbetak@netscape.com (away - not reading bugmail)

Updated

•

25 years ago

Status: NEW → ASSIGNED

leger

Updated

•

25 years ago

Keywords: nsbeta2

msanz

Updated

•

25 years ago

Keywords: beta2

leger

Comment 22

•

25 years ago

Putting on [nsbeta2+] radar.  Feature, must fix by 5/16.

Whiteboard: [nsbeta2+][5/16][FEATURE]

bobj

Comment 23

•

25 years ago

Removed "[FEATURE]" from Status Whiteboard since this is really an old HTML
compliance bug originally logged in bugsplat against the old code base.

Whiteboard: [nsbeta2+][5/16][FEATURE] → [nsbeta2+][5/16]

John Dobbins

Comment 24

•

25 years ago

Attempted to test this bug.
clicked on link
result- error message "www.nagual.ru could not be found. Please check the name
and try again."

leger

Comment 25

•

25 years ago

Putting on [nsbeta2-] radar. Missed the Netscape 6 feature train.  Please set to 
MFuture.

Whiteboard: [nsbeta2+][5/16] → [nsbeta2-]

Mike

Comment 26

•

24 years ago

M16 has been out for a while now, these bugs target milestones need to be 
updated.

jbetak@netscape.com (away - not reading bugmail)

Comment 27

•

24 years ago

reassigning to ftang for resource reallocation

Assignee: jbetak → ftang

Status: ASSIGNED → NEW

Frank Tang

Assignee

Comment 28

•

24 years ago

add nsbeta3. We need this to compatabile with HTML 4.0.
The fix is local to one file and low risk. The only reason we have not do it yet
is because it is "local anf low risk".
We should fix this in nsbeta3.

Status: NEW → ASSIGNED

Keywords: nsbeta3

bobj

Comment 29

•

24 years ago

FYI:
 Subject: RE: URL-encode international characters in Java?
 Resent-Date: Fri, 7 Jul 2000 12:24:44 -0400 (EDT)
 Resent-From: www-international@w3.org
        Date: Fri, 7 Jul 2000 09:23:25 -0700
        From: Chris Wendt <christw@MICROSOFT.com>
          To: "'Martin J. Duerst'" <duerst@w3.org>,
             "'Vinod Balakrishnan'" <vinod@filemaker.com>,
             Lenny Turetsky <LTuretsky@salesforce.com>,
             "'www-international@w3c.org'" <www-international@w3c.org>,
             "'servlet-interest@java.sun.com'" <servlet-interest@java.sun.com>

From: Martin J. Duerst [mailto:duerst@w3.org]
Sent: Thursday, July 06, 2000 11:53 PM
>Does IE support the 'accept-charset' parameter on FORM?

Yes. In a _very_ limited fashion:
If (accept-charset includes "UTF-8") AND (input contains characters not
fitting the document charset) THEN submit in UTF-8, regardless of the
document charset.

Chris..

Frank Tang

Assignee

Comment 30

•

24 years ago

set it to P1 M18

Priority: P3 → P1

Target Milestone: M16 → M18

Frank Tang

Assignee

Comment 31

•

24 years ago

here is the patch http://warp/u/ftang/tmp/fix5313.txt
Index: src/nsFormFrame.cpp
===================================================================
RCS file: /m/pub/mozilla/layout/html/forms/src/nsFormFrame.cpp,v
retrieving revision 3.122
diff -u -r3.122 nsFormFrame.cpp
--- nsFormFrame.cpp     2000/07/12 23:31:07     3.122
+++ nsFormFrame.cpp     2000/07/21 23:29:09
@@ -25,6 +25,7 @@
 
 #define NS_IMPL_IDS
 #include "nsICharsetConverterManager.h"
+#include "nsICharsetAlias.h"
 #include "nsIPlatformCharset.h"
 #undef NS_IMPL_IDS 
 
@@ -970,7 +971,49 @@
   // XXX
   // We may want to get it from the HTML 4 Accept-Charset attribute first
   // see 17.3 The FORM element in HTML 4 for details
-
+  nsresult result = NS_OK;
+  nsAutoString acceptCharsetValue;
+  if (mContent) {
+    nsIHTMLContent* form = nsnull;
+    result = mContent->QueryInterface(kIHTMLContentIID, (void**)&form);
+    if (NS_SUCCEEDED(result) && (nsnull != form)) {
+      nsHTMLValue value;
+      result = form->GetHTMLAttribute(nsHTMLAtoms::acceptcharset, value);
+      if (NS_CONTENT_ATTR_HAS_VALUE == result) {
+        if (eHTMLUnit_String == value.GetUnit()) {
+          value.GetStringValue(acceptCharsetValue);
+        }
+      }
+      NS_RELEASE(form);
+    }
+  }
+#ifdef DEBUG_ftang
+  printf("accept-charset = %s\n", acceptCharsetValue.ToNewUTF8String());
+#endif
+  PRInt32 l = acceptCharsetValue.Length();
+  if(l > 0 ) {
+    PRInt32 offset=0;
+    PRInt32 spPos=0;
+    // get charset from charsets one by one
+    NS_WITH_SERVICE(nsICharsetAlias, calias, kCharsetAliasCID, &rv);
+    if(NS_SUCCEEDED(rv) && (nsnull != calias)) {
+      do {
+        spPos = acceptCharsetValue.FindChar(PRUnichar(' '),PR_TRUE, offset);
+        PRInt32 cnt = ((-1==spPos)?(l-offset):(spPos-offset));
+        if(cnt > 0) {
+          nsAutoString charset;
+          acceptCharsetValue.Mid(charset, offset, cnt);
+#ifdef DEBUG_ftang
+          printf("charset[i] = %s\n",charset.ToNewUTF8String());
+#endif
+          if(NS_SUCCEEDED(calias->GetPreferred(charset,oCharset)))
+            return;
+        }
+        offset = spPos + 1;
+      } while(spPos != -1);
+    }
+  }
+  // if there are no accept-charset or all the charset are not supported
   // Get the charset from document
   nsIDocument* doc = nsnull;
   mContent->GetDocument(doc);
@@ -987,6 +1030,9 @@
   nsAutoString charset;
   nsresult rv = NS_OK;
   GetSubmitCharset(charset);
+#ifdef DEBUG_ftang
+  printf("charset=%s\n", charset.ToNewCString());
+#endif
   
   // Get Charset, get the encoder.
   nsICharsetConverterManager * ccm = nsnull;

Whiteboard: [nsbeta2-] → [nsbeta2-]patch in hand need review.

Frank Tang

Assignee

Comment 32

•

24 years ago

Also, we need http://warp/u/ftang/tmp/fix5313also.txt

Index: src/nsHTMLAtomList.h
===================================================================
RCS file: /m/pub/mozilla/layout/html/base/src/nsHTMLAtomList.h,v
retrieving revision 3.17
diff -u -r3.17 nsHTMLAtomList.h
--- nsHTMLAtomList.h    2000/06/07 06:58:43     3.17
+++ nsHTMLAtomList.h    2000/07/21 23:31:27
@@ -53,7 +53,7 @@
 HTML_ATOM(abbr, "abbr")
 HTML_ATOM(above, "above")
 HTML_ATOM(accept, "accept")
-HTML_ATOM(acceptcharset, "acceptcharset")
+HTML_ATOM(acceptcharset, "accept-charset")
 HTML_ATOM(accesskey, "accesskey")
 HTML_ATOM(action, "action")
 HTML_ATOM(align, "align")

Frank Tang

Assignee

Updated

•

24 years ago

Whiteboard: [nsbeta2-]patch in hand need review. → [nsbeta2-][nsbeta3+]patch in hand need review.

Frank Tang

Assignee

Comment 33

•

24 years ago

check in. Mark it fix

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Teruko Kobayashi

Comment 34

•

24 years ago

Verified as fixed.

Status: RESOLVED → VERIFIED

Jesse Ruderman

Comment 35

•

24 years ago

*** Bug 5314 has been marked as a duplicate of this bug. ***