Closed
Bug 5313
Opened 28 years ago
Closed 24 years ago
Accept-Charset for form is not implement.
Categories
(Core :: Internationalization, defect, P1)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
M18
People
(Reporter: ftang, Assigned: ftang)
References
()
Details
(Whiteboard: [nsbeta2-][nsbeta3+]patch in hand need review.)
(This bug imported from BugSplat, Netscape's internal bugsystem. It was known there as bug #56223 http://scopus.netscape.com/bugsplat/show_bug.cgi?id=56223 Imported into Bugzilla on 04/20/99 12:24) Split from Bug 48964: From: http://www.nagual.ru/~ache/n4w95.html#bug_list 1) Netscape not decode <FORM>s input from CP1251 (Russian Windows default character set) to KOI8-R when needed. I.e. it totally ignores ACCEPT-CHARSET="KOI8-R" <FORM> attribute and global HTML page character set too for both <META> and HTTP header cases. See Internationalization of the Hypertext Markup Language (RFC 2070) for details. Look at http://www.nagual.ru/~ache/main.html#form_input to see this bug in action.
Assignee | ||
Comment 2•28 years ago
|
||
We don't plan to support Accept-Character in FORM according to the Multilingual HTML RFC in Dogbert. Later this.
Comment 3•28 years ago
|
||
Per 6/30 I18n Latered Bug Meeting, this bug is marked as WONTFIX. We should do review of RFC specs compliance and this bug should be marked as a duplicate of that bug.
Comment 6•26 years ago
|
||
This bug will be moved over to 5.0 for a review. It is true that we don't do anything with Accpet-Charset attribute for Form Input and TextArea. The relevant section of the RFC 2070 is "5.1 DTD additions". This does not seem to be a requirement but rather a recommendation for a user agent. The recommended action upon encountering the Accept-Charset attribute would be: 1) a warning to the user about what charset the form can accept, or 2) restrict the input charsets to those listed as the attribute values. We need to decide if we should follow this requirement.
Updated•26 years ago
|
Assignee: erik → bobj
Comment 7•26 years ago
|
||
Bob, we need to decide who will own HTML form I18N issues.
Here's a reference: http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.3: accept-charset = charset list [CI] This attribute specifies the list of character encodings for input data that must be accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The server must interpret this list as an exclusive-or list, i.e., the server must be able to accept any single character encoding per entity received. We need a strategy on supporting charset encodings in form submissions http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13
Assignee | ||
Comment 9•26 years ago
|
||
Bob, you forget to include the important paragraph: The default value for this attribute is the reserved string "UNKNOWN". User agents MAY interpret this value as the character encoding that was used to transmit the document containing this FORM element. Basicly, the HTML spec does not say that the user agent MUST return the value in those charsets (and which one from the list ???). It only said the server MUST be able to process these charsets. The user agent MAY interprete this value as the character encoding that used to transmit. So in other word, this is an invalid bug. Ignore this value does confirm to the HTML spec.
Comment 10•26 years ago
|
||
What is the relationship of RFC2070 to this bug? I thought that this bug was originalyl about a case like the following: 1. The web designer wants restrict the input charset to those she/he specifies as the Accept-Charset attributes of Form. 2. Now if someone inputs into form, via a client, in a charset not listed as Accept-Charset attributes, then the client can either 1) warn the user that the input charset is not allowed by the form but send it anyway or 2) refuse to submit in that charset, or 3) convert it to a charset which is of the same encoding family if that is possible. 3. If no Accept-Charset value is present, then it's the same as "UNKNOWN". If "UNKNOWN" is present, then it's still the same thing. But if explicit values are present, then we need to honor these and do one of the things listed in 2 above. This is my interpreation of RFC 2070 and this seems to be also consistent with what HTML 4.0 spec says about Accept-Charset in form. These are all client-responsibilities.
Comment 11•25 years ago
|
||
There are 2 content types into which form data can be encoded (enctype): (1) application/x-www-form-urlencoded (2) multipart/form-data See: http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4 In case (1), there is no way to pass the charset encoding back to the server, so I think we should continue with the current 4.x behavior or encoding the form data set in the charset encoding of the form. In case (2) (not supported prior to 5.0), we can specify the charset of the form data being submitted by using the charset parameter in the MIME content-type [see RFC2045]. I suggest that we try to listen to the <FORM> accept-charset parameter by trying to convert the form data set into the specified charset(s). If it converts without error, submit the converted data, otherwise try the next charset in the accept-charset list. If none of the listed charsets convert without error, then default to the charset of the form. But we always include the charset parameter. Comments?
Comment 12•25 years ago
|
||
In case (1), there are 2 subcases (a) and (b): (a) method=get (b) method=post In case (1)(a), it is not possible to send the charset label along with the form submission. In case (1)(b), it *is* possible: Content-Type: application/x-www-form-urlencoded; charset=iso-8859-1 Note that the entire form submission must be in this charset, so we would have to try converting all of the fields into that charset to see if it's OK. Note also that we had problems with certain servers/CGIs when we tried this a while ago (adding charset label in POST case). In case (2), it is not necessary for the entire form submission to be in a single charset, since you can label each field separately: --AaB03x content-disposition: form-data; name="field1" content-type: text/plain;charset=windows-1250 content-transfer-encoding: quoted-printable Joe Blow owes =80100. --AaB03x
Comment 13•25 years ago
|
||
Good points. But what are you recommending? I don't think it is normally useful to submit different fields in different charsets in the multipart/form-data case. For (1b), we could modify the proposal to label the post with a charset. But as you point out, it may cause problems for servers/CGI's which cannot handle the parameter. We could control the behavior by prefs for cases (1b) and (2), with defaults off and on respectively? I still like the first proposal. It preserves backward compatibility and HTML4 does recommend ("should") using multipart/form-data for non-ASCII: http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4 The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data. Content developers who want to add accept-charset, could also change the forms to use multipart/form-data.
Comment 14•25 years ago
|
||
I didn't intend to recommend that we use more than one charset in the form-data case. I was just pointing out that our implementation *must* use a single charset in the other case (1). It is probably better to use a single charset in the form-data case, just to avoid needless confusion and so on, but I don't feel too strongly about this. Using prefs to control whether or not we append the charset in case (1b) is probably a good idea. Those prefs do not need to be surfaced in UI, I think. I also like the idea of trying to convert to one of the charsets in the accept-charset attribute. Furthermore, it might be a good idea to experiment with adding the charset in case (1b). Maybe we should even try adding a Content-Type header with a charset to the request headers immediately following the GET command. GET doesn't have a body, so it's abnormal, but it might work, and would allow CGIs to receive the charset info. Added Valeski to Cc list for opinions.
Assignee | ||
Comment 15•25 years ago
|
||
Currently the label part of (1b) is implemented in 5.0 see http://lxr.mozilla.org/mozilla/source/layout/html/forms/src/nsFormFrame.cpp for details- look at #ifdef SPECIFY_CHARSET_IN_CONTENT_TYPE. We can easily remove this feature/bugs byt comment out the #define SPECIFY_CHARSET_IN_CONTENT_TYPE I didn't do this for case (2). It should be easy- just change 1108 sprintf(buffer, "Content-type: %s; boundary=%s" CRLF, MULTIPART, boundary); Currently it decided the submission charset on what it believe the document is- the same way we did in 1.x - 4.x There is a method call GetSubmitCharset() which will return 1 charset . Currently it return the charset of the document. We can change it to return 1 charset from the Accept-Charset list.
Comment 16•25 years ago
|
||
Assigned to ftang. Here's my updated proposal: (1a) application/x-www-form-urlencoded, method=get Submit in charset of HTML form document (4.x behavior) - Done (1b) application/x-www-form-urlencoded, method=post If pref-xxx enabled Submit in GetSubmitCharset() and label with charset parameter Else (4.x behavior) Submit in charset of form, and no charset parameter (2) multipart/form-data Submit in GetSubmitCharset() and label with charset parameter GetSubmitCharset() would return either (i) a valid charset from the prioritized accept-charset list, or (ii) form charset A "valid charset" means that the data for submission can successfully be converted into that charset. Should the default for pref-xxx be disabled (4.x behavior) or enabled? Do we want to consider Erik's suggestion for (1a) (under pref control): Maybe we should even try adding a Content-Type header with a charset to the request headers immediately following the GET command. GET doesn't body, have a so it's abnormal, but it might work, and would allow CGIs to receive the charset info.
Assignee | ||
Updated•25 years ago
|
Target Milestone: M9 → M12
Assignee | ||
Comment 17•25 years ago
|
||
move to M12
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Updated•25 years ago
|
Target Milestone: M12 → M11
Assignee | ||
Comment 18•25 years ago
|
||
move it back to M11
Assignee | ||
Updated•25 years ago
|
Priority: P2 → P3
Assignee | ||
Updated•25 years ago
|
Assignee: tague → ftang
Status: ASSIGNED → NEW
Assignee | ||
Comment 19•25 years ago
|
||
reassign this to myself.
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Target Milestone: M11 → M12
Assignee | ||
Updated•25 years ago
|
Assignee: ftang → bobj
Status: ASSIGNED → NEW
Target Milestone: M12 → M13
Comment 21•25 years ago
|
||
Reassigned to jbetak for Beta2.
Updated•25 years ago
|
Status: NEW → ASSIGNED
Comment 22•25 years ago
|
||
Putting on [nsbeta2+] radar. Feature, must fix by 5/16.
Whiteboard: [nsbeta2+][5/16][FEATURE]
Comment 23•25 years ago
|
||
Removed "[FEATURE]" from Status Whiteboard since this is really an old HTML compliance bug originally logged in bugsplat against the old code base.
Whiteboard: [nsbeta2+][5/16][FEATURE] → [nsbeta2+][5/16]
Comment 24•25 years ago
|
||
Attempted to test this bug. clicked on link result- error message "www.nagual.ru could not be found. Please check the name and try again."
Comment 25•25 years ago
|
||
Putting on [nsbeta2-] radar. Missed the Netscape 6 feature train. Please set to MFuture.
Whiteboard: [nsbeta2+][5/16] → [nsbeta2-]
Comment 26•24 years ago
|
||
M16 has been out for a while now, these bugs target milestones need to be updated.
Comment 27•24 years ago
|
||
reassigning to ftang for resource reallocation
Assignee: jbetak → ftang
Status: ASSIGNED → NEW
Assignee | ||
Comment 28•24 years ago
|
||
add nsbeta3. We need this to compatabile with HTML 4.0. The fix is local to one file and low risk. The only reason we have not do it yet is because it is "local anf low risk". We should fix this in nsbeta3.
Status: NEW → ASSIGNED
Keywords: nsbeta3
Comment 29•24 years ago
|
||
FYI: Subject: RE: URL-encode international characters in Java? Resent-Date: Fri, 7 Jul 2000 12:24:44 -0400 (EDT) Resent-From: www-international@w3.org Date: Fri, 7 Jul 2000 09:23:25 -0700 From: Chris Wendt <christw@MICROSOFT.com> To: "'Martin J. Duerst'" <duerst@w3.org>, "'Vinod Balakrishnan'" <vinod@filemaker.com>, Lenny Turetsky <LTuretsky@salesforce.com>, "'www-international@w3c.org'" <www-international@w3c.org>, "'servlet-interest@java.sun.com'" <servlet-interest@java.sun.com> From: Martin J. Duerst [mailto:duerst@w3.org] Sent: Thursday, July 06, 2000 11:53 PM >Does IE support the 'accept-charset' parameter on FORM? Yes. In a _very_ limited fashion: If (accept-charset includes "UTF-8") AND (input contains characters not fitting the document charset) THEN submit in UTF-8, regardless of the document charset. Chris..
Assignee | ||
Comment 31•24 years ago
|
||
here is the patch http://warp/u/ftang/tmp/fix5313.txt Index: src/nsFormFrame.cpp =================================================================== RCS file: /m/pub/mozilla/layout/html/forms/src/nsFormFrame.cpp,v retrieving revision 3.122 diff -u -r3.122 nsFormFrame.cpp --- nsFormFrame.cpp 2000/07/12 23:31:07 3.122 +++ nsFormFrame.cpp 2000/07/21 23:29:09 @@ -25,6 +25,7 @@ #define NS_IMPL_IDS #include "nsICharsetConverterManager.h" +#include "nsICharsetAlias.h" #include "nsIPlatformCharset.h" #undef NS_IMPL_IDS @@ -970,7 +971,49 @@ // XXX // We may want to get it from the HTML 4 Accept-Charset attribute first // see 17.3 The FORM element in HTML 4 for details - + nsresult result = NS_OK; + nsAutoString acceptCharsetValue; + if (mContent) { + nsIHTMLContent* form = nsnull; + result = mContent->QueryInterface(kIHTMLContentIID, (void**)&form); + if (NS_SUCCEEDED(result) && (nsnull != form)) { + nsHTMLValue value; + result = form->GetHTMLAttribute(nsHTMLAtoms::acceptcharset, value); + if (NS_CONTENT_ATTR_HAS_VALUE == result) { + if (eHTMLUnit_String == value.GetUnit()) { + value.GetStringValue(acceptCharsetValue); + } + } + NS_RELEASE(form); + } + } +#ifdef DEBUG_ftang + printf("accept-charset = %s\n", acceptCharsetValue.ToNewUTF8String()); +#endif + PRInt32 l = acceptCharsetValue.Length(); + if(l > 0 ) { + PRInt32 offset=0; + PRInt32 spPos=0; + // get charset from charsets one by one + NS_WITH_SERVICE(nsICharsetAlias, calias, kCharsetAliasCID, &rv); + if(NS_SUCCEEDED(rv) && (nsnull != calias)) { + do { + spPos = acceptCharsetValue.FindChar(PRUnichar(' '),PR_TRUE, offset); + PRInt32 cnt = ((-1==spPos)?(l-offset):(spPos-offset)); + if(cnt > 0) { + nsAutoString charset; + acceptCharsetValue.Mid(charset, offset, cnt); +#ifdef DEBUG_ftang + printf("charset[i] = %s\n",charset.ToNewUTF8String()); +#endif + if(NS_SUCCEEDED(calias->GetPreferred(charset,oCharset))) + return; + } + offset = spPos + 1; + } while(spPos != -1); + } + } + // if there are no accept-charset or all the charset are not supported // Get the charset from document nsIDocument* doc = nsnull; mContent->GetDocument(doc); @@ -987,6 +1030,9 @@ nsAutoString charset; nsresult rv = NS_OK; GetSubmitCharset(charset); +#ifdef DEBUG_ftang + printf("charset=%s\n", charset.ToNewCString()); +#endif // Get Charset, get the encoder. nsICharsetConverterManager * ccm = nsnull;
Whiteboard: [nsbeta2-] → [nsbeta2-]patch in hand need review.
Assignee | ||
Comment 32•24 years ago
|
||
Also, we need http://warp/u/ftang/tmp/fix5313also.txt Index: src/nsHTMLAtomList.h =================================================================== RCS file: /m/pub/mozilla/layout/html/base/src/nsHTMLAtomList.h,v retrieving revision 3.17 diff -u -r3.17 nsHTMLAtomList.h --- nsHTMLAtomList.h 2000/06/07 06:58:43 3.17 +++ nsHTMLAtomList.h 2000/07/21 23:31:27 @@ -53,7 +53,7 @@ HTML_ATOM(abbr, "abbr") HTML_ATOM(above, "above") HTML_ATOM(accept, "accept") -HTML_ATOM(acceptcharset, "acceptcharset") +HTML_ATOM(acceptcharset, "accept-charset") HTML_ATOM(accesskey, "accesskey") HTML_ATOM(action, "action") HTML_ATOM(align, "align")
Assignee | ||
Updated•24 years ago
|
Whiteboard: [nsbeta2-]patch in hand need review. → [nsbeta2-][nsbeta3+]patch in hand need review.
Assignee | ||
Comment 33•24 years ago
|
||
check in. Mark it fix
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 35•24 years ago
|
||
*** Bug 5314 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•