Closed
Bug 18431
Opened 26 years ago
Closed 26 years ago
Change SPECIFY_CHARSET_IN_CONTENT_TYPE to read from a pref.
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
VERIFIED
INVALID
People
(Reporter: gagan, Assigned: bobj)
Details
...and the default is off. Sadly the default on is breaking a LOT of
websites/cgi scripts that do a dumb compare on the content-type.
Updated•26 years ago
|
Summary: Change SPECIFY_CHARSET_IN_CONTENT_TYPE to read from a pref. → Change SPECIFY_CHARSET_IN_CONTENT_TYPE to read from a pref.
Updated•26 years ago
|
Assignee: ftang → bobj
Comment 1•26 years ago
|
||
could you list the konwn broken sites ?
Comment 2•26 years ago
|
||
Any site using ColdFusion
Hotmail (whatever they're using)
Sun Java Web Server
Java Servlet and JSP developers kit - I've reported this to Sun
Servlets on Apache + Apache JServ
See http://bugzilla.mozilla.org/show_bug.cgi?id=7533 for more details
(Note this lsit came form that bug report, I have only PErsonally had problems
with ColdFusion.
Comment 3•26 years ago
|
||
My vote is to never add the charset to the Content-Type header. So don't add
the pref for this. There's too much broken server-side software out there. I
doubt that we will ever get enough of them to fix their side, given that so
many client installations currently omit the charset parameter in form POSTs.
Comment 4•26 years ago
|
||
But then again Mozilla doesn't work with non standard versions of DOM either,
Why use standards if they breaks sites?
I know I'd prefer to be able to browse the web fully standards compliant, and if
a site is broken, I can notify them (and disable standards compliance to view
the site if I _Really_ need to).
I also know that Average Joe User doesn't give a damn about standards. If a site
doesn't work with Mozilla but works with IE/NS4.x then they will see Moz as
broken. These people will would never want the feature enabled.
I guess I jsut like standards compliance :)
Comment 5•26 years ago
|
||
Script authors won't bother to start supporting having the charset parameter in
the content-type until a browser does it, so I agree that we should make this
an option:
When submitting forms:
( ) Send MIME type "charset" parameter (standard compliant mode)
(*) Send "_charset_" field (compatability mode, default)
We can then use the time between 5.0 is released and 5.1 (or 6.0) is
released for evangelisation, with the aim of turning the feature on by
default at some later date.
This should probably be an advanced option hidden deep in the preferences
dialogs, so as not to scare people away.
Comment 6•26 years ago
|
||
We're not just talking about scripts here. Some server-side software is written
in other languages like C and C++.
Anyway, I still think we can do without the pref, but if we really want to add
the pref, let's not have any UI for it, and just have it in the prefs.js text
file.
Comment 7•26 years ago
|
||
Also, the pref we're talking about in *this* bug report should be for the
charset parameter in the Content-Type header *only*. The _charset_ field should
be a separate issue. I.e. if this pref is switched on, *both* the charset
parameter and the _charset_ field are sent out. I.e. the _charset_ field is
always sent out, so that server-side software only has to look at one thing
(the _charset_ field).
It seems silly for a client-side pref to control behavior that the server-side
depends upon. Here's a proposal:
The <FORM> element can have an attribute, accept-charset, which "specifies
the list of character encodings for input data that must be accepted by the
server processing this form". (This is a hint, because, the user-agent is
not REQUIRED to send back in any of these.) The absence of this attribute
implies an accept-charset value of UNKNOWN.
So, why don't we send the content-type charset parameter ONLY for forms
which explicitly declares the <FORM> accept-charset parameter. Otherwise,
we omit it. This would provide backwards compatibility and for forms that
wanted this data, they would use the <FORM> accept-charset parameter.
Bug 5313 is about supporting the <FORM> accept-charset parameter. For a ref
on this parameter see
http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.3:
Comment 9•26 years ago
|
||
That's a good idea for HTTP POST, but what about GET? GET isn't followed by a
Content-Type header, since the headers are not followed by a body (unlike POST).
I still like the _charset_ field.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 10•26 years ago
|
||
Based on the discussions so far (see bug 18643) I am marking this bug as
invalid. We don't need a pref.
Comment 11•26 years ago
|
||
Actually, I took another look at HTML 4.0 and found that it says that form
submissions of type application/x-www-form-urlencoded are always ASCII, and it
does not say anything about adding the charset parameter to that Content-Type
header. RFC2070 suggests adding the charset parameter, but RFC2070 is older
than HTML 4.0.
So 99% of the installed browsers out there comply with HTML4 as far as the
charset parameter is concerned, but those same browsers do not comply with HTML4
as far as using ASCII only is concerned.
Anyway, as far as *this* bug report is concerned, the people who want the pref
seem to want it because they want to comply with the spec, but, as I have just
pointed out, the spec does not call for the charset parameter, so marking this
bug INVALID was probably the right choice.
If somebody wants to log a bug saying that Mozilla does not comply with HTML4
because it uses non-ASCII data in those types of form submissions, please go
ahead.
Status: RESOLVED → VERIFIED
Comment 12•26 years ago
|
||
It's actually HTTP 1.1 (RFC 2616 Dates June 99) that specifies the need for the
charset in the content-type header:
from: http://www.ietf.org/rfc/rfc2616.txt
3.4.1 Missing Charset
Some HTTP/1.0 software has interpreted a Content-Type header without
charset parameter incorrectly to mean "recipient should guess."
Senders wishing to defeat this behavior MAY include a charset
parameter even when the charset is ISO-8859-1 and SHOULD do so when
it is known that it will not confuse the recipient.
Unfortunately, some older HTTP/1.0 clients did not deal properly with
an explicit charset parameter. HTTP/1.1 recipients MUST respect the
charset label provided by the sender; and those user agents that have
a provision to "guess" a charset MUST use the charset from the
content-type field if they support that charset, rather than the
recipient's preference, when initially displaying a document. See
section 3.7.1.
This RFC appears to be mroe recent than HTML 4.0
Comment 13•26 years ago
|
||
The first paragraph of 3.4.1 above could be talking about the Content-Type
header in either an HTTP request or response, but the 2nd paragraph is clearly
talking about the response. My gut feeling is that 3.4.1 was written with the
response in mind, even though the 1st paragraph doesn't clearly indicate that.
Anyway, as far as application/x-www-form-urlencoded is concerned, the charset
parameter is not a good idea. Normally, the charset parameter is appended after
subtypes of type text (e.g. text/plain, text/html), and there it is possible
for intermediaries (proxies, gateways, etc) to perform transcoding based solely
on the charset parameter. With application/x-www-form-urlencoded you cannot
perform such transcoding blindly because you first have to undo the %XX hex
encoding (assuming that the charset is intended to refer to the encoding
underneath the hex encoding). If the charset is only referring to the encoding
after hex encoding, then it is meaningless, since the application wants to know
the charset underneath the hex encoding.
So, I still don't think that the pref for adding the charset to form POST
Content-Types is a good idea.
You need to log in
before you can comment on or make changes to this bug.
Description
•