Open Bug 259529 Opened 20 years ago Updated 2 years ago

Submitted form doesn't use best charset from accept-charset

Categories

(Core :: DOM: Core & HTML, defect)

x86
Windows XP
defect

Tracking

()

UNCONFIRMED

People

(Reporter: st, Unassigned)

Details

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.7) Gecko/20040803 Firefox/0.9.3 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.7) Gecko/20040803 Firefox/0.9.3 In a <form accept-charset="iso-8859-1,utf-8">, data is always send as iso-8859-1, even if it contains characters that are not in the iso-8859-1 table. Only if accept-charset is set to "utf-8", Firefox sends utf-8 data. The following table compares the POST data Firefox and IE 6 send. The text entered in a <textarea> was "& ę ö" (Whatever appeares as second character here, it's Unicode character 281. Nonetheless, in the following table everything's displayed as intended.). iso-8859-1,utf-8 utf-8,iso-8859-1 iso-8859-1 utf-8 Firefox: & &#281; ö & &#281; ö & &#281; ö & Ä? ö IE 6: & Ä? ö & Ä? ö & ê ö & Ä? ö This is a somewhat serious problem since the server cannot distinguish between the Unicode character 281 (send as &#281;) and the text &#281; (send as &#281;, too), thus any reliable server side corrections are impossible. Even though IE 6 fails at iso-8859-1, replacing character 281 with ê, he's basically doing it right, using utf-8 whenever possible. If Firebird chose iso-8859-1 because it's the one most supported: This does not apply here, if the server says he accepts utf-8, we may use it. Reproducible: Always Steps to Reproduce:
afaik the spec doesn't demand this behavior
Assignee: bugs → form-submission
Severity: major → enhancement
Component: Form Manager → HTML: Form Submission
Product: Firefox → Browser
QA Contact: firefox.form-manager
Version: unspecified → Trunk
(In reply to comment #1) > afaik the spec doesn't demand this behavior Excuse me, the spec doesn't demand *usable* forms? Are you kidding?
Severity: enhancement → normal
Reporter, does the behavior change if you use a space-separated list of charsets instead of a comma-separated one? We _should_ be taking the first charset listed that we support (since that's the way Accept-* things tend to work), but we may be screwing this up for comma-separated lists....
(In reply to comment #3) > We _should_ be taking the first charset listed that we support (since that's the > way Accept-* things tend to work), 1. Then you're breaking the rules, simple as that. We're not talking about HTTP here, this is HTML, and the HTML recommendation clearly states that "the client must interpret" (must!) <form>'s accept-charset attribute "as an exclusive-or list" (http://www.w3.org/TR/html4/interact/forms.html#h-17.3). There are no quality values for this HTML attribute and there is no explicit ordering. 2. Even though I do admit that most authors will most likely order the charsets the way they prefer them, there is absolutely no reason to just select the first you support and thus possibly breaking the complete form processing. I'm not even sure that the way you encode out-of-charset characters is defined somewhere in the HTML form recommendations. Please use the first charset you support _and_ that is able to transmit all characters unmodified. Nothing else I could add, except that "the specs don't demand" working forms and "some other protocol does it like that" do not make any sense to me, at least not in terms of why I should not be able to use some letter combinations with Firefox. Sorry.
(In reply to comment #4) > 1. Then you're breaking the rules, simple as that. ... > the HTML recommendation clearly states that "the client > must interpret" (must!) <form>'s accept-charset attribute "as an exclusive-or > list" We're doing that. Where do you see us not doing that? > quality values for this HTML attribute and there is no explicit ordering. Indeed. So it's up to the user-agent to somehow select a charset. The algorithm we have implemented is "select the first one in the list". Which is perfectly compliant with the spec... > I'm not even sure that the way you encode out-of-charset characters is defined > somewhere in the HTML form recommendations. It's not. It's the de-facto standard, though > Please use the first charset you support That's a reasonable request, but a lot of work.... The problm you ran into is actually a bit more extensive than this, though. Note that "utf-8,iso-8859-1" didn't send as utf-8 in your test. This is why I asked you to test a space-separated charset list in comment 3. If my guess is right, we'll need to split this into two bugs -- one on the parsing of the accept-charset attribute (which is a bug and a spec violation) and one on an enhancement (desirable, but difficult) to a spec-compliant behavior. So if you coul do the test I asked you to do, that would be much appreciated.
Assignee: form-submission → nobody
QA Contact: form-submission
Component: HTML: Form Submission → DOM: Core & HTML
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.