259529 - Submitted form doesn't use best charset from accept-charset

Reporter

Description

•

20 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.7) Gecko/20040803 Firefox/0.9.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.7) Gecko/20040803 Firefox/0.9.3

In a <form accept-charset="iso-8859-1,utf-8">, data is always send as
iso-8859-1, even if it contains characters that are not in the iso-8859-1 table.
Only if accept-charset is set to "utf-8", Firefox sends utf-8 data.

The following table compares the POST data Firefox and IE 6 send. The text
entered in a <textarea> was "& ę ö" (Whatever appeares as second character here,
it's Unicode character 281. Nonetheless, in the following table everything's
displayed as intended.).

          iso-8859-1,utf-8  utf-8,iso-8859-1  iso-8859-1  utf-8
Firefox:  & &#281; ö        & &#281; ö        & &#281; ö  & Ä? Ã¶
IE 6:     & Ä? Ã¶           & Ä? Ã¶           & ê ö       & Ä? Ã¶

This is a somewhat serious problem since the server cannot distinguish between
the Unicode character 281 (send as &#281;) and the text &#281; (send as &#281;,
too), thus any reliable server side corrections are impossible.

Even though IE 6 fails at iso-8859-1, replacing character 281 with ê, he's
basically doing it right, using utf-8 whenever possible. If Firebird chose
iso-8859-1 because it's the one most supported: This does not apply here, if the
server says he accepts utf-8, we may use it.

Reproducible: Always
Steps to Reproduce:

timeless

Comment 1

•

20 years ago

afaik the spec doesn't demand this behavior

Assignee: bugs → form-submission

Severity: major → enhancement

Component: Form Manager → HTML: Form Submission

Product: Firefox → Browser

QA Contact: firefox.form-manager

Version: unspecified → Trunk

Sönke Tesch

Reporter

Comment 2

•

20 years ago

(In reply to comment #1)
> afaik the spec doesn't demand this behavior

Excuse me, the spec doesn't demand *usable* forms? Are you kidding?

Severity: enhancement → normal

Boris Zbarsky [:bzbarsky]

Comment 3

•

20 years ago

Reporter, does the behavior change if you use a space-separated list of charsets
instead of a comma-separated one?

We _should_ be taking the first charset listed that we support (since that's the
way Accept-* things tend to work), but we may be screwing this up for
comma-separated lists....

Sönke Tesch

Reporter

Comment 4

•

20 years ago

(In reply to comment #3)
> We _should_ be taking the first charset listed that we support (since that's the
> way Accept-* things tend to work), 

1. Then you're breaking the rules, simple as that. We're not talking about HTTP
here, this is HTML, and the HTML recommendation clearly states that "the client
must interpret" (must!) <form>'s accept-charset attribute "as an exclusive-or
list" (http://www.w3.org/TR/html4/interact/forms.html#h-17.3). There are no
quality values for this HTML attribute and there is no explicit ordering.

2. Even though I do admit that most authors will most likely order the charsets
the way they prefer them, there is absolutely no reason to just select the first
you support and thus possibly breaking the complete form processing. I'm not
even sure that the way you encode out-of-charset characters is defined somewhere
in the HTML form recommendations.

Please use the first charset you support _and_ that is able to transmit all
characters unmodified.

Nothing else I could add, except that "the specs don't demand" working forms and
"some other protocol does it like that" do not make any sense to me, at least
not in terms of why I should not be able to use some letter combinations with
Firefox. Sorry.

Boris Zbarsky [:bzbarsky]

Comment 5

•

20 years ago

(In reply to comment #4)
> 1. Then you're breaking the rules, simple as that.
...
> the HTML recommendation clearly states that "the client
> must interpret" (must!) <form>'s accept-charset attribute "as an exclusive-or
> list"

We're doing that.  Where do you see us not doing that?

> quality values for this HTML attribute and there is no explicit ordering.

Indeed.  So it's up to the user-agent to somehow select a charset.  The
algorithm we have implemented is "select the first one in the list".  Which is
perfectly compliant with the spec...

> I'm not even sure that the way you encode out-of-charset characters is defined
> somewhere in the HTML form recommendations.

It's not.  It's the de-facto standard, though

> Please use the first charset you support

That's a reasonable request, but a lot of work....  The problm you ran into is
actually a bit more extensive than this, though.  Note that "utf-8,iso-8859-1"
didn't send as utf-8 in your test.  This is why I asked you to test a
space-separated charset list in comment 3.

If my guess is right, we'll need to split this into two bugs -- one on the
parsing of the accept-charset attribute (which is a bug and a spec violation)
and one on an enhancement (desirable, but difficult) to a spec-compliant
behavior.  So if you coul do the test I asked you to do, that would be much
appreciated.

Phil Ringnalda (:philor)

Updated

•

15 years ago

Assignee: form-submission → nobody

QA Contact: form-submission

Nobody; OK to take it and work on it

Assignee

Updated

•

5 years ago

Component: HTML: Form Submission → DOM: Core & HTML

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

Submitted form doesn't use best charset from accept-charset

Categories

(Core :: DOM: Core & HTML, defect)

Tracking

()

People

(Reporter: st, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Updated