Open Bug 232191 Opened 17 years ago Updated 1 year ago

Character encoding of submitted form silently influenced by that of original page.

Categories

(Core :: DOM: Core & HTML, defect)

x86
Linux
defect
Not set
normal

Tracking

()

People

(Reporter: andreas.krueger, Assigned: jshin1987)

Details

(Keywords: intl)

Attachments

(1 file)

User-Agent:       
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040122 Debian/1.6-1

Upon posting, Mozilla will silently change the character encoding.

A page that was originally ISO-8859-1 will receive a post that is windows-1252.
A page that was originally ISO-8859-15 will receive a post that is ISO-8859-15.

However, Mozilla never tells, in the HTTP or HTML it sends,
which character encoding it is currently using.

Reproducible: Always
Steps to Reproduce:
1.  A HTTP/HTML page asks the browser to submit a form with a text field.

1a. The HTTP of that page says "HTTP/1.1 text/html; charset=ISO-8859-1"
1b. The HTTP of that page says "HTTP/1.1 text/html; charset=ISO-8859-15"

2. I type, into the browser representation of the text field,
   some text containing the European currency symbol.

3. The browser displays my European currency symbol correctly.

4. I submit the form.

Actual Results:  

5. The browser says, in either case, that what it's sending is
   "Content-Type: application/x-www-form-urlencoded".
   No character encoding information is given.

5a. The browser transcodes the EUR symbol as %80.
    This seems to be the encoding of the EUR symbol in windows-1252.
5b. The browser transcodes the EUR symbol as %a4.
    This seems to be the encoding of the EUR symbol in ISO-8859-15.


Expected Results:  
In my opinion, the HTTP POST operation should not rely on context
to tell the server which chartext is being used.

The information
   "Content-Type: application/x-www-form-urlencoded"
is not enough, additional information as to which specific charset is being
used should be given, unless the charset truely is "ISO-8859-1" only.

5a. "Content-Type: application/x-www-form-urlencoded; charset=windows-1252"
5b. "Content-Type: application/x-www-form-urlencoded; charset=ISO-8859-15"


This is somewhat related to bug 228779.
> to tell the server which chartext is being used.

Sorry about the "chartext" typo.  Make that "character encoding".
This may have to be made a dupe of bug 228779. I'm leaving it now for the now.


For a brief period, Mozilla  added 'charset' when submitting a form, but it
broke a lot of server-side programs (back in 2001?) so that Mozilla developers
reverted back to the old behavior. There's a bug on the issue. I'll refer to it
when I find it.
Assignee: form-submission → jshin
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
OS: Linux → All
(In reply to comment #3)
> There's a bug on the issue. I'll refer to it
> when I find it.

You mean bug 18643, I think.
OS: All → Linux
Firefox does not appear to send a Content-Type header with multipart/form-data parts.  According to RFC2388, this is where this information belongs.

Content-type: multipart/form-data; boundary=12345

--12345
Content-disposition: form-data; name="field1"
Content-type: text/plain; charset=iso-8859-1

value1
--12345

Even if we elect not to change the decision we make regarding character encodings, we should at least make an attempt to document what we're doing so that applications have some hope of figuring out how to interpret the text.
Assuming that bug 228779 remains outstanding, submitted "ISO-8859-1" form data is actually sent in Windows 1252 and this should be reflected in the headers for that part.
The suggestion in comment 5 was attempted in a patch to bug 116346.  This patch had to be reverted because it broke Yahoo! mail (bug 392982).  If it gets reincarnated at a later date, I think it would help resolve this bug.
QA Contact: form-submission
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.