Closed Bug 324964 Opened 19 years ago Closed 14 years ago

FireFox converts form-data before posting.

Categories

(Firefox :: General, defect)

1.5.0.x Branch
x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: y.snaky, Unassigned)

Details

(Whiteboard: [CLOSEME 5-15-2010])

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

sorry for my english.

I choose "Western (ISO-8859-1)" character encoding and type some
characters than not exist in this characterset. For example, I type
the character "№" (cyrillic character). Then I submit a form.
Server script receive this data: "&#8470"
Why? It's HTML special characters! Why FF post them to server?
Current character encoding must NOT affect on posting data, it's only
for viewing pages. How about that I don't want output form data to browser
and just want to store it in database? For what I need HTML in my
datebase if I only post simply data?
As a result, server receive different data (with html special
characters or not) depending on current character encoding in FF. How must server
process it?

Form data must be urlencoded before posting, but not converted in
html, isn't it?

Reproducible: Always
So what do you suggest as the correct behaviour in this case? (That's a fairly rhetorical question, since I think there is no "right answer")

Background reading:
 http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html
 http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13
 http://whatwg.org/specs/web-forms/current-work/#unacceptableCharacters
Bug 35970
Bug 135762
Bug 228779
(In reply to comment #1)
> So what do you suggest as the correct behaviour in this case? (That's a fairly
> rhetorical question, since I think there is no "right answer")

http://www.ietf.org/rfc/rfc2388
There are no words about HTML-isation of data before posting.

Then I read your reference http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html:

"If I pasted the Windows matched-quotes into a form within an HTML document that had charset=windows-1252, then they went into the raw query string as %91 %92 %93 and %94 , which indeed are the %xx-codings of the matched quotes in codepage 1252. So far, so good.

If I did the same thing with the HTML document in charset=utf-8, then what got submitted were %E2%80%98, %E2%80%99, %E2%80%9C, %E2%80%9D, which are indeed the %xx-codings of the correct octet sequences for a utf-8 representation of the unicode characters U+2018 U+2019 U+201C U+201D. So that's behaving as expected.

However, the fun starts if I try submitting a form that's in charset=iso-8859-1 with this browser. What then turns up in the raw submitted string is this (taking just one example from the four):

              %26%238220%3B

Applying the %xx-decoding to that, we find that it reads

               “

in other words, a completely unsolicited HTML-isation has been performed on this input character. The result of submitting that single character is then totally indistinguishable from what happens if one types the character string "“" (without the quotes of course) into the text field. Both of them produce %26%238220%3B in the raw submitted string.
...
Well, "so far, so good". But my argument (if I hadn't already "missed the boat" on this) would be that once such HTML-ification has occurred, it's impossible to know whether the submission is an attempt to submit a single Unicode character, or an attempt to submit the character string &#number;"

However, it's about IE. But FF do the same (in other cases).
(In reply to comment #0)
> For example, I type the character "№" (cyrillic character). Then I submit a form. Server script receive this data: "&#8470"

I can be misconstrued here, because my cyrillic char was converted to HTML and my post lost the meaning :)
My first post must look like this:
---------
For example, I type the character "#" (cyrillic character). Then I submit a form. Server script receive this data: "&#8470"
---------
# - some character.

This discussion is just about that.
(In reply to comment #3)
> I can be misconstrued here, because my cyrillic char was converted to HTML and
> my post lost the meaning :)

Pls, set 'View | Character Encoding' to UTF-8 before posting here anything that includes non-ASCII characters.
After seeing that both FF and IE submitted characters with codes >= 256 in the same way, and reading through this and a few related bugs, I think I found the way to fix it: the form itself needs to be encoded as UTF-8 (meta http-equiv="Content-Type" content="text/html; charset=UTF-8").  When my test form was encoded as ISO-8859-1, both the character 256 (Latin Capital Letter A With Macron) and the string "Ā" were submitted with exactly the same encoding, whether the form method was "post" or "get".  Changing the form encoding to UTF-8, character 256 was sent as a UTF-8 2-byte sequence.

By the way, the W3C HTML 4.01 specification recommends that user agents represent non-ASCII characters in attribute values in UTF-8 (see http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1), but it does not say whether or not the encoding of the form data should depend on the encoding of the document containing the form.
This bug was reported on Firefox 2.x or older, which is no longer supported and will not be receiving any more updates. I strongly suggest that you update to Firefox 3.6.3 or later, update your plugins (flash, adobe, etc.), and retest in a new profile. If you still see the issue with the updated Firefox, please post here. Otherwise, please close as RESOLVED > WORKSFORME
http://www.mozilla.com
http://support.mozilla.com/kb/Managing+profiles
http://support.mozilla.com/kb/Safe+mode
Whiteboard: [CLOSEME 5-15-2010]
Version: unspecified → 1.5.0.x Branch
No reply, INCOMPLETE. Please retest with Firefox 3.6.x or later and a new profile (http://support.mozilla.com/kb/Managing+profiles). If you continue to see this issue with the newest firefox and a new profile, then please comment on this bug.
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.