Closed
Bug 224820
Opened 21 years ago
Closed 19 years ago
POST/GET data gets corrupted in pages in non-byte oriented character encodings(UTF-16/32)
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: thesmart7, Assigned: jshin1987)
References
()
Details
(Keywords: dataloss, intl)
Attachments
(1 file, 6 obsolete files)
24.60 KB,
patch
|
Biesinger
:
review+
jshin1987
:
superreview+
asa
:
approval1.8b4+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; nl-NL; rv:1.6a) Gecko/20031030 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; nl-NL; rv:1.6a) Gecko/20031030 I am using the UTF-16 charset on my page. When data gets POSTed from a form, the POST data gets corrupted. You can see this on my demonstation page: http://members.lycos.nl/slu/test.php. Note that when the entire page is gargabe, you will have to reload the page. Everything is explained there. Reproducible: Always Steps to Reproduce: 1. Create a UTF-16 page 2. Make a form with method="post" 3. Look at the POST data Actual Results: The field names are strangely truncated to 1 character and the values are not really UTF-16 encoded. Expected Results: Mozilla should NOT truncate the field names and should use the (url-encoded) UTF-16 charset.
Comment 1•21 years ago
|
||
All of the nsFormSubmission code uses char* pointers for the data it plans to send to the server... the problem is that it needs to pass around lengths too; otherwise things break with UTF-16 and the like.
Assignee | ||
Comment 2•21 years ago
|
||
In addition, it seems like we need to add nsEscape/UnEscape() that accept the string length as a parameter. Currently, both use strlen(). nsISaveAsCharset also needs some change.
Assignee | ||
Comment 3•21 years ago
|
||
bz do you know where nsIFormProcessor interface is implemented. Its interface is baroque and I want to fix it, but the only implementation I can find is kind of dummy (nsTestFormProcessor).
Comment 4•21 years ago
|
||
jshin, that's used for <keygen>. See http://lxr.mozilla.org/mozilla/source/security/manager/ssl/src/nsKeygenHandler.h#43
Assignee | ||
Comment 5•21 years ago
|
||
I have a tentative patch, but we need to come up with what exactly we want when we send form data in UTF-16/32 over POST. For instance, how should we represent ' ' (U+0020 SPACE) if charset for POST is UTF-16LE? Which of the following should we use? 0x2b 0x2b 0x00 0x2b %00 %20 %00 How about 'a' (U+0061)? Which should we use? %61%00 a%00 The other way around, how about U+6100? %00a %00%61 Kester, btw, can you attach your test.php (source) here? Probably, we have to treat non-ASCII-compatible encodings (UTF-16/32) specially.
Assignee: form-submission → jshin
OS: Windows XP → All
Assignee | ||
Comment 6•21 years ago
|
||
Reporter | ||
Comment 7•21 years ago
|
||
- The Multi-Byte String extension (mb_string) needs to be enabled (enabled by default in some versions). - You can test any charset that is supported by PHP. There is a list in the document. I am using PHP 5, but I works correctly in PHP 4.3.2 too. Not tested in other versions. - Set the encoding of your editor to UTF-8, or you will not be able to see the Chinese and Japanese characters. An alternative is explained in the document. - Info on mb_string: http://www.php.net/mb_string
Assignee | ||
Comment 8•21 years ago
|
||
Thanks. I wanted to see how your code deciphers url-encoded forms to arrays. It's done in a single stroke with 'var_dump($_POST)'. I'm afraid I have to either look at PHP4/5 source code or come up with a way to get my hands on 'raw' data sent by a client (Mozilla and IE). Anyway, with my patch applied and UTF-16LE, Mozilla hands over 'a%00' for 'a' (or '%61%00'). At your test page, I got back 'a\0' and character count of '3'(I expected 1) For '一' (U+4E00), Mozilla hands over '%00%4E' (or '%00N' Note that 'N' is U+004E) and got back '\0N'. What do you think Mozilla has to produce for 'a' (U+0061) and '一' (U+4E00) if character encoding is UTF-16LE? The section on form submission in HTML 4 is clearly byte-oriented ('a' is always represented by a single octet 0x61) and can hardly be a guide for non-byte-oriented character encodings. For UTF-16/UTF-32, if we had to escape all characters, whatever little merit of using UTF-16 over UTF-8 (space saving for CJK) would be lost. Richard, what do you think about this?
Reporter | ||
Comment 9•21 years ago
|
||
JS, you can change line 94 (<?php var_dump($_POST); ?>) to <?php readfile("php://input"); ?> to see the raw POST data. You can also use both if you want :)
Assignee | ||
Comment 10•21 years ago
|
||
Thanks. My php installation is early 4.x and I couldn't make it work with your test file. Could you add that statement (for raw POST data dump) at http://members.lycos.nl/slu/test.php so that I can check what MS IE sends. I suspect MS IE may send data in '%uxxxx' form when UTF-16LE/BE is selected. In case of Mozilla, with my patch, Mozilla escapes byte-by-byte even for UTF-16LE/BE. For instance, '%41%00' for 'A', '%00%4E' for '一'(U+4E00), '%01%AC' for '각'(U+AC01) (UTF-16LE). PHP, however, doesn't interpret it as UTF-16LE. Mozilla has a pref. entry to turn on/off 'charset' parameter in content-type header of application/x-www-url-encoded. That's off by default because a lot of servers choke on it, but I turned it on but that brought no change. BTW, please ignore my comment on the merit (or lack thereof) of using UTF-16.
Reporter | ||
Comment 11•21 years ago
|
||
I was working on that :) I have updated my tool with some additional features: - Show raw post data and character grouping - Quick change of charset To be found at: http://members.lycos.nl/slu/test2.php
Assignee | ||
Comment 12•21 years ago
|
||
Thanks a lot. It turns out that MS IE sends POST data in url-escaped UTF-8. As you may know, by default, the option 'send all URLs by UTF-8' is on with MS IE. I didn't know that the option also controls how POST data is encoded. Somehow, PHP interprets it correctly.... Mozilla with my patch (slightly modified from the one uploaded here) behaves as I intended. It turns all octets (of UTF-16LE/BE) to '%xx' forms. Usually, the server-side program expects its form data (post or get) to come in the encoding of the original form. In this case, it's UTF-16LE/BE so that Mozilla sends POST data in url-escaped UTF-16LE/BE. It even adds 'charset=UTF-16LE' (UTF-16BE) to 'Content-Type: application/x-www-url-encoded'. Still, PHP doesn't interpret it as UTF-16LE/BE. It's clearly a bug on PHP side that it doesn't honor 'charset' parameter in C-T header. I'm not sure exactly what to do here. I'm a bit tempted to go ahead and refine my patch to ask for r/sr (partly because it does some clean-up of the form submission code). However, it doesn't help those who use PHP. It's interesting to see what other server side tools do (e.g. JSP, asp, Perl:CGI module etc) BTW, there's a patch (made a long time ago) that implements an option 'send URL always in UTF-8' a la MS IE. I guess I have to try to move it forward so that people who want to turn it on can do so.
Assignee | ||
Comment 13•21 years ago
|
||
Just in case it's not clear, if the option 'send all URLs by UTF-8' is turned on, POST and GET data are always sent by MS IE in url-escaped UTF-8 __regardless__ of the character encoding used in the form page.
Assignee | ||
Comment 14•21 years ago
|
||
This patch does what I had in mind when I came to this bug for the first time
in addition to cleaning up some code in nsFormSubmission. However, it'd be of
little practical value (in light of what PHP does). Still cleaning up the code
is worth pursuing, isn't it?
BTW, I was wrong about the following:
> It even adds 'charset=UTF-16LE' (UTF-16BE) to 'Content-Type:
application/x-www-form-url-encoded'.
That part of the code is blocked (I thought it's controlled by
form.backwards_compatible?? pref. entry). To experiment, I had to activate that
block of the code. That block was disabled because the majority of the
server-side programs choke on it two years ago. Things have changed and we may
have to consider what to do with them. Adding another pref. entry is an option
or we can overload another form-submission related prefs.
Attachment #135050 -
Attachment is obsolete: true
Assignee | ||
Comment 15•21 years ago
|
||
In addition to MS IE, Opera transmits POST data in url-encoded UTF-8 (fromUTF-16LE/BE pages). I'm not sure which way to take. I'm writing to w3-i18n WG to seek some guide. smontagu, what do you think?
Assignee | ||
Comment 16•21 years ago
|
||
I haven't heard anything back from W3C-I18N WG. Kester , do you have any opinion? If PHP5 expects UTF-8 from UTF-16 encoded pages and MS IE and Opera do the same, we may have to follow the suit. I'll test JSP and Perl:CGI module later. Alternatively, we can add a pref. entry (with NO UI) to specify the way UTF-16 form is handled. The default would be to convert to UTF-8 before submitting.
Status: NEW → ASSIGNED
Reporter | ||
Comment 17•21 years ago
|
||
Well, since PHP is a widely used form processor and major browsers (MSIE, Opera) do it, I think it's a good idea to convert the POST and GET data of ANY Unicode-encoded page into the UTF-8 charset. Indeed this pref entry should not be visible in the UI. If users mess with this setting, some form submissions will fail, and they do not know why. And, since MSIE and Opera convert by default, I agree that the default would be to convert to UTF-8. You mentionned earlier in one of your comments that MSIE always sends POST and GET data in UTF-8, regardless of the char encoding (with the option enabled). That is not entirely true. It only does that with the Unicode charsets. Mozilla should act the same way. <offtopic> It took me about an hour to write this comment, because I was thinking a lot about how all these charset problems should be solved... I could only produce this (well-thought) announcement: People of the World, USE one of the Unicode charsets for ALL purposes! And a message to developers: please make your applications support multibyte characters! JS, this is certainly nothing personal :) </offtopic>
Comment 18•20 years ago
|
||
Note that this is broken for GET too. See bug 248336.
Summary: corrupted POST data when using UTF-16 charset → corrupted POST/GET data when using UTF-16 charset
Comment 19•20 years ago
|
||
*** Bug 248336 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 20•20 years ago
|
||
what's been holding me here is that there's no spec. to follow. I posted to W3-I18N WG list last fall, but haven't heard back anything. A working-draft released by WHATWG doesn't offer much new here, either.
Summary: corrupted POST/GET data when using UTF-16 charset → POST/GET data gets corrupted in pages in non-byte oriented character encodings(UTF-16/32)
Comment 21•20 years ago
|
||
(In reply to comment #17) I was the one who found the bug with GET as well. You suggest that we use only one charset - this is not possible in my case as the application is desigh=ned a set of Unicode charset - the choice is the user's. I think Mozilla should follow IE's and support UTF-8 in the same way - that is only where necessary. Although in my case the entire application is in Unicode. Dennis
Assignee | ||
Comment 22•20 years ago
|
||
(In reply to comment #21) > (In reply to comment #17) > You suggest that we use only > one charset - this is not possible in my case as the application is desigh=ned > a set of Unicode charset - the choice is the user's. I think Mozilla should > follow IE's and support UTF-8 in the same way - that is only where necessary. I'm afraid I'm not following you. Mozilla does support UTF-8 very well. The problem here is HTML4 authors and any other spec. writers have given a thought to how to deal with __non-byte-oriented__ character encodings like UTF-16 and UTF-32 when it comes to the form submission. As already mentioned, MS IE and Opera send form data in UTF-8 when the encoding of a page of which a form is a part is UTF-16. That may be regarded as sensible, but web user agent is only one part of the equation. We also have to see what server side program does. PHP 5.x works well with that approach, but I haven't yet managed to check out other server-side tools. (comment #16)
Comment 23•20 years ago
|
||
(In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #17) > > You suggest that we use only > > one charset - this is not possible in my case as the application is desigh=ned > > a set of Unicode charset - the choice is the user's. I think Mozilla should > > follow IE's and support UTF-8 in the same way - that is only where necessary. > I'm afraid I'm not following you. Mozilla does support UTF-8 very well. The > problem here is HTML4 authors and any other spec. writers have given a thought > to how to deal with __non-byte-oriented__ character encodings like UTF-16 and > UTF-32 when it comes to the form submission. As already mentioned, MS IE and > Opera send form data in UTF-8 when the encoding of a page of which a form is a > part is UTF-16. That may be regarded as sensible, but web user agent is only one > part of the equation. We also have to see what server side program does. PHP 5.x > works well with that approach, but I haven't yet managed to check out other > server-side tools. (comment #16) Sorry - I did mean UTF-16. But note the normal browser handles UTF-16 fine - it is only the FORMS POST/GET processor which is wrong. See the application in the attachment to Bug 248336 which is UTF-16 encoded. Dennis
Assignee | ||
Comment 24•20 years ago
|
||
Re comment #23 > it is only the FORMS POST/GET processor which is wrong. See the > application in the attachment to Bug 248336 which is UTF-16 encoded. Would you please read comment #16, comment #8, comment #12, comment #14, comment #15, etc? Everybody knows/agrees that there's something to be done here (otherwise, this bug would have been closed. note that I even made two experimental patches), but exactly what to do remains uncertain partly because the spec authors didn't expect this situation (using non-byte-oriented character encodings in the form submission). GET is probably less problematic than POST. However,even for GET, we have to "decide" whether to convert UTF-16 to UTF-8 (as is done by MS IE/Opera) or just use UTF-16 and url-encode strings. In the meantime, I'd just convert UTF-16 files in your applications to UTF-8. I don't see what you gain by using UTF-16 instead of UTF-8 in your application.
Comment 25•20 years ago
|
||
(In reply to comment #24) Thanks for your suggestions. Frankly I don't know how my applications became UTF-16 encoded - I did not do this overtly. It was not an easy thing to convert the program to UTF-8 but I found a painless method. Yes, the program now works, and as far as I am concerned this is the end of my problems with Mozilla. Again - thanks Dennis
Comment 26•20 years ago
|
||
jshin: I don't understand the problem you are trying to solve. Could you send an e-mail to whatwg@whatwg.org explaining what you think it is that the specs leave undefined, so that we can define it in Web Forms 2.0? Thanks!
Assignee | ||
Comment 27•20 years ago
|
||
Thanks for the reminder. That's been in my TODO list for a while (I subscribed to the list a few weeks ago). Anyway, the following provision in section 5.3 of WHATWG's Web Form 2.0 clears the way. Sorry for missing that. This is consistent with what Opera and MS IE do now so that I'll implement this (perhaps a couple of weeks from now) > encode all the characters in the form data set, UTF-8. Character encodings that > are not mostly supersets of US-ASCII must not be used (this includes UTF-16 > and EBCDIC) even if specified in accept-charset attribute. BTW, it'd be nice to have a similar provision against usin UTF-16/UTF-32 (and perhaps EBCDIC) in section 5.5 as well.
Comment 28•20 years ago
|
||
Please send any requests to the list. (I don't see why you would want to not allow UTF-16 in section 5.5.)
Comment 29•19 years ago
|
||
*** Bug 294774 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 30•19 years ago
|
||
Comment on attachment 135051 [details] Charset Test Page Source <?php /******* Mozilla Test Page ******* Original File Name: test.php Author: Kester Everts, Nov 2003 - May 2005 Register Globals: Off *********************************/ // default to utf-8 $enc = isset($_GET['enc']) ? $_GET['enc'] : 'utf-8'; // the internal encoding used by the PHP parser is utf-8 mb_internal_encoding('utf-8'); /* the output encoding used by PHP is set through the enc parameter PHP recognizes the following charsets: UCS-4, UCS-4BE, UCS-4LE, UCS-2, UCS-2BE, UCS-2LE, UTF-32, UTF-32BE, UTF-32LE, UCS-2LE, UTF-16, UTF-16BE, UTF-16LE, UTF-8, UTF-7, ASCII, EUC-JP, SJIS, eucJP-win, SJIS-win, ISO-2022-JP, JIS, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-13, ISO-8859-14, ISO-8859-15, byte2be, byte2le, byte4be, byte4le, BASE64, 7bit, 8bit and UTF7-IMAP */ mb_http_output($enc); // if PHP does not recognize the charset, use utf-8 $enc = (mb_http_output() == 'pass') ? 'UTF-8' : mb_http_output(); // buffer the output until the page is entirely parsed, then // let it go through the mb_output_handler() to convert it to // any charset (the value of parameter enc) // btw, make sure that there is not another output_handler // defined in your php.ini that makes this malfunctional ob_start('mb_output_handler'); // lets begin... echo '<?xml version="1.0" encoding="'.htmlentities($enc).'">'."\r\n"; ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Testing form data corruption in the <?php echo htmlentities($enc); ?> charset</title> <style type="text/css"> body { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: small; color: #0000CC; background-color: #FFFFFF; } </style> <script type="text/javascript"> <!-- function changeCharset() { // get the drop down handle var el = document.getElementById("charset"); // get the number of the selected option var n = el.selectedIndex; // get the value of the selected option var val = el.options[n].value; // use that value to set the charset location.href = "charsettest.php?enc=" + val; } var group = true; function groupChars() { var el = document.getElementById("rawdata"); var btn = document.getElementById("groupbtn"); if(group) { var str = el.childNodes.item(0).nodeValue; str = str.replace(/\r\n|\r|\n|<br \/>|<br\/>|<br>/mgi, "") .replace(/(%.{2})/mg, "[$1] ") .replace(/(=)/mg, " $1 ") .replace(/(&|&)/mg, "\r\n$1") .replace(/(.{75})/mg,"$1\r\n"); el.childNodes.item(0).nodeValue = str; btn.value = "Degroup chars"; group = false; } else { el.childNodes.item(0).nodeValue = original; btn.value = "Group chars" group = true; } } //--> </script> </head> <body> <h1>Using <?php echo htmlentities($enc); ?>...</h1> <p>Fill in some data in the form below. You may fill in and test some Japanese and Chinese characters to see what happens.<br /> Chinese: 木村被問候妻女變臉發惡.<br /> Japanese: ウェブ全体から検索. </p> <!-- If the Japanese and Chinese chars do not render correctly, use this instead: Chinese: 木村被問候妻女變臉發 ;惡 Japanese: ウェブ全体から検索 --> <p><fieldset><legend>Unicode test form</legend><form action="charsettest.php?enc=<?php echo htmlentities($enc); ?>" method="post"> <label for="textfield">Name</label> <input type="text" name="name" accesskey="n" tabindex="1" id="name" /> <br /> <label for="email">E-mail</label> <input type="text" name="email" accesskey="e" tabindex="2" id="email" /> <br /> <input type="submit" tabindex="3" accesskey="s" /> You are submitting this form in <span style="font-weight:bold;"><?php echo htmlentities($enc); ?></span>. </form></fieldset></p> <p>Below will show POST data from <acronym title="Hypertext PreProcessor">PHP</acronym>'s <var>$_POST</var> variable (only if you have posted something).<hr /> <pre><?php var_dump($_POST); ?></pre> This is what PHP thinks is submitted. PHP's charset detection can only detect UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS and ISO-2022-JP. <hr /> Raw post data:<br /> <pre style="letter-spacing:1px; width:90%;" id="rawdata"><?php $raw = file_get_contents("php://input"); echo htmlspecialchars(wordwrap($raw, 76, "\r\n", true), ENT_NOQUOTES, 'utf-8'); ?></pre> This should be equal to the output of an HTTP sniffer.<br /><input id="groupbtn" type="button" value="Group chars" onclick="groupChars()" style="display: inline" /> <script type="text/javascript"> try { original = document.getElementById("rawdata").childNodes.item(0).nodeValue; } catch(e) { document.getElementById("groupbtn").style.display='none'; } </script> <hr /> </p> <p> You can switch the charset by modifying the <var>enc</var> parameter in the <acronym title="Uniform Resource Locator">URL</acronym> or by use of this:<br /> <label for="select">Charset</label> <select id="charset" accesskey="c" tabindex="4" onchange="changeCharset()"> <option value="utf-8">Choose charset...</option> <option value="utf-8">Unicode (UTF-8)</option> <option value="utf-16le">Unicode (UTF-16 Little Endian)</option> <option value="utf-16be">Unicode (UTF-16 Big Endian)</option> <option value="utf-32le">Unicode (UTF-32 Little Endian)</option> <option value="utf-32be">Unicode (UTF-32 Big Endian)</option> <option value="utf-7">Unicode (UTF-7)</option> <option value="iso-8859-1">Latin 1 (ISO-8859-1)</option> <option value="iso-8859-15">Latin 9 (ISO-8859-15)</option> </select> </p> </body> </html>
Reporter | ||
Comment 31•19 years ago
|
||
Comment on attachment 135051 [details] Charset Test Page Source New test page: http://test.silcana.net/charsettest.php
Assignee | ||
Comment 32•19 years ago
|
||
Now I use UTF-8 for UTF-16/32.
Attachment #135111 -
Attachment is obsolete: true
Assignee | ||
Comment 33•19 years ago
|
||
I tested the patch with a slightly modified test case (I added a textfield with 'non-ASCII' name) and it worked fine.
Attachment #184192 -
Attachment is obsolete: true
Assignee | ||
Comment 34•19 years ago
|
||
Comment on attachment 184222 [details] [diff] [review] patch update bz, can you review? If you like it, would you put sr as well? I'm afraid there's no active owner of this code.
Attachment #184222 -
Flags: review?(bzbarsky)
Comment 35•19 years ago
|
||
Comment on attachment 184222 [details] [diff] [review] patch update I can review this, but it might take a few days.
Comment 36•19 years ago
|
||
Comment on attachment 184222 [details] [diff] [review] patch update >Index: content/html/content/src/nsFormSubmission.cpp > @@ -342,46 +343,44 @@ nsFSURLEncoded::AddNameValuePair(nsIDOMH >+ if (!processedValue.IsEmpty()) >+ rv = URLEncode(processedValue, convValue); >+ else > rv = URLEncode(aValue, convValue); Style for the file (with good reason!) is that all if/else clauses have braces, even one-line ones. Also, why do we have this check? To handle cases when ProcessValue failed or something? Shouldn't we check rv, then? How can a form processor actually send an empty value if it wants to, with this code? >@@ -719,65 +715,62 @@ nsFSMultipartFormData::ProcessAndEncode( >+ if (!processedValue.IsEmpty()) { Again, what if the form processor wanst to send an empty string? > nsFSTextPlain::AddNameValuePair(nsIDOMHTMLElement* aSource, >+ if (!processedValue.IsEmpty()) { And here. No need for the explicit (void) casts when you're ignoring an rv, fwiw..... >@@ -1310,14 +1307,21 @@ nsFormSubmission::GetEncoder(nsGenericHT >+ if (!charset.Compare("UTF-16", PR_FALSE, 6) || >+ !charset.Compare("UTF-32", PR_FALSE, 6)) Was there a reason not to use StringBeginsWith() here? >+nsFormSubmission::UnicodeToNewBytes(const nsAString& aStr, >+ else >+ newBuffer = aStr; Curly braces, please. >+nsFormSubmission::EncodeVal(const nsAString& aStr, nsACString& >+ LossyCopyUTF16toASCII(aStr, aOut); Why not at least fall back to using UTF-8 here? Just to not change the old behavior, or some other reason? > nsFormSubmission::ProcessValue(nsIDOMHTMLElement* aSource, >-#ifdef DEBUG >- nsresult rv = >-#endif Please leave this ifdef -- it's there to prevent a build warning in opt builds.
Attachment #184222 -
Flags: review?(bzbarsky) → review-
Assignee | ||
Comment 37•19 years ago
|
||
asking Simon for r and Boris for sr. > >+ if (!processedValue.IsEmpty()) > >+ rv = URLEncode(processedValue, convValue); > >+ else > > rv = URLEncode(aValue, convValue); > > Style for the file (with good reason!) is that all if/else clauses have braces, > even one-line ones. Fixed. > Also, why do we have this check? To handle cases when ProcessValue failed or > something? Shouldn't we check rv, then? How can a form processor actually > send an empty value if it wants to, with this code? It's possible to send an empty value because |aValue| would be empty in that case. Anyway, I changed to check rv. > No need for the explicit (void) casts when you're ignoring an rv, fwiw..... fixed. > >+ if (!charset.Compare("UTF-16", PR_FALSE, 6) || > > Was there a reason not to use StringBeginsWith() here? Fixed. > >+ LossyCopyUTF16toASCII(aStr, aOut); > > Why not at least fall back to using UTF-8 here? Just to not change the old > behavior, or some other reason? Not to change the old behavior, but falling back to UTF-8 seems better. > > nsFormSubmission::ProcessValue(nsIDOMHTMLElement* aSource, > >-#ifdef DEBUG > >- nsresult rv = > >-#endif > > Please leave this ifdef -- it's there to prevent a build warning in opt builds. It's not necessary because I do use 'rv' in opt builds. I tested both get and post.
Assignee | ||
Updated•19 years ago
|
Attachment #135051 -
Attachment is obsolete: true
Attachment #184222 -
Attachment is obsolete: true
Attachment #185009 -
Flags: superreview?(bzbarsky)
Attachment #185009 -
Flags: review?(smontagu)
Comment 38•19 years ago
|
||
Comment on attachment 185009 [details] [diff] [review] patch update (with bz's concerns addressed) Looks great. Thanks for fixing this!
Attachment #185009 -
Flags: superreview?(bzbarsky) → superreview+
Assignee | ||
Comment 39•19 years ago
|
||
ping Simon :-) can you review? I'll update the comment about RFC 2231 (in light of recent RFCs about the form submission) in my tree.
Assignee | ||
Comment 40•19 years ago
|
||
Comment on attachment 185009 [details] [diff] [review] patch update (with bz's concerns addressed) asking cbie for r
Attachment #185009 -
Flags: review?(smontagu) → review?(cbiesinger)
Comment 41•19 years ago
|
||
Comment on attachment 185009 [details] [diff] [review] patch update (with bz's concerns addressed) content/html/content/src/nsFormSubmission.cpp nsFSURLEncoded::URLEncode(const nsAString& aStr, nsCString& aEncoded) [...] + char* escapedBuf = nsEscape(encodedBuf.get(), url_XPAlphas); aEncoded.Adopt(escapedBuf); Shouldn't this return an error if nsEscape returns null? @@ -1307,16 +1308,25 @@ nsFormSubmission::GetEncoder(nsGenericHT + if(charset.EqualsLiteral("ISO-8859-1")) { if you're touching this line, please also add a space before ( Also: Why does a case-sensitive comparison suffice? Does the caller convert the charset name to upper case? Ah.. the comment below says so... but I'd move the first line of it before the ISO-8859-1/windows-1252 part. What about UCS-2 as charset, and UCS-4 as charset name? + temp.SetCharAt(newBuffer.CharAt(len), z++); this won't work. SetCharAt fails if the new index is > the length if the string. - if(aStr && aStr[0]) { + if (!newBuffer.IsEmpty()) { hmm, you're changing what is checked, is that ok? + if (formControl && formControl->GetType() == NS_FORM_INPUT_HIDDEN) + CopyASCIItoUTF16(mCharset, aResult); + return NS_OK; ?? Either indentation is wrong, or you have missing braces. (looks like the latter)
Attachment #185009 -
Flags: review?(cbiesinger) → review-
Assignee | ||
Comment 42•19 years ago
|
||
(In reply to comment #41) > + char* escapedBuf = nsEscape(encodedBuf.get(), url_XPAlphas); > aEncoded.Adopt(escapedBuf); > Shouldn't this return an error if nsEscape returns null? Perhaps, yes. There's another place in the file that needs that check (not in the current patch). I'll change that, too. > Also: Why does a case-sensitive comparison suffice? Does the caller convert the > charset name to upper case? It's not just case-insenstive. By the time the program flows reaches there, all character encoding names are already resolved to *our* internal canonical names as listed in charsetalias.properties file. > What about UCS-2 as charset, and UCS-4 as charset name? See above. > + temp.SetCharAt(newBuffer.CharAt(len), z++); > > this won't work. SetCharAt fails if the new index is > the length if the > string. fixed in an upcoming patch. > > - if(aStr && aStr[0]) { > + if (!newBuffer.IsEmpty()) { > > hmm, you're changing what is checked, is that ok? Please, read the patch more carefully. I'm not changing what's checked. In the current code, aStr is assigned a new value in three or four places. I removed all of them and checking newBuffer, instead. > + if (formControl && formControl->GetType() == NS_FORM_INPUT_HIDDEN) > + CopyASCIItoUTF16(mCharset, aResult); > + return NS_OK; > > ?? Either indentation is wrong, or you have missing braces. (looks like the > latter) Thanks for catching it. Fixed.
Assignee | ||
Comment 43•19 years ago
|
||
I updated it per cbie's comments (as I noted in my previous comment) carrying over bz's sr here.
Attachment #185009 -
Attachment is obsolete: true
Attachment #187947 -
Flags: superreview+
Attachment #187947 -
Flags: review?
Assignee | ||
Comment 44•19 years ago
|
||
(In reply to comment #43) > Created an attachment (id=187947) [edit] > patch update addressing cbie's comments > > I updated it per cbie's comments (as I noted in my previous comment) I fixed other instances of SetCharAt without setting the length in advance in my tree.
Assignee | ||
Comment 45•19 years ago
|
||
Comment on attachment 187947 [details] [diff] [review] patch update addressing cbie's comments Forgot to specify cbie as a reviewer (see comment #44 as well)
Attachment #187947 -
Flags: review? → review?(cbiesinger)
Comment 46•19 years ago
|
||
> See above.
hm? what does UCS-2 get normalized to? It's different from UTF-16...
Comment 47•19 years ago
|
||
> hm? what does UCS-2 get normalized to? It's different from UTF-16... oh, or do we treat those as equivalent? > Please, read the patch more carefully. sorry about that, I missed the assignment to aStr.
Updated•19 years ago
|
Attachment #187947 -
Flags: review?(cbiesinger) → review+
Assignee | ||
Comment 48•19 years ago
|
||
Comment on attachment 187947 [details] [diff] [review] patch update addressing cbie's comments asking for a to trunk This is to support UTF-16/32 in the form submission. While fixing that, I also sanitized the code. I tested this patch quite a bit and it works rather well. Although it affects virtually all users (it affects the form submission code), I'd say that the risk is rather low considering my test result. Even if there's a regression, we'd better take the risk in 1.1a2 and fix it as soon as possible (because we want this *feature* in 1.1). If there's a regression, it'll be discovered and reported quickly.
Attachment #187947 -
Flags: approval-aviary1.1a2?
Updated•19 years ago
|
Attachment #187947 -
Flags: approval-aviary1.1a2? → approval1.8b4+
Assignee | ||
Comment 49•19 years ago
|
||
sorry I haven't marked this as resolved/fixed. The patch was landed a while ago.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•