Closed Bug 117422 Opened 18 years ago Closed 18 years ago
Using Turkish character "I" in edit boxes causes data loss after submitting the form
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7+) Gecko/20011221 BuildID: 20011221 If I use turkish character "small I (i without the dot)" in edit boxes or fields in forms, I cannot see the the rest of thing I have written after that letter after submitting it, all that data is lost. Situation doent change even if I change character coding for composition to Turkish ISO 8859-9 Reproducible: Always Steps to Reproduce: 1. Open a page with data entry (like bugzilla, or any forum pages) 2. write something with Turkish letter "small I" Actual Results: dataloss after "small I" character Expected Results: I could see the things I have written correctly after submission
Test case: This sentence is written without "small I" character This sentence is written with "small ý" character
Well, it seems working now :) but I swear it wasnt working.. (in linux and windows) Turkish character test: üðþýçö ÜÐÞÝÇÖ
Ok it seems working somehow, closing the bug for now.
Status: UNCONFIRMED → RESOLVED
Closed: 18 years ago
Resolution: --- → WORKSFORME
trying with Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.7+) Gecko/20011228 No turkish characters: "bahar aksamlari hava ilik olur" Turkish Characters: "bahar akþamlarý hava ýlýk olur"
same thing with western coding No turkish characters: "bahar aksamlari hava ilik olur" with Turkish Characters: "bahar akþamlarý hava ýlýk olur"
In this system (at work) I can verify this error. steps: 1.go to www.google.com 2.write "tirtil" to search box (but with turkish small I characters) 3.and press "Google search" Expectes result: Google searches the word "tirtil" Actual Result: Google searches for "t" not "tirtil" , all characters after first turkish I are lost Some prefs: Navigator Language Default character encoding : ISO 8859-1 Language En-US Appearance Fonts: Western can anyone try this??? you have to change your keyboard layout to Turkish-Q and also language to Turkish Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.6+) Gecko/20011212
Status: VERIFIED → UNCONFIRMED
Resolution: WORKSFORME → ---
I think there are 2 parts to this bug. The rendering problem, I'm thinking was due to regression bug 116230 which was causing everyone grief with text widgets,until it was fixed on 01/04/01. Would it be possible for you to try a more recent build? I can' seem to recreate the rendering problem you mentioned with my debug build from today. The 2nd problem is the dataloss on submission. I was speaking to ftang, and he said he thinks he knows what the problem is there. Giving this bug to ftang as he requested.
Assignee: kin → ftang
Confirmed on, Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.7+) Gecko/20020116 installed it, and didnt touch any of the prefs, and went to www.google.com written "týrtýl" and same result, it searched for "t" not "tirtil" and same situation for turkish character "s" (s with a dot below it, pronunciated like "sh" ) I agree with your second idea, this dataloss happen on submission, they are lost. I guess they have to be converted to special characters on submission, but somethin is wrong, in previous versions this didnt happen, and I dont know when this regression happened. I will try this with mozilla versions at home and try to find the last time it was working. This is really a show stopper for Mozilla in Turkey.
I checked my old mozilla nightlies and I can see this bug in September 16 builds but not in September 4 builds. So there is a check-in between 4-16 September that caused this regression. bug 81203 and some others may be related to this. I also observed that the code "mozilla/ layout/ html/ forms/ src/ nsFormFrame.cpp" is probably responsible for this. there are lots of code additions to that fie between 4-16 September, and versions 3.175 and 3.183 If I can check the versions between 4-16 september, I can make the time interval narrower. hope this helped.
I meant 4-16 October for the date interval of source of regression ..
I got different results with my old OS/2 nightlies. I checked and rechecked. I see that the problem was introduced between 8 AM on November 1st and 8 AM on November 2nd. I didn't see any specific checkin that look suspicious, but it is a long list. I specifically saw that with the 8AM Nov 2nd build, the search on www.google.com was done with "t" instead of the whole word.
duh. The new form submission code - http://bugzilla.mozilla.org/show_bug.cgi?id=34297
If I do this in Mozilla, I generate http://www.google.com/search?hl=en&q=t&btnG=Google+Search as url which mean the q is "t" If i do this in IE6, I got ich http://www.google.com/search?hl=en&q=t%26%23305%3Brt%26%23305%3Bl&btnG=Google+Search which mean the q is "t%26%23305%3Brt%26%23305%3Bl" escaped of "tırtıl"
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → mozilla0.9.9
nsbeta1+ per triage meeting. Still need to decide HOW to fix it (what to sent out)
jkeiser- I have a patch which will make our code match IE 6 behavior. Will attached soon. Please r=
Comment on attachment 70223 [details] [diff] [review] patch match up to IE6 nsFormSubmission::GetEncoder(): 1. Could you remove the aCtrlsModAtSubmit param and change encoder param to aEncoder while you're in there? (This wasn't your fault.) It looks like it's not doing anything. 2. Make nsString charset an nsAutoString (no mallocs required there), and move that block below nsresult rv = NS_OK; (with a blank line before it) 3. Use rv = CallCreateInstance(NS_SAVEASCHARSET_CONTRACTID, aEncoder) rather than a direct CreateInstance call. This is safer. Also remove kSaveCharsetCID definition, since you won't need it for CallCreateInstance(). (When assigning to nsCOMPtr<>, use do_CreateInstance().) 4. Do NS_ENSURE_SUCCESS(rv, rv); rather than if (NS_SUCCEEDED(rv)). It saves you the if() (and the indent) and does the same thing. 5. Do the same after Init(), just for clarity's sake. return NS_OK; at the end of the function. nsFormSubmission::UnicodeToNewBytes() 6. Not a requirement for r=, but would be really helpful: could you put a little JavaDoc comment on UnicodeToNewBytes() (the definition of the function in the class) for those of us who aren't int'l-savvy? nsIFormSubmission.h has methods similarly documented, you can find a format there. So what exactly does this buy us? Were we using nsIUnicodeEncoder wrong before? nsISaveAsCharset seems to wrap nsIUnicodeEncoder and do some extra stuff with it.
John: The problem happen when you type in a form which cannot be encoded with the character set of the page you are viewing. For example, if you are viewing an english page which encoded in ISO-8859-1 encoding, (which have 256 characters defined). Since we are an Unicode application, users could type some characters other than those ISO-8859-1 characters, or copy & paste from other pages (say japanese page like http://home.netscape.com/ja) In that case, when we convert by using unicode converter, it will STOP at the point that contains the character which cannot be encoded. For example, if user put in "abc" + Two chinese characters + "eft" , the unicode converter will convert "abc" . in 6.2.1, it will convert to "abc" + two question mark + "eft" since we set the error behavior to repalce fallback to '?' but someone remove that code (in the current GetEncoder, youc can see that part of code got commented out ). this is very bad because user could type "Software " + Japanese names + " hardware car" when they search google. if we convert to "software ??? hardware car" then at least the search engine can match "software" "hardware" and "car", but the current behavior will only send out "software" the "?" approach is not idea but acceptable. However, when I look at IE6, it seems they use NCR to encode these "cannot be converted" character, this is not specified in any standard but nor any standard currently address this issue. I think it is better to fix this to the IE6-non-standard way, instead of to fix it to the old Netscape-non-standard way. And since there are currently no standard which address it, we don't need to worry about standard issue here.
I also handle the case if the string is "" differently and remove some of the "won't build" #ifdef DEBUG_ftang part.
Comment on attachment 70334 [details] [diff] [review] v2 of the patch. 1. CallCreateInstance( - remove space after paren 2. if (aSrc && aSrc) - does this if really buy us anything? At the least, aSrc should not be null (it comes from an nsAString) so don't check that. If there's not a significant performance penalty to call aEncoder with aSrc == "", please just remove the if() entirely. With that, r=jkeiser.
I prefer to check the conversion error then send as UTF-8 (% escape if necessary) because I see recommendation to use UTF-8. For HREF, http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 International URI draft, http://search.ietf.org/internet-drafts/draft-masinter-url-i18n-08.txt Is there a standard for this (and IE is based on that)? I think IE parity is also important, so I am not pushing UTF-8 if nobody understand, cc to momoi.
That's done in a different step. We first take the Unicode value and translate it to the charset in question, and then we URLEncode it (which does the % thing).
>2. if (aSrc && aSrc) - does this if really buy us anything? if I don't do this , the converter will return conversion error since if I pass null string in. chat w/ nhotta about the spec he mention. We agree we may need to revisit this code once they Internet Draft got accept. From my view, the chances are remote since it will break most of the cgi.
OK, fine by me.
Comment on attachment 70334 [details] [diff] [review] v2 of the patch. sr=jst
Comment on attachment 70334 [details] [diff] [review] v2 of the patch. Hold the presses on this fix ... we're using EncodeVal()'s return value wrong. If it's nsnull, we currently crash, and now this code is returning nsnull on empty string. Emailing ftang.
are you saying that nameStr.Adopt(EncodeVal(aName)); will crash if EncodeVal(aName) return null ?
Yes, in fact I encountered that crash yesterday. It should be a minor change actually, if you just change it to return a new empty string, I'll r= and I'm sure jst will sr=.
Comment on attachment 70940 [details] [diff] [review] v3 of the patch. add nsCRT::strdup("") if it is null. If the converter returns null, the function should probably return null too (We will start handling null as an error at some point.) I would put that strdup in an else personally. r=jkeiser, assuming that was the only change from before. Your call whether to put this in an else, you know the converter better than I do.
>If the converter returns null, the function should probably return null then it will crash as NOW, right ? ask jst to sr
Absolutely. But *if* the encoder is returning null only when it can't new enough memory, then your strdup will fail as well (return null) and we will crash anyway. Even more, on *failure* case we should not return empty string. It's a failure, we should send nothing at all rather than empty (i.e. it should propagate all the way back to AddNameValuePair). Unless nsISaveCharset has a legitimate reason to return null, that code is redundant and should be put in the else case only. The underlying problem is that we don't currently check for null from EncodeVal() like we should. That is filed as bug 126941. There's no need to work around that problem if it only happens in the case of out-of-memory error.
Comment on attachment 70940 [details] [diff] [review] v3 of the patch. add nsCRT::strdup("") if it is null. >- // Get Charset, get the encoder. >- nsCOMPtr<nsICharsetConverterManager> ccm( >- do_GetService(kCharsetConverterManagerCID, &rv)); >+ nsAutoString charset(aCharset); >+ if(charset.Equals(NS_LITERAL_STRING("ISO-8859-1"))) >+ charset.Assign(NS_LITERAL_STRING("windows-1252")); >+ The above looks weird, but I assume it's intentional and XP safe. >+ rv = CallCreateInstance( NS_SAVEASCHARSET_CONTRACTID, aEncoder); Loose the space after '(' here for consistency with surrounding code. sr=jst
Attachment #70940 - Flags: superreview+
>- // Get Charset, get the encoder. >- nsCOMPtr<nsICharsetConverterManager> ccm( >- do_GetService(kCharsetConverterManagerCID, &rv)); >+ nsAutoString charset(aCharset); >+ if(charset.Equals(NS_LITERAL_STRING("ISO-8859-1"))) >+ charset.Assign(NS_LITERAL_STRING("windows-1252")); >+ > >The above looks weird, but I assume it's intentional and XP safe. That logic was there, not a new code. just look at the "-" below it. yes, it is xp and safe. >+ rv = CallCreateInstance( NS_SAVEASCHARSET_CONTRACTID, aEncoder); >Loose the space after '(' here for consistency with surrounding code. will do
a=asa (on behalf of drivers) for checkin to 0.9.9
Well, since this bug has "r,sr" and "a", will it be in 0.9.9 ? I cant use mozilla because of this bug.. :)
fixed and check in
Status: ASSIGNED → RESOLVED
Closed: 18 years ago → 18 years ago
Resolution: --- → FIXED
Verifying on windows 98 2002-03-01-03-trunk and linux Redhat build 2002-03-01-09-trunk
Status: RESOLVED → VERIFIED
*** Bug 129236 has been marked as a duplicate of this bug. ***
*** Bug 125034 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.