Closed
Bug 203179
Opened 21 years ago
Closed 21 years ago
String.prototype.localeCompare() not sorting according to locale
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bmo, Unassigned)
References
()
Details
Attachments
(1 file, 3 obsolete files)
8.88 KB,
patch
|
smontagu
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4a) Gecko/20030401 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4a) Gecko/20030401 The sample page demonstrates sorting using localeCompare which doesn't sort correctly in Mozilla. It does however sort correctly in IE 6. I first found this bug in code which is sorting an array of Japanese strings (Japanese Windows 2000), and found that it doesn't do a locale comparison there either, although IE6 does. Reproducible: Always Steps to Reproduce: 1. go to the URL 2. 'a' and 'frisé' should both come before 'Mission'
Comment 1•21 years ago
|
||
Confirming report using Mozilla trunk binary 2003042308 on WinNT. Compare Moz and IE6 on these examples: 'a'.localeCompare('Mission') Moz ---> 20 IE6 ---> -1 'a'.localeCompare('a') Moz ---> 0 IE6 ---> 0 'a'.localeCompare('A') Moz ---> 32 IE6 ---> -1 'A'.localeCompare('a') Moz ---> -32 IE6 ---> 1 Note this means in IE6, localeCompare() puts 'a' before 'A', whereas the canonical string comparison does the reverse. In Mozilla, we get the same result for either comparison: function f(x,y) {return x.localeCompare(y);}; IE6 ['a', 'A'].sort() ---> 'A', 'a' ['a', 'A'].sort(f) ---> 'a', 'A' Moz ['a', 'A'].sort() ---> 'A', 'a' ['a', 'A'].sort(f) ---> 'A', 'a'
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: localeCompare not sorting according to locale → String.prototype.localeCompare() not sorting according to locale
Comment 2•21 years ago
|
||
From the ECMA-262 Edition 3 Final spec at http://www.mozilla.org/js/language/E262-3.pdf: 15.5.4.9 String.prototype.localeCompare (that) When the localeCompare method is called with one argument |that|, it returns a number other than NaN that represents the result of a locale-sensitive string comparison of |this| object (converted to a string) with |that| (converted to a string). The two strings are compared in an implementation-defined fashion. The result is intended to order strings in the sort order specified by the system default locale, and will be negative, zero, or positive, depending on whether |this| comes before |that| in the sort order, the strings are equal, or |this| comes after |that| in the sort order, respectively. The actual return values are left implementation-defined to permit implementers to encode additional information in the result value, but the function is required to define a total ordering on all strings and to return 0 when comparing two strings that are considered canonically equivalent by the Unicode standard. NOTE: This function is intended to rely on whatever language-sensitive comparison functionality is available to the ECMAScript environment from the host environment, and to compare according to the rules of the host environment’s current locale. It is strongly recommended that this function treat strings that are canonically equivalent according to the Unicode standard as identical (i.e. compare the strings as if they had both been converted to Normalised Form C or D first). It is also recommended that this function not honour Unicode compatibility equivalences or decompositions. If no language-sensitive comparison at all is available from the host environment, this function may perform a bitwise comparison.
Comment 3•21 years ago
|
||
For convenience: charCodeAt(0) Unicode value (hex) 'A' 65 0041 'a' 97 0061
Comment 4•21 years ago
|
||
Also: charCodeAt(0) Unicode value (hex) 'M' 77 004D 'a' 97 0061 So both Moz and IE6 will put 'M' before 'a' in the canonical string comparison, hence 'Mission' before 'a'; as can be seen via these javascript:URLs javascript: alert('M' < 'a') ---> true javascript: alert(['a', 'Mission'].sort()) ---> 'Mission', 'a' But as Brodie notes, in IE6, String.prototype.localeCompare() puts 'a' before 'M'.
Comment 5•21 years ago
|
||
I looked on the web and found a similar report here: http://www.geocrawler.com/archives/3/113/2001/7/0/6213846/ Message: 6213846 FROM: Alex Iliev news DATE: 07/19/2001 08:49:10 SUBJECT: JavaScript localeCompare() Hello, perhaps someone can help me with a problem using localeCompare() in Mozilla (Gecko/20010713 on LinuxPPC) - it does not give me correct results in the German language, in particular when 'ß' (sz) is involved. I set <html lang="de-DE"> in the page. still, 'ß' seems to be taken as a single letter higher in the collation order than 'z'. Am I specifying the language incorrectly, or is there some problem with the implementation? Thanks very much, Alex
Comment 6•21 years ago
|
||
Spidermonkey contains a function, JS_SetLocaleCallbacks, that's used to set up the locale functions used to handle localeCompare, as well as toLocaleUpperCase and toLocaleLowerCase. In other words, it's the job of the embedding application, not the JS Engine, to get this right. But according to http://lxr.mozilla.org/seamonkey/ident?i=JS_SetLocaleCallbacks, there's nothing in the Mozilla Browser code that does this. The component for this bug needs to be changed, but aside from Browser-General, I'm not sure where it should go. --scole
well DOM is the spidermonkey host, so i'll push it that way, someone can poke it from there.
Assignee: khanson → dom_bugs
Component: JavaScript Engine → DOM Core
QA Contact: pschwartau → desale
Comment 8•21 years ago
|
||
This hooks up locale callbacks in the JS engine, but it doesn't seem like we have any locale aware compare methods in Mozilla, AFAIK, so this still doesn't fix the problem...
Updated•21 years ago
|
Attachment #126028 -
Attachment description: Hook up locale callbacks, though no local aware functionality yet. → Hook up locale callbacks, though no locale aware functionality yet.
Comment 9•21 years ago
|
||
Dunno about upper/lower case but nsICollation has a CompareString method...
Comment 10•21 years ago
|
||
Comment 11•21 years ago
|
||
Comment on attachment 126195 [details] [diff] [review] locale aware string compare using nsICollation >+ PRBool result; >+ rv = gCollation->CompareString(kCollationStrengthDefault, str1, str2, >+ &result); Last time I looked, result was a PRInt32 (same as strcmp).
Comment 12•21 years ago
|
||
CompareString has a performance issue (bug 195008), so please be carefull if this is performance critical.
Comment 13•21 years ago
|
||
Naoki, this is something that's extremely rarely used on webpages, so I think it's fine to have non-optimal performance here until that bug is fixed.
Comment 14•21 years ago
|
||
Attachment #126028 -
Attachment is obsolete: true
Attachment #126195 -
Attachment is obsolete: true
Updated•21 years ago
|
Attachment #126320 -
Flags: superreview?(bzbarsky)
Attachment #126320 -
Flags: review?(nhotta)
Comment 15•21 years ago
|
||
Comment on attachment 126320 [details] [diff] [review] Same as above, but clean up gCollation on shutdown. Bleh, forgot the PRBool -> PRInt32 change...
Attachment #126320 -
Attachment is obsolete: true
Attachment #126320 -
Flags: superreview?(bzbarsky)
Attachment #126320 -
Flags: review?(nhotta)
Comment 16•21 years ago
|
||
Updated•21 years ago
|
Attachment #126323 -
Flags: superreview?(bzbarsky)
Attachment #126323 -
Flags: review?(nhotta)
Comment 17•21 years ago
|
||
Comment on attachment 126323 [details] [diff] [review] Proposed fix. Is there a reason for removing the initializers on the statics? I know they should be 0 anyway, but the code as written makes that clearer... Also, why use the CID instead of the contractid for the nsICollationFactory ? In any case, sr=me with either way on those two questions.
Attachment #126323 -
Flags: superreview?(bzbarsky) → superreview+
Comment 18•21 years ago
|
||
Um, there is no contract id for the collation factory (AFAICT). I put back the initializers for the statics, the reason I took them out was to make things consistent, now I went the other way in stead, and added more of them...
Comment 19•21 years ago
|
||
Ugh. Ok, no contractid means you can't use one.. ;)
Comment 20•21 years ago
|
||
Comment on attachment 126323 [details] [diff] [review] Proposed fix. >+// Thank you Microsoft! >+#ifdef CompareString >+#undef CompareString >+#endif Sigh. I hate to see another of these go into the tree, but what else can you do? >+static JSBool >+ChangeCase(JSContext *cx, JSString *src, jsval *rval, >+ void(* changeCaseFnc)(const nsAString&, nsAString&)) >+{ >+ nsDependentString str(NS_REINTERPRET_CAST(const PRUnichar *, >+ ::JS_GetStringChars(src)), >+ ::JS_GetStringLength(src)); Nit: regularize the indentation here r=smontagu
Attachment #126323 -
Flags: review?(nhotta) → review+
Comment 22•21 years ago
|
||
Any toolchain that doesn't ensure uninitialized global (extern/static) variables are zero-initialized is busted. But last I checked (a while ago), putting = 0; all over the place caused explicitly-zero-initialized variables to be put in .data (initialized data), not in .bss (blank starting store), taking up space in the object file and time loading (more time than asking the kernel for a pre-zeroed page). So I'm generally against explicit zero initialization of globals. Could the #undef CompareString jazz go into a %{C++ section in the IDL file? /be
Comment 23•21 years ago
|
||
Unfortunately putting the #undef in the .idl doesn't really fix the problem, it only fixes it in the cases where the .idl file (or generated .h file) is #included after the file that pulls in MS' #define of CompareString (as was the case in this patch). This sucks rocks, but I don't think we can do much better than what we do now, thanks to MS. I'll pull out those initializers again from all the statics. I don't care either way, as long as we're consistent.
Comment 24•21 years ago
|
||
The #undef can still win in the .idl file if we follow the rule that system headers (those included via <...> notation) are always included before local ("...") ones. We seem to follow that rule generally in code I care about. It's an old bit of folk wisdom in Unix circles (whence C came). If it helps consolidate the #undef next time there's a conflict, let's do it, rather than copy the #undef all over. /be
Comment 25•21 years ago
|
||
Yeah, if there is a perf hit for having the initializers, remove them.
Comment 26•21 years ago
|
||
The problem here is that the header files that pull in the #define CompareString is included through other mozilla #includes, in this case, either some of the nspr headers, nsILiveConnect, or nsIJVMManager. I don't have a problem putting the #undef in the idl, but I still don't think it'll save us much...
Comment 27•21 years ago
|
||
In this case, it's nsIJVMManager.h that pulls in whatever header (through jni.h) that #defines CompareString.
You need to log in
before you can comment on or make changes to this bug.
Description
•