Closed Bug 616841 Opened 14 years ago Closed 14 years ago

String.localeCompare behaves incorrectly for international, uppercase and lowercase characters when called by an xpcom callback

Categories

(Core :: XPConnect, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla2.0b10
Tracking Status
blocking2.0 --- final+

People

(Reporter: JasnaPaka, Assigned: cjones)

References

Details

(Keywords: regression, Whiteboard: [hardblocker])

Attachments

(7 files, 1 obsolete file)

1) Install Test Pilot (https://addons.mozilla.org/en-US/firefox/addon/13661/) and Czech Spell Dictionary (https://addons.mozilla.org/en-US/firefox/addon/3394/)
2) Open list of extensions in Add-ons Manager.
3) In list of installed extensions you will see order:

* Test Pilot
* Český slovník na kontrolu pravopisu

It's wrong order. Right variant is:

* Český slovník na kontrolu pravopisu
* Test Pilot
Hm, we already had this as bug 565757, which has been fixed a while back. Has this problem be regressed recently?
I see this problem in latest FF 4.0 build. Maybe regression?
Yeah, it's a regression since beta 7. There aren't that many fixes on trunk since beta 7 has been released:

https://bugzilla.mozilla.org/buglist.cgi?resolution=FIXED&query_format=advanced&bug_status=RESOLVED&component=Add-ons%20Manager&target_milestone=mozilla2.0b8

I could only think of bug 614416, or probably a side-effect of the new final theme landed with the patch on bug 601022. I will check.
Keywords: regression
Definitely a regression from bug 614416.
Blocks: 614416
blocking2.0: --- → ?
I wonder why the automated test on bug 565757 hasn't failed.
Flags: in-testsuite?
blocking2.0: ? → final+
There is something odd going on. The test for bug 565757 is working, I can see when running it that the list is sorted correctly. Also strangely although I can reproduce this here, resorting the list by toggling the name sort fixes the ordering.
Attached patch broken codeSplinter Review
This seems to be some kind of JS bug. I added this test code which is called when sorting the lists. When I first open the add-ons manager it shows the following output:

"Český slovníky pro kontrolu pravopisu".localeCompare("DOM Inspector") = 200
"DOM Inspector".localeCompare("Český slovníky pro kontrolu pravopisu") = -200

After that any attempt to resort the list displays:

"Český slovníky pro kontrolu pravopisu".localeCompare("DOM Inspector") = -2
"DOM Inspector".localeCompare("Český slovníky pro kontrolu pravopisu") = 2

This is pretty broken but I haven't yet managed to narrow down why it is changing behaviour.
Narrowed this down. String.localeCompare is broken when called by a callback from the add-ons manager but works when called by a DOM event listener. In xpcshell it is broken all the time. This seems like an xpconnect/JS issue so going to move over there.
Component: Add-ons Manager → XPConnect
Product: Toolkit → Core
QA Contact: add-ons.manager → xpconnect
Attached patch testcasesSplinter Review
These are testcases that demonstrate the errors.
Summary: Wrong order in the list of extensions when some extension name begins with non-ascii char → String.localeCompare behaves incorrectly for international characters when called by an xpcom callback
If you're running on a Window, we set locale callbacks on the JSContext.  See the localeCallbacks struct in nsJSContext::nsJSContext.

I suppose we should do this for JS component contexts too in the component loader?  And perhaps for the safe js context?  I'm not sure which context we're actually entering script on in this case...
Assignee: nobody → bzbarsky
Priority: -- → P1
Blocks: 619288
Chris, this is the bug I mentioned.
Sigh, fun bug.  AFAICT the web+extension-visible things affected are
 - JS components
 - xpconnect "safe context" (not sure what is, just listing per comment 10)
 - Web Workers
 - OOP jetpacks
 - out-of-process TabChild global, for message manager
 - in-process "TabChild global"
 - maybe ctypes, not sure
 - (pretty sure the localeCompare on |window| is broken in content processes)

and we would probably want to fix
 - xpcshell
 - IPC testshell

The locale service claims to be threadsafe (do we want to find out? ;) ), so we can theoretically fix all of the above except the out-of-process stuff; jetpack processes don't have XPCOM, and I don't believe content processes have access to localization files.

In light of this, is it true that only fixing
 - JS components and/or "safe context"
 - xpcshell

blocks betaN?
Hmm.  Do these callbacks need access to localization files, or just access to intl stuff?
Quick skim suggests that they stay in intl's yard (not sure though).  With the workaround in bug 576507, this might bring localeCompare in content processes back into play.  OOP jetpack is still out of luck.
Attached patch WIP (obsolete) — Splinter Review
First crack.  This patch has the included test case pass in xpcshell.  To fix the other 100 broken callsites, we would just need to add a similar test for each and sprinkle xpc_LocalizeContext() dust near the JS_NewContext().

I understand the lifetime of objects here and organization of this code not a bit, much appreciate suggestions there.  The biggest potential problems with this patch seem to me to be
 - having XPConnect stuff depend on XPCOM stuff.  Don't know the rules here.
 - the intl stuff is lazily init'd for each context.  Depending on how early localeCompare is needed, the initing could maybe possibly fail or lead to circular dependencies, not sure.
 - this patch assumes xpc_LocalizeContext needs to be thread safe, for simplicity uses PR_CallOnce to achieve it.  With this impl, init will only be attempted once.  That might not be a bad thing, but if thread safety isn't needed, or multiple init attempts need to be made, the guards should change.
Assignee: bzbarsky → jones.chris.g
Attachment #499963 - Flags: feedback?(bzbarsky)
> - having XPConnect stuff depend on XPCOM stuff.  Don't know the rules here.

XPConnect depends on XPCOM, period.  That's its job.  Depending on particular components is theoretically supposed to be conditional on !XPCONNECT_STANDALONE, but in practice that's dead.  So that part is fine.

> the intl stuff is lazily init'd for each context.

That's fine; that's what we had before, right?

> I understand the lifetime of objects here not a bit

The current setup before your patch is basically that while a JSContext that references these callbacks is live we don't release the decoder and collation.  That guarantees that things don't die while the callbacks can be invoked.

I think the right thing to do is probably to change callsites that xpc_LocalizeContext to notify xpconnect when they JS_DestroyContext that context.  That way xpconnect can keep track of the number of outstanding contexts referencing the callbacks, and do a similar game where if we're shutting down and the number is 0 or the number goes to 0 after we have shut down we do the release.

> - this patch assumes xpc_LocalizeContext needs to be thread safe

You mean that the locale callbacks need to be threadsafe, right?

In practice, since a single gDecoder is used for all callers, and since the decoding process uses members of the decoder and is totally not reentrant, the setup in the attached patch is limited to only being called on one thread, right?  I don't know offhand that gCollation has the same issue, but I wouldn't be all that surprised if it keeps some stuff in members too.

I'm not sure whether it's best to store a decoder and collation per thread using XPConnect's per-thread-data stuff (complicates lifetime management!) or whether we can get away with aborting if !NS_IsMainThread and these callbacks are called...  Given the general behavior of intl, I wouldn't be surprised if we just needto require the latter, though.  But that would mean that using locale stuff in a JS component running off the main thread would immediately abort.  Or I guess we could just throw a JS exception if these callbacks are called on a non-main thread.  Brendan, thoughts?
Oh, if you meant having xpconnect depending on nsDOMClassInfo... yes, that's totally fine imo.
Attachment #499963 - Flags: feedback?(bzbarsky) → feedback+
(In reply to comment #18)
> > - having XPConnect stuff depend on XPCOM stuff.  Don't know the rules here.
> 
> XPConnect depends on XPCOM, period.  That's its job.  Depending on particular
> components is theoretically supposed to be conditional on
> !XPCONNECT_STANDALONE, but in practice that's dead.  So that part is fine.
> 

Yes, I meant a dependency on particular components.

> > the intl stuff is lazily init'd for each context.
> 
> That's fine; that's what we had before, right?
> 

For dom/base it's no different, yes.  But, in general, it's different.  I'm worried about a case where we might lazily "localize" a JSContext soon after InitXPCOM(), but then say soon thereafter we instantiate a JS component and during instantiation something uses localeCompare which creates some kind of instantiation cycle.  Or something.  Really, I just don't know the dangers here.  The WIP deals with failed intl setup, but never tries again if there's a failure.

> > - this patch assumes xpc_LocalizeContext needs to be thread safe
> 
> You mean that the locale callbacks need to be threadsafe, right?

Well, xpc_LocalizeContext itself and the lazy init code in the callbacks, which touches global state.  LocalizeContext itself is probably OK unless there's some JS request needed.  My patch is threadsafe wrt to lazy init.

> 
> In practice, since a single gDecoder is used for all callers, and since the
> decoding process uses members of the decoder and is totally not reentrant, the
> setup in the attached patch is limited to only being called on one thread,
> right?  I don't know offhand that gCollation has the same issue, but I wouldn't
> be all that surprised if it keeps some stuff in members too.
> 
> I'm not sure whether it's best to store a decoder and collation per thread
> using XPConnect's per-thread-data stuff (complicates lifetime management!) or
> whether we can get away with aborting if !NS_IsMainThread and these callbacks
> are called...  Given the general behavior of intl, I wouldn't be surprised if
> we just needto require the latter, though.  But that would mean that using
> locale stuff in a JS component running off the main thread would immediately
> abort.  Or I guess we could just throw a JS exception if these callbacks are
> called on a non-main thread.

I skimmed the necessary modules a couple weeks ago and recall them claiming to be threadsafe.  I don't know if that's true.  The locale callbacks do need to be threadsafe regardless of off-main-thread JS components because one class of broken JSContexts is those created for web workers.  But in general I guess we could leave JSContexts selectively broken.
> I'm worried about a case where we might lazily "localize" a JSContext soon
> after InitXPCOM(), but then say soon thereafter we instantiate a JS component
> and during instantiation something uses localeCompare which creates some kind
> of instantiation cycle.

Right; so the worry is basically that some JS component will try to use the locale stuff before things are set up enough that intl components can be instantiated?

> I skimmed the necessary modules a couple weeks ago and recall them claiming to
> be threadsafe.

Really?  I don't see how any single unicode decoder can possibly be threadsafe... nsUTFToUnicode, for example, certainly isn't.

> The locale callbacks do need to be threadsafe

Then they probably need to either use a per-thread decoder or lock around the decoder use, at least....  I dunno about collation, like I said.

But yes, we could also just not worry about workers for now and leave them broken.  I'm not sure whether that's better than having the locale stuff throw in workers, though...
> nsUTFToUnicode, for example

I meant nsUTF8ToUnicode
(In reply to comment #21)
> > I'm worried about a case where we might lazily "localize" a JSContext soon
> > after InitXPCOM(), but then say soon thereafter we instantiate a JS component
> > and during instantiation something uses localeCompare which creates some kind
> > of instantiation cycle.
> 
> Right; so the worry is basically that some JS component will try to use the
> locale stuff before things are set up enough that intl components can be
> instantiated?
> 

Yes.

> > I skimmed the necessary modules a couple weeks ago and recall them claiming to
> > be threadsafe.
> 
> Really?  I don't see how any single unicode decoder can possibly be
> threadsafe... nsUTFToUnicode, for example, certainly isn't.
> 

Oh geez, I've been on another planet ... I looked at the modules needed to get to the compare+convert objects, I didn't even think about those themselves being non-pure.  And of course they might be implemented by extensions &c I guess.  Sigh.

Isn't the JS interface pure though?  Are we missing Reset() calls in here?  Anyway, more of an aside.

> > The locale callbacks do need to be threadsafe
> 
> Then they probably need to either use a per-thread decoder or lock around the
> decoder use, at least....  I dunno about collation, like I said.

I guess we could try per-thread, but I don't know how well that would work in full XPCOM generality.  (Too late to back that out? ;) )  Locking would leave us to open to easy unintentional DoS.

> But yes, we could also just not worry about workers for now and leave them
> broken.  I'm not sure whether that's better than having the locale stuff throw
> in workers, though...

I'm OK with trying a per-thread impl and seeing what blows up.
> Isn't the JS interface pure though? 

Yes.

> I'm OK with trying a per-thread impl and seeing what blows up.

OK, let's do that.  We just have to make sure that the init and destroy for the JSContext happen on the same thread, then, so we can refcount each thread's stuff individually or something....  The other option would be to try to hang a pointer to the right things to refcount off the jscontext itself, but that might take too much surgery.
Actually, I just realized that workers aren't tied to a particular thread.  So that could throw a wrench in the works....

If the collation stuff is really threadsafe, we could also fix localeCompare in general but leave toLocaleString broken on non-main threads.  And file bugs on intl to make it easy (as in fast) to stack-instantiate a Unicode decoder; at that point we could just do it on every call or something.
Er, unicode encoder.  But same deal!
(In reply to comment #24)
> The other option would be to try to hang a
> pointer to the right things to refcount off the jscontext itself, but that
> might take too much surgery.

What's your worry here?

My current (possibly somewhat evil) thought is to
 - subclass |struct JSLocaleCallbacks|, add decoder and collation members
 - heap-allocate this subclass and init members from xpc_LocalizeContext
 - |JS_SetLocaleCallbacks(subclassInstance)|
 - set up a context callback that listens for context destruction
 - delete the subclass instance on destruction

The nice thing about this is there's nothing to refcount (well, nontrivially anyway).  Bad things are
 - unsure about JSContext destruction semantics; callback might be invoked at awkward time
 - depends on context callback not being stomped (i.e. chained); dunno if current users maintain that discipline
 - decoder/collation can be called on arbitrary number of threads, though not concurrently or reentrantly AFAIK

Maybe there's a more "official" of setting per-context data?  I'll poke at this too.

> If the collation stuff is really threadsafe, we could also fix localeCompare in
> general but leave toLocaleString broken on non-main threads.  And file bugs on
> intl to make it easy (as in fast) to stack-instantiate a Unicode decoder; at
> that point we could just do it on every call or something.

Yes, fast stack allocation of these would solve all problems.
> What's your worry here?

There wasn't an obvious worry; just wasn't sure how easy it is to hang things off the jscontext.

Also, that setup involves allocating a new collation and decoder for each window... but I think that should be ok.

> callback might be invoked at awkward time

in what sense?

> depends on context callback not being stomped

Worth checking.

> decoder/collation can be called on arbitrary number of threads

I _think_ that this should be ok....  They have threadsafe refcounting, and as long as you don't reenter things are fine.

> Maybe there's a more "official" of setting per-context data? 

Contexts can have a private pointer, but that's already in use in various cases (e.g. for windows the private pointer points to the nsJSContext).
(In reply to comment #28)
> > What's your worry here?
>
> Also, that setup involves allocating a new collation and decoder for each
> window... but I think that should be ok.
>

And worker etc.  Yeah, they seem small-ish.
 
> > callback might be invoked at awkward time
> 
> in what sense?
> 

I don't know what the contract for JS_DestroyContext() and the context callback is.  If it and the callback can be legally called in the middle of operations, then deleting the decoder/collation might cause bad things to happen.

> > decoder/collation can be called on arbitrary number of threads
> 
> I _think_ that this should be ok....  They have threadsafe refcounting, and as
> long as you don't reenter things are fine.
> 

I see that nsCollationUnix has an nsCollation which has an nsIUnicodeEncoder member, and nsUnicodeToUTF8 at least isn't IMPL_THREADSAFE.  Can't really tell why though; doesn't look materially different than nsUTF8ToUnicode.  Maybe other encoders have issues?  Could feel this out on tryserver+nightlies maybe.

> > Maybe there's a more "official" of setting per-context data? 
> 
> Contexts can have a private pointer, but that's already in use in various cases
> (e.g. for windows the private pointer points to the nsJSContext).

OK.
bent tells me that JSContexts created for Workers are currently tied to a single thread, although soon the contexts might be switched to being destroyed only on the main thread.

If we can assume that contexts are usually parked on particular threads, we should be able to hang the collation+decoder off JSContexts with no issues.  bent also points out JS_SetContextThread(), but the only interesting m-c caller is ctypes and I'm not sure we care about locale callbacks there.

Maybe we can xpc_LocalizeContext() Worker contexts for now, with the above approach, and if their destruction model changes, resume this chat in that bug?
> I don't know what the contract for JS_DestroyContext()

Brendan?  You listening?

> and nsUnicodeToUTF8 at least isn't IMPL_THREADSAFE.

:(

> Maybe we can xpc_LocalizeContext() Worker contexts for now, with the above
> approach, and if their destruction model changes, resume this chat in that
> bug?

That seems like the right way forward, esp. if we add some asserts that would fail if the model changes.
(In reply to comment #31)

> > Maybe we can xpc_LocalizeContext() Worker contexts for now, with the above
> > approach, and if their destruction model changes, resume this chat in that
> > bug?
> 
> That seems like the right way forward, esp. if we add some asserts that would
> fail if the model changes.

Better believe it! ;)  nsUnicodeToUTF8 would catch it too.
Re: comment 29:

> I don't know what the contract for JS_DestroyContext() and the context callback
> is.  If it and the callback can be legally called in the middle of operations,
> then deleting the decoder/collation might cause bad things to happen.

What do you mean by "in the middle of operations"? You cannot destroy a context you're running script on -- hope that's obvious from the mandatory ownership (New and Destroy API verbs) model.

/be
Well, let me be more specific.  My patch does approximately

  ContextCallback(cx, op):
    if op == JSCONTEXT_DESTROY:
      JSLocaleCallbacks* lc = JS_GetLocaleCallbacks(cx);
      JS_SetLocaleCallbacks(cx, nsnull);
      delete static_cast<FooGummy*>(lc);

Is that OK?
Looks like workers are probably out for FF4, unless anyone has other ideas:

###!!! ASSERTION: nsPosixLocale not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /home/cjones/mozilla/mozilla-central/intl/locale/src/unix/nsPosixLocale.cpp, line 48
nsPosixLocale::Release() (/home/cjones/mozilla/mozilla-central/intl/locale/src/unix/nsPosixLocale.cpp:48)
~nsCOMPtr (/home/cjones/mozilla/ff-dbg/intl/locale/src/unix/../../../../dist/include/nsCOMPtr.h:534)
nsCollationUnix::Initialize(nsILocale*) (/home/cjones/mozilla/mozilla-central/intl/locale/src/unix/nsCollationUnix.cpp:144)
nsCollationFactory::CreateCollation(nsILocale*, nsICollation**) (/home/cjones/mozilla/mozilla-central/intl/locale/src/nsCollation.cpp:67)
XPCLocaleCallbacks::Compare(JSContext*, JSString*, JSString*, jsval_layout*) (/home/cjones/mozilla/mozilla-central/js/src/xpconnect/src/xpclocale.cpp:241)
XPCLocaleCallbacks::LocaleCompare(JSContext*, JSString*, JSString*, jsval_layout*) (/home/cjones/mozilla/mozilla-central/js/src/xpconnect/src/xpclocale.cpp:127)
str_localeCompare (/home/cjones/mozilla/mozilla-central/js/src/jsstr.cpp:1008)
js::CallJSNative(JSContext*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, js::Value*) (/home/cjones/mozilla/mozilla-central/js/src/jscntxtinlines.h:685)
CallCompiler::generateNativeStub() (/home/cjones/mozilla/mozilla-central/js/src/methodjit/MonoIC.cpp:691)
js::mjit::ic::NativeCall(js::VMFrame&, js::mjit::ic::CallICInfo*) (/home/cjones/mozilla/mozilla-central/js/src/methodjit/MonoIC.cpp:898)
Not sure if I need to get the "safe context" here too, but this fixes Mossop's tests.
Attachment #501261 - Flags: review?(bzbarsky)
Patch makes the test work, but also trips threadsafety assertions pasted above.  Maybe we should spin this off into another bug?  Not sure that'd be a 4.0 blocker.
Comment on attachment 501263 [details] [diff] [review]
part 4: "Localize" the IPC testshell JSContext

Looks good to me.
Attachment #501263 - Flags: review?(bent.mozilla) → review+
Comment on attachment 501260 [details] [diff] [review]
part 1: Refactor use of JSLocaleCallbacks to make them easier to share among various JSContexts

In the documentation for This() when talking about binding to the calling thread, you need s/MaybeThis/This/.

>+  nsAutoPtr<XPCLocaleCallbacks> lc(new XPCLocaleCallbacks());
>+  JS_SetLocaleCallbacks(cx, lc.forget());

Why do you need the |lc| temporary?  Why not just do:

  JS_SetLocaleCallbacks(cx, new XPCLocaleCallbacks());

?

r=me with those nits.
Attachment #501260 - Flags: review?(bzbarsky) → review+
Comment on attachment 501261 [details] [diff] [review]
part 2: "Localize" the JS component JSContext

r=me

You probably don't need to mess with the safe jscontext.  Or so I hope....
Attachment #501261 - Flags: review?(bzbarsky) → review+
> In the documentation for This() when talking about binding to the calling
> thread, you need s/MaybeThis/This/.

Done.

> >+  nsAutoPtr<XPCLocaleCallbacks> lc(new XPCLocaleCallbacks());
> >+  JS_SetLocaleCallbacks(cx, lc.forget());
> 
> Why do you need the |lc| temporary?  Why not just do:
> 
>   JS_SetLocaleCallbacks(cx, new XPCLocaleCallbacks());

Hm not sure what I had in mind there.  Fixed.
Whiteboard: [hardblocker]
Attachment #501260 - Flags: superreview?(mrbkap) → superreview+
Summary: String.localeCompare behaves incorrectly for international characters when called by an xpcom callback → String.localeCompare behaves incorrectly for international, uppercase and lowercase characters when called by an xpcom callback
(In reply to comment #38)
> Created attachment 501262 [details] [diff] [review]
> part 3: "Localize" the messageManager JSContexts

Review ping.  (Two-liner! ;) )
Attachment #501262 - Flags: review?(Olli.Pettay) → review+
I was not CC'ed so pinging didn't help.
I believe this broke disable-libxul builds

Undefined symbols:
  "xpc_LocalizeContext(JSContext*)", referenced from:
      _main in xpcshell.o
ld: symbol(s) not found
Depends on: 626138
This broke non-libxul builds, I filed bug 626138 for that.
Verified fixed with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b10pre) Gecko/20110116 Firefox/4.0b10pre. Tested the steps in comment 0 and non-capital first letters.

Given the automated test I don't think we need a manual test. Is that correct Chris?
Status: RESOLVED → VERIFIED
Flags: in-testsuite?
Flags: in-testsuite+
Flags: in-litmus-
Target Milestone: --- → mozilla2.0b10
I don't know of any reason to add a manual test.
Blocks: 903780
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: