Closed Bug 454435 Opened 11 years ago Closed 10 years ago

RefillDoubleFreeList runtime locking causes poor performance on parallel ray tracing

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: bent.mozilla, Assigned: igor)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 6 obsolete files)

I picked up vlad's parallel ray tracing demo yesterday and realized that the runtime locking issue in RefillDoubleFreeList still exists. On a multicore machine we get some performance wins using multiple threads but hardly the win we'd hope for.

Here's some suggestions Igor had:

> Another suggestion is to increase the free list size for double from
> the current maximum of 32 values to n*32 where n is at least 2. Yet
> another possibility is to use compare-and-swap to implement lock-free
> grabbing of allocation bitmap word that is used to build the
> context-private list. This way the lock will be taken only when
> allocating a new GC arena holding about 500 doubles.
Bah, my CC list disappeared somehow.
This may be a bit late, if not heretical: why don't we use a separate JSRuntime per thread, and turn off JS_THREADSAFE?

We would need to do something to jsdtoa.cpp to get rid of its global locks, or turn them on without JS_THREADSAFE if they are not showing up on profiles.

We'd get perf wins on all threads from the lack of JS_THREADSAFE, and we would get better isolation and faster GCs on workers (and on the main thread, lesser issue with its bigger heap).

/be
I have a patch in progress which does just that. There are some issues with script-blocking in the DOM that I may need help with. See http://hg.mozilla.org/users/bsmedberg_mozilla.com/xpcthreading , which appears to be misconfigured somehow at the moment.
I'm all for it, though it'd be perhaps too big an API incompatibility to spring on our extension authors for 3.1?  I wonder if any of them are still left using multiple threads after the ThreadManager punking, though, so maybe now is indeed the time.  Worth reaching out to extension and app folks to see whether they need shared objects for their multi-threading needs.  (Copying Blizzard here to that end.)

With a separate runtime (and therefore object pool) per thread, we would also be able to just rid ourselves of the XPConnect monitor, long an ambition of ours. :)
Yeah, I ripped out all the xpconnect locks and monitors.
The runtime-per-thread idea was floated at the start of the workerthreads stuff but I think everyone was concerned about the added costs of copying strings across runtimes... Not a big deal for passing simple messages across workers but it might be for large blobs of JSON'd objects. Is that still a concern?
My guess is that the cost of copying string data will be much lower than the pervasive locking costs we pay today.
If it becomes a concern, we can probably use the external-string facility to special-case string handling.  memcpy of string data should be dwarfed by the work done by the threads themselves, I would sort of hope, and perhaps even offset completely by the locking wins.  Brendan believes that crossing between threads *should* hurt, and he's convinced me of that path's righteousness.
(In reply to comment #8)
> If it becomes a concern, we can probably use the external-string facility to
> special-case string handling.  memcpy of string data should be dwarfed by the
> work done by the threads themselves, I would sort of hope, and perhaps even
> offset completely by the locking wins.

I think that a bigger issue might be memory usage more than the memcpy itself; but as you say, it can be avoided esp. just for strings.  It does make it harder to have some future world where real JS objects can be passed around (chrome) threads, but maybe that world is terrifying to begin with..
(In reply to comment #6)
> The runtime-per-thread idea was floated at the start of the workerthreads stuff
> but I think everyone was concerned about the added costs of copying strings
> across runtimes... Not a big deal for passing simple messages across workers
> but it might be for large blobs of JSON'd objects. Is that still a concern?

Let's have that problem.

Vlad: we were never gonna multi-thread Gecko, so I don't know what future world that was. A dystopian comic-book one where everone dies, even the plucky comic relief, probably.

/be
Hooray! I'm sure we can figure out how to share string data or even more complex stuff if/when that becomes important.
FWIW I suspect that future versions of the Worker spec will allow passing Blob objects across workers. However this might actually be a good thing since Blobs are read-only and so can easily be made thread safe and shared across threads.

Further, if we allow JSON objects to be passed across workers we can serialize directly to a shared string buffer that can be passed across threads without copying.
(In reply to comment #9)
> I think that a bigger issue might be memory usage more than the memcpy itself;
> but as you say, it can be avoided esp. just for strings.  It does make it
> harder to have some future world where real JS objects can be passed around
> (chrome) threads, but maybe that world is terrifying to begin with..

(In reply to comment #11)
> Hooray! I'm sure we can figure out how to share string data or even more
> complex stuff if/when that becomes important.
Right now actually.  Over in Bug 450290 I'm using the thread manager to do work in JS on a background thread.  I do believe the code could be refactored such that this is OK, but the tests are a bit more difficult.

Additionally, we plan on doing more work in the background for places so writes don't block the UI whenever possible.  We haven't fully spec'd it out yet, but not being able to pass objects around the different threads makes things more difficult to do.


Would it still be safe to create, in JS, an object that implements nsIRunnable on one thread, and then it's ran on another thread?
(In reply to comment #13)
> Would it still be safe to create, in JS, an object that implements nsIRunnable
> on one thread, and then it's ran on another thread?

No.

However, if you can refactor your code to avoid sharing objects, you will probably find it easier to maintain in the long run.
(In reply to comment #14)
> No.
So, that means anybody who wants to use the thread manager in JS to create a background thread cannot anymore, right?  There are very good reasons to do this, so I don't see why we'd want to take that away now.

> However, if you can refactor your code to avoid sharing objects, you will
> probably find it easier to maintain in the long run.
While this is generally true, in the current use case it isn't.  I'm sharing a variable that stores a wrapped native mozIStorageConnection.  Having to get it for each thread would just be annoying.
(In reply to comment #15)
> I'm sharing a
> variable that stores a wrapped native mozIStorageConnection.  

Why?
(In reply to comment #16)
> Why?
I define a smart getter on the main thread that is used on the background thread.  Regardless though, this change would break my code because I create objects that implement nsIRunnable on the main thread and dispatch them to my background thread.
Ok, so, sorry to throw another wrench in here, but sicking realized that I actually lied earlier when I said that worker threads only pass strings across threads...

Workers are run on a thread pool currently. Each thread that gets created by the pool is given its own context. When it's time to schedule a worker we assign the worker's global object to the context, run the worker, and then clear the global from the context so that another worker may reuse the thread. If the thread is busy then the pool will create a new thread (up to some magic limit) and run on that one. A worker itself is only ever running on one thread at a time, but it may be a different threads each time it runs. This lets us limit the number of OS threads that are created while also allowing unlimited workers.

It should be possible to make a new runtime per worker (although this sounds expensive and wasteful), but even that would cause problems because the workers use xpconnect to reflect XHR, (a special flavor of) DOM events, and listeners. If xpconnect has its own runtime then I couldn't use it any longer (or I'd have to do some cross-runtime-proxy calls)...
I think we can have an xpconnect per worker, and just run that script on whatever physical thread is available. I'll need to do some rework and thinking, because all objects associated with an xpconnect will need to be able to get back to it via a pointer chain, rather than using nsXPConnect::GetXPConnect()
I'm pretty sure we can get you from JSObject* to JSRuntime* in O(1), using something similar to what we do for mapping JSGCThing* to the GC flags.
Yeah, I found JS_SetRuntimePrivate and I'm going to set that to the matching nsXPConnect*.
Oh, right, I left out Vlad and Roc's suggestion that XPConnect be somehow per-runtime rather than per-thread.
So has anyone looked into what extensions might actually be bitten by this work? I know there's extensions out there that use threads (don't know which ones though), I know Songbird uses threads, and who knows what other apps. Having some sort of idea of the impact of this change for those would seem to be a requirement before we even consider taking a change like this, IMO.
Benjamin: If you need to go from JSObject* to JSRuntime*, without any cx param, we should generalize and rename js_GetGCStringRuntime.

Johnny's right, this is a big shift, and we do have MT JS API using plugin authors out there (see bug 424376, but I can't find the reference to the outfit whose plugin we broke -- Igor, do you remember?).

We need a way for such platform customers to cope. One step forward would be to leave JS_THREADSAFE on but still use a runtime per worker. The main thread and any plugin-created threads would share the main (currently only) runtime used by main-thread XPConnect. Do-able?

With enough hand-holding and bridging help such plugins might migrate to a world where JS_THREADSAFE is off. Can we do this in 3.1 timeframe? I don't think so. We could jump to Mozilla 2 at any time and declare API shenanigans, but really: we need a plan and hand-holders.

/be
Could we maybe get a post in dev.platform about this since it is not going to get a lot of exposure sitting in a bug?
So having one XPConnect per runtime only helps if we create one runtime per worker, no? This is certainly something we could do, but it would be a fairly costly solution. Especially considering that the number of workers is controlled by the page and we can't really set any reasonably low cap on it. It would also negate the current limits we have on how much memory JS is allowed to eat since that limit is per runtime.
Why is a runtime per-worker costly? I was imagining that it would be less costly, because you have smaller heaps to walk. Is the dynamic footprint of a JSRuntime larger than I imagine?

We'd certainly have to think about how resource limits were handled.
(In reply to comment #24)
> We need a way for such platform customers to cope.

Wild and crazy idea: enable JS_THREADSAFE dynamically for those customers?
Blocks: ST-XPConnect
I've cloned bug 455548 to discuss single-threading xpconnect, so that this bug can just be about RefillDoubleFreeList if necessary.
(In reply to comment #30)
> Does
> http://crash-stats.mozilla.com/report/index/cace737a-846a-11dd-a334-001cc45a2c28
> Have any relevance to this bug.

No.
The test is for optimized thread-safe js-shell and measures the performance of running a function N times sequentially versus running it via scatter in parallel. As the function the test first uses code with a loop that creates a double on each iteration and then it uses a similar code that involves only int calls. By default N is set to 4 and should not exceed the number of cores on the system.

Here is results of a typical run on 4-core 2.6 GHz Xenon CPU under 64-bit Ubuntu Linux (no jit):

~/m/31-ff/js/src $ ~/m/31-ff/js/src/Linux_All_OPT.OBJ/js ~/s/b.js 2
2 - number of threads
double case
62.5 - average time for uncontested case
120 - scatter timing
1.92 - overhead
int case
51.5 - average time for uncontested case
81 - scatter timing
1.5728155339805825 - overhead

~/m/31-ff/js/src $ ~/m/31-ff/js/src/Linux_All_OPT.OBJ/js ~/s/b.js 3
3 - number of threads
double case
62.333333333333336 - average time for uncontested case
192 - scatter timing
3.0802139037433154 - overhead
int case
51.666666666666664 - average time for uncontested case
153 - scatter timing
2.9612903225806453 - overhead

~/m/31-ff/js/src $ ~/m/31-ff/js/src/Linux_All_OPT.OBJ/js ~/s/b.js 4
4 - number of threads
double case
63.25 - average time for uncontested case
285 - scatter timing
4.5059288537549405 - overhead
int case
51.25 - average time for uncontested case
214 - scatter timing
4.175609756097561 - overhead

Here the overhead is the scatter timing divided by the average timing of a sequential case. With the ideal parallelism it should be 1, but the real world is far from the ideal.

Already with 3 threads the overhead of the scatter case is about 3 and with 4 threads it exceeds 4 for both double and int cases. So according to the results the scatter may speedup things only with 2 threads, and even there it make sense only for ints, as the overhead of double allocation offsets any wins from the parallel execution.
The previous test case is not good as it measures the max scatter timing, not the average one. This penalizes the parallel execution time. The new test fixes that and improves the reporting. Here is the results of the typical run on 2.66 4-Core Xenon CPU under 64-bit Ubuntu, no jit:

~/m/31-ff/js/src $ ~/m/31-ff/js/src/Linux_All_OPT.OBJ/js ~/s/b.js
Best timing of 10 scatter runs:
double case
threads=1:      62.0
threads=2:      84.0    (overhead 1.35)
threads=3:      122.3   (overhead 1.97)
threads=4:      169.8   (overhead 2.74)
int case
threads=1:      50.0
threads=2:      59.5    (overhead 1.19)
threads=3:      93.7    (overhead 1.87)
threads=4:      131.5   (overhead 2.63)

Here again both int and double cases show substantial slowdown compared with the sequential case even with 2 threads. But the double case slows down more then the int case (1.35 versus 1.19 as an overhead).
Attachment #339752 - Attachment is obsolete: true
So I don't think we can conclude much from these numbers yet because of bug 419086. Here's the shark profile of the int case with 4 threads on my 2xdual-core machine:

42.2% pthread_mutex_lock  
42.1%  PR_Lock  
24.9%   JS_BeginRequest 
24.9%    JS_ResumeRequest 
24.9%     JS_YieldRequest 
24.9%      my_BranchCallback(JSContext*, JSScript*) 
24.9%       js_ResetOperationCount  
24.9%        js_Interpret 
24.9%         js_Invoke 
24.9%          js_InternalInvoke  
24.9%           JS_CallFunctionValue  
24.9%            DoScatteredWork(JSContext*, ScatterThreadData*)  
18.7%             RunScatterThread(void*) 
18.7%              _pt_root 
18.7%                thread_start
With the patch from bug 419086 I get there numbers on my 2xCore2 machine:

Best timing of 10 scatter runs:

double case
threads=1:  192.0
threads=2:  344.5   (overhead 1.79)
threads=3:  670.7   (overhead 3.49)
threads=4:  1002.5  (overhead 5.22)

int case
threads=1:  137.0
threads=2:  179.0   (overhead 1.31)
threads=3:  209.3   (overhead 1.53)
threads=4:  231.2   (overhead 1.69)
And without the patch:

double case
threads=1:  203.0
threads=2:  440.5   (overhead 2.17)
threads=3:  1098.7  (overhead 5.41)
threads=4:  1752.5  (overhead 8.63)

int case
threads=1:  159.0
threads=2:  298.0   (overhead 1.87)
threads=3:  420.0   (overhead 2.64)
threads=4:  476.0   (overhead 2.99)
Attached patch work in progress (obsolete) — Splinter Review
The patch prototypes that idea of using the whole page as local free list for doubles. The patch stores the list in JSGCFreeListSet to ensure that a non-empty free list can be used by other context and threads when the current context leaves the current request.
Attached patch tested implementation (obsolete) — Splinter Review
This is what I am going to benchmark.

To Ben Turner:

Could you check if with this patch the timing still dominated with the contested double lock in RefillDoubleFreeList?
Attachment #340608 - Attachment is obsolete: true
I'm on a different machine today (1xCore2Duo) but here are my results. Both of these runs have the patch from bug 419086 applied.

Without patch (attachment 340789 [details] [diff] [review]):

double case
threads=1:	83.0
threads=2:	284.5	(overhead 3.43)
threads=3:	550.0	(overhead 6.63)
threads=4:	613.0	(overhead 7.39)
int case
threads=1:	58.0
threads=2:	103.5	(overhead 1.78)
threads=3:	137.0	(overhead 2.36)
threads=4:	150.8	(overhead 2.60)

---------------------------------------

With patch (attachment 340789 [details] [diff] [review]):

double case
threads=1:	80.0
threads=2:	158.5	(overhead 1.98)
threads=3:	225.0	(overhead 2.81)
threads=4:	255.5	(overhead 3.19)
int case
threads=1:	59.0
threads=2:	103.0	(overhead 1.75)
threads=3:	137.3	(overhead 2.33)
threads=4:	155.2	(overhead 2.63)

In the 2-thread double case my profile says that 76% of the time is spent in js_Interpret, 9.7% in js_NewDoubleInRootedValue, and 1.5% in RefillDoubleFreeList.
Sadly attachment 340789 [details] [diff] [review] doesn't help parallel raytrace as much as I'd hoped, but it is definitely closer! This sped up a five run average of 2-thread parallel raytrace on my core2 machine from 13015 to 11539 ms. Looks like the contention now is between js_NewGCThing and RefillDoubleFreeList.
Attached patch whole page as free list (obsolete) — Splinter Review
With the new patch the free lists spans the whole GC page both for doubles and ordinary GC things.

What are the performance numbers with this patch?
Attachment #340789 - Attachment is obsolete: true
The patch required an update due to TM merge.
Attachment #341128 - Attachment is obsolete: true
Severity: normal → enhancement
(In reply to comment #41)
> What are the performance numbers with this patch?

They were great, but the output of the raytrace was incorrect and so i suspect that this patch broke something.

We're closing in on 3.1, is this something that should block?
Flags: blocking1.9.1?
Wasn't the incorrect output due to a TM bug, since fixed?

/be
(In reply to comment #44)
> Wasn't the incorrect output due to a TM bug, since fixed?

That was something different. Workers still don't use JIT so this was always tested with the interpreter only.
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Updated patch and results with jit enabled for workers coming?

/be
I should update the patch after addressing my P1 and P2. With luck it could be even this week.
/me cheers!

I'll be happy to benchmark.
Sayrer, given bug 487948 should we reconsider blocking on this? Seems like any math heavy worker could trigger this bottleneck...
Flags: blocking1.9.1- → blocking1.9.1?
I don't think we can block on a worker performance bug, as there won't be a lot web content depending on it. thoughts?
I would be worried about taking this into final without beta exposure. So if we block we have to fix this immediately for b4.
I doubt this bug only hurts worker though. Double allocation performance is pretty horrible even for single-threaded use. bz's mandelbrot benchmark shows that well (trace-test.js).
Yes, it probably bites in more than just workers, but it's not worse than Firefox 3.
Flags: blocking1.9.1? → blocking1.9.1-
Flags: blocking1.9.2?
Keywords: perf
Flags: wanted-next+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
sayrer, this is the bug I was talking about the other day. Any way we can bump the priority on this? It sailed right through 1.9.1 and looks like it may miss 1.9.2 as well unless some action is taken.
I have a couple ideas for this. bent, can you catch me on irc?
I will update/fix the patch before the Wednesday.
Attached patch whole page as free list v2 (obsolete) — Splinter Review
Here is updated but completely untested patch. The idea here is to minimize the amount of time the GC spends inside GC_LOCK even when prepraing the free lists. In particular, the lists are assembled outside the lock.
Assignee: general → igor
Attachment #341260 - Attachment is obsolete: true
updated patch that passes shell tests
Attachment #403848 - Attachment is obsolete: true
With the patch from the comment above the test case from the comment 33 changes the output from:

Best timing of 10 scatter runs:
double case
threads=1:      76.0
threads=2:      79.5    (overhead 1.05)
threads=3:      99.0    (overhead 1.30)
threads=4:      106.0   (overhead 1.39)

to:
Best timing of 10 scatter runs:
double case
threads=1:      77.0
threads=2:      77.0    (overhead 1.00)
threads=3:      78.0    (overhead 1.01)
threads=4:      79.3    (overhead 1.03)

This is on 4-core CPU.
Comment on attachment 404249 [details] [diff] [review]
whole page as free list v3

asking for review as the patch has passed the try server.
Attachment #404249 - Flags: review?(brendan)
Comment on attachment 404249 [details] [diff] [review]
whole page as free list v3

>@@ -104,16 +104,20 @@ FinishThreadData(JSThreadData *data)
> #if defined JS_TRACER
>     js_FinishJIT(&data->traceMonitor);
> #endif
> }
> 
> static void
> PurgeThreadData(JSContext *cx, JSThreadData *data)
> {
>+    /*
>+     * Clear the double free list to release all the pre-allocated doubles.
>+     */
>+    data->doubleFreeList = NULL;
>     js_PurgeGSNCache(&data->gsnCache);

Blank line after date->doubleFreeList = NULL; line.

>+const uint32 DOUBLES_PER_ARENA =
>+    (ARENA_INFO_OFFSET * JS_BITS_PER_BYTE) / (JS_BITS_PER_DOUBLE + 1);
. . .
>+const uint32 DOUBLES_ARENA_BITMAP_WORDS =
>+    JS_HOWMANY(DOUBLES_PER_ARENA, JS_BITS_PER_WORD);

Do we want size_t instead of uint32 for these consts, for better 64-bit perf? They should all fit in 32 bits but their initializers involve sizeof, so size_t is the "obvious" type that doesn't require more analysis of the code reader.

Note for ref below.

>+const uint32 DOUBLES_ARENA_BITMAP_OFFSET =
>+    ARENA_INFO_OFFSET - DOUBLES_ARENA_BITMAP_WORDS * sizeof(jsuword);
. . .
> #define DOUBLE_ARENA_BITMAP(arenaInfo)                                        \
>     (CHECK_DOUBLE_ARENA_INFO(arenaInfo),                                      \
>      (jsbitmap *) arenaInfo - DOUBLES_ARENA_BITMAP_WORDS)

jsbitmap must be word-sized, and it is -- but without any static assertion or other such sanity check. Do we care? Perhaps use jsbitmap instead of jsuword throughout?

>+TurnUsedArenaIntoDoubleList(JSGCArenaInfo *a)
> {
>+    /*
>+     * When m below points the last bitmap's word in the arena, its high high

"high high" typo -- "high half"?

>+     * corresponds to non-existing cells and cellptr is outside the space
>+     * allocated for doubles. ClearDoubleArenaFlags sets such bits to 1. Thus
>+     * even for this last word its bit is unset iff the corresponding cell
>+     * exists and free.
>+     */
>+    JSGCDoubleCell *list = NULL;
>+    jsuword cellptr = ARENA_INFO_TO_START(a) +
>+                      DOUBLES_ARENA_BITMAP_WORDS * JS_BITS_PER_WORD *
>+                      sizeof(jsdouble);

Use DOUBLES_PER_ARENA instead of the second line of the cellptr initializer to simplify?

>+    for (jsbitmap *m = DOUBLE_ARENA_BITMAP_END(a);
>+         m != DOUBLE_ARENA_BITMAP(a);) {
>+        JS_ASSERT(cellptr >= ARENA_INFO_TO_START(a));

Here isn't cellptr strictly > ARENA_INFO_TO_START(a)?

>+        JS_ASSERT((cellptr - ARENA_INFO_TO_START(a)) %
>+                  (JS_BITS_PER_WORD * sizeof(jsdouble)) == 0);
>+        --m;
>+        jsbitmap bits = *m;
>+        if (bits == jsbitmap(-1)) {
>+            cellptr -= JS_BITS_PER_WORD * sizeof(jsdouble);
>+        } else {
>+            const unsigned unroll = 4;
>+            const jsbitmap unrollMask = (jsbitmap(1) << unroll) - 1;
>+            JS_STATIC_ASSERT(JS_BITS_PER_WORD % unroll == 0);

Asserting ((JS_BITS_PER_WORD & unrollMask) == 0) is more righteous ;-).

>+            for (unsigned n = JS_BITS_PER_WORD; n != 0; ) {
>+                n -= unroll;

I'd use a do {...} while ((n -= unroll) != 0); loop, with unsigned n = JS_BITS_PER_WORD; before. The block scope ends with the loop anyway.

>+static JSGCDoubleCell *
>+TurnEmptyArenaIntoDoubleList(JSGCArenaInfo *a)
>+{
>+    JSGCDoubleCell *head = reinterpret_cast<JSGCDoubleCell*>(ARENA_INFO_TO_START(a));
>+    JSGCDoubleCell *lastcell = head + DOUBLES_PER_ARENA - 1;
>+    for (JSGCDoubleCell *cell = head; cell != lastcell; ++cell)

Perhaps a blank line before the for loop to let things breathe a bit.

>         /*
>+         * Loop until we find arena with some free doubles. We turn arenas
>+         * into free lists outside the to minimize lock contention.

Missing word ("lock") after "outside the".

>          */
>+        while ((a = rt->gcDoubleArenaList.cursor)) {

Nit: house style favors redundant != NULL here.

Nice patch, r=me with these addressed.

/be
Attachment #404249 - Flags: review?(brendan) → review+
http://hg.mozilla.org/tracemonkey/rev/1c011b6b4f57

I nominate this for 1.9.2 per comment 54.
Flags: wanted1.9.2?
Whiteboard: fixed-in-tracemonkey
ok wanted+, but I'll merge it.
Flags: wanted1.9.2? → wanted1.9.2+
http://hg.mozilla.org/mozilla-central/rev/1c011b6b4f57
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Woo! Thanks guys!
(In reply to comment #63)
> ok wanted+, but I'll merge it.

Did this happen? 1.9.2 flag doesn't indicate so, but I'm not sure it was updated correctly.  Might be relevant for a worker-based JPEG encoder performance problem, though that's speculation on my part.
1.9.2 and m-c have more or less the same behaviour with that JPEG encoder test.
(In reply to comment #66)
> Did this happen? 1.9.2 flag doesn't indicate so, but I'm not sure it was
> updated correctly.

This did not make it to 1.9.2.
You need to log in before you can comment on or make changes to this bug.