Closed Bug 20357 Opened 20 years ago Closed 10 years ago

JS thin locks fatter than fat locks?


(Core :: JavaScript Engine, defect, P3)






(Reporter: brendan, Unassigned)




(Keywords: helpwanted, js1.5)


(4 files)

More measurement needed, but so far we have evidence from waterson and shaver
that thin locks are no faster, and jband's tests showing that they're slower.
If they're not a win, we should simplify jslock.[ch] to be mostly-macro veneer
on top of NSPR locks and condvars.

This was the reply that jband got from Bjorn:

When we did this work, the timings were different for NSPR locks
vs thin locks (1.5 yrs ago). It used to be that when the locking
code was enabled, via JS_THREADSAFE, JS lost 50% of its speed if
using NSPR locks, and 20% if using thin locks. Hence, the
threadsafe version of JS became an issue to the server-side JS
people (in NES).

You can disable thin locks by compiling with JS_NO_THIN_LOCKS set
in the makefile/environment. With this version of JS you can
easily determine if the thin locks help anymore.

Of course, another reason for using thin locks is that fewer OS
locks are used, since the thin locks use a pool of NSPR locks.

There are two implementation of NSPR that run on NT:
NT-only and WIN95.  The NT-only version does not run
on Windows 95/98, whereas the WIN95 version runs on
Windows 95/98/NT.  Mozilla uses the WIN95 version.

The NT-only NSPR has a two-level, MxN threading
implementation in which PR_Lock and PR_Unlock
requires one EnterCriticalSection and one
LeaveCriticalSection call.  Therefore, locking
and unlocking are at least twice as expensive
as an implementation that uses only native threads
(i.e., a 1x1 threading model).  I believe that
JS thin locks were primarily designed to provide
better locking performance when the NT-only NSPR is
used.  This means that thin locks may not have much
or any advantage when the Unix, Mac, or WIN95 version
of NSPR is used.  Since Mozilla does not use the
NT-only version of NSPR, I believe that the problem
that thin locks were intended to address is irrelevant
to Mozilla.
Assignee: clayton → rogerl
*** Bug 20356 has been marked as a duplicate of this bug. ***
Target Milestone: M14
Sorry, not going to make beta on this one.
Target Milestone: M14 → M16
Would like to get this fixed in m17 as part of js1.5 wrap-up.

Keywords: js1.5
Target Milestone: M16 → M17
Reassigning to Patrick as per meeting -
Assignee: rogerl → beard
QA Contact: rginda → pschwartau
Reassigning to mitesh.
Assignee: beard → mitesh
I will look into it. Looks like there is a macro to switch off thin locks.
Here are some interesting results with two test cases.
I have created two versions of jsshell - one with thin locks enabled and one 
with thin locks disabled (fat locks only). In file jslock.h, I defined macro 
JS_USE_ONLY_NSPR_LOCKS which essentially defines NSPR_LOCK to 1 and only fat 
locks from nspr module are used.

Here are the results of two test cases.

First Test Case (thanks to jband):

// Defeat potential inline optimization by doing the 'set' out of line. 
function doSet(obj, prop, prop2, val) {
    obj[prop] = val;
    obj[prop2] = val;

var o = {};
var iterations = 1000000;

print("running "+iterations+" iterations of test");

var start_time = new Date().getTime()/1000;

for(var i = 0; i < iterations; i++)
    doSet(o, "p", "q",  i);
var end_time = new Date().getTime()/1000;

var interval = parseInt(100*(end_time - start_time),10)/100;
print("did "+iterations+" iterations in "+interval+ " seconds.");


The time taken on Windows NT machine for this particular test case.

With thin locks  - 38.09 seconds
With fat locks   - 39.45 seconds

So it seems that thin locks are faster than fat locks for this particular test 

Following is the output from the run under the profiler.
Under the profiler the number of interations are only 200000,

(a) with thin locks

Profile: Function timing, sorted by time
Date:    Fri Jul 28 16:31:09 2000

Program Statistics
    Command line at 2000 Jul 28 16:25: js 
    Total time: 17442.530 millisecond
    Time outside of functions: 13.152 millisecond
    Call depth: 60
    Total functions: 1193
    Total hits: 50854179
    Function coverage: 29.4%
    Overhead Calculated 647
    Overhead Average 647

Module Statistics for js.exe
    Time in module: 17429.378 millisecond
    Percent of time in module: 100.0%
    Functions in module: 1193
    Hits in module: 50854179
    Module function coverage: 29.4%

        Func          Func+Child           Hit
        Time   %         Time      %      Count  Function
    6400.972  36.7    17325.453  99.4        1 _js_Interpret (jsinterp.obj)
    1319.725   7.6     1319.725   7.6 12408120 _js_CompareAndSwap (i)
    1214.729   7.0     1652.177   9.5  4202145 _js_Lock (i)
    1126.555   6.5     1570.941   9.0  4202145 _js_Unlock (i)
     923.099   5.3     2507.842  14.4  3000895 _js_LockObj (jslock.obj)
     750.866   4.3     2062.041  11.8  1200019 _js_FindProperty (jsobj.obj)
     563.173   3.2     2028.708  11.6  3801197 _js_LockScope1 (jslock.obj)
     549.325   3.2     1955.840  11.2  3801156 _js_UnlockScope (jslock.obj)
     535.721   3.1      758.628   4.4  1001349 _js_GetSlotWhileLocked (i)
     438.019   2.5      438.019   2.5  2003882 _js_IsScopeLocked (jslock.obj)
     436.146   2.5      873.887   5.0  2000812 _js_DropScopeProperty 
     333.457   1.9      333.457   1.9  1800315 _JS_PropertyStub (jsapi.obj)
     312.786   1.8      312.786   1.8  1600040 _js_GetAtom (jsatom.obj)
     301.995   1.7     1027.169   5.9   400637 _js_AtomizeString (jsatom.obj)
     293.989   1.7     1581.967   9.1  1200277 _js_DropProperty (jsobj.obj)
     275.317   1.6     1403.990   8.1  2200680 _js_UnlockObj (jslock.obj)
     227.305   1.3      442.249   2.5  2000918 _js_AtomicAdd (i)
     197.742   1.1      641.707   3.7   800302 _js_LockScope (jslock.obj)
     185.155   1.1     1262.079   7.2   400000 _js_ValueToStringAtom 
     152.745   0.9      446.046   2.6   200220 _JS_GetPrivate (jsapi.obj)
     143.423   0.8      143.452   0.8   200016 _js_NewNumberValue (jsnum.obj)
     141.510   0.8      306.206   1.8   401804 _JS_HashTableRawLookup 
     120.322   0.7      267.055   1.5   200008 _ComputeThis (jsinterp.obj)
      96.447   0.6       96.447   0.6   400328 _js_CompareStrings (jsstr.obj)
      96.045   0.6       96.050   0.6   200003 _js_AllocStack (jsinterp.obj)
      68.233   0.4      164.680   0.9   400328 _js_compare_atom_keys 
      67.013   0.4       67.013   0.4   400637 _js_HashString (jsstr.obj)
      53.191   0.3       53.460   0.3   400009 _js_ValueToString (jsstr.obj)
      27.956   0.2       27.956   0.2        1 _js_InitStringGlobals (jsstr.obj)
      13.608   0.1       16.810   0.1        1 _js_InitGC (jsgc.obj)
      10.259   0.1    17429.378 100.0        1 _main (js.obj)

...truncated for brevity

(b) with fat locks

Profile: Function timing, sorted by time
Date:    Fri Jul 28 16:38:11 2000

Program Statistics
    Command line at 2000 Jul 28 16:34: js 
    Total time: 18750.477 millisecond
    Time outside of functions: 4.571 millisecond
    Call depth: 60
    Total functions: 1191
    Total hits: 34047792
    Function coverage: 29.1%
    Overhead Calculated 644
    Overhead Average 644

Module Statistics for js.exe
    Time in module: 18745.906 millisecond
    Percent of time in module: 100.0%
    Functions in module: 1191
    Hits in module: 34047792
    Module function coverage: 29.1%

        Func          Func+Child           Hit
        Time   %         Time      %      Count  Function
    6751.417  36.0    18672.214  99.6        1 _js_Interpret (jsinterp.obj)
    1847.841   9.9     1847.841   9.9  4802754 _js_LockScope1 (jslock.obj)
    1804.132   9.6     1804.132   9.6  4802713 _js_UnlockScope (jslock.obj)
    1647.989   8.8     3244.067  17.3  4002452 _js_LockObj (jslock.obj)
     994.903   5.3      994.903   5.3  2000918 _js_AtomicAdd (i)
     901.848   4.8     2654.198  14.2  1200019 _js_FindProperty (jsobj.obj)
     526.390   2.8     1756.152   9.4  3202237 _js_UnlockObj (jslock.obj)
     493.047   2.6      959.826   5.1  2000812 _js_DropScopeProperty 
     474.659   2.5      855.427   4.6   400637 _js_AtomizeString (jsatom.obj)
     466.998   2.5      466.998   2.5  2003882 _js_IsScopeLocked (jslock.obj)
     422.292   2.3     1654.367   8.8  1200277 _js_DropProperty (jsobj.obj)
     340.317   1.8      340.317   1.8  1800315 _JS_PropertyStub (jsapi.obj)
     316.825   1.7     1646.645   8.8  1001349 _js_GetSlotWhileLocked (i)
     304.405   1.6      556.168   3.0   800302 _js_LockScope (jslock.obj)
     264.579   1.4      264.579   1.4  1600040 _js_GetAtom (jsatom.obj)
     177.865   0.9     1114.889   5.9   400000 _js_ValueToStringAtom 
     158.496   0.8      800.577   4.3   200220 _JS_GetPrivate (jsapi.obj)
     143.553   0.8      311.210   1.7   401804 _JS_HashTableRawLookup 
     127.930   0.7      127.955   0.7   200016 _js_NewNumberValue (jsnum.obj)
     116.001   0.6      461.087   2.5   200008 _ComputeThis (jsinterp.obj)
     107.017   0.6      167.647   0.9   400328 _js_compare_atom_keys 
      84.204   0.4       84.376   0.5   400009 _js_ValueToString (jsstr.obj)
      70.176   0.4       70.182   0.4   200003 _js_AllocStack (jsinterp.obj)
      68.796   0.4       68.796   0.4   400637 _js_HashString (jsstr.obj)
      60.630   0.3       60.630   0.3   400328 _js_CompareStrings (jsstr.obj)
      16.778   0.1       16.778   0.1        1 _js_InitStringGlobals (jsstr.obj)
       6.512   0.0        6.512   0.0      400 _mallocFatlock (jslock.obj)
       6.224   0.0        6.246   0.0        2 _Print (js.obj)
       3.092   0.0        3.092   0.0      400 _freeFatlock (jslock.obj)
       2.537   0.0        2.537   0.0     1761 _JS_malloc (jsapi.obj)
       1.646   0.0        1.878   0.0       18 _JS_NewHashTable (jshash.obj)
       1.587   0.0        6.636   0.0      223 _js_GetClassPrototype (jsobj.obj)
       1.560   0.0        1.560   0.0     1763 _JS_free (jsapi.obj)
       1.481   0.0        8.043   0.0        1 _js_SetupLocks (jslock.obj)
       1.452   0.0        4.214   0.0        1 _js_InitDateClass (jsdate.obj)
       1.309   0.0        1.668   0.0        1 _js_InitScanner (jsscan.obj)
       1.308   0.0        2.602   0.0        6 _JS_InitArenaPool (jsarena.obj)
       1.304   0.0        2.808   0.0        1 _js_InitFunctionClass (jsfun.obj)
       1.295   0.0        1.295   0.0        8 _JS_CeilingLog2 (jslog2.obj)
       1.288   0.0        4.741   0.0        1 _js_InitGC (jsgc.obj)
       1.282   0.0    18745.906 100.0        1 _main (js.obj)

As we can see from the profiler output that thin locks are little faster than 
fat locks.

Test Case 2:
I have inserted the following code into jsapi.c which basically calls a large 
number of time fat and thin locks during the runtime (Thanks to jband)
Index: jsapi.c
RCS file: /cvsroot/mozilla/js/src/jsapi.c,v
retrieving revision 3.62
diff -r3.62 jsapi.c
> #define LOCK_SPEED_TEST 1                            
>  /* mitesh test... */
> #include "prmjtime.h"
> #endif /* LOCK_SPEED_TEST */
>  /* mitesh test... */
>         {
> #define COUNT 1000000
>             JSInt64 start;
>             JSInt64 end;
>             JSInt64 llelapsed;
>             JSInt32 elapsed;
>             JSInt32 i;
>             PRLock* fatLock;
>             JSThinLock thinLock;
>             jsword tid;
>             fatLock = PR_NewLock();
>             printf("lock and unlock %d fatlocks \n", (int)COUNT);
>             start = PRMJ_Now();
>             for(i = 0; i < COUNT; i++)
>                 {
>                     PR_Lock(fatLock);
>                     PR_Unlock(fatLock);
>                 }
>             end = PRMJ_Now();
>             JSLL_SUB(llelapsed, end, start);
>             JSLL_L2I(elapsed, llelapsed);
>             printf("%d elasped \n", (int)elapsed);
>             js_NewLock(&thinLock);
>             printf("lock and unlock %d thinlocks \n", (int)COUNT);
>             start = PRMJ_Now();
>             tid = js_CurrentThreadId();
>             for(i = 0; i < COUNT; i++)
>                 {
>                     js_Lock(&thinLock, tid);
>                     js_Unlock(&thinLock, tid);
>                 }
>             end = PRMJ_Now();
>             JSLL_SUB(llelapsed, end, start);
>             JSLL_L2I(elapsed, llelapsed);
>             printf("%d elasped \n", (int)elapsed);
>         }
> #endif /* LOCK_SPEED_TEST */

output of a sample run of jsshell.exe:

lock and unlock 1000000 fatlocks
297000 elasped
lock and unlock 1000000 thinlocks
500000 elasped

This test case is a direct comparison between the implementation of 
PR_Lock/PR_UnLock and js_Lock and js_UnLock functions. Which is not exactly 
correct because there are other functions also affected when only fat locks are 

We need more practical test cases to compare thin locks and fat locks to reach 
to a decision. 
Also, the effect on platforms other than Win NT are of interest. The thin locks 
were bad on Linux because (until recently) we did not have a fast implementation 
of compare-and-swap for Linux in nspr. I believe this is fixed now.
From jslock.h, it seems that on Linux, we always use Fat locks.
#if defined(JS_USE_ONLY_NSPR_LOCKS) || !(defined(_WIN32) || defined(SOLARIS) || 

#undef JS_LOCK0
#undef JS_UNLOCK0
#define JS_LOCK0(P,M) JS_ACQUIRE_LOCK(((JSLock*)(P)->fat)); (P)->owner = (M)
#define JS_UNLOCK0(P,M) (P)->owner = 0; JS_RELEASE_LOCK(((JSLock*)(P)->fat))
#define NSPR_LOCK 1
(ACQUIRE and RELEASE map to PR_Lock and PR_UnLock)

Since on Linux _WIN32 is not defined, it always uses NSPR locks.
Mark it as a future.. No resolution yet.
Target Milestone: M17 → Future
OK, I did some digging and I understand this better.

Thin locks are dependent on having a fast platform specific js_CompareAndSwap 
implementation in jslock.c. We have those for Win32/x86, Solaris/Sparc, and AIX 
only. Any other platform that tries to use thin locks gets a slow default 
implementation of js_CompareAndSwap that uses an NSPR lock.

I did a little testing using xpcshell and a .js file that I'll attach (same as 
used for bug 43902 except I increased the loop counter for better resolution). 
This just loops through a lot of property get/set on different objects. It does 
three timed tests.

These timings are in debug builds.

All timings in seconds...

For WinNT:

 Fat Locks | Thin Locks
   1.22    |   1.15
   1.35    |   1.27
   0.54    |   0.49

So, we gain at most maybe 10% by using the thin locks on Win32/x86.

I ran the same test on Linux (using a different - slower - computer)

 Fat Locks | Thin Locks
   3.68    |   5.25
   4.00    |   5.87
   1.57    |   1.90

Since we don't have a fast Linux/x86 js_CompareAndSwap the thin locks are slow. 
Luckily, we don't *use* the thin locks in our builds for Linux.

Presumably, someone could implement a Linux version of the Win32/x86 
js_CompareAndSwap and we could start using thin locks and expect a 10%ish gain 
similar to the one we get in Win32.

I think my previous abstracted test of thin v. fat is skewed because it does not 
take into account the different usage patterns of the locks (and direct use of 
js_CompareAndSwap) in the engine.

Nevertheless, It looks like we are getting some gain from the thin locks - at 
least on Win32/x86.

I'll try the Win32 tests on RElease build to see if the 10% is real or not.
Thin locks are even better on Win32 in optimized builds!

Same test on a Win32/x86 release build on yet another - faster - machine...

 Fat Locks | Thin Locks
   0.81    |   0.57
   0.84    |   0.63
   0.31    |   0.23

Of course, the test case is really skewed to property access, but this shows 
thin locks apparently being a big win. And this make it look like adding the 
js_CompareAndSwap for Linix would give us a significant boost there.
Two more things...

- FWIW... For Windows I've been adding -DJS_USE_ONLY_NSPR_LOCKS to LCFLAGS and 
rebuildsing js/src to test this.

- No telling what sort of gain might be available on Mac.
Why can't Linux/x86 trivially partake of the Windows/x86 compare-and-swap? 
Maybe a little gcc asm magic is required instead of different MSVC asm magic. 
Who is gonna look into this?

What is the situation on Mac/PPC?  Do we turn on THIN_LOCKS?

Now that jband mentions it, it seems obviously wrong to emulate compare-and-swap
with a global lock, as jslock.c does around line 194. Shouldn't we take that
code out of jslock.c and replace it with #error "You shouldn't try to use thin
locks on your platform, you need an atomic compare-and-swap instruction." or
some such?

Fat locks are forced by:
#if defined(JS_USE_ONLY_NSPR_LOCKS) || \
    !(defined(_WIN32) || defined(SOLARIS) || defined(AIX))

Thin locks are only turned on on those three platforms.

For Linux the Win32 code is a guide, but the choice of registers might have to 
change, and JS_INLINE added to the inlined asm might not be safe? I don't know. 
Can we advertise for a volunteer? I'm not setup well for this and am not 
inclined to commit to doing it.

I like the #error idea. As far as I can tell this should not break anything. But 
if anyone were to call js_Lock directly in the current code they would get a 
thin lock that would be slow on some platforms. You could try that out on your 
Linux build to verify that If does not #error out.
Ok, I'm working on a linux version of this. Should have a patch out pretty soon.

I noted while hacking that the #ifdefs in jslock.h and jslock.c differ in the
win32 alpha case, the function is surrounded by defined(_WIN32) &&
defined(_M_IX86), but the fat-locks is only protected by !defined(_WIN32).
Ok. I hacked up thin locks for x86 Linux.
Seems to work well on an SMP box.

Here is the timing on my machine (dual PIII 450 Mhz) running the testcase in
 Fat Locks | Thin Locks
   1.85    |   1.2
   1.97    |   1.32
   0.68    |   0.49
I attached a new patch that should work on *BSD and all other GCC based x86
based machines. It just removes the #if defined(linux) parts and uses
defined(__GNUC__) && defined(__i386__) instead.

I have only tested it on linux, but i have no reason to believe it should break
on anything else.
Above patch from alex works in FreeBSD 5.0-CURRENT, here is the speed difference
on this machine. Without patch:
Alex, thanks for doing this -- great job.  If you could, would you please fix 
the WIN32/M_IX86 #if logic so the two cases are complementary?  Thanks, and

a=brendan again -- thanks.

One thing one of us (you or I) should do soon: put that #error "don't try to use 
thin locks without compare-and-swap" line in place of the wrong "default" 
implementation of thin locks that uses a global lock to synchronize c-and-s 
(which will always lose to fat locks).

I don't know the best place to place that. Could you do it?
Sure -- have you checked in yet?

Um. I'm not very good at this. Aren't i supposed to have an r= too before i
check in?
I got yer r= right here -- anyone else care to second?  But I think you're good
to go.

Ok. Its the tree now.
Suck suck suck. It broke the build:
Build Error Summary
/tmp/cc2Almln.s:928: Error: operands given don't match any known 386 instruction
gmake[3]: *** [jslock.o] Error 1

I've reverted it.

What version of gas does the tinderbox machines run?
I run a stock RedHat 6.2 with binutils, and it worked for me.

Maybe it died because cmpxchg is a >= 486 instruction only (but the windows
build uses it).
Ok. New version of the patch is in. This one should work on older binutils too.
I had to move the lock instruction to a separate line. That made it work on an
old RH 5.0 machine anyway. I'll keep my eyes on tinderbox.
Why is this bug still open?  Thin locks win.  The only open issues I see are:

(1) What about Mac PPC?  Can someone (beard? scc?) help hack up the right atomic
instruction asm magic?

(2) I have a patch I'll soon attach to bug 54743 that optimizes JS scope locking
(object locking is based on scope locking) for the single-threaded case, which
should make JS_THREADSAFE almost painless.

Can we either get PPC thin locks supported under the aegis of this bug, or else
close this one and open a new bug asking for PPC thin lock support?  Thanks,

Part of my fix to bug 54743 ripped out the bogus emulation of compare-and-swap
using a global lock, which would definitely make "thin" locks slower than "fat"
ones in a non-zero-contention benchmark.  So there's not much more to do here in
the XP code.  We just need PPC support, and any other CPUs that we can manage.

beard, mitesh, waterson: any help (or pointers to others who can help) in the
quest for the Mac and Linux PPC compare-and-swap holy grail that I claim this
bug now represents?

#ifdef XP_MAC
#include < DriverSynchronization.h>

This contains the following declaration:

EXTERN_API_C( Boolean )
CompareAndSwap(UInt32                 oldVvalue,
               UInt32                 newValue,
               UInt32 *               OldValueAdr);
I poked around here for a bit, but didn't find much...
Reassigning from Mitesh to Patrick. Note this from above:

------- Additional Comments From Brendan Eich 2000-11-13 16:33 -------
Why is this bug still open?  Thin locks win.  The only open issues I see are:


Have these issues been discussed or resolved to everyone's satisfaction?
If so, we could close this bug - 
Assignee: mitesh → beard
I think beard is the right guy to hack PPC compare and swap support into jslock.c!

Blocks: 149801
Keywords: helpwanted
OS: All → Mac System 8.5
Hardware: All → Macintosh
Target Milestone: Future → ---
Then maybe someone should poke beard ? :D
Note: Patrick is currently on sabbatical; back in August, I believe -
Do we still need this for Mac OS X, now that GCC is the default compiler for
that platform or can bis bug be WONTFIX'ed since it's a mac classic issue?
This bug is targeted at a Mac classic platform/OS, which is no longer supported
by Please re-target it to another platform/OS if this bug applies
there as well or resolve this bug.

I will resolve this bug as WONTFIX in four weeks if no action has been taken.
To filter this and similar messages out, please filter for "mac_cla_reorg".
I don't know whether we need this bug still, but it depends on CPU architecture,
not on operating system version, so I'm changing the OS field and leaving it
open.  Only some analysis of the performance of native mutexes underlying
PR_Lock vs. a jslock.c compare-and-swap PPC instruction selection #elif can tell
whether we need to fix or invalidate this bug.

OS: Mac System 8.5 → MacOS X
Reassign all of my bugs to default owner.
Assignee: beard → rogerl
Assignee: rogerl → general
QA Contact: pschwartau → general
This may just be a dup (forward-dup, after all these years) of bug 296878.  I'll let someone better versed in bugzilla's fine points decide.

No longer blocks: 149801
Depends on: 296878
closing this after all these years. reopen if you disagree.
Closed: 10 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.