Closed Bug 20357 Opened 25 years ago Closed 15 years ago

JS thin locks fatter than fat locks?

Categories

(Core :: JavaScript Engine, defect, P3)

PowerPC
macOS
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: brendan, Unassigned)

References

()

Details

(Keywords: helpwanted, js1.5)

Attachments

(4 files)

More measurement needed, but so far we have evidence from waterson and shaver that thin locks are no faster, and jband's tests showing that they're slower. If they're not a win, we should simplify jslock.[ch] to be mostly-macro veneer on top of NSPR locks and condvars. /be
This was the reply that jband got from Bjorn: When we did this work, the timings were different for NSPR locks vs thin locks (1.5 yrs ago). It used to be that when the locking code was enabled, via JS_THREADSAFE, JS lost 50% of its speed if using NSPR locks, and 20% if using thin locks. Hence, the threadsafe version of JS became an issue to the server-side JS people (in NES). You can disable thin locks by compiling with JS_NO_THIN_LOCKS set in the makefile/environment. With this version of JS you can easily determine if the thin locks help anymore. Of course, another reason for using thin locks is that fewer OS locks are used, since the thin locks use a pool of NSPR locks. bjorn
There are two implementation of NSPR that run on NT: NT-only and WIN95. The NT-only version does not run on Windows 95/98, whereas the WIN95 version runs on Windows 95/98/NT. Mozilla uses the WIN95 version. The NT-only NSPR has a two-level, MxN threading implementation in which PR_Lock and PR_Unlock requires one EnterCriticalSection and one LeaveCriticalSection call. Therefore, locking and unlocking are at least twice as expensive as an implementation that uses only native threads (i.e., a 1x1 threading model). I believe that JS thin locks were primarily designed to provide better locking performance when the NT-only NSPR is used. This means that thin locks may not have much or any advantage when the Unix, Mac, or WIN95 version of NSPR is used. Since Mozilla does not use the NT-only version of NSPR, I believe that the problem that thin locks were intended to address is irrelevant to Mozilla.
Assignee: clayton → rogerl
*** Bug 20356 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Target Milestone: M14
Sorry, not going to make beta on this one.
Target Milestone: M14 → M16
Would like to get this fixed in m17 as part of js1.5 wrap-up. /be
Keywords: js1.5
Target Milestone: M16 → M17
Reassigning to Patrick as per meeting -
Assignee: rogerl → beard
Status: ASSIGNED → NEW
QA Contact: rginda → pschwartau
Status: NEW → ASSIGNED
Reassigning to mitesh.
Assignee: beard → mitesh
Status: ASSIGNED → NEW
I will look into it. Looks like there is a macro to switch off thin locks.
Status: NEW → ASSIGNED
Here are some interesting results with two test cases. I have created two versions of jsshell - one with thin locks enabled and one with thin locks disabled (fat locks only). In file jslock.h, I defined macro JS_USE_ONLY_NSPR_LOCKS which essentially defines NSPR_LOCK to 1 and only fat locks from nspr module are used. Here are the results of two test cases. First Test Case (thanks to jband): ---------------------------- // Defeat potential inline optimization by doing the 'set' out of line. function doSet(obj, prop, prop2, val) { obj[prop] = val; obj[prop2] = val; } var o = {}; var iterations = 1000000; print("running "+iterations+" iterations of test"); var start_time = new Date().getTime()/1000; for(var i = 0; i < iterations; i++) doSet(o, "p", "q", i); var end_time = new Date().getTime()/1000; var interval = parseInt(100*(end_time - start_time),10)/100; print("did "+iterations+" iterations in "+interval+ " seconds."); ----------------------------- The time taken on Windows NT machine for this particular test case. With thin locks - 38.09 seconds With fat locks - 39.45 seconds So it seems that thin locks are faster than fat locks for this particular test case. Following is the output from the run under the profiler. Under the profiler the number of interations are only 200000, (a) with thin locks ==================== Profile: Function timing, sorted by time Date: Fri Jul 28 16:31:09 2000 Program Statistics ------------------ Command line at 2000 Jul 28 16:25: js D:\mozilla\mozilla\js\src\Debug\tests\locktest.js Total time: 17442.530 millisecond Time outside of functions: 13.152 millisecond Call depth: 60 Total functions: 1193 Total hits: 50854179 Function coverage: 29.4% Overhead Calculated 647 Overhead Average 647 Module Statistics for js.exe ---------------------------- Time in module: 17429.378 millisecond Percent of time in module: 100.0% Functions in module: 1193 Hits in module: 50854179 Module function coverage: 29.4% Func Func+Child Hit Time % Time % Count Function --------------------------------------------------------- 6400.972 36.7 17325.453 99.4 1 _js_Interpret (jsinterp.obj) 1319.725 7.6 1319.725 7.6 12408120 _js_CompareAndSwap (i) 1214.729 7.0 1652.177 9.5 4202145 _js_Lock (i) 1126.555 6.5 1570.941 9.0 4202145 _js_Unlock (i) 923.099 5.3 2507.842 14.4 3000895 _js_LockObj (jslock.obj) 750.866 4.3 2062.041 11.8 1200019 _js_FindProperty (jsobj.obj) 563.173 3.2 2028.708 11.6 3801197 _js_LockScope1 (jslock.obj) 549.325 3.2 1955.840 11.2 3801156 _js_UnlockScope (jslock.obj) 535.721 3.1 758.628 4.4 1001349 _js_GetSlotWhileLocked (i) 438.019 2.5 438.019 2.5 2003882 _js_IsScopeLocked (jslock.obj) 436.146 2.5 873.887 5.0 2000812 _js_DropScopeProperty (jsscope.obj) 333.457 1.9 333.457 1.9 1800315 _JS_PropertyStub (jsapi.obj) 312.786 1.8 312.786 1.8 1600040 _js_GetAtom (jsatom.obj) 301.995 1.7 1027.169 5.9 400637 _js_AtomizeString (jsatom.obj) 293.989 1.7 1581.967 9.1 1200277 _js_DropProperty (jsobj.obj) 275.317 1.6 1403.990 8.1 2200680 _js_UnlockObj (jslock.obj) 227.305 1.3 442.249 2.5 2000918 _js_AtomicAdd (i) 197.742 1.1 641.707 3.7 800302 _js_LockScope (jslock.obj) 185.155 1.1 1262.079 7.2 400000 _js_ValueToStringAtom (jsatom.obj) 152.745 0.9 446.046 2.6 200220 _JS_GetPrivate (jsapi.obj) 143.423 0.8 143.452 0.8 200016 _js_NewNumberValue (jsnum.obj) 141.510 0.8 306.206 1.8 401804 _JS_HashTableRawLookup (jshash.obj) 120.322 0.7 267.055 1.5 200008 _ComputeThis (jsinterp.obj) 96.447 0.6 96.447 0.6 400328 _js_CompareStrings (jsstr.obj) 96.045 0.6 96.050 0.6 200003 _js_AllocStack (jsinterp.obj) 68.233 0.4 164.680 0.9 400328 _js_compare_atom_keys (jsatom.obj) 67.013 0.4 67.013 0.4 400637 _js_HashString (jsstr.obj) 53.191 0.3 53.460 0.3 400009 _js_ValueToString (jsstr.obj) 27.956 0.2 27.956 0.2 1 _js_InitStringGlobals (jsstr.obj) 13.608 0.1 16.810 0.1 1 _js_InitGC (jsgc.obj) 10.259 0.1 17429.378 100.0 1 _main (js.obj) ...truncated for brevity (b) with fat locks ================== Profile: Function timing, sorted by time Date: Fri Jul 28 16:38:11 2000 Program Statistics ------------------ Command line at 2000 Jul 28 16:34: js D:\mozilla\mozilla\js\src\Debug\tests\locktest.js Total time: 18750.477 millisecond Time outside of functions: 4.571 millisecond Call depth: 60 Total functions: 1191 Total hits: 34047792 Function coverage: 29.1% Overhead Calculated 644 Overhead Average 644 Module Statistics for js.exe ---------------------------- Time in module: 18745.906 millisecond Percent of time in module: 100.0% Functions in module: 1191 Hits in module: 34047792 Module function coverage: 29.1% Func Func+Child Hit Time % Time % Count Function --------------------------------------------------------- 6751.417 36.0 18672.214 99.6 1 _js_Interpret (jsinterp.obj) 1847.841 9.9 1847.841 9.9 4802754 _js_LockScope1 (jslock.obj) 1804.132 9.6 1804.132 9.6 4802713 _js_UnlockScope (jslock.obj) 1647.989 8.8 3244.067 17.3 4002452 _js_LockObj (jslock.obj) 994.903 5.3 994.903 5.3 2000918 _js_AtomicAdd (i) 901.848 4.8 2654.198 14.2 1200019 _js_FindProperty (jsobj.obj) 526.390 2.8 1756.152 9.4 3202237 _js_UnlockObj (jslock.obj) 493.047 2.6 959.826 5.1 2000812 _js_DropScopeProperty (jsscope.obj) 474.659 2.5 855.427 4.6 400637 _js_AtomizeString (jsatom.obj) 466.998 2.5 466.998 2.5 2003882 _js_IsScopeLocked (jslock.obj) 422.292 2.3 1654.367 8.8 1200277 _js_DropProperty (jsobj.obj) 340.317 1.8 340.317 1.8 1800315 _JS_PropertyStub (jsapi.obj) 316.825 1.7 1646.645 8.8 1001349 _js_GetSlotWhileLocked (i) 304.405 1.6 556.168 3.0 800302 _js_LockScope (jslock.obj) 264.579 1.4 264.579 1.4 1600040 _js_GetAtom (jsatom.obj) 177.865 0.9 1114.889 5.9 400000 _js_ValueToStringAtom (jsatom.obj) 158.496 0.8 800.577 4.3 200220 _JS_GetPrivate (jsapi.obj) 143.553 0.8 311.210 1.7 401804 _JS_HashTableRawLookup (jshash.obj) 127.930 0.7 127.955 0.7 200016 _js_NewNumberValue (jsnum.obj) 116.001 0.6 461.087 2.5 200008 _ComputeThis (jsinterp.obj) 107.017 0.6 167.647 0.9 400328 _js_compare_atom_keys (jsatom.obj) 84.204 0.4 84.376 0.5 400009 _js_ValueToString (jsstr.obj) 70.176 0.4 70.182 0.4 200003 _js_AllocStack (jsinterp.obj) 68.796 0.4 68.796 0.4 400637 _js_HashString (jsstr.obj) 60.630 0.3 60.630 0.3 400328 _js_CompareStrings (jsstr.obj) 16.778 0.1 16.778 0.1 1 _js_InitStringGlobals (jsstr.obj) 6.512 0.0 6.512 0.0 400 _mallocFatlock (jslock.obj) 6.224 0.0 6.246 0.0 2 _Print (js.obj) 3.092 0.0 3.092 0.0 400 _freeFatlock (jslock.obj) 2.537 0.0 2.537 0.0 1761 _JS_malloc (jsapi.obj) 1.646 0.0 1.878 0.0 18 _JS_NewHashTable (jshash.obj) 1.587 0.0 6.636 0.0 223 _js_GetClassPrototype (jsobj.obj) 1.560 0.0 1.560 0.0 1763 _JS_free (jsapi.obj) 1.481 0.0 8.043 0.0 1 _js_SetupLocks (jslock.obj) 1.452 0.0 4.214 0.0 1 _js_InitDateClass (jsdate.obj) 1.309 0.0 1.668 0.0 1 _js_InitScanner (jsscan.obj) 1.308 0.0 2.602 0.0 6 _JS_InitArenaPool (jsarena.obj) 1.304 0.0 2.808 0.0 1 _js_InitFunctionClass (jsfun.obj) 1.295 0.0 1.295 0.0 8 _JS_CeilingLog2 (jslog2.obj) 1.288 0.0 4.741 0.0 1 _js_InitGC (jsgc.obj) 1.282 0.0 18745.906 100.0 1 _main (js.obj) As we can see from the profiler output that thin locks are little faster than fat locks. ******************************************************************************** Test Case 2: --------------------------------------- I have inserted the following code into jsapi.c which basically calls a large number of time fat and thin locks during the runtime (Thanks to jband) Index: jsapi.c =================================================================== RCS file: /cvsroot/mozilla/js/src/jsapi.c,v retrieving revision 3.62 diff -r3.62 jsapi.c 71a72,78 > #define LOCK_SPEED_TEST 1 > > #ifdef LOCK_SPEED_TEST > /* mitesh test... */ > #include "prmjtime.h" > #endif /* LOCK_SPEED_TEST */ > 645a653,696 > #ifdef LOCK_SPEED_TEST > /* mitesh test... */ > { > #define COUNT 1000000 > JSInt64 start; > JSInt64 end; > JSInt64 llelapsed; > JSInt32 elapsed; > JSInt32 i; > PRLock* fatLock; > JSThinLock thinLock; > jsword tid; > > fatLock = PR_NewLock(); > > printf("lock and unlock %d fatlocks \n", (int)COUNT); > start = PRMJ_Now(); > > for(i = 0; i < COUNT; i++) > { > PR_Lock(fatLock); > PR_Unlock(fatLock); > } > end = PRMJ_Now(); > JSLL_SUB(llelapsed, end, start); > JSLL_L2I(elapsed, llelapsed); > printf("%d elasped \n", (int)elapsed); > > > js_NewLock(&thinLock); > printf("lock and unlock %d thinlocks \n", (int)COUNT); > start = PRMJ_Now(); > tid = js_CurrentThreadId(); > for(i = 0; i < COUNT; i++) > { > js_Lock(&thinLock, tid); > js_Unlock(&thinLock, tid); > } > end = PRMJ_Now(); > JSLL_SUB(llelapsed, end, start); > JSLL_L2I(elapsed, llelapsed); > printf("%d elasped \n", (int)elapsed); > } > #endif /* LOCK_SPEED_TEST */ output of a sample run of jsshell.exe: lock and unlock 1000000 fatlocks 297000 elasped lock and unlock 1000000 thinlocks 500000 elasped This test case is a direct comparison between the implementation of PR_Lock/PR_UnLock and js_Lock and js_UnLock functions. Which is not exactly correct because there are other functions also affected when only fat locks are used. We need more practical test cases to compare thin locks and fat locks to reach to a decision.
Also, the effect on platforms other than Win NT are of interest. The thin locks were bad on Linux because (until recently) we did not have a fast implementation of compare-and-swap for Linux in nspr. I believe this is fixed now.
From jslock.h, it seems that on Linux, we always use Fat locks. ------ #if defined(JS_USE_ONLY_NSPR_LOCKS) || !(defined(_WIN32) || defined(SOLARIS) || defined(AIX)) #undef JS_LOCK0 #undef JS_UNLOCK0 #define JS_LOCK0(P,M) JS_ACQUIRE_LOCK(((JSLock*)(P)->fat)); (P)->owner = (M) #define JS_UNLOCK0(P,M) (P)->owner = 0; JS_RELEASE_LOCK(((JSLock*)(P)->fat)) #define NSPR_LOCK 1 ----------- (ACQUIRE and RELEASE map to PR_Lock and PR_UnLock) Since on Linux _WIN32 is not defined, it always uses NSPR locks.
Mark it as a future.. No resolution yet.
Target Milestone: M17 → Future
OK, I did some digging and I understand this better. Thin locks are dependent on having a fast platform specific js_CompareAndSwap implementation in jslock.c. We have those for Win32/x86, Solaris/Sparc, and AIX only. Any other platform that tries to use thin locks gets a slow default implementation of js_CompareAndSwap that uses an NSPR lock. I did a little testing using xpcshell and a .js file that I'll attach (same as used for bug 43902 except I increased the loop counter for better resolution). This just loops through a lot of property get/set on different objects. It does three timed tests. These timings are in debug builds. All timings in seconds... For WinNT: Fat Locks | Thin Locks 1.22 | 1.15 1.35 | 1.27 0.54 | 0.49 So, we gain at most maybe 10% by using the thin locks on Win32/x86. I ran the same test on Linux (using a different - slower - computer) Fat Locks | Thin Locks 3.68 | 5.25 4.00 | 5.87 1.57 | 1.90 Since we don't have a fast Linux/x86 js_CompareAndSwap the thin locks are slow. Luckily, we don't *use* the thin locks in our builds for Linux. Presumably, someone could implement a Linux version of the Win32/x86 js_CompareAndSwap and we could start using thin locks and expect a 10%ish gain similar to the one we get in Win32. I think my previous abstracted test of thin v. fat is skewed because it does not take into account the different usage patterns of the locks (and direct use of js_CompareAndSwap) in the engine. Nevertheless, It looks like we are getting some gain from the thin locks - at least on Win32/x86. I'll try the Win32 tests on RElease build to see if the 10% is real or not.
Thin locks are even better on Win32 in optimized builds! Same test on a Win32/x86 release build on yet another - faster - machine... Fat Locks | Thin Locks 0.81 | 0.57 0.84 | 0.63 0.31 | 0.23 Of course, the test case is really skewed to property access, but this shows thin locks apparently being a big win. And this make it look like adding the js_CompareAndSwap for Linix would give us a significant boost there.
Two more things... - FWIW... For Windows I've been adding -DJS_USE_ONLY_NSPR_LOCKS to LCFLAGS and rebuildsing js/src to test this. - No telling what sort of gain might be available on Mac.
Why can't Linux/x86 trivially partake of the Windows/x86 compare-and-swap? Maybe a little gcc asm magic is required instead of different MSVC asm magic. Who is gonna look into this? What is the situation on Mac/PPC? Do we turn on THIN_LOCKS? /be
Now that jband mentions it, it seems obviously wrong to emulate compare-and-swap with a global lock, as jslock.c does around line 194. Shouldn't we take that code out of jslock.c and replace it with #error "You shouldn't try to use thin locks on your platform, you need an atomic compare-and-swap instruction." or some such? /be
Fat locks are forced by: #if defined(JS_USE_ONLY_NSPR_LOCKS) || \ !(defined(_WIN32) || defined(SOLARIS) || defined(AIX)) Thin locks are only turned on on those three platforms. For Linux the Win32 code is a guide, but the choice of registers might have to change, and JS_INLINE added to the inlined asm might not be safe? I don't know. Can we advertise for a volunteer? I'm not setup well for this and am not inclined to commit to doing it. I like the #error idea. As far as I can tell this should not break anything. But if anyone were to call js_Lock directly in the current code they would get a thin lock that would be slow on some platforms. You could try that out on your Linux build to verify that If does not #error out.
Ok, I'm working on a linux version of this. Should have a patch out pretty soon. I noted while hacking that the #ifdefs in jslock.h and jslock.c differ in the win32 alpha case, the function is surrounded by defined(_WIN32) && defined(_M_IX86), but the fat-locks is only protected by !defined(_WIN32).
Ok. I hacked up thin locks for x86 Linux. Seems to work well on an SMP box. Here is the timing on my machine (dual PIII 450 Mhz) running the testcase in xpcshell: Fat Locks | Thin Locks 1.85 | 1.2 1.97 | 1.32 0.68 | 0.49
I attached a new patch that should work on *BSD and all other GCC based x86 based machines. It just removes the #if defined(linux) parts and uses defined(__GNUC__) && defined(__i386__) instead. I have only tested it on linux, but i have no reason to believe it should break on anything else.
Above patch from alex works in FreeBSD 5.0-CURRENT, here is the speed difference on this machine. Without patch:
Alex, thanks for doing this -- great job. If you could, would you please fix the WIN32/M_IX86 #if logic so the two cases are complementary? Thanks, and a=brendan@mozilla.org. /be
a=brendan again -- thanks. One thing one of us (you or I) should do soon: put that #error "don't try to use thin locks without compare-and-swap" line in place of the wrong "default" implementation of thin locks that uses a global lock to synchronize c-and-s (which will always lose to fat locks). /be
I don't know the best place to place that. Could you do it?
Sure -- have you checked in yet? /be
Um. I'm not very good at this. Aren't i supposed to have an r= too before i check in?
I got yer r= right here -- anyone else care to second? But I think you're good to go. /be
Ok. Its the tree now.
Suck suck suck. It broke the build: Build Error Summary /tmp/cc2Almln.s:928: Error: operands given don't match any known 386 instruction gmake[3]: *** [jslock.o] Error 1 I've reverted it. What version of gas does the tinderbox machines run? I run a stock RedHat 6.2 with binutils 2.9.5.0.22, and it worked for me. Maybe it died because cmpxchg is a >= 486 instruction only (but the windows build uses it).
Ok. New version of the patch is in. This one should work on older binutils too. I had to move the lock instruction to a separate line. That made it work on an old RH 5.0 machine anyway. I'll keep my eyes on tinderbox.
Why is this bug still open? Thin locks win. The only open issues I see are: (1) What about Mac PPC? Can someone (beard? scc?) help hack up the right atomic instruction asm magic? (2) I have a patch I'll soon attach to bug 54743 that optimizes JS scope locking (object locking is based on scope locking) for the single-threaded case, which should make JS_THREADSAFE almost painless. Can we either get PPC thin locks supported under the aegis of this bug, or else close this one and open a new bug asking for PPC thin lock support? Thanks, /be
Part of my fix to bug 54743 ripped out the bogus emulation of compare-and-swap using a global lock, which would definitely make "thin" locks slower than "fat" ones in a non-zero-contention benchmark. So there's not much more to do here in the XP code. We just need PPC support, and any other CPUs that we can manage. beard, mitesh, waterson: any help (or pointers to others who can help) in the quest for the Mac and Linux PPC compare-and-swap holy grail that I claim this bug now represents? /be
#ifdef XP_MAC #include < DriverSynchronization.h> #endif This contains the following declaration: EXTERN_API_C( Boolean ) CompareAndSwap(UInt32 oldVvalue, UInt32 newValue, UInt32 * OldValueAdr);
I poked around here for a bit, but didn't find much... http://lxr.linux.no/source/include/asm-ppc/atomic.h?a=ppc
Reassigning from Mitesh to Patrick. Note this from above: ------- Additional Comments From Brendan Eich 2000-11-13 16:33 ------- Why is this bug still open? Thin locks win. The only open issues I see are: etc. Have these issues been discussed or resolved to everyone's satisfaction? If so, we could close this bug -
Assignee: mitesh → beard
Status: ASSIGNED → NEW
I think beard is the right guy to hack PPC compare and swap support into jslock.c! /be
Blocks: 149801
Keywords: helpwanted
OS: All → Mac System 8.5
Hardware: All → Macintosh
Target Milestone: Future → ---
Then maybe someone should poke beard ? :D
Note: Patrick is currently on sabbatical; back in August, I believe -
Do we still need this for Mac OS X, now that GCC is the default compiler for that platform or can bis bug be WONTFIX'ed since it's a mac classic issue?
This bug is targeted at a Mac classic platform/OS, which is no longer supported by mozilla.org. Please re-target it to another platform/OS if this bug applies there as well or resolve this bug. I will resolve this bug as WONTFIX in four weeks if no action has been taken. To filter this and similar messages out, please filter for "mac_cla_reorg".
I don't know whether we need this bug still, but it depends on CPU architecture, not on operating system version, so I'm changing the OS field and leaving it open. Only some analysis of the performance of native mutexes underlying PR_Lock vs. a jslock.c compare-and-swap PPC instruction selection #elif can tell whether we need to fix or invalidate this bug. /be
OS: Mac System 8.5 → MacOS X
Reassign all of my bugs to default owner.
Assignee: beard → rogerl
Assignee: rogerl → general
QA Contact: pschwartau → general
This may just be a dup (forward-dup, after all these years) of bug 296878. I'll let someone better versed in bugzilla's fine points decide. /be
No longer blocks: 149801
Depends on: 296878
closing this after all these years. reopen if you disagree.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: