Closed Bug 54743 Opened 24 years ago Closed 24 years ago

Our JavaScript is 3x slower than IE's

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla0.8

People

(Reporter: kandrot, Assigned: brendan)

References

Details

Attachments

(50 files)

966 bytes, text/html
Details
427 bytes, text/html
Details
497 bytes, text/html
Details
49.13 KB, patch
Details | Diff | Splinter Review
33.08 KB, patch
Details | Diff | Splinter Review
9.69 KB, patch
Details | Diff | Splinter Review
51.27 KB, patch
Details | Diff | Splinter Review
35.49 KB, patch
Details | Diff | Splinter Review
3.11 KB, text/plain
Details
1.16 KB, text/plain
Details
51.64 KB, patch
Details | Diff | Splinter Review
35.85 KB, patch
Details | Diff | Splinter Review
51.63 KB, patch
Details | Diff | Splinter Review
35.84 KB, patch
Details | Diff | Splinter Review
9.82 KB, text/plain
Details
714 bytes, text/plain
Details
2.93 KB, text/plain
Details
53.34 KB, patch
Details | Diff | Splinter Review
37.55 KB, patch
Details | Diff | Splinter Review
3.35 KB, patch
Details | Diff | Splinter Review
53.96 KB, patch
Details | Diff | Splinter Review
38.43 KB, patch
Details | Diff | Splinter Review
54.19 KB, patch
Details | Diff | Splinter Review
54.44 KB, patch
Details | Diff | Splinter Review
54.62 KB, patch
Details | Diff | Splinter Review
60.55 KB, patch
Details | Diff | Splinter Review
71.90 KB, patch
Details | Diff | Splinter Review
76.22 KB, patch
Details | Diff | Splinter Review
63.77 KB, patch
Details | Diff | Splinter Review
77.35 KB, patch
Details | Diff | Splinter Review
82.58 KB, patch
Details | Diff | Splinter Review
70.03 KB, patch
Details | Diff | Splinter Review
5.06 KB, text/plain
Details
83.24 KB, patch
Details | Diff | Splinter Review
71.20 KB, patch
Details | Diff | Splinter Review
5.49 KB, patch
Details | Diff | Splinter Review
83.58 KB, patch
Details | Diff | Splinter Review
71.54 KB, patch
Details | Diff | Splinter Review
2.11 KB, text/plain
Details
83.64 KB, patch
Details | Diff | Splinter Review
365 bytes, patch
Details | Diff | Splinter Review
83.95 KB, patch
Details | Diff | Splinter Review
85.44 KB, patch
Details | Diff | Splinter Review
19.02 KB, text/plain
Details
86.32 KB, patch
Details | Diff | Splinter Review
86.30 KB, patch
Details | Diff | Splinter Review
74.33 KB, patch
Details | Diff | Splinter Review
88.69 KB, patch
Details | Diff | Splinter Review
76.71 KB, patch
Details | Diff | Splinter Review
88.66 KB, patch
Details | Diff | Splinter Review
Our implementation of JavaScript appears to take 3 times as long to do many tasks as in IE. This was also noticed in bug #40988, but that is not the focus of that bug. Here are the numbers I found: in a pure looping javascript (1 million loops), we spend (function only, not children) ftol 25% (2,000,000 calls) js_interpret 25% (1 call) js_UnlockScope 10% (7,000,000 calls) js_lockScope1 9% (7,000,000 calls) js_FindProperty 5% (2,000,000 calls) It looks like somethings are called multiple times for each pass through a loop, which we might be able to change. I did not list them all here, but can add them to the bug if they are needed. In the Luhn Credit card js (30,000 iterations), we spend (function only) malloc 55% (332,630) (mostly allocating 2 bytes for character access) js_Interpret 5% (1) ftol 4% (917,000) (mostly from js_Interpret and js_newNumberValue) js_GetSlotWhileLocked 4% (4,000,000) (spread out through a few functions) I hope these numbers help to narrow down the problem to someone who knows our JavaScript engine better than I. If IE can do these same operations in 1/3 the time, so can we. (I have more information, which I can add later, once I find out the format needed to be helpful.)
cc'ing Brendan, jband, Patrick -
Edward, see bug 43902 for discussion of the 3x cost of doing threadsafe locking in the JS engine. And bug 50859 for discussion of why this locking is necessary and numbers that show this locking overhead is of samll and acceptable cost in real world situations in the browser (i.e. no one does 30k iterations of credit card validations).
Also, It would be good if you could attach the test case(s).
bug 40988 is not about JavaScript speed, its about the speed of manipulating the DOM. note the non-linear aspect of that bug. I'd bet a lot this is limited by the speed of manipulating the content model and probably cascading notifications and reflows or some such thing outside of the core JS engine.
Lock and Unlock only account for 19% and only in the case of looping. Also, they called 7 times for each pass through a loop. Is this as designed? Most of the time is spent in malloc, or doing things more times then what I think the author intended. It is good to know that Locking and Unlocking were designed into the code, but were they designed to be called as many times as they are in that case? Did we design to malloc 2 byte strings everytime a JavaScript access a char? I have attached the Luhn script, the looping script, and an abs script, all of which are 1/3 the speed under our JavaScript as compared to MS. There are also benchmarks on different websites, which I will try to find again, which also show the same thing, across the board, as well as the referenced bug #40988. We are slower, why is what I want to know. This bug has some information, and if anyone has any more relevant info into "why?", I think it should be added.
Attached file A Simple math script
Under Linux, 4.75 vs latest Moz6 (in seconds), with the attached scripts: Luhn - 6.113 (M6) vs 3.528 (4.75) looping - 4.72 (M6) vs 6.777 (4.75) abs - 15.938 (M6) vs 15.339 So, Moz6 seems to be much faster in looping, about the same for simple math, and slower on strings. Obviously small test cases, so if there are any more to add, please add them. Thanks.
objects are locked for property access, not based on passes through loops - other threads in other loops could be accessing the objects. The credit card test does a lot of calls to String.charAt. This has to make a new string (one char+null terminator) each time. A better allocator or a recycler for small strings would improve this. The looping is heavily dominated by the locked access to the counter value. This can be quicker when done in a function with the counter being local rather than a property of the global object. I'm not seeing why there would be calls to ftol. The numbers are small enough to use the integer comparison optimization in RELATIONAL_OP. This is curious. The math test is just a combination of this looping and the speed of calling Math.abs. whatever.
Mine, cuz I've been doing all the major hacking on SpiderMonkey lately, and I started jsjit.c during my sabbatical. /be
Assignee: rogerl → brendan
incrementing a global variable is not optimized to use jsints -- sorry. It uses doubles to do the deed, incurring some overhead. I'm not sure what ftol is, it sounds like a helper function in the C runtime that's called by code generated by the compiler for things like JSDOUBLE_IS_INT. I'll confirm this bug and use it to help build up a suite of benchmarks for the JIT. Keep 'em coming. Kandrot, please do test a counterpart script that uses a local variable. None of this means that time spent in js_Interpret dominates any critical path, of course, but that's another issue. If js_Interpret doesn't dominate or even show up high in Self-time on mail startup and other important user-level task benchmarks, then I think we should spend more time analyzing other parts of the system than we do benchmarking JS. But I welcome input on this bug, because I am going to finish the JS JIT. /be
Status: UNCONFIRMED → NEW
Ever confirmed: true
There are optimizations beyond native machine code compilation that can be done in the JS engine. I'll list some here soon. /be
Status: NEW → ASSIGNED
Keywords: mozilla1.0
Target Milestone: --- → mozilla0.9
The patch just attached needs your code review and testing help. I'm preparing to run jband's test from bug 20357 on optimized versions of xpcshell built with and without the patch. /be
Ugh -- cyclic dependency problem with jslock.h and jsscope.h (latent bug, #ifdef DEBUG, escalated by the big patch and fixed by that last jslock.h-only patch). I will fix this soon, but first, here are my results (RH6.1 Dell PIII 500MHz 256MB) from jband's timing.js testcase run through optimized xpcshell, averaged over three trials after warming OS caches with a "0th" trial: < global average .71 < object average .75 < local average .32 --- > global average .36 > object average .39 > local average .23 Not quite as good as without JS_THREADSAFE defined, which I measured with an optimized js shell (not with xpcshell) at .21/.24/.16. There are still global locks to take and release that JS_THREADSAFE enables, and there are still those scope->ownercx == cx tests, and any deoptimizations caused by the conditional expressions that fall into js_LockObj, js_LockScope, or js_[GS]etSlotWhileLocked calls if scope->ownercx != cx. In Mozilla, DOM JS on the main thread does not run within requests. So the code in the patched JS_EndRequest that shares a single-threaded scope once the single thread (request on a context) is done using it, and a second thread wants to lock it, won't ever run for objects in DOM JS (scripts and event handlers, both XUL and HTML). XPConnected JS and component JS does run within requests. Is this a problem? Not for DOM JS objects, which have single-threaded native code underlying their classes. But vanilla JS objects could be shared among the main and other threads, and if the non-main thread accesses a single-threaded scope, that thread will wait for a notify from JS_EndRequest that won't happen until (unless) a non-DOM script or function call happens on the main thread, on the same JSContext. Should we bite the bullet and make all JS run within requests, in view of the rule that says JS_THREADSAFE embeddings must do so, a rule which this patch relies upon? /be
Brendan, I'm just starting to look at these. Doing the easy stuff first... jsshell compiled and ran the test cases with only the octal junk failing. For the JS_THREADSAFE stuff on Win32 you need to add JS_FRIEND_API to the js_InitContextForLocking declaration in jslock.h to match the definition in jslock.c... extern JS_FRIEND_API(void) js_InitContextForLocking(JSContext *); ...otherwise it will not compile on win32. And... Running mozilla I get a deadlock in jslock.c line 694 during object finalization after the main window has come up but not finished loading content. I only see one thread actually doing anything having to do with JS. Maybe some faulty signaling logic? I'll attach the stack. Anyway, I'll start actually looking at the changes.
Attached file deadlock stacktrace
For the record... On my Dell Inspiron 7500 PIII 400 256meg using xpcshell I get: < global 0.12 seconds. < object 0.12 seconds. < local 0.01 seconds. --- > global 0.05 seconds. > object 0.05 seconds. > local 0.01 seconds. Note that this is with the 'loc' function fixed as in the body of that bug to make it trully avoid repeated lookups on the global object. I'll attach the timing.js that I used.
jband: my mozilla builds never hit that "deadlock" (really, just a logic bug in js_LockScope1; but your stack also smoked out a bogus scope lock/unlock pair in js_DestroyScope, just to quell an assertion in the clear-scope implementation. Are you selecting from multiple profiles or using some other dialog on startup? Thanks, please try the latest diff -u patch and let me know how it works. Anyone else have time to help test? I feel my Linux mozilla usage does not cover enough cases after this round! /be
brendan: yes a dialog had come up - just dumb luck... I'd turned on NS6 association with launching .html files and on launch of mozilla it asked me if I wanted to reassociate with mozilla. I'll look at the new stuff. I get the general idea; I'll look closer at specifics. I like what I see so far. Have you rounded up another reviewer?
Cc'ing other potential reviewers (Shaver's gone AWOL ;-). I'm running my dogfood optimized Linux build with this patch, works great. Alecf is kindly helping to test it too. Maybe waterson will give an r=? /be
Ok, I'm still working up to reading the code closer. I applied the recent patch to a fresh js/src in my trunk build and set it loose on chofmann's browser buster. It eventually looked up in js_LockScope1. It was calling in to JS via xpconnect. The method was nsIXULKBrowserWindow::onStateChange. I'm iffy on the url. The particular docshell only showed "_blank". I think it was on (or going away from) www.apple.com. This is based on the window title in the app icon showing the one in the top100 sites with url.56 in the name. I failed to check all other thread stacks before killing it. Just starting up and going to that url did not trigger the lockup for me. I'll attach the stack
Attached file lockup stack
I'm thinking I'd like to see some tests where we stress actually transitioning into "accessed from more than one thread" mode. I'm thinking that it ought to be possible to implement nsIThread in JS (with ot without some native helper code) and write some JS tests that are multithreaded using xpcshell. I'll look into that.
Cc'ing alecf. Alec, can you tell from jband's "lockup stack" what object might be in the midst of a lock attempt? It looks like a method implementing the onStateChange, maybe http://lxr.mozilla.org/mozilla/source/xpfe/browser/resources/content/navigator.js#282 in which case the object in question could be the channel, which is a threadsafe object accessed on two threads (right?). jband, if you can make this happen again, please leave it in the debugger and I'll come by. /be
attached a stack showing lockup in multi-threaded case in xpchsell. This one happens to have two threads stuck. But, I;ve seen this stick here with the 'other' thread having finished and only the main thread stuck. (there is also a memory flusher thread doing its thing, but out of the picture I think). Without brendan's changes this works fine.
brendan: re our conversation... When JS_DestroyContext is called in the thread dtor cx->requestDepth == 0 as it should be. The calls to the JS object via the wrapper have balanced Begin/EndRequest.
Divide and conquer.... Thanks to jband for pointing out the ugly, easy case of a context being destroyed without anyone trying to share scopes owned by that context. Rather than attempt to find such scopes (via the GC heap, presumably) and clear their ownercx pointers, I chose to make js_LockScope1 pay the price: it searches linearly in rt->contextList for the alleged scope->ownercx, using a new js_LiveContext predicate. This minimizes code size at the cost of O(N**2) growth rate for N contexts in the runtime, growing with scope share events. I think scope sharing will be rare enough, and N small enough, that we can live with the cost. Note that memory recycling could result in a new cx whose address is the same as an old, destroyed one recorded in various scopes' ownercx pointers -- but all is well, because either the new live context is not in a request, so the call to js_LockScope1 on such a "dangling ownercx" scope will assume ownership for the calling cx; or js_LockScope1 will wait on rt->scopeSharingDone after inserting the scope on rt->scopeSharingTodo, and JS_EndRequest interlock properly via rt->gcLock and notify the waiter. Thanks again for all the testing help -- more needed, but I think we're almost there. We might need to put all "DOM JS" inside requests, as I mentioned in an earlier comment, but I haven't come up with a thought experiment that requires that yet. /be
So, I'm still having problems in my xpcshell testcase. I added the patch I'll attach to make xpcshell use Begin/EndRequest for the main thread. With that I hang at that same 'ol place. The ownercx is that of the main thread and since ownercx->requestDepth == 1 the quick out is not taken. The main thread is waiting in my call to t1.join(). The two other threads get stuck in PR_WaitCondVar(rt->scopeSharingDone, PR_INTERVAL_NO_TIMEOUT); I don't see that the main thread should still be holding this. Clearly the first other thread hit this while I was trying to join it and got stuck and the second other thread got some time and hit the same blockage. I'm not seeing anything odd in the main thread stack that indicates it would have an object's scope locked. Ideas?
jband: you've created a simple AB-BA deadlock by blocking the main thread (in join) without suspending the request it's in. The rule for JS_BeginRequest and JS_EndRequest is that they should bracket maximal non-blocking sequences of JS API calls and native calling code. It's an awkward model for request-unaware blocking-native-method implementations, obviously. But it's the best we have right now for keeping the GC at bay. Note that you could easily deadlock even without the scope sharing patch, if memory was low and the thread you were trying to join with ran a last-ditch GC. /be
Ah right. Thanks Brendan. See bug 60303 for a proposed fix to xpconnect for just such a situation.
Note dependency. Can I get r= and sr= on the last patch? Or is more testing needed (in which case can I persuade others to help test the patch)? /be
Depends on: 60303
Now that I'm done tripping over the changes and my own confusion I'll actually review the code. I do think it would be good to push forward with some multithreaded testing. But, that need not necessarily preceed the checkin. Another code reviewer or two would be real nice.
I promise not to monkey with this, and I'll check in the previous patch if the latest is scary, but it simply unions scope->count and scope->link, based on the observation that the two uses of those words (or the unioned word) are disjoint in time. A single-threaded scope does not use count (and can assert that count is zero) in the non-union case, and a multi-threaded scope does not use link (and can assert that link == 0). /be
Small change, last one for a while in this direction (I hope!): the previous patch, which unified storage for scope->count and scope->link, assumed that NULL's representation as a jsword was 0. The latest patch does not abuse the union (which exists merely to share storage, while providing strong types for the two arms of the union) to "pun" from 0 integral to null pointer type value. No diff -wu this time, enough patches for you! /be
[mid-air collision - posting anyway...] Brendan, I'm just starting to look at the code (again) and also running heavier multi-threaded tests. After fixing some locking and JS Request bugs in xpconnect I have it so I can run a test that loops through creating/running/joining tens and hundreds of threads that each run some JS and touch a couple of objects. Without your patch I'm to the point where this runs well - except for an occasional crash that looks like it can probably be attributed to memory corruption. I'm trying to track that in Purify. With your patch it runs a lot, but has just hit the assert: if (!js_LiveContext(rt, ownercx) || !ownercx->requestDepth || ownercx->thread == cx->thread) { JS_ASSERT(scope->u.count == 0); ownercx->requestDepth is 0 scope->u.count is 1 It is unclear to me if that is really a count of 1 or the magic number for the link. Are you certain that the union is safe (at least with respect to this assertion)? Thoughts?
jband: try the union-less patch, *or* make NO_SCOPE_SHARING_TODO be 0xcafebabe cast to JSScope*, or some such. Also, if you have this in the debugger still, or can, what is rt->scopeSharingTodo? How many scopes are on it? I worried about this case last night but convinced myself it was a can't happen, because any multithreaded access that caused the scope to be put on rt->scopeSharingTodo must have been succeeded by a JS_EndRequest before this three-part || condition could be true. Now I'm rethinking my proof. /be
Forgot to add: the union is safe, because scope->u.count is never used until scope->ownercx is null (i.e., till the scope is shared and the JS_LOCK_SCOPE and JS_UNLOCK_SCOPE macros take the slow patch into js_LockScope and js_UnlockScope, which worry about reentrancy counting). The hazard here, which I still think is a "can't-happen", is not that some slow path will use scope->u.link as if it were a count; it's that (as you supposed in allowing that the 1 value in scope->u.count was NO_SCOPE_SHARING_TODO) the scope will be on rt->scopeSharingTodo at the point the assertion botches. /be
rt->scopeSharingTodo points to the scope in question - so only one in the list. I'll try a more intersting value for NO_SCOPE_SHARING_TODO and see if I hit the assert again.
Argh, I see a race: JS_EndRequest decrements cx->requestDepth (which is thread local because cx may be used by one thread at a time, only) before testing for zero and then calling JS_LOCK_GC(rt). So if an ownercx is finishing a request, but gets preempted after decrementing ownercx->requestDepth to zero and before taking the scope off rt->scopeSharingTodo, and another thread tries to lock that scope, boom. But I wonder whether you hit this tiny window, or whether there's yet another bug lurking. Ah, mid-air collision joy. Ok, so the scope is on rt->scopeSharingTodo yet its ownercx is not in a request. Since a scope only ever gets put on that list if it is accessed on another thread than ownercx's thread, *and* ownercx at the time of access has a non-zero requestDepth, I don't see how scope got on the list apart from the race I identified above. Comments? I'll come up with a patch that closes that race right away. /be /be
Hit it again. Same deal. u.count is 0xcafebabe. If I could map from ownercx->thread to msdev's thread ids I could maybe see what the other thread was up to (I have 40 threads in this case).
cx->thread is the PRThread*, per jslock.h's CurrentThreadId() macro. You should be able to get to the windows thread id from there. I don't see another race, so I hope my latest patch does the trick. /be
There was a thread in JS_EndRequest waiting on the GC lock and it has the right cx - so this looks like the source of the problem. I'll try the new patch.
OK. Now I see two problems... 1) hit: JS_ASSERT(scope->u.count == 0); in jslock.c line 712. This is the else clause for Thin_RemoveWait. 2) Full deadlock. Every thread waiting in PR_WaitCondVar (either in js_LockScope1 or JS_BeginRequest etc).
My stupid brain knew this in an early draft, but I deleted it as I juggled the number of condition variables from one per runtime to per context and back: A loop was required in js_LockScope1 given the multiplicity of ownercx and the single rt->scopeSharingDone condition. All waiters on the latter get notified by any ownercx's JS_EndRequest, but only a particular request-ending ownercx's contested scope(s) should be lock-able. The rest must recheck their invariants and wait again if necessary. I also tightened up the 0=>1 cx->requestDepth transition to be protected by rt->gcLock, matching the previous patch's 1=>0 transition, so that js_LockScope1 can be sure that !ownercx->requestDepth means there is no request on that cx. /be
Seriously looking for r= and sr= now. I'll provide a bunch of notes compiled from the diff -wu attachment soon. /be
Cc'ing jdunn. Jim, I'm hoping for help testing the next-to-last patch (diff -u format, "looks like we have a winner") on the various platforms you care about. Just cd js/src and patch < winner. Thanks, let me know how I can help. /be
- Radical object (scope) locking optimization: don't lock if a scope is accessed on the context that exclusively owns it (initially, the context on which the scope was created). Once a scope becomes shared among more than one owner-context, give it the usual thin or fat lock, per existing jslock.c code. I did this at the memory cost of another word per JSScope, ownercx, which raised scope size from 12 to 13 words if !DEBUG. I also added a linked list head pointer, rt->scopeSharingTodo, and a scopeSharingDone condition variable to JSRuntime, and a scopeToShare pointer to JSContext that's necessary for deadlock avoidance. The rt->scopeSharingTodo list links JSScopes through the scope->u.link union arm, which overlays the pre-existing scope->count (now u.count) member. This list holds scopes still exclusively owned by a context, but wanted by js_LockScope calls active on other threads. Those calls wait on the rt->scopeSharingDone condition, which is notified every time an owner-context ends the request running on it, in which it may be use scope freely (because it still has exclusive ownership). To avoid AB-BA deadlocks, if a js_LockScope attempt on one context finds that the owner-context of the scope is already waiting on a scope owned by the current context (or indirectly depending on such a scope lock), the attempt converts the scope from lock-free exclusive ownership to shared ownership (thin or fat lock). - Fix js_SetupLocks and the js_LockGlobal/js_UnlockGlobal code to avoid divmod instruction costs, strength-reducing to bit-mask instructions. - The radical lock-free scope change required care in handling the 0=>1 and 1=>0 transitions of cx->requestDepth, which was till now thread-local because part of the JSContext not manipulated by other threads. It's still updated only by cx's thread, but it is read by other threads in the course of attempting to claim exclusive ownership of a scope for more lock-free JS object operations. - Fixed various nits in jslock.[ch], including using Init/Finish rather than New/Destroy for the methods that take a JSThinLock and initialize and finish/free its members. Another example: JS_ATOMIC_ADDREF is now JS_ATOMIC_INCREMENT and JS_ATOMIC_DECREMENT, so the two cases can be mapped to PR_AtomicIncrement and PR_AtomicDecrement if appropriate. - Cleaned up gratuitous casts in jscntxt.c by using &cx->links, etc. - The lock used for mutual exclusion around both request begin and end vs. GC synchronization is rt->gcLock, and this lock now also protects all scope->ownercx pointer changes from non-null (exclusive) to null, the rt->scopeSharingTodo/scope->u.link list operations, and of course the rt->scopeSharingDone condition. But this means that js_GC cannot hold rt->gcLock across the bulk of its body, in particular the mark phase, during which JS_GetPrivate calls, e.g., may need to "promote" scope locks from lock-free to thin or fat, because doing so would double-trip. There never was any good reason to hold rt->gcLock so long, of course -- locks are for mutual exclusion, not for waiting or notifying a thread -- those operations require a condition, rt->gcDone, which we already use along with rt->gcLevel to keep racing GC attempts at bay. So now that rt->gcLock does not protect the mark phase, the enumeration of rt->gcRootsHash can race badly with JS_RemoveRootRT, an API that may legitimately be called outside of a request, without even a context. It turns out that people may be cheating on the request model even with JS_AddRoot, JS_AddNamedRoot, and JS_RemoveRoot calls, so we must make all of those interlock with the GC using gcLevel and gcDone, unless they are called on the gcThread. - I added comments to jslock.h making it plain that callers of JS_LOCK_OBJ and JS_UNLOCK_OBJ must either be implementations of js_ObjectOps hooks, or code reachable only from those hooks; or else must be predicated on OBJ_IS_NATIVE tests. It turns out jsinterp.c's CACHED_GET and CACHED_SET macros neglected to do such tests, limiting the ability of JS embeddings to implement JSObjectOps with their own non-JSScope JSObjectMap subclass. Fixed, small performance hit that the lock-free optimization should more than make up for. - jslock.c now gives a #error if you try to compile it on a platform that lacks a compare-and-swap instruction. The #error says to use NSPR locks. Before this change, some platforms would emulate compare-and-swap using a global PRLock, which is always worse in runtime than using per-scope PRLocks.
Ok, now that everyone's fat and happy on turkey, how about an r= and sr=? /be
Tried these changes on both hpux & aix, debug and optimized, didn't see any problems. I don't think I am qualified to give you a r= brendan... My only question is the change in jscntxt.c @@ -249,7 +261,7 @@ JS_LOCK_RUNTIME(rt); if (!cx) cx = (JSContext *)rt->contextList.next; - if ((void *)cx == &rt->contextList) + if (&cx->links == &rt->contextList) cx = NULL; Is there ANY chance cx->links could be NULL? thus having &cx->links go batty? (I know some os's don't like you taking a &(NULL)
Jim, thanks for the testing, and hpux reminds me: do you have an H-P Precision Architecture manual, or otherwise know of a compare-and-swap atomic instruction we could use there? The AIX code in jslock.c is nice because AIX provides a <sys/atomic_ops.h> header and a _check_lock function/inline/compiler-primitive or whatever it is. The links member of JSContext is the first member, and must be for the JSCList usage in jscntxt.c to work, so &cx->links is the same address as (void *)cx, but of the right type to compare to &rt->contextList (namely, JSCList *). Ye olde "poor man's subclassing from a linked-list element struct". /be
r=jband (but this *really* strains my brain) One thing... I compiled with DEBUG_SCOPE_COUNT defined to try out the logging. setlinebuf is not to be found in NT. I suggest you replace it with: setvbuf(logfp, NULL, _IONBF, 0); /* same as... setlinebuf(logfp) */ The man claims that these are equivalent. http://www.datafocus.com/docs/man3/setlinebuf.3.asp
jband: I changed thank setlinebuf to setvbuf, thanks. Thanks too for your great testing help, which really strained early, buggy versions of this patch's tiny brain. If I get an r=, will your r= be an sr=? I'm still looking for an r= -- bueller? /be
jband's cautionary "strains my brain" caused me to naval-gaze over this patch some more, and I realized something obvious in hindsight: the PR_WaitCondVar in jslock.c on rt->scopeSharingDone must end any requests nested on the calling cx, per the usual "requests can't call blocking native methods" code. I therefore inlined (because rt->gcLock is held already) and specialized code from JS_EndRequest before the wait and JS_BeginRequest after. This caused me to re-review the GC, and I realized that ever since bug 49816 was fixed, there was no need for a finalize phase distinct from the sweep phase. That used to be necessary when finalizers could allocate new GC-things; the finalize phase would run outside of rt->gcLock. But 49816 prevents allocation from finalizers, so there is no need for gc_finalize_phase() or rt->gcFinalVec. I'm sorry to spark re-review, but it's a smallish increment (if you care to diff the diffs). Attached next is an updated version of my checkin comments. /be
I'm not going for the "most patches in a bug" record, really I'm not! Unless the changes in the last few patches, which I think are necessary (suspending and then resuming the request around the PR_WaitCondVar in ClaimScope) and/or desirable (reuniting the sweep and finalize phases of js_GC), have regressed something subtly, I hope to get r= and sr= so this big change can land. There may be problems later, to be reported by separate bugs. If this most recent patch looks good and tests well, I'd like to get it in now. I'm sorry for morphing kandrot's "3x slower" report -- I'll rerun all the tests reported here and file new bugs, with an appropriate meta-bug depending on them. I hope to collect more and better benchmarks still. /be
Upping to P1 to match my priorities. /be
Priority: P3 → P1
Target Milestone: mozilla0.9 → mozilla0.8
The last substantive change, which I thumbnailed as "suspending and then resuming the request around the PR_WaitCondVar in ClaimScope", left ClaimScope's earlier code, which attempts to claim exclusive ownership of the scope, failing to test scope->u.link before claiming. The magic linked list terminator we use for rt->scopeSharingTodo, threaded through scope->u.link, ensures that we can test whether that pointer is non-null to decide whether the scope is already in the list, and bound to be promoted to shared ownership. We need to test scope->u.link and ensure it's null before we can try to claim exclusive ownership, or we'll grab a scope that's headed for shared ownership and race badly when the request whose context still owns it wakes up from the PR_WaitCondVar in ClaimScope and resumes its request, thinking it still has exclusive access. /be
FWIW... the test I just attached is the one I've been running with a lot. I just added the timing part... 2.83 seconds (on average) without brendan's changes 2.45 seconds (on average) with brendan's changes
Hey Brendan, You'll need another patch... It won't commpile ifndef JS_THREADSAFE. You use rt->gcThread 'unprotected' in some places in jsgc.c. Might be some others issues too.
On my older timing test (timing.js) I now see results in xpcshell with JS_THREADSAFE #defined and the new code in place giving results identical to what I see in jsshell non-threadsafe... > global 0.03 seconds. > object 0.04 seconds. > local 0.01 seconds. This is all more-or-less +/- 0.01. This is like a 3x speedup on this property access intensive code when only one thread is doing anything yet threadsafety is in place. Wow!
Wow, jband's stress test (and his machines, my Linux box won't do it) find all sorts of rare bugs. js_ContextIterator didn't deal at all well with the next context in the list being destroyed. This is easily fixed by storing the current context in *iterp (it was never ok to destroy the current context in the list). /be
Some r= thoughts from my very cursory look. I understand the basics, but none of the threadsafety risks... What's js_LockScope1 for? It's in a common call chain; is the module-static call optimized away? Can WillDeadLock ever fall into a loop not containing the checked-for context? WillDeadLock would have detected that loop when it was created, yes? As you mention, walking the context list implies O(n**2) runtime behavior, but n is small and walking happens rarely. If it's worth fixing, seems like a context hash would fix this. (From your testing, doesn't look like it's important.) Could the context still be dead after jslock.c:350? It seems like the code assumes it's live. jslock.c:ClaimScope - /* * Inline JS_SuspendRequest before we wait on rt->scopeSharingDone, * saving and clearing cx->requestDepth so we don't deadlock if the * GC needs to run on ownercx. */ Why? We're in JS_LOCK_GC... Or is is this accomplishing something different? I don't think I understand the GC interleaving. JS_AWAIT_GC is just below, I must need tutoring. Is the JS_RemoveRootRT-motivated locking well tested? Have the jband-tests been run on a non-thinlock platform or when avoiding thinlocks by changing #defines? (can it matter?) Looks like there's a few longish codepaths #ifdef'd on this.
>What's js_LockScope1 for? It's in a common call chain; is the module-static >call optimized away? It's from bjorn's code, and it is a common subroutine of js_PromoteScopeLock, js_LockScope, and js_LockObj. It avoids reloading cx->thread each time through the loop in js_LockObj, but that's a hard case -- maybe we could just unify js_LockScope1 and js_LockScope. >Can WillDeadLock ever fall into a loop not containing the checked-for context? No, because it must reach a null scopeToShare or there is already a cycle, which means there is already a deadlock, which contradicts the hypothesis. >WillDeadLock would have detected that loop when it was created, yes? Right. >Could the context still be dead after jslock.c:350? It seems like the code >assumes it's live. It's safe to assume ownercx is alive there because either scope->u.link is non-null, which implies ownercx has not ended its request and no other context has shared scope to avoid deadlock; or we called js_LiveContext and it returned true (and ownercx was in a request and not on the same thread as cx -- if either of those conditions were false, we'd claim scope and return early at 349). >jslock.c:ClaimScope - > /* > * Inline JS_SuspendRequest before we wait on rt->scopeSharingDone, > * saving and clearing cx->requestDepth so we don't deadlock if the > * GC needs to run on ownercx. > */ >Why? We're in JS_LOCK_GC... Recall that the condition variable releases its lock while it waits. >Or is is this accomplishing something different? I don't think I understand the GC >interleaving. JS_AWAIT_GC is just below, I must need tutoring. The GC would deadlock waiting for our request to end while we waited for it to run, if it had to run in order for the scope to become shared (say, as a last-ditch effort when low on memory). Therefore we must temporarily reduce rt->requestCount by one (that counter counts contexts in requests, not total requests; more than one may nest on a cx). We also saveDepth and clear cx->requestDepth just in case another request wants to claim a scope that *we* own, above at 341-349. >Is the JS_RemoveRootRT-motivated locking well tested? Yes, jband's testsuite smoked it out and it's provably correct. >Have the jband-tests been run on a non-thinlock platform or when avoiding >thinlocks by changing #defines? (can it matter?) Looks like there's a few >longish codepaths #ifdef'd on this. Neither of us have worried about that, because the ownercx lock-free path is independent of thin vs. fat. Or so it seems to me! Thanks -- ask me more questions if you have any.
Just one more... Have you run in a small-heap runtime?
Running the stress test with the latest patch on the Mac apparently deadlocks before returning from the first call to runThreads(). The value of foo always gets printed as 0. The Mac's threads are software only (setjmp/longjmp variant), so there's no real preemption going on. I'll need to rebuild NSPR with some instrumentation turned on to see where the deadlocks are.
Patrick, Do you have a fully updated build? Changes to xpconnect and xpcom/threads went in a couple of days ago (though it would probably not work at all without this). Also, brendan has a change to jsgc.c that has not shown up here. And, see my patch in bug 61788. But really, I would not expect any of these to cause deadlock.
The testsuite runs into its runtime's gc-limit and survives. I can run mozilla with a tuned down gcSize of 256K, slowly -- until I try to start mail too, and then the GC heap is just too small -- endless last-ditch calls to js_GC from js_AllocGCThing, without finding enough garbage to get out of the ditch. /be
Evidently none of the calls to PR_WaitCondVar() in ClaimScope() (line 405 of jslock.c) are ever returning, leading to eventual deadlock when all the joins are attempted. I'm going to see if this has any connection to the GC leak detector by trying in an optimized build. jband: my build is from a pull made sometime after midnight Thursday morning.
Patrick, if it's a deadlock, can you cite the stacks of the two or more threads who are depending on one another's resources? /be
Patrick: sorry, mid-air collision and I didn't read your last comment fully. If Mac threads get stuck waiting in PR_WaitCondVar in ClaimScope, then can you look at the scope they're waiting for? Has it a null ownercx, or is its ownercx still pointing at a context? If the latter, is that ownercx in a request still (requestDepth != 0)? What is the context-thread's stack? /be
scope->ownercx isn't NULL, and if I set a break point on ShareScope, it never gets hit.
Also put a breakpoint in the inlined, optimized version of ShareScope in JS_EndRequest, and see if that doesn't get hit. If it doesn't, or in any case, how about tracking scope->ownercx->thread and backtracing that thread? Thanks /be (on IRC #mozilla too)
Nope, the inlined version never gets hit either (I set it on line 807 of jsapi.c). I can't get you any more information about which threads own what, just yet, because NSPR on the Mac isn't very cooperative about giving that information. I custom hacked a version a while back that lets another process do a thread dump, but I haven't got that build here. I'll have to integrate it and get back to you.
beard: Sorry, I don't know much about how nspr does this and these may be stupid questions... . Are you certain the other pseudo-threads run at all? I note there is a bug on the fact that xpcshell does not set up an event loop (bug 56389). Is this not required for nspr to run the other pseudo-threads? Or is the call to PR_Join or the other PR_CreateThread calls supposed to yield on their own? Does this work without brendan's changes? Also (sorry if this is way obvious), you can certainly set the number of threads low in threads.js to make it easier to see what's going on once you can see thread stacks.
Threads are running, I see the MemoryFlusher thread and the idle thread running just fine. I have reduced the test case to only run with 10 threads, and that's where I'm seeing the deadlock.
Patrick: with 10 or fewer threads, can't you just switch to each of them in the debugger and backtrace? Or is the problem getting CW to switch to another user-level thread? I can make gdb do that on MIPS, by setting $pc and $sp, for what it's worth. /be
So every thread of interest is waiting under ClaimScope except for thread 00000[16afa938] pri= 1 flags=0x68 condvar=15e2f078 sleep=6217456ms, which is calling PR_JoinThread via nsIThread, via XPConnect? If so, can you verify that the scope(s) being waited on by all the ClaimScope threads have as their ownercx the cx implicated in the PR_JoinThread stack trace? If that's true, then either you don't have jband's patch to suspend and resume a request around the XPTC_InvokeByIndex, or that patch isn't enough. Can you check that you have rev 1.51 of xpcwrappednativeclass.cpp in your tree, and that the object file is up to date? The bug jband fixed was bug 60303, if you need to apply a patch. /be
I definitely have rev 1.51 of mozilla/js/src/xpconnect/src/ xpcwrappednativeclass.cpp, and am running with it. When nsXPCWrappedNativeClass::CallWrappedMethod calls JS_SuspendRequest(), cx-> requestDepth == 2, so JS_NOTIFY_ALL_CONDVAR(rt->scopeSharingDone) never gets run. I'm loading the script from the already running XPCShell via load('stresstest.js'), if that makes a difference.
And, during the call to PR_JoinThread(), the owning context does turn out to be the main thread's context, and since JS_SuspendRequest didn't perform the appropriate interlock, it's not surprising that the deadlock happens.
Props to beard for testing interactively. I fixed JS_SuspendRequest to suspend any depth request so that the GC can run and scopes can be claimed or shared. This is an old hole in the under-tested suspend/resume API. I also unified js_LockScope1 and js_LockScope based on mccabe's feedback; the only loss is a reload of cx->thread when js_LockObj retries after losing a race with a mutator of an obj that was sharing its prototype's scope. That's rare enough to be a don't-care. Bart: Are we there yet? Homer: Just a little further... Bart: Are we there yet? Homer: Just a little further... Bart: Are we there yet? Homer: Just a little further... /be
[mid-air w/ brendan (three times!)] Patrick. Yes it does make a difference that you are using 'load'. I can reproduce this on NT that way too. This is just my stupid mistake in wrapping the code inside Load in a request. Since it can only be called from the engine, this is just not necessary and should be removed: Index: xpcshell.cpp =================================================================== RCS file: /cvsroot/mozilla/js/src/xpconnect/shell/xpcshell.cpp,v retrieving revision 1.46 diff -u -r1.46 xpcshell.cpp --- xpcshell.cpp 2000/11/30 06:58:33 1.46 +++ xpcshell.cpp 2000/12/04 00:27:56 @@ -236,7 +236,6 @@ return JS_FALSE; argv[i] = STRING_TO_JSVAL(str); filename = JS_GetStringBytes(str); - DoBeginRequest(cx); script = JS_CompileFile(cx, obj, filename); if (!script) ok = JS_FALSE; @@ -244,7 +243,6 @@ ok = JS_ExecuteScript(cx, obj, script, &result); JS_DestroyScript(cx, script); } - DoEndRequest(cx); if (!ok) return JS_FALSE; } This does, however, make me concerned about my uses of requests in xpconnect/src. Should I make special provisions to ensure that they never nest? Should the JS Engine supply a public function to get the request depth?
OK, so with brendan's change I don't have to worry about nesting too deep. But the xpcshell change ought to be made anyway I think.
jband: read brendan's commentary above re: JS_SuspendRequest nesting vs. scope sharing. So, you shouldn't have to fix xpcshell.cpp. r=beard
Brendan, what if suspend/resume nests? Why not make the caller to this pair hold the depth on their stack?
I agree with jband that xpcshell need not wrap a request around compile/execute from Load, given the fact that Load can be ultimately (directly or indirectly via scripted functions) called only from a top-level script that's already inside a request that the shell wraps around top-level script execution. xpcshell can be fixed with self-review as far as I am concerned, as it's not part of the Mozilla-the-giant-lizard build and runtime. Jband, how about an sr=? /be
In case it was not clear. I mean: Begin Begin Suspend Begin Suspend Resume End Resume End End You should either hold a depth stack internally or make it an out/in param pair to Suspend/Resume.
jband: I was hoping to maintain API source and binary compatibility, even though suspend and resume are obscure. That implies a stack; argh. Patch coming up, hope this one is my final answer -- but thanks for the lifeline! /be
The JS_{Suspend,Resume}Request API incompatibility is regretable, but unlikely to cause much stress from all I know about the multi-threaded JS embeddings out there. We'll find out for sure, because I see no other way (an internal stack of saved requestDepths would need to allocate stack space somehow, which implies fallibility, which means an API break). Ok, I'm about to declare victory and report any future problems in future bugs. Who is with me? I think I have two r='s on recent versions of the patch that have been successively refined. I don't mind re-review, but don't require it as a rule for any Mozilla review-seeker. I do, however, still crave a jband sr=. /be
The comment in xpconnect and the use change of suspend/resume claims it is ok to do so even when not in a request. Yet, in suspend you assert that we are in a request.
sr=jband
Checked in (toasted speedracer, the ultrasparc/gcc tinderbox machine -- turns out we never used compare-and-swap on that target architecture/compiler combo, falling back on the global-lock-based c&s emulation!). Many thanks to all who helped, especially to jband for his great stress-test setup and particularly helpful (in terms of reproducing bugs in draft patches) MP and laptop machines. I've filed bug 61897 on the suboptimal allocation behavior (;-) for one-char strings indexed from longer strings, and bug 61898 for the lack of int-domain optimized arithmetic where possible. This bug is dead, sorry again for morphing it but let's let it RIP. Any new probs, open a new bug. /be
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: