Closed Bug 32525 Opened 26 years ago Closed 26 years ago

JS multithreaded engine last-destroy-context bug

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: brendan, Assigned: brendan)

Details

(Keywords: js1.5)

Attachments

(2 files)

Thread 1 and 2 are racing through their final requests, calling JS_DestroyContext. Say thread 1 is first in the gate (so it is not last according to the code in js_DestroyContext, jscntxt.c around line 143). Thread 1 runs the GC, but it waits (jsgc.c line 765, JS_AWAIT_REQUEST_DONE(rt) in a while loop) for all other requests to end, suspend, or yield. Now thread 2 is scheduled and it runs through js_DestroyContext, where it finds that it is in fact last context through the gate. But thread 2 doesn't know that thread 1 is "stuck in the gate" waiting in the GC for thread 2 to end its current request. Thread 2 therefore blithely unpins atoms, tries to GC (which notices another thread is in the GC and just bumps rt->gcLevel, line 745 of jsgc.c, and returns early), and then frees atom state and ends its own request just before it frees its context. This leaves thread 1 doomed: it awakens because thread 2 ended its request, and falls over null or cleared atom state variables. Workarounds including using JS_DestroyContextNoGC, which was apparently added to hack around the underlying bug (and another deadlock bug, fixed by jband just recently). Patch coming up. /be
I'm curious to see what your patch will look like. Sorry I didn't jump in on the email thread about this. Is it really important to do the stuff in "if(last){...}"? For posterity sake I wanted to point out that this is not just an engine shutdown bug. It can happan any time the engine is transitioning through this state while destroying JSContexts. It may well be going through this state frequently in some embeddings before being 'awaken' as new 'requests' arrive.
Jband: good point, in fact chouck@geocast.com ran into this all the time before he added a "root context" to keep the runtime's context population above one until he was really ready to shut down. Patch coming up, it's hairy on account of the damn gcmode (JS_DestroyContext*GC APIs) hacking. /be
Summary: JS multithreaded engine shutdown bug → JS multithreaded engine last-destroy-context bug
Attached patch proposed fixSplinter Review
More commentary: Jband's fix in http://bugzilla.mozilla.org/show_bug.cgi?id=28570 is necessary still, due to the way the GC (maybe) gets (forced to) run within the request that called js_DestroyContext. It would be simpler to end that request and then do a GC, but I'm not sure that's safe and I wanted to get this patch attached. Could someone try moving the while (cx->requestDepth) JS_EndRequest(cx) loop up above all js_*GC calls in js_DestroyContext? The last-destroy-context case may indeed arise in embeddings without a root or "keepalive" context, as jband pointed out. I wanted to add that rt->stateChange and rt->state help to keep the first-new-context-after-last-destroy-context from racing with js_FreeAtomState done under last-destroy-context. We have enough synchronization, properly located in jscntxt.c, to handle that case. What we have lacked till this patch is anything to keep the next-to-last context from blocking in the GC (getting "stuck in the gate") while the last context frees atom state ("slams the gate on the next-to-last context's foot!"). I still want blood from whoever hacked JS_DestroyContextNoGC into this picture, rather than fixing the underlying deadlocks and synchronization bugs. /be
Keywords: js1.5
Anyone? Bueller? /be
Patrick, Would you add this to your threading work? Thanks, = C =
Assignee: clayton → beard
All the mess of the JS_DestroyContextNoGC hackaround (which we must maintain for API compatibility, grr) should be cleaned up now. The last js_DestroyContext always calls js_ForceGC, which (whether the underlying js_GC call gets the GC lock and finds rt->gcLevel 0, meaning it runs the GC; or it finds another thread already GC'ing, in which case that other GC must be running finalizers, and this one will wait for it to finish) does not return until all other threads are done with rt->atomState. Jband added cx->destroying, and I buddied, but we were seeing a deadlock that wasn't there: any request calling the GC while another thread is in the GC waiting for the first request to end will notify the waiter, after temporarily decrementing rt->requestCount. And (what is easy to forget) wait on a condvar unlocks the condition's lock. Chouck reported a deadlock last year between js_GC and JS_ResumeRequest, but it is unrelated and not yet fixed (right chouck?). It needs a separate bug, which I'll file in a moment. Comments welcome. /be
r=jband I read this closely and understand it better than ever (which isn't saying a lot). I can't see anything wrong with your fix. I looked for holes and did not find them. I really wish we had some small multi-threaded test apps to stress this.
I'm back! /be
Assignee: beard → brendan
Status: NEW → ASSIGNED
Target Milestone: --- → M16
Fix checked in. /be
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: