Closed
Bug 32525
Opened 26 years ago
Closed 26 years ago
JS multithreaded engine last-destroy-context bug
Categories
(Core :: JavaScript Engine, defect, P3)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
M16
People
(Reporter: brendan, Assigned: brendan)
Details
(Keywords: js1.5)
Attachments
(2 files)
|
4.53 KB,
patch
|
Details | Diff | Splinter Review | |
|
4.79 KB,
patch
|
Details | Diff | Splinter Review |
Thread 1 and 2 are racing through their final requests, calling
JS_DestroyContext. Say thread 1 is first in the gate (so it is not last
according to the code in js_DestroyContext, jscntxt.c around line 143). Thread
1 runs the GC, but it waits (jsgc.c line 765, JS_AWAIT_REQUEST_DONE(rt) in a
while loop) for all other requests to end, suspend, or yield. Now thread 2 is
scheduled and it runs through js_DestroyContext, where it finds that it is in
fact last context through the gate.
But thread 2 doesn't know that thread 1 is "stuck in the gate" waiting in the GC
for thread 2 to end its current request. Thread 2 therefore blithely unpins
atoms, tries to GC (which notices another thread is in the GC and just bumps
rt->gcLevel, line 745 of jsgc.c, and returns early), and then frees atom state
and ends its own request just before it frees its context. This leaves thread 1
doomed: it awakens because thread 2 ended its request, and falls over null or
cleared atom state variables.
Workarounds including using JS_DestroyContextNoGC, which was apparently added to
hack around the underlying bug (and another deadlock bug, fixed by jband just
recently).
Patch coming up.
/be
Comment 1•26 years ago
|
||
I'm curious to see what your patch will look like. Sorry I didn't jump in on the
email thread about this. Is it really important to do the stuff in
"if(last){...}"?
For posterity sake I wanted to point out that this is not just an engine
shutdown bug. It can happan any time the engine is transitioning through this
state while destroying JSContexts. It may well be going through this state
frequently in some embeddings before being 'awaken' as new 'requests' arrive.
| Assignee | ||
Comment 2•26 years ago
|
||
Jband: good point, in fact chouck@geocast.com ran into this all the time before
he added a "root context" to keep the runtime's context population above one
until he was really ready to shut down. Patch coming up, it's hairy on account
of the damn gcmode (JS_DestroyContext*GC APIs) hacking.
/be
Summary: JS multithreaded engine shutdown bug → JS multithreaded engine last-destroy-context bug
| Assignee | ||
Comment 3•26 years ago
|
||
| Assignee | ||
Comment 4•26 years ago
|
||
More commentary:
Jband's fix in http://bugzilla.mozilla.org/show_bug.cgi?id=28570 is necessary
still, due to the way the GC (maybe) gets (forced to) run within the request
that called js_DestroyContext. It would be simpler to end that request and then
do a GC, but I'm not sure that's safe and I wanted to get this patch attached.
Could someone try moving the while (cx->requestDepth) JS_EndRequest(cx) loop up
above all js_*GC calls in js_DestroyContext?
The last-destroy-context case may indeed arise in embeddings without a root or
"keepalive" context, as jband pointed out. I wanted to add that rt->stateChange
and rt->state help to keep the first-new-context-after-last-destroy-context from
racing with js_FreeAtomState done under last-destroy-context. We have enough
synchronization, properly located in jscntxt.c, to handle that case. What we
have lacked till this patch is anything to keep the next-to-last context from
blocking in the GC (getting "stuck in the gate") while the last context frees
atom state ("slams the gate on the next-to-last context's foot!").
I still want blood from whoever hacked JS_DestroyContextNoGC into this picture,
rather than fixing the underlying deadlocks and synchronization bugs.
/be
| Assignee | ||
Comment 5•26 years ago
|
||
Anyone? Bueller?
/be
Patrick,
Would you add this to your threading work?
Thanks,
= C =
Assignee: clayton → beard
| Assignee | ||
Comment 7•26 years ago
|
||
| Assignee | ||
Comment 8•26 years ago
|
||
All the mess of the JS_DestroyContextNoGC hackaround (which we must maintain for
API compatibility, grr) should be cleaned up now. The last js_DestroyContext
always calls js_ForceGC, which (whether the underlying js_GC call gets the GC
lock and finds rt->gcLevel 0, meaning it runs the GC; or it finds another thread
already GC'ing, in which case that other GC must be running finalizers, and this
one will wait for it to finish) does not return until all other threads are done
with rt->atomState.
Jband added cx->destroying, and I buddied, but we were seeing a deadlock that
wasn't there: any request calling the GC while another thread is in the GC
waiting for the first request to end will notify the waiter, after temporarily
decrementing rt->requestCount. And (what is easy to forget) wait on a condvar
unlocks the condition's lock.
Chouck reported a deadlock last year between js_GC and JS_ResumeRequest, but it
is unrelated and not yet fixed (right chouck?). It needs a separate bug, which
I'll file in a moment.
Comments welcome.
/be
Comment 9•26 years ago
|
||
r=jband
I read this closely and understand it better than ever (which isn't saying a
lot). I can't see anything wrong with your fix. I looked for holes and did not
find them. I really wish we had some small multi-threaded test apps to stress
this.
| Assignee | ||
Updated•26 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → M16
| Assignee | ||
Comment 11•26 years ago
|
||
Fix checked in.
/be
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•