Closed
Bug 133773
Opened 23 years ago
Closed 23 years ago
malign race to use and destroy runtime state in js_DestroyContext
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: brendan, Assigned: brendan)
Details
(Keywords: crash, js1.5)
Attachments
(4 files, 6 obsolete files)
|
25.56 KB,
patch
|
shaver
:
review+
jband_mozilla
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
|
1.07 KB,
patch
|
shaver
:
review+
jband_mozilla
:
superreview+
jesup
:
approval+
|
Details | Diff | Splinter Review |
|
47.81 KB,
application/pdf
|
hc2428
:
review+
|
Details |
|
22.18 KB,
application/octet-stream
|
hc2428
:
review+
|
Details |
See the thread running through this message:
news://news.mozilla.org:119/a7o6cb$k4t2@ripley.netscape.com
The js_NewContext/js_DestroyContext synchronization mechanism (the JSRTS_DOWN,
LAUNCHING, UP, and LANDING state variable, in conjunction with the stateChange
condition) works fine except for one bad bug: js_DestroyContext removes cx from
rt->contextList first, before transitioning from UP to LANDING if cx is "last".
This means a next-to-last thread may still be racing through js_DestroyContext,
having already removed its cx from rt->contextList (necessary for the first
thread to be "last"), but still using rt->atomState, etc.
The race after that is wide open, and Michel Drapeau indeed found one thread
(the last one) freeing atom state while the other was trying to mark it from its
default-forced GC called near the bottom of js_DestroyContext in the not-last case.
The workaround Michel found, calling JS_DestroyContextNoGC (with a JS_GC call
before it, but that's immaterial to the workaround, even if a good idea) just
dodges the symptom-bullet by not forcing a GC from the next-to-last call to
js_DestroyContext.
/be
| Assignee | ||
Comment 1•23 years ago
|
||
This needs to be fixed ASAP. Patch coming up.
/be
| Assignee | ||
Comment 2•23 years ago
|
||
Please test, r=, and sr= ASAP. It tests well for me, notably shutdown leak
testing -- but more testing is welcome.
/be
Comment 3•23 years ago
|
||
This patch passes the JS testsuite in the debug/optimized JS shell on WinNT.
Updated•23 years ago
|
Keywords: mozilla1.0 → mozilla1.0+
| Assignee | ||
Comment 4•23 years ago
|
||
Because we must make the next-to-last context empty rt->contextList only after
it has (possibly, maybe) GC'd, and because we must have one code path through
the top of js_DestroyContext before we can divince last from next-to-last from
other cases, *all* cases must defer removing cx from the list. This may sound
obvious now, it was missed for quite a while. An alternative is discussed with
reasons for its rejection in the jscntxt.h comment.
/be
Attachment #76429 -
Attachment is obsolete: true
| Assignee | ||
Comment 5•23 years ago
|
||
The tweak is important to API users who call JS_BeginRequest and keep that
request open till JS_DestroyContext implicitly ends the request. Without this
change from the last patch (moving cx->destroyingSelf = JS_TRUE; above the
while (cx->requestDepth != 0) JS_EndRequest(cx); loop), such API users could
not ensure that a racing GC would never mark cx's roots; this might result in
too much garbage, or worse: shutdown leaks.
/be
Attachment #76450 -
Attachment is obsolete: true
| Assignee | ||
Comment 6•23 years ago
|
||
+ * in that case, if the GC should see cx before out store to this boolean
s/out store/our store/
I didn't want to attach another patch just for this comment typo fix.
Ok, *now* I'm ready for r= and sr=!
/be
| Assignee | ||
Comment 7•23 years ago
|
||
jband wisely waited to review, and guilted me into iterating in public yet
again (I don't mind, I do it as a public service :-). Apart from the comment
changes, which I hope are worth their weight (I forgot to update comments shown
in the last cvs diff -u20 -- too many comments and some will rust and lie, but
I think I got all relevant ones here), the main change is to make
JS_ContextIterator safe. It calls js_ContextIterator, as does the GC. All
context iterations should skip contexts being destroyed.
This patch therefore doesn't touch jsgc.c, it modifies only jscntxt.[ch]. The
cx->destroyingSelf = JS_TRUE; assignment now happens under the protection of
the rt lock, which also protects context iterator steps.
/be
Attachment #76648 -
Attachment is obsolete: true
Comment 8•23 years ago
|
||
brendan: My head is starting to throb. It seems to me that you now risk a
failure to detect 'last'. If the last two context destructions race, then the
last one into the lock may not see itself as the only member of the context list
at the point where 'last' is detected because the first one is still in the
list. I wonder how bad that really is.
I also wonder if this is all worth it. I sugested this once before... Would it
really be so bad to not do the stuff we do for the 'last' context destruction
when we do it. What if we waited until runtime destuction? It would be ugly to
create a new context at that point just for the purpose of being able to cleanup
the last stuff in the runtime. But, it would be safe and sane. Thoughts?
| Assignee | ||
Comment 9•23 years ago
|
||
>It seems to me that you now risk a failure to detect 'last'. If the last two
>context destructions race, then the last one into the lock may not see itself as
>the only member of the context list at the point where 'last' is detected
>because the first one is still in the list. I wonder how bad that really is.
Right, and there's another problem: js_LiveContext. More work needed, I'm
invalidating this patch.
However, I don't believe we can defer other things till runtime destruction.
That's an API incompatibility that some embedders I know of will notice, and
call a memory bloat bug.
/be
| Assignee | ||
Updated•23 years ago
|
Attachment #76690 -
Attachment is obsolete: true
| Assignee | ||
Comment 10•23 years ago
|
||
This patch touches more files, but changes fewer lines (66 -/+ lines in the
diff instead of 89). Thanks to jband for buddying me here, and to timeless for
reminding me to get more sleep.
The last patch series attempted to leave a context being destroyed on
rt->contextList until GC had been run on it, but that created new states that
weren't handled properly by the logic of the code. It overcomplicated things,
too, causing jband's head to hurt :-(.
This patch keeps the old logic whereby js_DestroyContext removes cx from
rt->contextList first, possibly changing rt->state to JSRTS_LANDING if cx is
the last context in rt. But it makes a crucial fix, changing the lock that
protects contextList and stateChange from rt->rtLock to rt->gcLock. This is
needed for two reasons:
1. js_LiveContext, called while rt->gcLock is held by ClaimScope in jslock.c,
without this patch could race badly with js_DestroyContext. The race was rare,
and its bad outcome even rarer, I'm sure. But in the worst case, if ClaimScope
found a context pointed at by a scope->ownercx that was about to be destroyed,
but which js_DestroyContext had not yet removed from rt->contextList, its call
to js_LiveContext would return true -- but if immediately after that return,
the thread on which ClaimScope was running were preempted until after that cx
had been destroyed, then ClaimScope would resume and fondle free memory.
2. To fix the present bug, we need to interlock contextList and state updates
using gcLock, and teach the GC to bail if it is racing with the forced GC
called on the last context in the runtime (we can't use rtLock because the GC's
logic is protected by gcLock). That last-racing-with-next-to-last-context is
the case that Michel Drapeau showed, and now, with this patch, it can't happen.
js_DestroyContext calls js_ForceGC in the last-context case with the
GC_LAST_CONTEXT parameter flag set, which distinguishes the call as a necessary
one, compared to the potentially racing js_ForceGC(cx, 0) made in the not-last
case of js_DestroyContext. Any call to js_GC without GC_LAST_CONTEXT in the
gcflags arg now returns early if rt->state != JSRTS_UP.
This tests well for me, although I was never able to reproduce Michel's crash.
I'll mail him to make sure he tries this patch. Jband, I promise this won't
hurt (as much :-).
/be
Comment 11•23 years ago
|
||
Brendan: I think I buy this as addressing the stated problem. I see one issue
though...
You changed the JS_FRIEND_API js_ForceGC. lxr shows that there are callers
outside of js/src. I doubt we want to expose the flags. Perhaps JS_ForceGC
should be added (without the flags param) and js_ForceGC be made a non-friend?
Either way, outside callers need to be fixed.
| Assignee | ||
Comment 12•23 years ago
|
||
Otherwise, no code change from last patch's files, just more files. The js.c
code was copied more often than I had thought.
/be
Attachment #76907 -
Attachment is obsolete: true
Comment 13•23 years ago
|
||
Comment on attachment 77163 [details] [diff] [review]
complete fix, removes js_ForceGC as a FRIEND API
sr=jband. And a change to shamu too!
Attachment #77163 -
Flags: superreview+
Comment 14•23 years ago
|
||
Comment on attachment 77163 [details] [diff] [review]
complete fix, removes js_ForceGC as a FRIEND API
r/sr=shaver (back from space!)
Attachment #77163 -
Flags: review+
Comment 15•23 years ago
|
||
Comment on attachment 77163 [details] [diff] [review]
complete fix, removes js_ForceGC as a FRIEND API
a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #77163 -
Flags: approval+
| Assignee | ||
Comment 16•23 years ago
|
||
Fix is in, thanks.
/be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 17•23 years ago
|
||
I'm an idiot -- the (last) call to js_UnpinPinnedAtoms can race badly with a
next-to-last GC in progress. D'oh! Thanks to Michel for testing the patch,
sorry I jumped the gun on checking in. This last window of vulnerability will
be easy to close, I think.
/be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 18•23 years ago
|
||
Yes, this means keeping the last request on a cx you want to destroy open and
letting JS_DestroyContext (js_DestroyContext, actually) end that request for
you is the way to go (workaround for the crash Michel found that this patch
fixes).
As noted in the comment in this patch, this simple change complements rather
than replaces the previous patch, which is still needed in case the "not-last"
js_DestroyContext races into the GC after the last one has ended its request.
I don't believe there's any way for a GC to race with the dangerous parts of
the last-context-destruction case now. Review appreciated, I'm waiting on
Michel's testing thumbs-up.
/be
Comment 19•23 years ago
|
||
Comment on attachment 77311 [details] [diff] [review]
simple fix to avoid racing js_UnpinPinnedAtoms vs. js_{Mark,Sweep}AtomState
sr=jband. I buy this. Let's hope the tests don't turn out to be smarter than we
are (again).
Attachment #77311 -
Flags: superreview+
Comment 20•23 years ago
|
||
Comment on attachment 77311 [details] [diff] [review]
simple fix to avoid racing js_UnpinPinnedAtoms vs. js_{Mark,Sweep}AtomState
r=shaver.
Attachment #77311 -
Flags: review+
| Assignee | ||
Comment 21•23 years ago
|
||
Michel wrote:
>Congratulation !
>It seems to work fine.
>
>Thanks a lot !
So I'm going for a= from drivers.
/be
Status: REOPENED → ASSIGNED
Comment 22•23 years ago
|
||
Comment on attachment 77311 [details] [diff] [review]
simple fix to avoid racing js_UnpinPinnedAtoms vs. js_{Mark,Sweep}AtomState
a=rjesup@wgate.com for drivers
Attachment #77311 -
Flags: approval+
| Assignee | ||
Comment 23•23 years ago
|
||
Fixed for good.
/be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 25•16 years ago
|
||
Attachment #399019 -
Flags: review+
Comment 26•16 years ago
|
||
Conclusion:
From the gdb backtrace we can see that it is because the state->table is NULL, the _HashTableEnumerateEntries triggers a segmentation fault. The reason of segmentation fault is we call js_FreeAtomState first (free the table of JSAtomState), and then we call js_MarkAtomState (access the table of JSAtomState). This order is possible under the fix patches in this page. This backtrace symptom is the same as that of the original bug.
Please note that in my patch, I have commented the “if (!state->table)” check in js_MarkAtomState(). This check is added by the bug #131246, a different bug, and this check will hide the segmentation fault in firefox 133773 bug.
In this bug, developers added lots of code in order to avoid the race of js_FreeAtomState vs. js_MarkAtomState (we want js_FreeAtomState is called after all js_MarkAtomState are called), but failed. I recommend that we reconsider mechanisms to achieve our goals (although the segmentation fault can be hidden by the check added in bug #131246).
Attachment #399019 -
Attachment is obsolete: true
Attachment #399027 -
Flags: review+
Comment 27•16 years ago
|
||
Attachment #399030 -
Flags: review+
Comment 28•16 years ago
|
||
Sorry, there is a typo in my pseudo code in attachment 399027 [details].
If(last) {
Wait until gcLevel == 1; // here is a typo
UnpinAtom();
...
A correct form should be:
If(last) {
Wait until gcLevel == 0;
UnpinAtom();
...
| Assignee | ||
Comment 29•16 years ago
|
||
Heming, thanks for the comments and analysis. It looks like you've independently found bug 477021.
In general it's best to file a new bug rather than append to an old and resolved bug. Please see bug 477021 and let us know if anything still is broken. Thanks again,
/be
| Assignee | ||
Comment 30•16 years ago
|
||
See also bug 478336.
/be
You need to log in
before you can comment on or make changes to this bug.
Description
•