Closed Bug 480301 Opened 16 years ago Closed 16 years ago

Leaving outermost request should js_LeaveTrace

Categories

(Core :: JavaScript Engine, defect, P1)

Other Branch
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jorendorff, Assigned: jorendorff)

References

Details

(Keywords: fixed1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 3 obsolete files)

If a _FAIL builtin happens to suspend and resume a request (like, in a getter), all our JIT state is exposed to GC.
(In reply to comment #0) > If a _FAIL builtin happens to suspend and resume a request (like, in a getter), > all our JIT state is exposed to GC. Interesting, but why the suspend is special? I.e. why an explicit js_GC from a getter is any better?
There's code in js_GC to cope with that, if cx is the context that's on trace. If cx is some other context... I don't know how nested requests involving multiple contexts on the same thread are supposed to work. That might be a hole too.
(In reply to comment #2) > There's code in js_GC to cope with that, if cx is the context that's on trace. Right, js_GC would not run if the cx runs a trace. > If cx is some other context... I don't know how nested requests involving > multiple contexts on the same thread are supposed to work. That might be a > hole too. That is a hole indeed. A simple way to fix that is to associate the trace with the current thread and not the context. But that pretty much require to fix the bug 437325.
Attached patch v1 (obsolete) — Splinter Review
OK. Then let's get the trivial part in and do the rest in a followup bug.
Assignee: general → jorendorff
Status: NEW → ASSIGNED
Attachment #364555 - Flags: review?(igor)
(In reply to comment #4) > Created an attachment (id=364555) [details] > v1 This is not enoygh. There are 2 other places that effectively suspends the request and allows to run js_GC on another thread. 1. The first place is ClaimTitle when the code waits for other thread to make the object's scope shared. 2. The second place is js_GC itself. If another thread already waits in js_GC for all requests to finish, then the current one would happily delegate the GC job to that thread. Then the current thread would wait until the job will be done. This happens before any trace checks. Note that fixing these issues would most likely conflict with changes for the bug 477627 as the patch there factors out both these temporal suspends into a helper function.
Attachment #364555 - Flags: review?(igor)
Well, ok, let's wait.
Depends on: 477627
I think this blocks. It's a gaping GC hole whenever we call a _FAIL native that goes off into arbitrary code that might, say, call a quick stub.
Flags: blocking1.9.1?
Flags: blocking1.9.1? → blocking1.9.1+
Correction: Gecko doesn't allow GC to happen except on the main thread. So quick stubs are not an issue. Perhaps we're ok here, but it's easy enough to fix that we really shouldn't risk it. At least wanted1.9.1.
(In reply to comment #8) > Correction: Gecko doesn't allow GC to happen except on the main thread. This is not true: the last-ditch GC can run from any thread.
(In reply to comment #9) > (In reply to comment #8) > > Correction: Gecko doesn't allow GC to happen except on the main thread. > > This is not true: the last-ditch GC can run from any thread. Sorry for spreading FUD: the GC callback takes care about it and prevents the GC from non-main thread indeed.
Priority: -- → P1
This can probably proceed without bug 477627, but that one looks pretty much ready to push in. I'll finish implementing this anyway and suffer through the rebase if any.
Attached patch v2 (obsolete) — Splinter Review
As this is a P1 blocker, we should probably go ahead and push it in (reversing my sentiment in comment 6).
Attachment #364555 - Attachment is obsolete: true
Attachment #372693 - Flags: review?(brendan)
Attachment #372693 - Flags: review?(brendan) → review+
Comment on attachment 372693 [details] [diff] [review] v2 This is pure goodness but it will conflict big-time with Igor's patch for bug 477627, which has review. You guys can probably agree on the best order to land. /be
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
The landed patch calls js_LeaveTrace with the GC lock held. Is it really safe?
There is a bug in the patch when it does js_LeaveTrace for all contexts on the current thread. This is bogus: JS_TRACE_ON(cx) checks if the current thread, not the JSContext instance, runs the trace. So if one have 2 contexts on the current thread (happens in the browser all the time) and the second runs the trace, then the patch will see JS_TRACE_ON(cx) for the first cx and call js_DeepBail on it. But this is very wrong since it will call js_DeepBail on JSContext that does *not* run the contexts. Given this issue and conflicts with the patch from the bug 477627 I suggest to backout this patch, land the patch from the bug 477627, and then fix this bug.
Sounds good. Lets do it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I backed out the landed patch
Whiteboard: fixed-in-tracemonkey
I suggest to change js_DeapBail so it searches JSContext that runs the trace. In most cases that would be the passed cx, but if not, a simple loop over JSThread's context list should find one.
backout was merged to mozilla central as well
(In reply to comment #20) > I suggest to change js_DeapBail so it searches JSContext that runs the trace. > In most cases that would be the passed cx, but if not, a simple loop over > JSThread's context list should find one. My instinct was for js_DeepBail to assume that its argument was precisely correct. But on reflection, Igor's suggestion seems like the only way to go. Most call sites need to search for the right cx anyway. Patch coming this morning.
Attached patch v3 (obsolete) — Splinter Review
Did I say morning? I meant noonish. This rebases and incorporates an ugly-looking fix for the multiple-contexts problem of comment 17. It also releases the GC lock when calling js_LeaveTrace from js_DiscountRequestsForGC (comment 16). I don't think this is strictly necessary, but it's good manners and makes for one less implicit constraint on js_LeaveTrace.
Attachment #372693 - Attachment is obsolete: true
Attachment #373183 - Flags: review?(brendan)
(In reply to comment #23) > Created an attachment (id=373183) [details] > It also releases the GC lock when calling js_LeaveTrace from > js_DiscountRequestsForGC (comment 16). I don't think this is strictly > necessary, but it's good manners and makes for one less implicit constraint on > js_LeaveTrace. Ugh, I should have caught that -- yes, we should not risk lock nesting or in any event over-extend critical sections for no reason. Thanks for this save. /be
Comment on attachment 373183 [details] [diff] [review] v3 >diff --git a/js/src/jscntxt.cpp b/js/src/jscntxt.cpp >--- a/js/src/jscntxt.cpp >+++ b/js/src/jscntxt.cpp >@@ -846,16 +846,24 @@ js_WaitForGC(JSRuntime *rt) > uint32 > js_DiscountRequestsForGC(JSContext *cx) > { > uint32 requestDebit; > > JS_ASSERT(cx->thread); > JS_ASSERT(cx->runtime->gcThread != cx->thread); > >+#ifdef JS_TRACER >+ if (JS_ON_TRACE(cx)) { >+ JS_UNLOCK_GC(cx->runtime); >+ js_LeaveTrace(cx); >+ JS_LOCK_GC(cx->runtime); >+ } >+#endif >+ > requestDebit = js_CountThreadRequests(cx); Will this give up invariants from before the #ifdef that we assume after the #endif? The assertions still hold, but callers' invariants may not. Callers are: - js_GC when it detects the GC already running and on another thread. This case seems safe, we are on the way out. We could sleep for a while and any number of subsequent allocations and/or GC runs occur, and we'd be no worse off. - ClaimTitle. This one is not safe, because another thread could process the rt->titleSharingTodo list, leaving us waiting a long time (possibly forever) on rt->titleSharingDone. So the above unlock/lock, while well-motivated on general grounds, is not correct without more changes to the caller. Igor, your thoughts are welcome as usual. /be
Attachment #373183 - Flags: review?(brendan) → review-
(In reply to comment #25) > Will this give up invariants from before the #ifdef that we assume after the > #endif? The assertions still hold, but callers' invariants may not. Callers > are: This is a good catch. The patch would be correct if ClaimTitle calls js_DiscountRequestsForGC right before doing PR_WaitCondVar. The latter effectively releases the GC lock so doing an extra unlock/lock before that is just fine. Now, if Claim title would be fixed to look like: requestDebit = js_DiscountRequestsForGC(cx); PR_WaitCondVar(rt->titleSharingDone, PR_INTERVAL_NO_TIMEOUT); js_RecountRequestsAfterGC(rt, requestDebit); then it will be very similar to the pattern in js_GC, which is: requestDebit = js_DiscountRequestsForGC(cx); js_RecountRequestsAfterGC(rt, requestDebit); This suggests to replace these two function by a single one that take an extra parameter to distinguish the first case from the second. That function would contain js_DeepBail call.
(In reply to comment #25) > - ClaimTitle. This one is not safe, because another thread could process the > rt->titleSharingTodo list, [...] Ah, I looked at both callers but misanalyzed. I'll revise as Igor suggests in comment 26.
On second thought I don't think comment 26 is right that it would be safe to unlock for a moment just before waiting. It is either necessary to wait or unsafe to wait; after unlocking and relocking, we must check to see which is the case.
(In reply to comment #28) > It is either necessary to wait or > unsafe to wait; The PR_WaitCondVar(rt->titleSharingDone) function unlocks the GC lock and allows other threads to enter the lock in arbitrary order. Thus from code synchronization point of view PR_WaitCondVar(rt->titleSharingDone) has the same effect as JS_UNLOCK_GC(rt) JS_LOCK_GC(rt) PR_WaitCondVar(rt->titleSharingDone)
(In reply to comment #27) > Ah, I looked at both callers but misanalyzed. I'll revise as Igor suggests in > comment 26. Hm, I think that perhaps we should replace js_DiscountRequestsForGC(cx) and js_RecountRequestsAfterGC(rt, requestDebit) with a class that does that in constructor/destructor. This is C++ at the end.
(In reply to comment #29) > Thus from code > synchronization point of view > > PR_WaitCondVar(rt->titleSharingDone) > > has the same effect as > > JS_UNLOCK_GC(rt) > JS_LOCK_GC(rt) > PR_WaitCondVar(rt->titleSharingDone) I am wrong about this - the signal to wake up on rt->titleSharingDone can be lost between unlock/lock. So the code must check after the unlock if the title is still needs sharing.
Attached patch v4Splinter Review
Attachment #373183 - Attachment is obsolete: true
Attachment #373331 - Flags: review?(brendan)
As an alternative to explicit LeaveTrace due to possible GC spread throughout the code what about calling in js_GC LeaveTrace for each thread?
(In reply to comment #34) > As an alternative to explicit LeaveTrace due to possible GC spread throughout > the code what about calling in js_GC LeaveTrace for each thread? Specifically, what about calling LeaveTrace from PurgeThreadData? Note a care has to be taken there since cx argument in the function is the cx that runs the GC thread and unrelated to JSThreadData argument.
Comment on attachment 373331 [details] [diff] [review] v4 > JS_FORCES_STACK JS_FRIEND_API(void) > js_DeepBail(JSContext *cx) > { > JS_ASSERT(JS_ON_TRACE(cx)); > >+ /* >+ * Exactly one context on the current thread is on trace. Find out which >+ * one. (Most callers cannot guarantee that it's cx.) >+ */ Given the comment, what about replacing JSTraceMonitor.onTrace with cx that runs the trace?
(In reply to comment #34) > As an alternative to explicit LeaveTrace due to possible GC spread throughout > the code what about calling in js_GC LeaveTrace for each thread? That might be better. (In reply to comment #36) > Given the comment, what about replacing JSTraceMonitor.onTrace with cx that > runs the trace? That sounds definitely better.
(In reply to comment #37) > (In reply to comment #34) > > As an alternative to explicit LeaveTrace due to possible GC spread throughout > > the code what about calling in js_GC LeaveTrace for each thread? > > That might be better. I think we would still have to flush the globals to the global object in order to avoid bugs in embeddings that can share a global object among threads. So this would not succeed in reducing the amount of complexity. Arguably the underlying problem here is that we have three pieces of code for leaving a request, instead of a function with some flags. > > Given the comment, what about replacing JSTraceMonitor.onTrace with cx that > > runs the trace? Filed followup bug 488874.
Sharing a global among threads is not a supported use-case. Deprecate, obsolete, remove support for it -- starting here. Does any embedder actually do this? The language has no synchronization primitives. /be
(In reply to comment #39) > Sharing a global among threads is not a supported use-case. Deprecate, > obsolete, remove support for it -- starting here. OK! > Does any embedder actually do this? The language has no synchronization > primitives. In embeddings that share any objects across threads, I would be surprised if they weren't doing it accidentally, by sharing objects whose methods refer to globals. (`scatter` too, but I won't miss it.)
scatter can adapt -- a torture test using a shared object with many threads mutating is a good test. It's a lousy embedding model, tho! /be
(In reply to comment #38) > I think we would still have to flush the globals to the global object in order > to avoid bugs in embeddings that can share a global object among threads. So > this would not succeed in reducing the amount of complexity. This is not for this bug. This one is about missing DeepBail due to code that leaves the request temporarily allowing the GC to proceed on other threads. Since the problem is in the GC itself, calling DeepBail for all the threads from the GC itself will solve this bug once and for all without doing unnecessary bail calls.
I totally disagree with the line of reasoning in comment 42. The bug summary is easily changed. We don't have to let it dictate what we do here. Here's my concern. Brendan has said "a trace always runs to completion without yielding its request" as though this were a design invariant (in bug 419537, but I've also heard him mention this on IRC). If we need to have that rule, we should revise the bug summary and use the approach in v4. If not, comment 34 is the way to go.
When I say it out loud, invariants vary -- makes me superstitious ;-). We can use one bug or two. It's up to you guys. I agree the bug summary talks about the invariant more clearly (perhaps it could be clearer) than about just the GC forcing threads off trace. We should certainly consolidate common code. Don't think anyone disagrees there. /be
Regarding shared-among-threads globals: if this is no longer supported, we should assert on that and replace shell's scatter with something that is similar to message-passing interface of DOM thread workers. That would be extremely helpful in debugging worker's issues.
Comment on attachment 373331 [details] [diff] [review] v4 I would have cut to the chase and changed onTrace to a JSContext *, to avoid trouble groveling through a long JSThread.contextList (which happens in Firefox). /be
Attachment #373331 - Flags: review?(brendan) → review+
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Can anyone test if applying this patch to the 1.9.1 branch prevents the testcase in bug 484590 from crashing the Shiretoko (3.5b4pre) nightly load?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: