Closed
Bug 480301
Opened 16 years ago
Closed 16 years ago
Leaving outermost request should js_LeaveTrace
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jorendorff, Assigned: jorendorff)
References
Details
(Keywords: fixed1.9.1, Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files, 3 obsolete files)
2.02 KB,
patch
|
Details | Diff | Splinter Review | |
6.82 KB,
patch
|
brendan
:
review+
|
Details | Diff | Splinter Review |
If a _FAIL builtin happens to suspend and resume a request (like, in a getter), all our JIT state is exposed to GC.
Comment 1•16 years ago
|
||
(In reply to comment #0)
> If a _FAIL builtin happens to suspend and resume a request (like, in a getter),
> all our JIT state is exposed to GC.
Interesting, but why the suspend is special? I.e. why an explicit js_GC from a getter is any better?
Assignee | ||
Comment 2•16 years ago
|
||
There's code in js_GC to cope with that, if cx is the context that's on trace.
If cx is some other context... I don't know how nested requests involving multiple contexts on the same thread are supposed to work. That might be a hole too.
Comment 3•16 years ago
|
||
(In reply to comment #2)
> There's code in js_GC to cope with that, if cx is the context that's on trace.
Right, js_GC would not run if the cx runs a trace.
> If cx is some other context... I don't know how nested requests involving
> multiple contexts on the same thread are supposed to work. That might be a
> hole too.
That is a hole indeed. A simple way to fix that is to associate the trace with the current thread and not the context. But that pretty much require to fix the bug 437325.
Assignee | ||
Comment 4•16 years ago
|
||
OK. Then let's get the trivial part in and do the rest in a followup bug.
Comment 5•16 years ago
|
||
(In reply to comment #4)
> Created an attachment (id=364555) [details]
> v1
This is not enoygh. There are 2 other places that effectively suspends the request and allows to run js_GC on another thread.
1. The first place is ClaimTitle when the code waits for other thread to make the object's scope shared.
2. The second place is js_GC itself. If another thread already waits in js_GC for all requests to finish, then the current one would happily delegate the GC job to that thread. Then the current thread would wait until the job will be done. This happens before any trace checks.
Note that fixing these issues would most likely conflict with changes for the bug 477627 as the patch there factors out both these temporal suspends into a helper function.
Assignee | ||
Updated•16 years ago
|
Attachment #364555 -
Flags: review?(igor)
Assignee | ||
Comment 6•16 years ago
|
||
Well, ok, let's wait.
Assignee | ||
Comment 7•16 years ago
|
||
I think this blocks. It's a gaping GC hole whenever we call a _FAIL native that goes off into arbitrary code that might, say, call a quick stub.
Flags: blocking1.9.1?
Updated•16 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Assignee | ||
Comment 8•16 years ago
|
||
Correction: Gecko doesn't allow GC to happen except on the main thread. So quick stubs are not an issue. Perhaps we're ok here, but it's easy enough to fix that we really shouldn't risk it. At least wanted1.9.1.
Comment 9•16 years ago
|
||
(In reply to comment #8)
> Correction: Gecko doesn't allow GC to happen except on the main thread.
This is not true: the last-ditch GC can run from any thread.
Comment 10•16 years ago
|
||
(In reply to comment #9)
> (In reply to comment #8)
> > Correction: Gecko doesn't allow GC to happen except on the main thread.
>
> This is not true: the last-ditch GC can run from any thread.
Sorry for spreading FUD: the GC callback takes care about it and prevents the GC from non-main thread indeed.
Updated•16 years ago
|
Priority: -- → P1
Assignee | ||
Comment 11•16 years ago
|
||
This can probably proceed without bug 477627, but that one looks pretty much ready to push in. I'll finish implementing this anyway and suffer through the rebase if any.
Assignee | ||
Comment 12•16 years ago
|
||
As this is a P1 blocker, we should probably go ahead and push it in (reversing my sentiment in comment 6).
Attachment #364555 -
Attachment is obsolete: true
Attachment #372693 -
Flags: review?(brendan)
Updated•16 years ago
|
Attachment #372693 -
Flags: review?(brendan) → review+
Comment 13•16 years ago
|
||
Comment on attachment 372693 [details] [diff] [review]
v2
This is pure goodness but it will conflict big-time with Igor's patch for bug 477627, which has review. You guys can probably agree on the best order to land.
/be
Comment 14•16 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 15•16 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 16•16 years ago
|
||
The landed patch calls js_LeaveTrace with the GC lock held. Is it really safe?
Comment 17•16 years ago
|
||
There is a bug in the patch when it does js_LeaveTrace for all contexts on the current thread. This is bogus: JS_TRACE_ON(cx) checks if the current thread, not the JSContext instance, runs the trace. So if one have 2 contexts on the current thread (happens in the browser all the time) and the second runs the trace, then the patch will see JS_TRACE_ON(cx) for the first cx and call js_DeepBail on it. But this is very wrong since it will call js_DeepBail on JSContext that does *not* run the contexts.
Given this issue and conflicts with the patch from the bug 477627 I suggest to backout this patch, land the patch from the bug 477627, and then fix this bug.
Comment 18•16 years ago
|
||
Sounds good. Lets do it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 20•16 years ago
|
||
I suggest to change js_DeapBail so it searches JSContext that runs the trace. In most cases that would be the passed cx, but if not, a simple loop over JSThread's context list should find one.
Comment 21•16 years ago
|
||
backout was merged to mozilla central as well
Assignee | ||
Comment 22•16 years ago
|
||
(In reply to comment #20)
> I suggest to change js_DeapBail so it searches JSContext that runs the trace.
> In most cases that would be the passed cx, but if not, a simple loop over
> JSThread's context list should find one.
My instinct was for js_DeepBail to assume that its argument was precisely correct. But on reflection, Igor's suggestion seems like the only way to go. Most call sites need to search for the right cx anyway. Patch coming this morning.
Assignee | ||
Comment 23•16 years ago
|
||
Did I say morning? I meant noonish.
This rebases and incorporates an ugly-looking fix for the multiple-contexts problem of comment 17.
It also releases the GC lock when calling js_LeaveTrace from js_DiscountRequestsForGC (comment 16). I don't think this is strictly necessary, but it's good manners and makes for one less implicit constraint on js_LeaveTrace.
Attachment #372693 -
Attachment is obsolete: true
Attachment #373183 -
Flags: review?(brendan)
Comment 24•16 years ago
|
||
(In reply to comment #23)
> Created an attachment (id=373183) [details]
> It also releases the GC lock when calling js_LeaveTrace from
> js_DiscountRequestsForGC (comment 16). I don't think this is strictly
> necessary, but it's good manners and makes for one less implicit constraint on
> js_LeaveTrace.
Ugh, I should have caught that -- yes, we should not risk lock nesting or in any event over-extend critical sections for no reason. Thanks for this save.
/be
Comment 25•16 years ago
|
||
Comment on attachment 373183 [details] [diff] [review]
v3
>diff --git a/js/src/jscntxt.cpp b/js/src/jscntxt.cpp
>--- a/js/src/jscntxt.cpp
>+++ b/js/src/jscntxt.cpp
>@@ -846,16 +846,24 @@ js_WaitForGC(JSRuntime *rt)
> uint32
> js_DiscountRequestsForGC(JSContext *cx)
> {
> uint32 requestDebit;
>
> JS_ASSERT(cx->thread);
> JS_ASSERT(cx->runtime->gcThread != cx->thread);
>
>+#ifdef JS_TRACER
>+ if (JS_ON_TRACE(cx)) {
>+ JS_UNLOCK_GC(cx->runtime);
>+ js_LeaveTrace(cx);
>+ JS_LOCK_GC(cx->runtime);
>+ }
>+#endif
>+
> requestDebit = js_CountThreadRequests(cx);
Will this give up invariants from before the #ifdef that we assume after the #endif? The assertions still hold, but callers' invariants may not. Callers are:
- js_GC when it detects the GC already running and on another thread. This case seems safe, we are on the way out. We could sleep for a while and any number of subsequent allocations and/or GC runs occur, and we'd be no worse off.
- ClaimTitle. This one is not safe, because another thread could process the rt->titleSharingTodo list, leaving us waiting a long time (possibly forever) on rt->titleSharingDone. So the above unlock/lock, while well-motivated on general grounds, is not correct without more changes to the caller.
Igor, your thoughts are welcome as usual.
/be
Attachment #373183 -
Flags: review?(brendan) → review-
Comment 26•16 years ago
|
||
(In reply to comment #25)
> Will this give up invariants from before the #ifdef that we assume after the
> #endif? The assertions still hold, but callers' invariants may not. Callers
> are:
This is a good catch. The patch would be correct if ClaimTitle calls js_DiscountRequestsForGC right before doing PR_WaitCondVar. The latter effectively releases the GC lock so doing an extra unlock/lock before that is just fine. Now, if Claim title would be fixed to look like:
requestDebit = js_DiscountRequestsForGC(cx);
PR_WaitCondVar(rt->titleSharingDone, PR_INTERVAL_NO_TIMEOUT);
js_RecountRequestsAfterGC(rt, requestDebit);
then it will be very similar to the pattern in js_GC, which is:
requestDebit = js_DiscountRequestsForGC(cx);
js_RecountRequestsAfterGC(rt, requestDebit);
This suggests to replace these two function by a single one that take an extra parameter to distinguish the first case from the second. That function would contain js_DeepBail call.
Assignee | ||
Comment 27•16 years ago
|
||
(In reply to comment #25)
> - ClaimTitle. This one is not safe, because another thread could process the
> rt->titleSharingTodo list, [...]
Ah, I looked at both callers but misanalyzed. I'll revise as Igor suggests in comment 26.
Assignee | ||
Comment 28•16 years ago
|
||
On second thought I don't think comment 26 is right that it would be safe to unlock for a moment just before waiting. It is either necessary to wait or unsafe to wait; after unlocking and relocking, we must check to see which is the case.
Comment 29•16 years ago
|
||
(In reply to comment #28)
> It is either necessary to wait or
> unsafe to wait;
The PR_WaitCondVar(rt->titleSharingDone) function unlocks the GC lock and allows other threads to enter the lock in arbitrary order. Thus from code synchronization point of view
PR_WaitCondVar(rt->titleSharingDone)
has the same effect as
JS_UNLOCK_GC(rt)
JS_LOCK_GC(rt)
PR_WaitCondVar(rt->titleSharingDone)
Comment 30•16 years ago
|
||
(In reply to comment #27)
> Ah, I looked at both callers but misanalyzed. I'll revise as Igor suggests in
> comment 26.
Hm, I think that perhaps we should replace js_DiscountRequestsForGC(cx) and js_RecountRequestsAfterGC(rt, requestDebit) with a class that does that in constructor/destructor. This is C++ at the end.
Comment 31•16 years ago
|
||
(In reply to comment #29)
> Thus from code
> synchronization point of view
>
> PR_WaitCondVar(rt->titleSharingDone)
>
> has the same effect as
>
> JS_UNLOCK_GC(rt)
> JS_LOCK_GC(rt)
> PR_WaitCondVar(rt->titleSharingDone)
I am wrong about this - the signal to wake up on rt->titleSharingDone can be lost between unlock/lock. So the code must check after the unlock if the title is still needs sharing.
Assignee | ||
Comment 32•16 years ago
|
||
Assignee | ||
Comment 33•16 years ago
|
||
Attachment #373183 -
Attachment is obsolete: true
Attachment #373331 -
Flags: review?(brendan)
Comment 34•16 years ago
|
||
As an alternative to explicit LeaveTrace due to possible GC spread throughout the code what about calling in js_GC LeaveTrace for each thread?
Comment 35•16 years ago
|
||
(In reply to comment #34)
> As an alternative to explicit LeaveTrace due to possible GC spread throughout
> the code what about calling in js_GC LeaveTrace for each thread?
Specifically, what about calling LeaveTrace from PurgeThreadData? Note a care has to be taken there since cx argument in the function is the cx that runs the GC thread and unrelated to JSThreadData argument.
Comment 36•16 years ago
|
||
Comment on attachment 373331 [details] [diff] [review]
v4
> JS_FORCES_STACK JS_FRIEND_API(void)
> js_DeepBail(JSContext *cx)
> {
> JS_ASSERT(JS_ON_TRACE(cx));
>
>+ /*
>+ * Exactly one context on the current thread is on trace. Find out which
>+ * one. (Most callers cannot guarantee that it's cx.)
>+ */
Given the comment, what about replacing JSTraceMonitor.onTrace with cx that runs the trace?
Assignee | ||
Comment 37•16 years ago
|
||
(In reply to comment #34)
> As an alternative to explicit LeaveTrace due to possible GC spread throughout
> the code what about calling in js_GC LeaveTrace for each thread?
That might be better.
(In reply to comment #36)
> Given the comment, what about replacing JSTraceMonitor.onTrace with cx that
> runs the trace?
That sounds definitely better.
Assignee | ||
Comment 38•16 years ago
|
||
(In reply to comment #37)
> (In reply to comment #34)
> > As an alternative to explicit LeaveTrace due to possible GC spread throughout
> > the code what about calling in js_GC LeaveTrace for each thread?
>
> That might be better.
I think we would still have to flush the globals to the global object in order to avoid bugs in embeddings that can share a global object among threads. So this would not succeed in reducing the amount of complexity.
Arguably the underlying problem here is that we have three pieces of code for leaving a request, instead of a function with some flags.
> > Given the comment, what about replacing JSTraceMonitor.onTrace with cx that
> > runs the trace?
Filed followup bug 488874.
Comment 39•16 years ago
|
||
Sharing a global among threads is not a supported use-case. Deprecate, obsolete, remove support for it -- starting here.
Does any embedder actually do this? The language has no synchronization primitives.
/be
Assignee | ||
Comment 40•16 years ago
|
||
(In reply to comment #39)
> Sharing a global among threads is not a supported use-case. Deprecate,
> obsolete, remove support for it -- starting here.
OK!
> Does any embedder actually do this? The language has no synchronization
> primitives.
In embeddings that share any objects across threads, I would be surprised if they weren't doing it accidentally, by sharing objects whose methods refer to globals. (`scatter` too, but I won't miss it.)
Comment 41•16 years ago
|
||
scatter can adapt -- a torture test using a shared object with many threads mutating is a good test. It's a lousy embedding model, tho!
/be
Comment 42•16 years ago
|
||
(In reply to comment #38)
> I think we would still have to flush the globals to the global object in order
> to avoid bugs in embeddings that can share a global object among threads. So
> this would not succeed in reducing the amount of complexity.
This is not for this bug. This one is about missing DeepBail due to code that leaves the request temporarily allowing the GC to proceed on other threads. Since the problem is in the GC itself, calling DeepBail for all the threads from the GC itself will solve this bug once and for all without doing unnecessary bail calls.
Assignee | ||
Comment 43•16 years ago
|
||
I totally disagree with the line of reasoning in comment 42. The bug summary is easily changed. We don't have to let it dictate what we do here.
Here's my concern. Brendan has said "a trace always runs to completion without yielding its request" as though this were a design invariant (in bug 419537, but I've also heard him mention this on IRC). If we need to have that rule, we should revise the bug summary and use the approach in v4. If not, comment 34 is the way to go.
Comment 44•16 years ago
|
||
When I say it out loud, invariants vary -- makes me superstitious ;-).
We can use one bug or two. It's up to you guys. I agree the bug summary talks about the invariant more clearly (perhaps it could be clearer) than about just the GC forcing threads off trace.
We should certainly consolidate common code. Don't think anyone disagrees there.
/be
Comment 45•16 years ago
|
||
Regarding shared-among-threads globals: if this is no longer supported, we should assert on that and replace shell's scatter with something that is similar to message-passing interface of DOM thread workers. That would be extremely helpful in debugging worker's issues.
Comment 46•16 years ago
|
||
Comment on attachment 373331 [details] [diff] [review]
v4
I would have cut to the chase and changed onTrace to a JSContext *, to avoid trouble groveling through a long JSThread.contextList (which happens in Firefox).
/be
Attachment #373331 -
Flags: review?(brendan) → review+
Comment 47•16 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 48•16 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Comment 49•16 years ago
|
||
Can anyone test if applying this patch to the 1.9.1 branch prevents the testcase in bug 484590 from crashing the Shiretoko (3.5b4pre) nightly load?
Comment 50•16 years ago
|
||
Keywords: fixed1.9.1
You need to log in
before you can comment on or make changes to this bug.
Description
•