480301 - Leaving outermost request should js_LeaveTrace

Jason Orendorff [:jorendorff]

Assignee

Description

•

16 years ago

If a _FAIL builtin happens to suspend and resume a request (like, in a getter), all our JIT state is exposed to GC.

Igor Bukanov

Comment 1

•

16 years ago

(In reply to comment #0) > If a _FAIL builtin happens to suspend and resume a request (like, in a getter), > all our JIT state is exposed to GC. Interesting, but why the suspend is special? I.e. why an explicit js_GC from a getter is any better?

Jason Orendorff [:jorendorff]

Assignee

Comment 2

•

16 years ago

There's code in js_GC to cope with that, if cx is the context that's on trace. If cx is some other context... I don't know how nested requests involving multiple contexts on the same thread are supposed to work. That might be a hole too.

Igor Bukanov

Comment 3

•

16 years ago

(In reply to comment #2) > There's code in js_GC to cope with that, if cx is the context that's on trace. Right, js_GC would not run if the cx runs a trace. > If cx is some other context... I don't know how nested requests involving > multiple contexts on the same thread are supposed to work. That might be a > hole too. That is a hole indeed. A simple way to fix that is to associate the trace with the current thread and not the context. But that pretty much require to fix the bug 437325.

Jason Orendorff [:jorendorff]

Assignee

Comment 4

•

16 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

OK. Then let's get the trivial part in and do the rest in a followup bug.

Assignee: general → jorendorff

Status: NEW → ASSIGNED

Attachment #364555 - Flags: review?(igor)

Igor Bukanov

Comment 5

•

16 years ago

(In reply to comment #4) > Created an attachment (id=364555) [details] > v1 This is not enoygh. There are 2 other places that effectively suspends the request and allows to run js_GC on another thread. 1. The first place is ClaimTitle when the code waits for other thread to make the object's scope shared. 2. The second place is js_GC itself. If another thread already waits in js_GC for all requests to finish, then the current one would happily delegate the GC job to that thread. Then the current thread would wait until the job will be done. This happens before any trace checks. Note that fixing these issues would most likely conflict with changes for the bug 477627 as the patch there factors out both these temporal suspends into a helper function.

Jason Orendorff [:jorendorff]

Assignee

Updated

•

16 years ago

Attachment #364555 - Flags: review?(igor)

Jason Orendorff [:jorendorff]

Assignee

Comment 6

•

16 years ago

Well, ok, let's wait.

Jason Orendorff [:jorendorff]

Assignee

Updated

•

16 years ago

Depends on: 477627

Jason Orendorff [:jorendorff]

Assignee

Comment 7

•

16 years ago

I think this blocks. It's a gaping GC hole whenever we call a _FAIL native that goes off into arbitrary code that might, say, call a quick stub.

Flags: blocking1.9.1?

Robert Sayre

Updated

•

16 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Jason Orendorff [:jorendorff]

Assignee

Comment 8

•

16 years ago

Correction: Gecko doesn't allow GC to happen except on the main thread. So quick stubs are not an issue. Perhaps we're ok here, but it's easy enough to fix that we really shouldn't risk it. At least wanted1.9.1.

Igor Bukanov

Comment 9

•

16 years ago

(In reply to comment #8) > Correction: Gecko doesn't allow GC to happen except on the main thread. This is not true: the last-ditch GC can run from any thread.

Igor Bukanov

Comment 10

•

16 years ago

(In reply to comment #9) > (In reply to comment #8) > > Correction: Gecko doesn't allow GC to happen except on the main thread. > > This is not true: the last-ditch GC can run from any thread. Sorry for spreading FUD: the GC callback takes care about it and prevents the GC from non-main thread indeed.

Robert Sayre

Updated

•

16 years ago

Priority: -- → P1

Jason Orendorff [:jorendorff]

Assignee

Comment 11

•

16 years ago

This can probably proceed without bug 477627, but that one looks pretty much ready to push in. I'll finish implementing this anyway and suffer through the rebase if any.

Jason Orendorff [:jorendorff]

Assignee

Comment 12

•

16 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

As this is a P1 blocker, we should probably go ahead and push it in (reversing my sentiment in comment 6).

Attachment #364555 - Attachment is obsolete: true

Attachment #372693 - Flags: review?(brendan)

Brendan Eich [:brendan]

Updated

•

16 years ago

Attachment #372693 - Flags: review?(brendan) → review+

Brendan Eich [:brendan]

Comment 13

•

16 years ago

Comment on attachment 372693 [details] [diff] [review] v2 This is pure goodness but it will conflict big-time with Igor's patch for bug 477627, which has review. You guys can probably agree on the best order to land. /be

Andreas Gal :gal

Comment 14

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/f97f196dcb58

Whiteboard: fixed-in-tracemonkey

Robert Sayre

Comment 15

•

16 years ago

http://hg.mozilla.org/mozilla-central/rev/f97f196dcb58

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Igor Bukanov

Comment 16

•

16 years ago

The landed patch calls js_LeaveTrace with the GC lock held. Is it really safe?

Igor Bukanov

Comment 17

•

16 years ago

There is a bug in the patch when it does js_LeaveTrace for all contexts on the current thread. This is bogus: JS_TRACE_ON(cx) checks if the current thread, not the JSContext instance, runs the trace. So if one have 2 contexts on the current thread (happens in the browser all the time) and the second runs the trace, then the patch will see JS_TRACE_ON(cx) for the first cx and call js_DeepBail on it. But this is very wrong since it will call js_DeepBail on JSContext that does *not* run the contexts. Given this issue and conflicts with the patch from the bug 477627 I suggest to backout this patch, land the patch from the bug 477627, and then fix this bug.

Andreas Gal :gal

Comment 18

•

16 years ago

Sounds good. Lets do it.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Igor Bukanov

Comment 19

•

16 years ago

I backed out the landed patch

Whiteboard: fixed-in-tracemonkey

Igor Bukanov

Comment 20

•

16 years ago

I suggest to change js_DeapBail so it searches JSContext that runs the trace. In most cases that would be the passed cx, but if not, a simple loop over JSThread's context list should find one.

Shawn Wilsher :sdwilsh

Comment 21

•

16 years ago

backout was merged to mozilla central as well

Jason Orendorff [:jorendorff]

Assignee

Comment 22

•

16 years ago

(In reply to comment #20) > I suggest to change js_DeapBail so it searches JSContext that runs the trace. > In most cases that would be the passed cx, but if not, a simple loop over > JSThread's context list should find one. My instinct was for js_DeepBail to assume that its argument was precisely correct. But on reflection, Igor's suggestion seems like the only way to go. Most call sites need to search for the right cx anyway. Patch coming this morning.

Jason Orendorff [:jorendorff]

Assignee

Comment 23

•

16 years ago

Attached patch v3 (obsolete) — Details — Splinter Review

Did I say morning? I meant noonish. This rebases and incorporates an ugly-looking fix for the multiple-contexts problem of comment 17. It also releases the GC lock when calling js_LeaveTrace from js_DiscountRequestsForGC (comment 16). I don't think this is strictly necessary, but it's good manners and makes for one less implicit constraint on js_LeaveTrace.

Attachment #372693 - Attachment is obsolete: true

Attachment #373183 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 24

•

16 years ago

(In reply to comment #23) > Created an attachment (id=373183) [details] > It also releases the GC lock when calling js_LeaveTrace from > js_DiscountRequestsForGC (comment 16). I don't think this is strictly > necessary, but it's good manners and makes for one less implicit constraint on > js_LeaveTrace. Ugh, I should have caught that -- yes, we should not risk lock nesting or in any event over-extend critical sections for no reason. Thanks for this save. /be

Brendan Eich [:brendan]

Comment 25

•

16 years ago

Comment on attachment 373183 [details] [diff] [review] v3 >diff --git a/js/src/jscntxt.cpp b/js/src/jscntxt.cpp >--- a/js/src/jscntxt.cpp >+++ b/js/src/jscntxt.cpp >@@ -846,16 +846,24 @@ js_WaitForGC(JSRuntime *rt) > uint32 > js_DiscountRequestsForGC(JSContext *cx) > { > uint32 requestDebit; > > JS_ASSERT(cx->thread); > JS_ASSERT(cx->runtime->gcThread != cx->thread); > >+#ifdef JS_TRACER >+ if (JS_ON_TRACE(cx)) { >+ JS_UNLOCK_GC(cx->runtime); >+ js_LeaveTrace(cx); >+ JS_LOCK_GC(cx->runtime); >+ } >+#endif >+ > requestDebit = js_CountThreadRequests(cx); Will this give up invariants from before the #ifdef that we assume after the #endif? The assertions still hold, but callers' invariants may not. Callers are: - js_GC when it detects the GC already running and on another thread. This case seems safe, we are on the way out. We could sleep for a while and any number of subsequent allocations and/or GC runs occur, and we'd be no worse off. - ClaimTitle. This one is not safe, because another thread could process the rt->titleSharingTodo list, leaving us waiting a long time (possibly forever) on rt->titleSharingDone. So the above unlock/lock, while well-motivated on general grounds, is not correct without more changes to the caller. Igor, your thoughts are welcome as usual. /be

Attachment #373183 - Flags: review?(brendan) → review-

Igor Bukanov

Comment 26

•

16 years ago

(In reply to comment #25) > Will this give up invariants from before the #ifdef that we assume after the > #endif? The assertions still hold, but callers' invariants may not. Callers > are: This is a good catch. The patch would be correct if ClaimTitle calls js_DiscountRequestsForGC right before doing PR_WaitCondVar. The latter effectively releases the GC lock so doing an extra unlock/lock before that is just fine. Now, if Claim title would be fixed to look like: requestDebit = js_DiscountRequestsForGC(cx); PR_WaitCondVar(rt->titleSharingDone, PR_INTERVAL_NO_TIMEOUT); js_RecountRequestsAfterGC(rt, requestDebit); then it will be very similar to the pattern in js_GC, which is: requestDebit = js_DiscountRequestsForGC(cx); js_RecountRequestsAfterGC(rt, requestDebit); This suggests to replace these two function by a single one that take an extra parameter to distinguish the first case from the second. That function would contain js_DeepBail call.

Jason Orendorff [:jorendorff]

Assignee

Comment 27

•

16 years ago

(In reply to comment #25) > - ClaimTitle. This one is not safe, because another thread could process the > rt->titleSharingTodo list, [...] Ah, I looked at both callers but misanalyzed. I'll revise as Igor suggests in comment 26.

Jason Orendorff [:jorendorff]

Assignee

Comment 28

•

16 years ago

On second thought I don't think comment 26 is right that it would be safe to unlock for a moment just before waiting. It is either necessary to wait or unsafe to wait; after unlocking and relocking, we must check to see which is the case.

Igor Bukanov

Comment 29

•

16 years ago

(In reply to comment #28) > It is either necessary to wait or > unsafe to wait; The PR_WaitCondVar(rt->titleSharingDone) function unlocks the GC lock and allows other threads to enter the lock in arbitrary order. Thus from code synchronization point of view PR_WaitCondVar(rt->titleSharingDone) has the same effect as JS_UNLOCK_GC(rt) JS_LOCK_GC(rt) PR_WaitCondVar(rt->titleSharingDone)

Igor Bukanov

Comment 30

•

16 years ago

(In reply to comment #27) > Ah, I looked at both callers but misanalyzed. I'll revise as Igor suggests in > comment 26. Hm, I think that perhaps we should replace js_DiscountRequestsForGC(cx) and js_RecountRequestsAfterGC(rt, requestDebit) with a class that does that in constructor/destructor. This is C++ at the end.

Igor Bukanov

Comment 31

•

16 years ago

(In reply to comment #29) > Thus from code > synchronization point of view > > PR_WaitCondVar(rt->titleSharingDone) > > has the same effect as > > JS_UNLOCK_GC(rt) > JS_LOCK_GC(rt) > PR_WaitCondVar(rt->titleSharingDone) I am wrong about this - the signal to wake up on rt->titleSharingDone can be lost between unlock/lock. So the code must check after the unlock if the title is still needs sharing.

Jason Orendorff [:jorendorff]

Assignee

Comment 32

•

16 years ago

Attached patch interdiff v3 to v4 — Details — Splinter Review

Jason Orendorff [:jorendorff]

Assignee

Comment 33

•

16 years ago

Attached patch v4 — Details — Splinter Review

Attachment #373183 - Attachment is obsolete: true

Attachment #373331 - Flags: review?(brendan)

Igor Bukanov

Comment 34

•

16 years ago

As an alternative to explicit LeaveTrace due to possible GC spread throughout the code what about calling in js_GC LeaveTrace for each thread?

Igor Bukanov

Comment 35

•

16 years ago

(In reply to comment #34) > As an alternative to explicit LeaveTrace due to possible GC spread throughout > the code what about calling in js_GC LeaveTrace for each thread? Specifically, what about calling LeaveTrace from PurgeThreadData? Note a care has to be taken there since cx argument in the function is the cx that runs the GC thread and unrelated to JSThreadData argument.

Igor Bukanov

Comment 36

•

16 years ago

Comment on attachment 373331 [details] [diff] [review] v4 > JS_FORCES_STACK JS_FRIEND_API(void) > js_DeepBail(JSContext *cx) > { > JS_ASSERT(JS_ON_TRACE(cx)); > >+ /* >+ * Exactly one context on the current thread is on trace. Find out which >+ * one. (Most callers cannot guarantee that it's cx.) >+ */ Given the comment, what about replacing JSTraceMonitor.onTrace with cx that runs the trace?

Jason Orendorff [:jorendorff]

Assignee

Comment 37

•

16 years ago

(In reply to comment #34) > As an alternative to explicit LeaveTrace due to possible GC spread throughout > the code what about calling in js_GC LeaveTrace for each thread? That might be better. (In reply to comment #36) > Given the comment, what about replacing JSTraceMonitor.onTrace with cx that > runs the trace? That sounds definitely better.

Jason Orendorff [:jorendorff]

Assignee

Comment 38

•

16 years ago

(In reply to comment #37) > (In reply to comment #34) > > As an alternative to explicit LeaveTrace due to possible GC spread throughout > > the code what about calling in js_GC LeaveTrace for each thread? > > That might be better. I think we would still have to flush the globals to the global object in order to avoid bugs in embeddings that can share a global object among threads. So this would not succeed in reducing the amount of complexity. Arguably the underlying problem here is that we have three pieces of code for leaving a request, instead of a function with some flags. > > Given the comment, what about replacing JSTraceMonitor.onTrace with cx that > > runs the trace? Filed followup bug 488874.

Brendan Eich [:brendan]

Comment 39

•

16 years ago

Sharing a global among threads is not a supported use-case. Deprecate, obsolete, remove support for it -- starting here. Does any embedder actually do this? The language has no synchronization primitives. /be

Jason Orendorff [:jorendorff]

Assignee

Comment 40

•

16 years ago

(In reply to comment #39) > Sharing a global among threads is not a supported use-case. Deprecate, > obsolete, remove support for it -- starting here. OK! > Does any embedder actually do this? The language has no synchronization > primitives. In embeddings that share any objects across threads, I would be surprised if they weren't doing it accidentally, by sharing objects whose methods refer to globals. (`scatter` too, but I won't miss it.)

Brendan Eich [:brendan]

Comment 41

•

16 years ago

scatter can adapt -- a torture test using a shared object with many threads mutating is a good test. It's a lousy embedding model, tho! /be

Igor Bukanov

Comment 42

•

16 years ago

(In reply to comment #38) > I think we would still have to flush the globals to the global object in order > to avoid bugs in embeddings that can share a global object among threads. So > this would not succeed in reducing the amount of complexity. This is not for this bug. This one is about missing DeepBail due to code that leaves the request temporarily allowing the GC to proceed on other threads. Since the problem is in the GC itself, calling DeepBail for all the threads from the GC itself will solve this bug once and for all without doing unnecessary bail calls.

Jason Orendorff [:jorendorff]

Assignee

Comment 43

•

16 years ago

I totally disagree with the line of reasoning in comment 42. The bug summary is easily changed. We don't have to let it dictate what we do here. Here's my concern. Brendan has said "a trace always runs to completion without yielding its request" as though this were a design invariant (in bug 419537, but I've also heard him mention this on IRC). If we need to have that rule, we should revise the bug summary and use the approach in v4. If not, comment 34 is the way to go.

Brendan Eich [:brendan]

Comment 44

•

16 years ago

When I say it out loud, invariants vary -- makes me superstitious ;-). We can use one bug or two. It's up to you guys. I agree the bug summary talks about the invariant more clearly (perhaps it could be clearer) than about just the GC forcing threads off trace. We should certainly consolidate common code. Don't think anyone disagrees there. /be

Igor Bukanov

Comment 45

•

16 years ago

Regarding shared-among-threads globals: if this is no longer supported, we should assert on that and replace shell's scatter with something that is similar to message-passing interface of DOM thread workers. That would be extremely helpful in debugging worker's issues.

Brendan Eich [:brendan]

Comment 46

•

16 years ago

Comment on attachment 373331 [details] [diff] [review] v4 I would have cut to the chase and changed onTrace to a JSContext *, to avoid trouble groveling through a long JSThread.contextList (which happens in Firefox). /be

Attachment #373331 - Flags: review?(brendan) → review+

Andreas Gal :gal

Comment 47

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/5d0af2376447

Whiteboard: fixed-in-tracemonkey

Robert Sayre

Comment 48

•

16 years ago

http://hg.mozilla.org/mozilla-central/rev/5d0af2376447

Status: REOPENED → RESOLVED

Closed: 16 years ago → 16 years ago

Resolution: --- → FIXED

Michael Kraft [:morac]

Comment 49

•

16 years ago

Can anyone test if applying this patch to the 1.9.1 branch prevents the testcase in bug 484590 from crashing the Shiretoko (3.5b4pre) nightly load?

Robert Sayre

Comment 50

•

16 years ago

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/e985746d2973

Keywords: fixed1.9.1

v1 16 years ago Jason Orendorff [:jorendorff] 729 bytes, patch		Details \| Diff \| Splinter Review
v2 16 years ago Jason Orendorff [:jorendorff] 2.45 KB, patch	brendan : review+	Details \| Diff \| Splinter Review
v3 16 years ago Jason Orendorff [:jorendorff] 4.80 KB, patch	brendan : review-	Details \| Diff \| Splinter Review
interdiff v3 to v4 16 years ago Jason Orendorff [:jorendorff] 2.02 KB, patch		Details \| Diff \| Splinter Review
v4 16 years ago Jason Orendorff [:jorendorff] 6.82 KB, patch	brendan : review+	Details \| Diff \| Splinter Review