Closed Bug 589193 Opened 14 years ago Closed 14 years ago

JM: move interruptFlags into JSCompartment

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(2 files, 4 obsolete files)

Every time a loop is entered or call is made, JM checks the context's thread's interrupt flags.  Currently, this requires a two loads (cx, then cx->thread or cx->runtime depending on JS_THREADSAFE) then a combined load/test of the interruptFlags.  If the interruptFlags were instead a field on VMFrame, only the load/test would be required.  From testing on the SS harness (branching on f.cx == NULL), this looks to be worth about 5ms on AWFY.
Revised approach for this bug.  If the interrupt flags are on the compartment then its address can be baked in directly for both JM and TM.
Summary: JM: move interruptFlags into VMFrame → JM: move interruptFlags into JSCompartment
Is that better than a stack read?
Just tested this, and I don't see any difference on the SS harness between using a compartment address and a stack address.
Stack might be better on x64/arm where its harder to load from an absolute address
Yeah, would need to do a move imm->reg + load/test on those platforms.  Can test x64, are there ARM machines around?  The only issue with keeping it on the stack is that TM needs to poll the flags too and should be able to continue baking in an address -> extra complexity with storing the flags in two places that can go out of sync.
Attached patch interrupt patch (obsolete) — Splinter Review
This patch moves interruptFlags from JSThreadData into JSCompartment.  Passes tests and trace-tests, but that doesn't mean much; how should I test this the interrupt mechanism?  Also, who should review this?  One issue is that two methods which iterate over and trigger callbacks on all threads (js_TriggerAllOperationCallbacks and BeginGCSession) now iterate over all compartments, even ones without active threads.  Near as I can tell this is benign, but I don't know.

For x64, I compared performance between a move+load/test compartment access (as in the patch) and a dummy access to the VMFrame; the version in the patch consistently outperformed the stack access by 2ms in the SS harness (I don't know why).
Assignee: general → bhackett1024
Aside: I've studied the perf of the slow script checks in JM and TM in the past. I've typically found that the cost is small but nonzero. Also, the cost is hard to understand: moving code or making small changes often has surprising effects. For example, WebKit's check is in a different position, and at least for the empty loop, if I move our (different) check to that position, we get a small win. I suspect that the difference is less in large loops than it is in small/empty loops, but I don't think I ever collected any direct evidence for that claim.
Yeah, I've also seen weird behavior and don't know how well microbenchmark perf translates to real perf with the slow script checks.  I was using the SS harness in aggregate for this patch, and there at least the speedups are (mostly) in line with what I'd expect.
Blocks: 587707
Attachment #467880 - Flags: review?(igor)
(In reply to comment #0)
> Every time a loop is entered or call is made, JM checks the context's thread's
> interrupt flags.  Currently, this requires a two loads (cx, then cx->thread or
> cx->runtime depending on JS_THREADSAFE) then a combined load/test of the
> interruptFlags. 

The flags are in JSThreadData and that can be loaded as a constant as we do not support a script running on different threads. So what prevents just embedding &JSThreadData.interruptFlag in the generated code the same way trace jit does?
My understanding (from an IRC conversation with jorendorff, brendan and gal) is that while two threads cannot run on a single compartment simultaneously, it is possible that a thread can run on a compartment, finish, and then a different thread resume execution on that compartment later.

This works fine for the tracer when baking in &JSThreadData.interruptFlags, because its data is stored per-thread (I think --- JSThreadData.traceMonitor?).  It wouldn't work for the method JIT, which stores data on the scripts themselves.  This data could then be used by different threads (and different JSThreadDatas), just not simultaneously.

Putting the flags on the compartment would let both the tracer and method JIT bake in constants, since scripts can't migrate across compartments.
(In reply to comment #10)
> My understanding (from an IRC conversation with jorendorff, brendan and gal) is
> that while two threads cannot run on a single compartment simultaneously, it is
> possible that a thread can run on a compartment, finish, and then a different
> thread resume execution on that compartment later.

For the bug 554866 I have an early patch that allows to migrate JSThread to a different native thread. That would fix performance issues with web workers where a single web worker can indeed run on a different native threads. So at least for FF embedding a pointer into JSThread would work just fine with that bug fixed.
Comment on attachment 467880 [details] [diff] [review]
interrupt patch

>diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp
> JS_PUBLIC_API(void)
> JS_TriggerOperationCallback(JSContext *cx)
>+    JS_ASSERT(cx->compartment);
>+    cx->compartment->triggerOperationCallback();

This is wrong. First cx->compartment may change at any moment so proper locking must be done here. Second the semantics of JS_TriggerOperationCallback is
to trigger the callback for *any* context running on this thread. So the code must find all compartments bound to cx->thread and trigger the callback for them. 

Also what if a thread would create a new compartment and run long-running script on it righter after JS_TriggerOperationCallback is called? In such case the new compartment would not have the interrupt flag set and the script would not be interrupted. 


> void
> js_TriggerAllOperationCallbacks(JSRuntime *rt, JSBool gcLocked)
> {
...
>+    for (JSCompartment **c = rt->compartments.begin(); c != rt->compartments.end(); ++c)
>+        (*c)->triggerOperationCallback();
> }

As with JS_TriggerOperationCallback this is problematic if a new compartment is created and run ill-looping script on some thread right after js_TriggerAllOperationCallbacks.
Attachment #467880 - Flags: review?(igor) → review-
(In reply to comment #11)
> For the bug 554866 I have an early patch that allows to migrate JSThread to a
> different native thread. That would fix performance issues with web workers
> where a single web worker can indeed run on a different native threads. So at
> least for FF embedding a pointer into JSThread would work just fine with that
> bug fixed.

When do you think this will be ready?  If 554866 will be done soon then that's a good solution, but if it's a ways off it'd be nice to fix this perf issue quickly.
(In reply to comment #12)
> Comment on attachment 467880 [details] [diff] [review]
> interrupt patch
> 
> >diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp
> > JS_PUBLIC_API(void)
> > JS_TriggerOperationCallback(JSContext *cx)
> >+    JS_ASSERT(cx->compartment);
> >+    cx->compartment->triggerOperationCallback();
> 
> This is wrong. First cx->compartment may change at any moment so proper locking
> must be done here. Second the semantics of JS_TriggerOperationCallback is
> to trigger the callback for *any* context running on this thread. So the code
> must find all compartments bound to cx->thread and trigger the callback for
> them. 
>
> Also what if a thread would create a new compartment and run long-running
> script on it righter after JS_TriggerOperationCallback is called? In such case
> the new compartment would not have the interrupt flag set and the script would
> not be interrupted. 

What is proper locking here?  The old version of JS_TriggerOperationCallback has comments indicating it can race with JS_ClearContextThread, but does not do any locking, and I don't see any locks in use when cx->compartment is written.  Related question: can JS_TriggerOperationCallback race with JS_DestroyContext?

If atomic accesses are sufficient, would this scheme work to avoid races between TriggerOperationCallback and code changing cx->compartment:

JS_TriggerOperationCallback:

ncompartment = cx->compartment; // atomic
do {
  compartment = ncompartment;
  compartment->interruptFlags = 1; // atomic
  ncompartment = cx->compartment; // atomic
} while (ncompartment != compartment);

AutoEnterCompartment (and related):

compartment = cx->compartment;
cx->compartment = newCompartment; // atomic
if (compartment->interruptFlags) // atomic
  triggerCallback();
(In reply to comment #14)
> What is proper locking here?  The old version of JS_TriggerOperationCallback
> has comments indicating it can race with JS_ClearContextThread, but does not do
> any locking,

The locking is not necessary as the caller must ensure that the passed cx is valid one and the calling thread must be in a request. This is sufficient to ensure safe access to cx->thread.

> and I don't see any locks in use when cx->compartment is written.

If we do not have a lock there, then we need to add one for the proposed schema to work. 

> 
> If atomic accesses are sufficient, would this scheme work to avoid races
> between TriggerOperationCallback and code changing cx->compartment:

How this would address the need to enumerate all compartments for the given thread atomically? If we allow compartment to migrate from one thread to another then we must ensure serialized access to cx->compartment. But I guess this also point to implicit assumption that the container cannot migrate across different JSThread instances. If so, then we should really fix the bug 554866. I will try to have a working patch in the next couple of days.
Fixing bug 554866 would take some time as it would require to make web workers thread aware so they could be grouped according to the JSThread they use for caching. So lets try to fix this bug while making sure that JS_TriggerOperationCallback calls are never lost. The latter property is essential for web workers as the main thread do not issue repeated JS_TriggerOperationCallback calls to stop or suspend them for the GC on the main thread to proceed.

As I wrote above the main problem with having the flag in the compartment is that at the moment of JS_TriggerOperationCallback the target thread may not have any compartment yet running on it or may just about to change the compartment. AFAICS one way to fix this is to duplicate the interrupt flag both in JSThread and in JSCompartment.

Then JSTriggerOperationCallback would set both the interrupt flag for the thread and for the compartment if any. To make sure that we do not loose the interrupt flag for the thread any code that enters the compartment would also need to set the interrupt flag for the compartment. JSThread would also need to a have field pointing to the currently executing compartment so JSTriggerApplicationCallback can detect the current compartment for the thread and interrupt it.
Attached patch revised patch (obsolete) — Splinter Review
Comment 16 sounds like a good plan, here's a patch.  This still has issues that reads and writes of JSThreadData.compartment are not atomic --- on a multiprocessor system it's possible to lose an update to the active compartment's flags if some reads/writes are not immediately visible (commented in the patch).  Is there an API for doing atomic reads/writes of pointer-width values?
Attachment #467880 - Attachment is obsolete: true
Attachment #469060 - Flags: review?(igor)
(In reply to comment #17)
> This still has issues that
> reads and writes of JSThreadData.compartment are not atomic --- on a
> multiprocessor system it's possible to lose an update to the active
> compartment's flags if some reads/writes are not immediately visible (commented
> in the patch).

This should not be a problem. If some loopy code would not see the flag set immediately then the next loop iteration would pick it up as writes should eventually propagate.
(In reply to comment #18)
> This should not be a problem. If some loopy code would not see the flag set
> immediately then the next loop iteration would pick it up as writes should
> eventually propagate.

The problem though is that if the reads/writes of the compartment are not synchronous then the wrong compartment's flags can get updated, which won't be picked up by the loop checks.

The strategy used in the patch is to do lock-free operations that ensure that if a triggerOperationCallback occurs simultaneously with a compartment change, the new compartment's flags will be updated.

Trigger callback:

A1 thread->flags = true
A2 compartment = thread->compartment
A3 compartment->flags = true

Set compartment:

B1 thread->compartment = new
B2 flags = thread->flags
B3 if (flags) new->flags = true

A3 and B3 don't need to be done synchronously.  If A1/A2/B1/B2 are synchronous (using shared state), then no matter how they are interleaved new->flags will be updated.  If they are *not* synchronous, the compartment might get updated without having its flags set. e.g. for this basic interleaving:

A1
A2
A3
B1
B2
B3

The read at B2 must see the write from A1 for the new compartment to have its flags updated.
Comment on attachment 469060 [details] [diff] [review]
revised patch

>diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp
>+JS_ALWAYS_INLINE void
>+JSThreadData::triggerOperationCallback()
>+{
>+    /*
>+     * Use JS_ATOMIC_SET in the hope that it will make sure the write will
>+     * become immediately visible to other processors polling the flag.
>+     * Note that we only care about visibility here, not read/write
>+     * ordering.
>+     */
>+    JS_ATOMIC_SET_MASK(&interruptFlags, INTERRUPT_OPERATION_CALLBACK);
>+
>+    /*
>+     * FIXME: Should be atomic read, so that any value written by another
>+     * processor is seen immediately.
>+     */
>+    JSCompartment *compartment = this->compartment;
>+
>+    /*
>+     * The compartment may be out of phase here with this thread data
>+     * if it has since entered a different compartment.  This is OK,
>+     * as the write to the interrupt flags on the thread itself will
>+     * be picked up when the callback called.
>+     */
>+    if (compartment)
>+        JS_ATOMIC_SET_MASK(&compartment->interruptFlags, INTERRUPT_OPERATION_CALLBACK);


This is problematic indeed (and I was wrong in the previous my comment) as another thread may see the writes here reordered. We still end up loosing the callback invocation if the target thread changes the compartments right at this moment so compartment->interruptFlags would be set on the old compartment after thread->compartment is updated.

We can fix that with read/write barriers here, it would be nice. But I have bo idea how to do that in a portable way. Plus even the read barrier on each compartment enter could be expensive and wipe out all the gains from having interruptFlags in JSCompartment. So lets slow down js_TriggerOperationCallback instead.

The idea is to add JSThread pointer to the compartment. Then the code that enters the compartment would check if compartment->thread matches the current one. If not, it would take the GC lock, set the thread and copy the interruptFlags. If yes and the thread is the same as previously, no lock is taken and interruptFlags is *not* copied from JSThread.

Then js_TriggerOperationCallback, in addition to setting JSThread::interruptMask, would also take the GC lock, loop over all compartments and for those that have matching thread, it would set the interruptMask. Effectively this replaces JSThread::compartment in the patch with JSCompartment::thread.

This may not scale well with thousands of compartments, but lets worry about that when we would get reports about that.
(In reply to comment #20)
> We can fix that with read/write barriers here, it would be nice. But I have bo
> idea how to do that in a portable way. Plus even the read barrier on each
> compartment enter could be expensive and wipe out all the gains from having
> interruptFlags in JSCompartment. So lets slow down js_TriggerOperationCallback
> instead.
> 
> The idea is to add JSThread pointer to the compartment. Then the code that
> enters the compartment would check if compartment->thread matches the current
> one. If not, it would take the GC lock, set the thread and copy the
> interruptFlags. If yes and the thread is the same as previously, no lock is
> taken and interruptFlags is *not* copied from JSThread.
> 
> Then js_TriggerOperationCallback, in addition to setting
> JSThread::interruptMask, would also take the GC lock, loop over all
> compartments and for those that have matching thread, it would set the
> interruptMask. Effectively this replaces JSThread::compartment in the patch
> with JSCompartment::thread.
> 
> This may not scale well with thousands of compartments, but lets worry about
> that when we would get reports about that.

This sounds good, will work on that.  I take it most cross-compartment calls are to a compartment already living on the same thread?

It'd be nice to know how expensive the barriers are; the locking libraries must use them, but I don't know if there are APIs for them.  At least in principle a barrier shouldn't cost more than a main memory access.
(In reply to comment #21)
> This sounds good, will work on that.  I take it most cross-compartment calls
> are to a compartment already living on the same thread?

We assume that a compartment can only be accessed from one thread. So the answer should be yes.

> 
> It'd be nice to know how expensive the barriers are; the locking libraries must
> use them, but I don't know if there are APIs for them.  At least in principle a
> barrier shouldn't cost more than a main memory access.

That could be hundreds of cycles on a fast CPU.
Ack, problem with comment 20.  If compartment->thread can be written by anyone holding the GC lock, there's not really a safe way to test it without holding the GC lock.

Simpler alternative that should keep the perf gain without needing any more locking:

Instead of putting interruptFlags on JSCompartment, keep it only on JSThread and put an interruptCounter on JSRuntime that keeps track of the number of threads that still need to service an interrupt.  Can be mutated with JS_ATOMIC_INCREMENT/DECREMENT (these will never lose an update I hope!), and tests for non-zero can be asynchronous.  This address is baked into the method JIT code, which will look like:

test-interruptCounter
branch-nonzero-to-slow-path
return-site:
run-rest-of-script

slow-path:
load-cx
load-thread
test-interruptFlags
branch-zero-to-return-site:
call-interrupt-handler
jump-to-return-site

While there is still an interrupt on some thread that needs to be serviced, other threads will run slower by a very tiny margin.
(In reply to comment #23)
> Ack, problem with comment 20.  If compartment->thread can be written by anyone
> holding the GC lock, there's not really a safe way to test it without holding
> the GC lock.

I do not see it. The test is like compartment->thread == current_thread when the compartment is entered under assumption that the compartment can only be accessed from one thread (this the caller responsibility) with lock taken if the test fails. I do not see how this could be unsafe.

> Instead of putting interruptFlags on JSCompartment, keep it only on JSThread
> and put an interruptCounter on JSRuntime that keeps track of the number of
> threads that still need to service an interrupt. 

That is a nice idea. We would need an atomic counter on JSThread as well so InvokeOperationCallback would know how much to subtract from the runtime counter in case js_TriggerOperationCallback was called several times before the callback is invoked. I.e. the rule is that we must not miss js_TriggerOperationCallback but several js_TriggerOperationCallback should be coalesced and lead to just one invocation of the callback.

Another caveat is that js_TriggerOperationCallback could be called when the thread becomes idle. In this case the runtime counter would be seen set from other threads for rather lengthy period of time. The workaround is to decrease the runtime counter whenever the thread leaves the request and to increase it again when the thread is resumed.

Again, with hundredths of threads this could be problematic, but lets deal with that when we get reports. 

> JS_ATOMIC_INCREMENT/DECREMENT (these will never lose an update I hope!), and
> tests for non-zero can be asynchronous.  This address is baked into the method
> JIT code, which will look like:

I think we can just delegate the slow path to InvokeOperationCallback at least for the method jit.
Hm, but what about moving the flag back to JSContext where it was before?
That would still need an extra load for the context for both JM and TM, right?  Problem with these slow script checks is that we do them so often, every memory access ends up costing 3ms on SS.
(In reply to comment #26)
> That would still need an extra load for the context for both JM and TM, right?

Yes, so lets forget about that suggestion.
 
> Problem with these slow script checks is that we do them so often, every memory
> access ends up costing 3ms on SS.

At the beginning of TM a run-time code patching was considered to avoid the checks. But the complexity and cost of locking was considered too great to worth the troubles. Perhaps we should reconsider it and look if not into the code patching but at least mutation of native registers to direct the code towards the counter check? That is, of cause, for another bug.
Attached patch patch using counters (obsolete) — Splinter Review
This patch uses counters on JSThreadData and JSRuntime for the number of unhandled interrupt requests in that thread / across all threads.  JM polls the runtime's counter, and if that hits does another OOL check for the counter on the current thread before going to the stub call.  I get these times for SS:

Existing: 364.3ms
New (no interrupts): 358.9ms
New (always interrupts on other threads): 368.9ms
New (always interrupts on other threads, with no OOL fast path): 407.4ms
Attachment #469060 - Attachment is obsolete: true
Attachment #469577 - Flags: review?(igor)
Attachment #469577 - Flags: review?(dvander)
Attachment #469060 - Flags: review?(igor)
Comment on attachment 469577 [details] [diff] [review]
patch using counters

>diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp
>@@ -621,16 +621,19 @@ JSRuntime::init(uint32 maxbytes)
>+
>+    interruptCounter = 0;

This is not necessary - all runtime members are zeroed initially.

> JSBool
> js_InvokeOperationCallback(JSContext *cx)
> {
>+    JSRuntime *rt = cx->runtime;
>+    JSThreadData *td = JS_THREAD_DATA(cx);
>+
>     JS_ASSERT_REQUEST_DEPTH(cx);
>-    JS_ASSERT(JS_THREAD_DATA(cx)->interruptFlags & JSThreadData::INTERRUPT_OPERATION_CALLBACK);
>+    JS_ASSERT(td->interruptCounter != 0);
> 
>     /*
>-     * Reset the callback flag first, then yield. If another thread is racing
>+     * Reset the callback counter first, then yield. If another thread is racing
>      * us here we will accumulate another callback request which will be
>      * serviced at the next opportunity.
>      */
>-    JS_ATOMIC_CLEAR_MASK(&JS_THREAD_DATA(cx)->interruptFlags,
>-                         JSThreadData::INTERRUPT_OPERATION_CALLBACK);
>+    while (td->interruptCounter) {
>+        JS_ATOMIC_DECREMENT(&td->interruptCounter);
>+        JS_ATOMIC_DECREMENT(&rt->interruptCounter);
>+    }

We need a real lock here like gcLock and in triggerOperationCallback around the increments and decrements. Otherwise another thread can call the trigger right in the middle of decrements in the above loop leading to the counters going out of sync. 

The patch also do not address the issue of always set runtime counter when the trigger is called when some thread is about to leave the request and suspend. The above numbers shows that the corresponding slowdown is real if small.
Attachment #469577 - Flags: review?(igor) → review-
(In reply to comment #29)
> We need a real lock here like gcLock and in triggerOperationCallback around the
> increments and decrements. Otherwise another thread can call the trigger right
> in the middle of decrements in the above loop leading to the counters going out
> of sync. 

The counters will be temporarily out of sync regardless in the middle of this loop, but that's OK as long as the correct state is restored later.  Since increments and decrements are one-to-one with each other, this will happen even without locking, provided that the JS_ATOMIC_* operations are indeed atomic.

> The patch also do not address the issue of always set runtime counter when the
> trigger is called when some thread is about to leave the request and suspend.
> The above numbers shows that the corresponding slowdown is real if small.

I don't see how the workaround in comment 24 would work for this, when a thread could receive an arbitrary number of interrupt requests as it is suspending.  Is there a way to tell from a JSThreadData whether it is suspended?  If so this could be fixed by locking the GC whenever the thread is suspended or an interrupt is triggered, adjusting the runtime's counter so it doesn't reflect the counter in suspended threads.
(In reply to comment #30)
> (In reply to comment #29)
> > We need a real lock here like gcLock and in triggerOperationCallback around the
> > increments and decrements. Otherwise another thread can call the trigger right
> > in the middle of decrements in the above loop leading to the counters going out
> > of sync. 
> 
> The counters will be temporarily out of sync regardless in the middle of this
> loop, but that's OK as long as the correct state is restored later.  Since
> increments and decrements are one-to-one with each other, this will happen even
> without locking, provided that the JS_ATOMIC_* operations are indeed atomic.

You right - at worst after the counter decreasing loop we would have a disbalance like sum_all_JSThread::counter != rt->interruptCounter. But that would be restored when all the threads finishes their trigger calls.

> I don't see how the workaround in comment 24 would work for this, when a thread
> could receive an arbitrary number of interrupt requests as it is suspending. 

If trigger/invoke calls would take the GC lock when increasing/decreasing the counter, then the trigger can also see that the thread is not in a request and do not touch the runtime counter if so. This assumes that request exit/enter code would also decrease/increase the runtime counter.

But the lock could also make thinks simpler as the code could use a flag in JSThread and increment/decrement the runtime counter only on flag transitions.

Note also that the GC lock is taken in any case in js_InvokeOperationCallback so with a proper implementation the locking penalty would be only in the trigger call. 

> Is there a way to tell from a JSThreadData whether it is suspended?

JSThread::requestContext is null if thread is not in a request. But that would change after landing the bug 477999.
Attached patch revised patch (obsolete) — Splinter Review
This patch protects callback triggers with the GC lock, so that the thread's request depth can be tested for non-zero and to establish invariant that runtime->interruptCounter is the sum of the interrupt counters for all contexts with active requests.
Attachment #469577 - Attachment is obsolete: true
Attachment #476015 - Flags: review?(igor)
Attachment #476015 - Flags: review?(dvander)
Attachment #469577 - Flags: review?(dvander)
(In reply to comment #32)
> Created attachment 476015 [details] [diff] [review]
> revised patch

Quick question: why the patch moves requestDepth into JSThreadData?
JSThreadData.triggerOperationCallback needs to know the request depth to determine whether to update runtime->interruptCounter, but the JSThread itself isn't always known when triggering.  Mainly js_TriggerAllOperationCallbacks, which iterates over the runtime's ThreadDatas.

Also, just noticed (at least) one caller of JS_TriggerOperationCallback already has the GC lock held when calling (XPCJSRuntime::WatchdogMain).  Is this OK, or should JS_TriggerOperationCallback have a 'JSBool gcLocked' parameter?
Comment on attachment 476015 [details] [diff] [review]
revised patch

>@@ -1612,16 +1609,19 @@ struct JSRuntime {
> 
>+    /* Number of unhandled interrupts across threads with active requests. */
>+    volatile jsword     interruptCounter;
>+

The field should be inside JS_THREADSAFE ifdef.

>+JS_ALWAYS_INLINE void
>+JSThreadData::triggerOperationCallback(JSRuntime *rt)
>+{
>+    /*
>+     * Use JS_ATOMIC_INCREMENT in the hope that it will make sure the write will
>+     * become immediately visible to other processors polling the flag.
>+     * Note that we only care about visibility here, not read/write
>+     * ordering.
>+     */
>+    JS_ATOMIC_INCREMENT(&interruptCounter);
>+
>+    /* rt->interruptCounter does not reflect suspended threads. */
>+#ifdef JS_THREADSAFE
>+    if (requestDepth != 0)
>+#endif
>+        JS_ATOMIC_INCREMENT(&rt->interruptCounter);

This will race against the thread entering or leaving the request that can change requestDepth. I do not see how one can avoid taking the GC lock here. And if one takes a lock, then JSThreadData::interruptCounter can be a boolean flag again.
Attachment #476015 - Flags: review?(igor) → review-
All callers hold the GC lock before calling JSThreadData.triggerOperationCallback.  Comment 34 is asking whether the GC lock is reentrant, i.e. it is OK to take when already held.

Currently JM checks the interrupt flags even in non-JS_THREADSAFE builds.  Is this needed at all?  If it is needed, the runtime needs an interrupt counter for the JIT to test.
(In reply to comment #34)
> Also, just noticed (at least) one caller of JS_TriggerOperationCallback already
> has the GC lock held when calling (XPCJSRuntime::WatchdogMain).  Is this OK, or
> should JS_TriggerOperationCallback have a 'JSBool gcLocked' parameter?

We should not make JS_TriggerOperationCallback taking any gclocked parameter. If necessary, just make a JS_FRIEND_API function that the watchdog can call. That friend can take the gcLocked param.
(In reply to comment #36)
> Currently JM checks the interrupt flags even in non-JS_THREADSAFE builds.  Is
> this needed at all? 

Yes, it is needed.

> If it is needed, the runtime needs an interrupt counter
> for the JIT to test.

What is wrong with JSThreadData::interruptCounter shared with thread and non-thread version?
(In reply to comment #38)
> What is wrong with JSThreadData::interruptCounter shared with thread and
> non-thread version?

JM can't specialize the JIT code on a particular JSThreadData, so it currently has to load the context and thread first.  Checking the runtime allows it to bake in an address and avoid 2 loads x many millions of interrupt checks.
(In reply to comment #39)
> (In reply to comment #38)
> > What is wrong with JSThreadData::interruptCounter shared with thread and
> > non-thread version?
> 
> JM can't specialize the JIT code on a particular JSThreadData, so it currently
> has to load the context and thread first.

In !JS_THREADSAFE builds there is one JSTHreadData so JM could just embed the corresponding pointer.
I think we should just move the interrupt flag into the compartment. It will be more feature proof especially in view of the compartment GC when the code may want to trigger compartment-specific GC.
That sounds good, and also comes full circle to the first patch (and bug title).  For the behavior you describe in comment 12 (trigger affects on all contexts on the calling threads), could this be done with:

- If JS_THREADSAFE, add owning JSThread* to JSCompartment, protected by GC lock.
- When triggering a single operation callback, take GC lock, iterate and set for all compartments in the runtime for ones matching the specified thread.
- When triggering all operation callbacks, take GC lock, iterate and set for all compartments in the runtime.

With many compartments, the second could be made faster by keeping a vector of compartments on JSThread protected by GC lock, then only that list would be iterated.

Does this sound right?
(In reply to comment #42)
> With many compartments, the second could be made faster by keeping a vector of
> compartments on JSThread protected by GC lock, then only that list would be
> iterated.

With  vector there is a scalability issue about removing a
compartment from the vector. As the compartment can only
present on a single thread, a doubly-linked list would be
a better option. That list would also allow o reset the
interrupt flag quickly from the callback invocation.

But then again this increases the complexity and raises the
issues like should operation callback be called per each
interrupted compartment? So what about just storing the
address of JSThread::interruptFlag in the container for now
without moving the interrupt flag? The container will update
the field when it is entered. This will eliminate one load in
JM without changing any operation callback semantics.

Farther optimizations can be done in another bug while paying
attention to the browser needs. The observation is that it may
not be necessary to ensure the exact current semantics in the
new world of compartments. 

For example, the web workers always know the current compartment
that is executed on the current thread so they may need API to
interrupt just that compartment. For the main thread we should
consider what exactly long-running script means in presence of
cross-compartments calls. Perhaps it is always possible to
enumerate the compartments that can contribute to a long-running
scripts?
Every load in the interrupt check path costs about 2.5ms on SS, which is a big number given the margins we're working with.  It is important to be as fast as possible here.

The patch in comment 32 preserves the semantics that the callback is only called once after doing one or more triggers.  That patch is correct as is (except maybe for XPCJSRuntime::WatchdogMain), why not go with that strategy and if the semantics of JS_TriggerOperationCallback change sometime in the future then the code can change with it.
(In reply to comment #44)
> Every load in the interrupt check path costs about 2.5ms on SS, which is a big
> number given the margins we're working with.  It is important to be as fast as
> possible here.

But that patch also increases the code footprint and the corresponding code cache pressure as for optimal performance it includes checks for both rt->interruptFlags and cx->thread->interruptFlags. That is why I suggest to try that idea of storing the address of the current &thread->interruptFlags in the compartment. Yes, it will add an extra load, but that would be a load with 0 offset and the jited code would be shorter. 

If that would be still slower, then lets fix the synchronization issues in the last patch.
That would still be slower.  JM emits code into two different buffers, an inline buffer and out of line buffer.  The check for rt->interruptFlags is in the inline buffer, the check for cx->thread->interruptFlags is in the out of line buffer.  Adding code to the out of line buffer doesn't impact the footprint of the loop itself, and shouldn't impact performance except when that out of line code actually runs.
(In reply to comment #46)
> That would still be slower. 

OK, lets fix then synchronization issues with JSRuntime::interrupt.
Attached patch patchSplinter Review
This changes JSThreadData.interruptCounter back to interruptFlags (protected by the GC lock), and adds JSRuntime.interruptCounter only under JS_THREADSAFE.
Attachment #476015 - Attachment is obsolete: true
Attachment #476383 - Flags: review?(igor)
Attachment #476383 - Flags: review?(dvander)
Attachment #476015 - Flags: review?(dvander)
Comment on attachment 476383 [details] [diff] [review]
patch

>+    RegisterID reg = frame.allocReg();

This reg doesn't get used on x86 & ARM, in non-TS builds, but it doesn't matter, right? We can't cause spills here because interrupts are at safe points?

>+#ifdef JS_CPU_X86
>+    Jump jump = masm.branch32(Assembler::NotEqual, AbsoluteAddress(interrupt), Imm32(0));
>+#else
>+    /* Handle processors that can't load from absolute addresses. */
>+    masm.move(ImmPtr(interrupt), reg);
>+    Jump jump = masm.branchTest32(Assembler::NonZero, Address(reg, 0));
>+#endif

You can include JS_CPU_ARM with x86, from the looks of MacroAssemblerARM.

r=me on method JIT changes
Attachment #476383 - Flags: review?(dvander) → review+
Comment on attachment 476383 [details] [diff] [review]
patch

>+JS_ALWAYS_INLINE void
>+JSThreadData::triggerOperationCallback(JSRuntime *rt)
>+{
>+    /*
>+     * Use JS_ATOMIC_SET in the hope that it will make sure the write will
>+     * become immediately visible to other processors polling the flag.
>+     * Note that we only care about visibility here, not read/write
>+     * ordering: this field can only be written with the GC lock held.
>+     */
>+    if (interruptFlags)
>+        return;
>+    JS_ATOMIC_SET(&interruptFlags, 1);
>+
>+#ifdef JS_THREADSAFE
>+    /* rt->interruptCounter does not reflect suspended threads. */
>+    if (requestDepth != 0)
>+        JS_ATOMIC_INCREMENT(&rt->interruptCounter);
>+#endif

Nit: update the comments like:

>+    /*
>+     * Use JS_ATOMIC_SET and JS_ATOMIC_INCREMENT in the hope that it will make sure the write will
>+     * become immediately visible to other processors polling the fields.
>+     * Note that we only care about visibility here, not read/write
>+     * ordering: the fields can only be written with the GC lock held.
>+     */

Thanks for a nice work here!
Attachment #476383 - Flags: review?(igor) → review+
Depends on: 598279
I see strict aliasing warnings now:

/Users/sayrer/dev/tm/js/src/jscntxt.h: In member function ‘void JSThreadData::triggerOperationCallback(JSRuntime*)’:
/Users/sayrer/dev/tm/js/src/jscntxt.h:3247: warning: dereferencing type-punned pointer will break strict-aliasing rules
/Users/sayrer/dev/tm/js/src/jscntxt.h:3252: warning: dereferencing type-punned pointer will break strict-aliasing rules
Attached patch jsint patchSplinter Review
This should fix the warnings.  For some reason JS_ATOMIC_SET_MASK takes a word, but JS_ATOMIC_SET and JS_ATOMIC_INCREMENT take int32s (shouldn't be a current problem for 64-bit platforms, the top 32 bits are always zeroed and both JITs are already doing 32-bit loads).
Attachment #477189 - Flags: review?(igor)
Attachment #477189 - Flags: review?(igor) → review+
Would prefer int32, no big deal.

/be
http://hg.mozilla.org/tracemonkey/rev/c99a94c310f1

Also, d20abbebe373 turned out to be 5ms on SS (as predicted).
http://hg.mozilla.org/mozilla-central/rev/c99a94c310f1
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
It looks like the followup fix for this has somehow gotten backed out.  At least rev 6d88ce7787ad from today has the jswords there, not the 32-bit ints...
Oh, nevermind.  This just got relanded, didn't it?  Without the jsint bit...
Sorry, just relanded the int32 fix.  This was backed out earlier this week as a test for bug 601725, then relanded today.
Blocks: 621140
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: