Closed Bug 465030 Opened 11 years ago Closed 11 years ago

TM: Support terminating long-running scripts without using extra threads or signals.

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Keywords: fixed1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 2 obsolete files)

The bug #450000 adds support for calling operation callback for long-running traces but only when JS_OPERATION_COUNT is false forcing the embedding to implement a watchdog-like functionality like one in bug 453157.

For compatibility and for testing TM should support calls to operation callback even when JS_OPERATION_COUNT is true. For that the trace code should not only check that the operation count is zero but also decrease the count on each backward jump.
No longer depends on: 450000
Depends on: 450000
(In reply to comment #0)
> The bug #450000 adds support for calling operation callback for long-running
> traces but only when JS_OPERATION_COUNT is false forcing the embedding to
> implement a watchdog-like functionality like one in bug 453157.

You say that like it's a bad thing :-P.

> For compatibility

with what? Embeddings that do not use threads, but need trace-JITed iloop watchdogging? The js shell is the only example I can think of, and we could either special-case it, or make it JS_THREADSAFE.

> and for testing

Maybe, but it would be better if we could avoid #if'd modes in our code. We have JS_THREADSAFE already. Experience shows testing suffers on one side of such #ifs, and more than one such produces even more bitrot combination.

/be
(In reply to comment #1)
> > For compatibility
> 
> with what? Embeddings that do not use threads, but need trace-JITed iloop
> watchdogging? 

For compatibility with embeddings that do not wish to write its own watchdogs (which is not so trivial and currently cannot even be done with using only public SM API) and simply want the existing code to continue to work even with JIT enabled. Plus, until the bug 453432 is fixed, the operation callback is a must for GC-scheduling, not only for terminating the long-running scripts. 

> Maybe, but it would be better if we could avoid #if'd modes in our code.

The ifdef is already there, http://hg.mozilla.org/tracemonkey/rev/94ac046a4332 .
Attached patch v1 (obsolete) — Splinter Review
The patch adds operationCount update code before checking for its value. This slows down SunSpider benchmarks by 1% although the slowdown for bits-in-byte is 40%, see the following attachment.

Until the watchdog patch is ready I suggest to take this one. The patch will also allow to measure the real win with the watchdog thread.
Assignee: general → igor
Attachment #355762 - Flags: review?(gal)
Attached file sunspider results (obsolete) —
Summary: TM: Support terminating long-running scripts when JS_OPERATION_COUNT is true. → TM: Support terminating long-running scripts without using extra threads or signals.
Comment on attachment 355762 [details] [diff] [review]
v1

The last time we tried to push the patch it froze TUnit across 3 platforms. That should be fairly easy to reproduce, so I don't think we should give up on the original patch just yet. I am against slowing us down except if absolutely necessary for correctness.

If for product reasons we want the partial fix to make sure we don't miss b3, shaver or damon or sayrer should make that call. Bouncing the review to sayrer.

Technical criticism: why do you re-read operationCount in the guard instead of comparing against the value in "update"? The current code creates a needless store/load stall IMO.
Attachment #355762 - Flags: review?(gal) → review?(sayrer)
If this patch helps land the blocking watchdog patch, then it is ok
(In reply to comment #5)
> (From update of attachment 355762 [details] [diff] [review])
> The last time we tried to push the patch it froze TUnit across 3 platforms.
> That should be fairly easy to reproduce,

It have not froze TUnit on Linux, on Mac TUnit passed once with the patch applied while freezing at the another run, and on Windows TUnit frozen at 2 different places. This will also makes sure that the GC triggered from the branch callback after a trace exit is not responsible for the watchdog problems.

> so I don't think we should give up on
> the original patch just yet. I am against slowing us down except if absolutely
> necessary for correctness.

Well, currently the tracer lacks the functionality that was long provided by SM API. With this this patch everything is restored. As for performance hit it will just wait until the watchdog is ready - the counter update code will be conditional there.
Attachment #355762 - Flags: review?(sayrer) → review+
Comparing against update instead of a new load might reduce the 1%/40% a bit. I will measure once your patch is in.
Attached patch v2Splinter Review
The new patch removes the extra load. With it the overhead of SunSpider goes down to 0.8% with the maximum hit of 22% for bits-in-byte.
Attachment #355762 - Attachment is obsolete: true
Attachment #355763 - Attachment is obsolete: true
Attachment #355778 - Flags: review?(gal)
Attached file sunspider results
Comment on attachment 355778 [details] [diff] [review]
v2

Looks ok, thanks for the fixed nit. + as per sayrer to help landing the dependent watchdog thread patch.
Attachment #355778 - Flags: review?(gal) → review+
It is interesting that if in the patch I changed that  

  lir->ins2i(LIR_sub, counter, JSOW_SCRIPT_JUMP)

into

  lir->ins2i(LIR_sub, counter, 1)

That is, replace 4096 by 1, then the sunspider no longer shows any difference with the patch even if bits-in-byte still shows a considerable slowdown of about 20%. This change still stops long-running scripts but its precision is way off.
Fixing bug dependency.
Blocks: 453157
No longer depends on: 450000
landed in TM - http://hg.mozilla.org/tracemonkey/rev/763b96e81579
Whiteboard: fixed-in-tracemonkey
The proper change set for the landed patch in TM is http://hg.mozilla.org/tracemonkey/rev/2b946c80e545
Just checking: we have no intention of shipping b3 with this kind of regression in microbenchmarks. Right?

/be
(In reply to comment #12)
> It is interesting that if in the patch I changed that  
> 
>   lir->ins2i(LIR_sub, counter, JSOW_SCRIPT_JUMP)
> 
> into
> 
>   lir->ins2i(LIR_sub, counter, 1)
> 
> That is, replace 4096 by 1, then the sunspider no longer shows any difference
> with the patch even if bits-in-byte still shows a considerable slowdown of
> about 20%. This change still stops long-running scripts but its precision is
> way off.

So we take longer to notice long-running scripts -- our "operation weights" (OW!) are wrong. But we incur less overhead. Do I have that right? (Or is this about the larger immediate operand to a sub instruction requiring more x86 instructions to synthesize?)

Did we have a calibration system for the operation weights? I forget.

/be
(In reply to comment #16)
> Just checking: we have no intention of shipping b3 with this kind of regression
> in microbenchmarks. Right?

Right
http://hg.mozilla.org/mozilla-central/rev/763b96e81579
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.