Open Bug 1241964 Opened 8 years ago Updated 2 years ago

Debugger runnables should be able to interrupt running scripts.

Categories

(DevTools :: Debugger, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: ejpbruel, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

Unfortunately, we have some known issues. Unlike scripts on the main thread, worker scripts can contain infinite loops. While the script is inside a loop, no runnables are executed, which means we cannot interact with the debugger. To avoid this, debugger runnables should be able to interrupt running scripts.
Blocks: dbg-worker
This was identified as particularly important for Emscripten, because we are looking to develop a compilation mode where applications could run their own main loops in a worker, in which case their main thread (running in a worker) will not yield back at all throughout the execution of the page.

Eddy, we were wondering in the games+devtools meeting today about the size/complexity of this bug item? Do you have a rough estimate that you could throw in here?
Flags: needinfo?(ejpbruel)
(In reply to Jukka Jylänki from comment #1)
> This was identified as particularly important for Emscripten, because we are
> looking to develop a compilation mode where applications could run their own
> main loops in a worker, in which case their main thread (running in a
> worker) will not yield back at all throughout the execution of the page.
> 
> Eddy, we were wondering in the games+devtools meeting today about the
> size/complexity of this bug item? Do you have a rough estimate that you
> could throw in here?

I expect this to be relatively easy to implement. We already have something called control runnables in workers, which are responsible for things like shutdown. These runnables use the interrupt callback to make sure that they are always handled, even when the worker is running an infinite loop.

Since debugger runnables are their own separate class of runnables, theoretically all we need to do is to invoke the interrupt callback for these runnables as well.
Flags: needinfo?(ejpbruel)
I don't know what your timetable is for this, but I can make a first pass at this bug next week if necessary.
Attached patch patch (obsolete) — Splinter Review
This patch extends the interrupt handler for workers to also process pending debugger runnables.
Attachment #8717465 - Flags: review?(khuey)
The test in the above patch currently has one shortcoming: it works even if we don't request the interrupt callback to be called in DispatchDebuggerRunnable. Apparently, something else in the test triggers the interrupt callback even if we don't explicitly request it.

It would be nice if we could set up the test so that nothing else triggers the interrupt callback. That way, the test will only succeed if we explicitly request the interrupt callback to be called in DispatchDebuggerRunnable.

Kyle, do you have any idea what could trigger the interrupt callback in the test, other than DispatchDebuggerRunnable?
Flags: needinfo?(khuey)
I don't think we want the interrupt handler to process pending debugger runnables.  To give a trivial counterexample, what if we're already running a debugger runnable?  The case of "content script is ilooping and we need to preempt it to enter the debugger" needs to be handled separately.

The interrupt callback in your test is triggered by the GC timer firing.  In general I don't think you'll be able to reliably test the interrupt behavior here, but I don't think you actually want the interrupt behavior here either, as stated above.
Flags: needinfo?(khuey)
Comment on attachment 8717465 [details] [diff] [review]
patch

Review of attachment 8717465 [details] [diff] [review]:
-----------------------------------------------------------------

r- per comment 6

What I think you actually want here is a WorkerControlRunnable subclass (thus, a preempting runnable) that checks to see if the debugger is already in control, and if not, tells the Worker to not return to script, but instead enter the debugger.  IMO that should be done by setting a flag that InterruptCallback checks before returning, rather than the runnable itself spinning up the event loop, because we need to keep processing control runnables that end up in the queue *after* the "hand control to the debugger" control runnable.
Attachment #8717465 - Flags: review?(khuey) → review-
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> I don't think we want the interrupt handler to process pending debugger
> runnables.  To give a trivial counterexample, what if we're already running
> a debugger runnable?  The case of "content script is ilooping and we need to
> preempt it to enter the debugger" needs to be handled separately.
> 
> The interrupt callback in your test is triggered by the GC timer firing.  In
> general I don't think you'll be able to reliably test the interrupt behavior
> here, but I don't think you actually want the interrupt behavior here
> either, as stated above.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #7)
> Comment on attachment 8717465 [details] [diff] [review]
> patch
> 
> Review of attachment 8717465 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r- per comment 6
> 
> What I think you actually want here is a WorkerControlRunnable subclass
> (thus, a preempting runnable) that checks to see if the debugger is already
> in control, and if not, tells the Worker to not return to script, but
> instead enter the debugger.  IMO that should be done by setting a flag that
> InterruptCallback checks before returning, rather than the runnable itself
> spinning up the event loop, because we need to keep processing control
> runnables that end up in the queue *after* the "hand control to the
> debugger" control runnable.

How exactly would this control runnable be used? Do we have to dispatch it to the worker each time after we dispatched a debugger runnable to the worker, to ensure the latter gets processed? That seems like a lot of overhead.

Also, what do you mean by entering the debugger? Entering a (nested) debugger event loop? The existing debugger event loops are used to simulate a thread pause, which means the debugger needs to enter/leave it explicitly. If we enter such a debugger event loop from the interrupt callback, the debugger server won't know to leave it later on.

It sounds like what you're suggesting is for this control runnable to set a flag to tell the interrupt callback to process any pending debugger runnables, with an additional flag to make sure that this is not a re-entrant call to the interrupt callback, which would mean we're already processing debugger runnables.

However, I don't think there's ever a case where we have pending debugger runnables in the interrupt callback, but we don't want to process them. Dispatching an extra control runnable  to tell the interrupt callback to process debugger runnables thus seems redundant to me.

Wouldn't it be enough to simply add the reentrancy guard to the interrupt callback, so that it will:
1. Process all control runnables
2. Process all debugger runnables, *unless* this is a reentrant call to the interrupt callback
Flags: needinfo?(khuey)
Hmmm. Not processing debugger runnables in the interrupt callback if the debugger is already in control is not completely trivial. Consider the following scenario:

Let's say we have a worker that runs in an infinite loop, but it's currently paused in the debugger. We dispatch two debugger runnables to the worker: the first will cause the worker to resume running, the second will do something arbitrary, like evaluating an expression.

Dispatching the debugger runnables to the worker will cause the interrupt callback to be scheduled (either directly, as in my patch, or via a control runnable, as you suggested). Before the interrupt callback ends up being called, however, the main event loop starts processing the first debugger runnable.

While the first debugger runnable is being processed, the interrupt callback is called. It sees that we're already processing a debugger runnable (i.e. the debugger is currently in control), so it won't attempt to process the second debugger runnable.

After the interrupt callback returns, we finish processing the first debugger runnable, and the worker starts running again. Since the worker is running in an infinite loop, we will never end up back in the main event loop. And since the interrupt callback was already called, it won't be called again for the second debugger runnable. As a result, the second debugger runnable will never be processed.

Perhaps we need a rule that if there are any pending debugger runnables when the interrupt callback is called, but the debugger is already in control, the interrupt callback reschedules itself (assuming that's legal) to ensure these pending debugger runnables are processed?
(In reply to Eddy Bruel [:ejpbruel] from comment #9)
> Hmmm. Not processing debugger runnables in the interrupt callback if the
> debugger is already in control is not completely trivial. Consider the
> following scenario:
> 
> Let's say we have a worker that runs in an infinite loop, but it's currently
> paused in the debugger. We dispatch two debugger runnables to the worker:
> the first will cause the worker to resume running, the second will do
> something arbitrary, like evaluating an expression.

Why would we do that?  How does evaluating an expression do anything sensible if the debuggee is not at a fixed execution point?  Setting a breakpoint seems like a more realistic use case, so I'm going to use that here.

> Dispatching the debugger runnables to the worker will cause the interrupt
> callback to be scheduled (either directly, as in my patch, or via a control
> runnable, as you suggested). Before the interrupt callback ends up being
> called, however, the main event loop starts processing the first debugger
> runnable.

I don't think the control runnable is a good idea anymore because of comment 8.

> While the first debugger runnable is being processed, the interrupt callback
> is called. It sees that we're already processing a debugger runnable (i.e.
> the debugger is currently in control), so it won't attempt to process the
> second debugger runnable.

Right.

> After the interrupt callback returns, we finish processing the first
> debugger runnable, and the worker starts running again. Since the worker is
> running in an infinite loop, we will never end up back in the main event
> loop. And since the interrupt callback was already called, it won't be
> called again for the second debugger runnable. As a result, the second
> debugger runnable will never be processed.

Yes, the debugger needs to somehow ensure the debugger queue is empty before returning to content script.  The event queue should always run debugger runnables preferentially to regular runnables, right?  So if this happens at the top level of the event queue (i.e. DoRunLoop) we're ok.  So this only matters if we're in a debugger event loop (i.e. EnterDebuggerEventLoop) where we will return to content script without exhausting the debugger queue.  Let's just not do that?

> Perhaps we need a rule that if there are any pending debugger runnables when
> the interrupt callback is called, but the debugger is already in control,
> the interrupt callback reschedules itself (assuming that's legal) to ensure
> these pending debugger runnables are processed?

I don't really see how we could reschedule the interrupt callback.  Why can't we just make EnterDebuggerEventLoop not exit the nested event queue until there are no debugger runnables pending?

Another question to contemplate.  Say we enter the debugger off a breakpoint, and trigger an evaluation of script.  Do we want to be able to interrupt that script to process debugger runnables?
Flags: needinfo?(khuey)
Marking this as P1, mainly because I'm actively working on it, but also because Emscripten wants to move to an execution model where the main thread runs in a worker, in which case this feature becomes absolutely necessary.
Priority: -- → P1
Assignee: nobody → ejpbruel
Attached patch WIPSplinter Review
Here is a new patch for you to consider.

After thinking it through, I don't think we need to ensure that the debugger queue is empty before leaving a nested event loop. I've added a comment to the patch that explains my reasoning: in a nutshell: either there is still an interrupt callback on the stack, which will process all debugger runnables before returning to worker code, or we are processing a debugger runnable from the main run loop, in which case we cannot be inside an infinite loop.

I am still unsure what to do with the case where we are using the debugger to evaluate code that contains an infinite loop. We could either allow such code to be interrupted (which is what the current patch does), but that would cause us to violate run-to-completion semantics for the debugger (since we can end up processing debugger runnables from an interrupt handler that interrupts worker code, while there we are still processing a debugger runnable from a previous interrupt handler that interrupted worker code on the stack).

On the main thread, code that's running in an infinite loop will eventually trigger the slow script dialog. Could we do something similar for workers, except for top-level code? (i.e. while there is no debugger code on the stack).
Attachment #8717465 - Attachment is obsolete: true
Attachment #8728502 - Flags: feedback?(khuey)
Comment on attachment 8728502 [details] [diff] [review]
WIP

Ugh. I incorrectly assumed that if there is debugger code on the stack, there must either be an interrupt handler on the stack, or we must be inside the main event loop.

That's trivially not true, if you set a breakpoint inside the infinite loop. So I'm canceling the feedback request for this patch.
Attachment #8728502 - Flags: feedback?(khuey)
I just realised that ensuring that the debugger queue is empty before we exit a nested event loop is not going to be enough.

A nested event loop is always created by debugger code, usually within some event handler (i.e. a breakpoint was hit, a new script was created, etc). As a result, whenever we return from a nested event loop, the next thing on top of the stack will be debugger code. If we just happen to dispatch a debugger runnable then, and the interrupt callback triggers before we return from that event handler, the debugger runnable will never be processed.

In light of the above, I think we need a different approach here. If the interrupt callback is triggered from debugger code, could we schedule another interrupt callback then? That way, the interrupt callback is guaranteed to eventually trigger when it is interrupting worker code.
Flags: needinfo?(khuey)
I just had a talk about this with Shu on irc. He suggested that we could set up a watchdog when the debugger is active in a worker. That way, even if there are still debugger runnables on the queue when we return to worker code, and the interrupt callback has already been called, it will eventually be called again.

This solution is not ideal, but its the best solution I can think of so far. Interrupting JavaScript code is expensive, so there will likely be some slowdown due to this. In addition, we can't make the watchdog interval too large, because we want the debugger runnables to be processed asap.
Is there a hook that we can use to detect when we're leaving a compartment? If so, we could detect when we're leaving a debugger compartment and returning to a worker compartment, and make sure to request the interrupt callback if there are still pending debugger runnables at that point.

If possible, this sounds like it would be a lot less complex than setting up a watchdog thread for workers.
There isn't a hook for leaving a compartment. A watchdog timer would introduce
yet another source of nondeterminism. But I don't think either of these are
necessary.

First of all, let's just assume for all considerations that the debuggee's event
queue is suspended/resumed by the EventLoopStack preNest/postNest hooks in the
worker. I'm pretty sure this is already how it works.

So: the effect we need is for the debuggee to behave as if it returns to the
event loop frequently.

If we have that, then everything must work just as it does on the main thread.
The only difference between a worker thread and ordinary content code is that
the worker thread doesn't have any slow page timer killing it if it runs for too
long. If we can make the worker debuggee behaves as if it returns to the event
loop frequently, we've removed that difference.

This is why Shu's suggestion is correct: if there is, literally, a watchdog
timer forcing the debuggee to check the event loop from time to time, then that
meets the condition.

This is why Eddy's suggestion of a compartment switch hook is also correct: if
we could schedule an interrupt just before returning to the debuggee whenever
the event loop is non-empty, and schedule an interrupt when events arrive, then
it would be as if the debuggee is always spinning the event loop, so that also
meets the condition.

SpiderMonkey very much wants compartment switches to be quick; they're not
loaded down with very much at all. So it's probably necessary to look at the
more specific action of resuming debuggee code.

[Coffeeshop is closing, more later]
How about we don't make any debugger runnables preempt the worker script, instead we arm a timer to interrupt the script if we haven't processed the debugger runnables in N seconds?  This is basically recreating the slow script dialog behavior.
Flags: needinfo?(khuey) → needinfo?(ejpbruel)
[continuing... ]

We only have a problem when the interrupt callback is invoked from debugger code
that will run (either call or return to) debuggee code; we can't process it
immediately, to respect the debugger's run-to-completion rule, so we have to put
it off.

We shouldn't focus too much on returning from Debugger handlers to the debuggee.
Although that is one way we can lose control, if there are ever
non-Debugger-mediated calls from the debuggee to debugger code (other sorts of
hooks or handlers, for example), those debuggee continuations need to be caught
as well.

Furthermore, it's not just returns that matter. D.O.p.call and D.F.p.eval can
also transfer control from the debugger to an infinite debuggee loop. (Shu's
DebuggeeWouldRun checks should catch any debugger->debuggee calls that don't go
through Debugger.)

Having taken more time to think things through, Shu's watchdog timer idea seems
appealingly simple. It uses existing interfaces, in the same way that they're
used on the main thread. I looked at the code and found that
JS_RequestInterruptCallback can actually be *ignored* in some cases on Windows:

https://hg.mozilla.org/mozilla-central/file/37e64a9ae974/js/src/asmjs/WasmSignalHandlers.cpp#l1343

This means that it is *only* correct to use JS_RequestInterruptCallback in a way
that is going to be periodically retried, like a watchdog timer.

The other arrangement that could work would be to hook up the "pause" UI element
to something that calls JS_RequestInterruptCallback on the worker directly, and
indeed have the interrupt callback simply do nothing if it interrupts debugger
code, and invoke a debugger callback if it interrupts debuggee code. Almost all
the time, we'll either be in the debugger event loop or running debuggee code,
so it'll work; when it doesn't, the developer can just mash the button again and
get lucky this time. That'd be a pain in the neck to test, though.

A more deterministic approach would probably involve some way for the interrupt
callback, when it finds itself interrupting debugger code, to defer itself until
control enters debuggee code --- either by call or by return. That should
probably be hooked into or closely related to the code that pushes and pops
js::Activations, since those are relatively expensive transitions that are
guaranteed to happen on debugger->debuggee transitions. This would be a
SpiderMonkey facility that we could test aggressively in jsapi-tests, and
perhaps expose to fuzzers and jit-tests with something in TestingFunctions.cpp.
(In reply to Jim Blandy :jimb from comment #19)
> Having taken more time to think things through, Shu's watchdog timer idea
> seems
> appealingly simple. It uses existing interfaces, in the same way that they're
> used on the main thread. I looked at the code and found that
> JS_RequestInterruptCallback can actually be *ignored* in some cases on
> Windows:
> 
> https://hg.mozilla.org/mozilla-central/file/37e64a9ae974/js/src/asmjs/
> WasmSignalHandlers.cpp#l1343
> 
> This means that it is *only* correct to use JS_RequestInterruptCallback in a
> way
> that is going to be periodically retried, like a watchdog timer.

Uhhhhh this is *not* ok.  None of our existing uses of the interrupt callback (including worker termination, which is needed for shutdown) retry.
Actually, I guess at some point we changed the XPConnect watchdog to retry.  But workers definitely don't ...
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> Uhhhhh this is *not* ok.  None of our existing uses of the interrupt
> callback (including worker termination, which is needed for shutdown) retry.

Yeah, I was kind of shocked. Changing the function's contract to "will make an effort" is a big deal.

Luke, can you confirm that I described the current behavior of JS_RequestInterruptCallback correctly in comment 19?
Flags: needinfo?(luke)
(In reply to Jim Blandy :jimb from comment #19)
> A more deterministic approach would probably involve some way for the interrupt
> callback, when it finds itself interrupting debugger code, to defer itself until
> control enters debuggee code --- either by call or by return.

Oh, this is pointless, if JS_RequestInterruptCallback isn't trustworthy.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #18)
> How about we don't make any debugger runnables preempt the worker script,
> instead we arm a timer to interrupt the script if we haven't processed the
> debugger runnables in N seconds?  This is basically recreating the slow
> script dialog behavior.

I like the simplicity of that idea, but I'm somewhat worried about pathological scenarios where interacting with the debugger becomes really slow because almost every runnable has to wait for this periodic interrupt before it can be processed. I'm not sure if such pathological scenarios could actually happen, since we always schedule an interrupt callback when posting the runnable, so this would only be a problem if the scheduled interrupts somehow (almost) always end up interrupting debugger code.

In any case, this seems like another form of non-deterministic behavior, so we should consider it carefully.
Flags: needinfo?(ejpbruel)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> (In reply to Jim Blandy :jimb from comment #19)
> > Having taken more time to think things through, Shu's watchdog timer idea
> > seems
> > appealingly simple. It uses existing interfaces, in the same way that they're
> > used on the main thread. I looked at the code and found that
> > JS_RequestInterruptCallback can actually be *ignored* in some cases on
> > Windows:
> > 
> > https://hg.mozilla.org/mozilla-central/file/37e64a9ae974/js/src/asmjs/
> > WasmSignalHandlers.cpp#l1343
> > 
> > This means that it is *only* correct to use JS_RequestInterruptCallback in a
> > way
> > that is going to be periodically retried, like a watchdog timer.
> 
> Uhhhhh this is *not* ok.  None of our existing uses of the interrupt
> callback (including worker termination, which is needed for shutdown) retry.

Oh dear. This bug seems to be escalating quite rapidly.

If we have to refactor the existing code to schedule the interrupt callback periodically either way, perhaps we could kill two bird with one stone and use that to implement your suggestion from comment 18?
Flags: needinfo?(khuey)
That comment came in with bug 1099339 which showed SuspendThread() failing randomly (previously, we MOZ_CRASHed and that was good enough except for WinXP 64).  I'd seen the watchdog thread retrying and assumed that to be the case (since I'd be surprised if there weren't may other corner cases where an interrupt gets lost).  Retrying interrupt triggering would be the best and most reliable.  An alternative is we could have SM *itself* do the auto-retry (perhaps just in a loop with a small Sleep() wrapped around SuspendThread).
Flags: needinfo?(luke)
(In reply to Jim Blandy :jimb from comment #22)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #20)
> > Uhhhhh this is *not* ok.  None of our existing uses of the interrupt
> > callback (including worker termination, which is needed for shutdown) retry.
> 
> Yeah, I was kind of shocked. Changing the function's contract to "will make
> an effort" is a big deal.

That basically sums up my thoughts.

Let's not get too distracted by debating how broken Spidermonkey is though.  Does comment 18 (which I assume is basically Shu's watchdog proposal) work for the debugger?
Flags: needinfo?(khuey) → needinfo?(ejpbruel)
Unless we fix JS_RequestInterruptCallback as Luke suggests in comment 26, I think we need a timer-based solution, as in comment 15 and comment 18.
I just realised another problem with this solution. If we cannot reschedule the interrupt callback while a previous interrupt callback is still on the stack, then we also can't reschedule the interrupt callback from a debugger runnable that is processed by that previous interrupt callback.

A debugger runnable can execute arbitrary JS code, including a call to setImmediate. That call will cause the debugger to dispatch a DebuggerSetImmediate runnable to itself. But since we always schedule the interrupt callback when a debugger runnable is dispatched, we end up rescheduling the interrupt callback while a previous interrupt callback is still on the stack.

We could work around this by not scheduling the interrupt callback when we dispatch a debugger runnable, and instead rely solely on the watchdog thread to periodically interrupt the worker. However, if the worker is running an infinite loop, we'd then only end up debugger runnables when the periodic interrupt executes. If we want the debugger to be responsive, that interrupt needs to be very small. But making it small enough for the debugger to be responsive will likely kill the performance of the worker itself.

Given the above, I'm not sure what the best way forward is. Kyle, any thoughts?
Flags: needinfo?(ejpbruel) → needinfo?(khuey)
(In reply to Eddy Bruel [:ejpbruel] from comment #29)
> I just realised another problem with this solution. If we cannot reschedule
> the interrupt callback while a previous interrupt callback is still on the
> stack, then we also can't reschedule the interrupt callback from a debugger
> runnable that is processed by that previous interrupt callback.

Why can't we reschedule the interrupt?

> We could work around this by not scheduling the interrupt callback when we
> dispatch a debugger runnable, and instead rely solely on the watchdog thread
> to periodically interrupt the worker. However, if the worker is running an
> infinite loop, we'd then only end up debugger runnables when the periodic
> interrupt executes. If we want the debugger to be responsive, that interrupt
> needs to be very small. But making it small enough for the debugger to be
> responsive will likely kill the performance of the worker itself.
> 
> Given the above, I'm not sure what the best way forward is. Kyle, any
> thoughts?

Why does the debugger need to be particularly responsive in the face of this?  If I tie up the main thread with something that executes a nine-second loop and then yields and tries again responsiveness is going to be terrible too.

FWIW, I'm imagining the algorithm here is roughly:

1. "Begin debugging" runnable is dispatched from main thread.
2. Interrupt timer is set for five seconds or something to establish an SLA.
3a. The worker thread yields at some point in those five seconds. We defuse the timer and execute the runnable.
3b. The worker thread doesn't yield, after 5 seconds we get an interrupt and run the debugger runnable from the interrupt callback.
4. Because we've used the debugger now, we change the timer interval to something smaller (lets say 250 ms), but don't yet reschedule it.
5. The debugger does a setImmediate before finishing.  If we entered via the interrupt callback, we reschedule the timer.  If we entered via the normal event queue, we don't, because we'll process the setImmediate before processing any content runnables.
6a. The debugger finishes and we return to content script, which finishes before the timer fires. Then we process the setImmediate as if we hadn't set the interrupt timer (and we defuse it, of course)
6b. The debugger finishes and we return to content script, which continues running.  After a bit the interrupt timer fires and we interrupt the script again, and execute the setImmediate.
6c. (Which admittedly I hadn't thought of) the interrupt timer fires and we're still in the debugger. We notice we're interrupting the debugger script and reset the timer, and don't run any runnables.
Flags: needinfo?(khuey)
Assignee: ejpbruel → nobody
Product: Firefox → DevTools
Moving to p3 because no activity for at least 24 weeks.
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P1 → P3
Type: defect → enhancement
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: