Worker should not be frozen when events are suspended.

RESOLVED FIXED in Firefox 44

Status

defect
RESOLVED FIXED
4 years ago
3 months ago

People

(Reporter: ejpbruel, Assigned: ejpbruel)

Tracking

(Blocks 1 bug)

unspecified
Firefox 44
Dependency tree / graph

Firefox Tracking Flags

(firefox44 fixed)

Details

(Whiteboard: [devtools-platform])

Attachments

(1 attachment, 2 obsolete attachments)

Currently, SuspendTimeouts suspends both timer and worker events. It accomplishes the latter by freezing all workers for the given window. Freezing a worker blocks its corresponding thread from executing. This is indistinguishable from queueing all events generated by this worker, as long as the execution state of the worker is not observable.

Unfortunately, with worker debugging about to land, the execution state of the worker *will* be observable. We suspend worker events to create the illusion that the main thread is blocked. This should not cause the corresponding workers to actually block as well. It's perfectly possible for a worker thread to keep running and post messages to the main thread while the latter is blocked.

The correct solution here is to introduce the concept of suspending/resuming a worker. Suspending a worker will cause all events that it generates to be queued on the main thread. Resuming a worker will cause these events to be processed on the main thread. Compare this with freezing/thawing a worker, which actually blocks the worker thread from executing.
Attachment #8627620 - Flags: review?(khuey)
Blocks: 1178726
I don't think we can do this.  Just blocking worker messages to the main thread doesn't prevent a worker from changing the state of the world.  Consider a worker that uses IndexedDB, for example.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #2)
> I don't think we can do this.  Just blocking worker messages to the main
> thread doesn't prevent a worker from changing the state of the world. 
> Consider a worker that uses IndexedDB, for example.

That doesn't make sense to me. So when the main thread hits a breakpoint, all workers for the page should paused as well? Because that is what is happening right now.

We are suspending timeouts to create the illusion that the main thread is paused, but not the workers. It is perfectly fine for the workers to keep running and creating side effects, as long as no event handlers in the main thread end up running as a result.
Flags: needinfo?(khuey)
Why wouldn't we pause all the threads?  That's what gdb does.

With SharedArrayBuffer they even have shared memory, just like threads.
Flags: needinfo?(khuey)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> Why wouldn't we pause all the threads?  That's what gdb does.
> 
> With SharedArrayBuffer they even have shared memory, just like threads.

Good point about workers having shared memory. Isn't freezing a worker inherently racy though? The worker doesn't actually stop running code until its interrupt handler gets a chance to run, which means its still possible for the worker to affect the state of the main thread even after it is paused.

I am not sure what the best way to move forward is here, so I'm needinfo'ing in jimb to get his opinion:

Jim, how do you think we should deal with workers when the main thread pauses? Currently, when the main thread pauses, we suspend all its timeouts, which in turn causes all workers to freeze.

A problem with this approach is that we also freeze workers when the page that owns them is moved to the bfcache. In that case, we want to remove the workers from the UI list, but not in the case where the main thread is simply paused. We currently cannot distinguish between these two cases.

My idea was to not pause workers when the main thread is paused, but instead only suspend events triggered by these workers in the main thread, in order to maintain the illusion that the latter is paused. As khuey pointed out, this is not what gdb does (it simply pauses all the threads), but that does not necessarily mean their model is better.

However, another problem, as khuey pointed out as well, is that that workers can now actually share state with the main thread, so if we don't pause workers when the main thread is paused, the state of the latter can still be changed by the former. I am not sure if that would be a problem in practice or not.

If we decide that this is a problem, and workers really need to be paused when the main thread does, we currently have no way to do so in a way that is non-racy: freezing a worker doesn't immediately take effect, so SuspendTimeouts shouldn't actually complete until all workers have stopped running.
Jim, see comment 5.
Flags: needinfo?(jimb)
NIing myself as a reminder for after PTO.
Flags: needinfo?(shu)
Comment on attachment 8627620 [details] [diff] [review]
Implement SuspendWorkersForWindow

Review of attachment 8627620 [details] [diff] [review]:
-----------------------------------------------------------------

Per comment 2.
Attachment #8627620 - Flags: review?(khuey) → review-
Hi Jim

Even though I am on leave right now, it would be nice if we could resolve this issue before I get back. Could you spare some time to reply to my ni? this week?
Sorry for the slow response; thanks for your pings.

In GDB, stopping all threads is racy too. We have to iterate over the threads and stop each one individually, and then wait for each thread to acknowledge that it's been stopped. But any thread could hit a breakpoint, or exit, or get a signal in the mean time, so GDB needs to be prepared for its "please stop" request to be answered with "I'd already stopped for this other reason".

When we pause a page and its workers get frozen, do we get some kind of acknowledgement from the worker when it's actually frozen? If we wait for acknowledgements from all our workers, then by the time we've gotten the acknowledgements we know they can't affect the main thread's state any more.

What happens if a worker and the main thread hit a breakpoint at the same time? Will pausing the main thread freeze the worker properly even when it's stopped at a breakpoint? Will we handle the worker's breakpoint pause properly?

As far as the BF cache is concerned: how do we deal with the pages themselves? Don't we treat entering the BF cache as simply disappearing, and being resurrected from the BF cache as magically reappearing from nowhere? If we can't tell when workers have been stashed and resurrected, but we need to display that, then I'd guess we need some new APIs to report that to us.
Flags: needinfo?(jimb)
(In reply to Jim Blandy :jimb from comment #10)
> When we pause a page and its workers get frozen, do we get some kind of
> acknowledgement from the worker when it's actually frozen? If we wait for
> acknowledgements from all our workers, then by the time we've gotten the
> acknowledgements we know they can't affect the main thread's state any more.

No, we don't.  We could add an acknowledgement but it would have to be asynchronous (since the main thread can never block on a worker).
Whiteboard: [devtools-platform]
Status: NEW → ASSIGNED
After some extended discussion, I've drawn the following conclusions:
- Kyle is right. When one thread pauses, all threads need to pause.
- However, freezing workers is still the wrong mechanism for this, because that will also pause the 
  debugger in each worker.
- What we need to do instead is this:
  - Suspend all worker messages on the main thread when the latter pause
  - Make the debugger responsible for pausing all workers, using the nested event loop mechanism.

Given the above, I think we should move forward with landing this patch. Since Kyle is in Taiwan at the moment, I've asked Blake Kaplan to review the patch again.
Flags: needinfo?(shu)
Here's a rebased version of the SuspendWorkersForWindow patch.

The important thing to note here is that this patch does not affect how SuspendTimeouts works for other callers. The only call path that is effected by this is the call to SuspendTimeout we make from the debugger.

Moreover, workers are still frozen when they are moved into the bfcache. Freezing a worker suspends all runnables from that worker, and pauses the worker thread asap. Suspending a worker only does the former.
Attachment #8627620 - Attachment is obsolete: true
Attachment #8664214 - Flags: review?(mrbkap)
Attachment #8664214 - Flags: review?(mrbkap) → review?(khuey)
Blake, can I assume you reassigned the review to Kyle because he is available again for review? A comment to that effect would have been helpful.
How do you envision adding breakpoints working?  The way I expect that to work is that the main thread will send a control runnable to the worker (which preempts running JS) that causes the debugger to add breakpoints.  Note that *not* using a control runnable means the content script has to yield before the debugger can add a breakpoint.

It feels like this should piggyback off of the same mechanism.  Do you disagree?

This also doesn't implement the "Make the debugger responsible for pausing all workers, using the nested event loop mechanism."  I'm not sure what to make of that ...
Put more succinctly, I understand what this patch does to the main thread side of things, I don't understand what this patch does (or doesn't do, mostly) to the worker thread side of things.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #15)
> How do you envision adding breakpoints working?  The way I expect that to
> work is that the main thread will send a control runnable to the worker
> (which preempts running JS) that causes the debugger to add breakpoints. 
> Note that *not* using a control runnable means the content script has to
> yield before the debugger can add a breakpoint.
> 
> It feels like this should piggyback off of the same mechanism.  Do you
> disagree?
> 
> This also doesn't implement the "Make the debugger responsible for pausing
> all workers, using the nested event loop mechanism."  I'm not sure what to
> make of that ...

I haven't completely worked out yet how we should pause all workers. What I have in mind is something along these lines:

- When a worker pauses due to a breakpoint, the debugger server for that worker sends a notification to the
  debugger server on the main thread (using the worker debugger's message channel)
- When the debugger server on the main thread receives this notification, it broadcasts an interrupt 
  request to the debugger servers for all the other workers, telling them to pause their workers.
- The debugger servers for all other workers send a message to the main thread, either in response to the
  interrupt request from the main thread, or because their worker paused due to a breakpoint in the
  meantime.
- The main thread waits until it has received a response from all other workers. When all other workers
  have paused, the main thread notifies the client that the original worker has paused.
- The client can step through the worker that paused originally, but not through any of the other workers.
- If any other workers paused due to a breakpoint while we were in the process of pausing all workers,
  those breakpoint should be reported when the worker that paused originally is resumed.

That said, even though I haven't worked out all the details yet, making the debugger responsible for pausing the other workers is the correct solution. Freezing all workers will make the debuggers for those workers unresponsive when the main thread pauses, which is unacceptable from a UX point of view.

We also don't need the pause-all-workers mechanism to be in place before this patch can land: although we want pause-all-workers to become the default, the devtools team has indicated that they want to at least have the option to pause individual workers. So we can start with that, and implement the pause-all-workers mechanism later. This is easier from an implementation point-of-view as well.
Flags: needinfo?(ejpbruel)
Blocks: 1175550
(In reply to Eddy Bruel [:ejpbruel] from comment #18)
> - When the debugger server on the main thread receives this notification, it
> broadcasts an interrupt 
>   request to the debugger servers for all the other workers, telling them to
> pause their workers.

Yes, my main question is how this "interrupt request" will work.

> That said, even though I haven't worked out all the details yet, making the
> debugger responsible for pausing the other workers is the correct solution.
> Freezing all workers will make the debuggers for those workers unresponsive
> when the main thread pauses, which is unacceptable from a UX point of view.

I think we're on the same page re: not Freeze()ing and instead stopping the execution of content JS.

> We also don't need the pause-all-workers mechanism to be in place before
> this patch can land: although we want pause-all-workers to become the
> default, the devtools team has indicated that they want to at least have the
> option to pause individual workers. So we can start with that, and implement
> the pause-all-workers mechanism later. This is easier from an implementation
> point-of-view as well.

I guess what I'm hung up on is that the test included with this patch explicitly tests the *opposite*, that the worker continues to run when we SuspendTimeouts() the window.
Flags: needinfo?(ejpbruel)
Comment on attachment 8664214 [details] [diff] [review]
Implement SuspendWorkersForWindow.

Review of attachment 8664214 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/test/test_WorkerDebugger_suspended.xul
@@ +48,5 @@
> +          onMessage: function () {
> +            ok(false, "The worker should not send a response.");
> +          }
> +        };
> +        wdm.addListener(listener);

nsIWorkerDebuggerManagerListener doesn't have an onMessage property, so I have no idea what this is trying to test.  Especially since the worker does in fact send a response through the debugger ...

r- for this.
Attachment #8664214 - Flags: review?(khuey) → review-
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #19)
> (In reply to Eddy Bruel [:ejpbruel] from comment #18)
> > - When the debugger server on the main thread receives this notification, it
> > broadcasts an interrupt 
> >   request to the debugger servers for all the other workers, telling them to
> > pause their workers.
> 
> Yes, my main question is how this "interrupt request" will work.

The debugger server already supports such an interrupt request. When the worker is running, both the worker and the debugger use the same event loop (though not the same event queue). When the debugger receives the interrupt request it responds by entering a nested event loop. Since the worker runs in the same thread as the debugger, and since nested event loops only process debugger events, this effectively stops client code from running.

The difficulty is that unlike on the main thread, worker scripts may have infinite loops, in which case the interrupt request will never be processed. Now that I think about it, this is a actually a general problem: you cannot interact with the debugger if the worker is inside an infinite loop. At first glance, the correct solution here would be to give debugger runnables the same status as control runnables (i.e. they trigger the interrupt callback). What do you think?

> 
> > That said, even though I haven't worked out all the details yet, making the
> > debugger responsible for pausing the other workers is the correct solution.
> > Freezing all workers will make the debuggers for those workers unresponsive
> > when the main thread pauses, which is unacceptable from a UX point of view.
> 
> I think we're on the same page re: not Freeze()ing and instead stopping the
> execution of content JS.

Exactly.

> 
> > We also don't need the pause-all-workers mechanism to be in place before
> > this patch can land: although we want pause-all-workers to become the
> > default, the devtools team has indicated that they want to at least have the
> > option to pause individual workers. So we can start with that, and implement
> > the pause-all-workers mechanism later. This is easier from an implementation
> > point-of-view as well.
> 
> I guess what I'm hung up on is that the test included with this patch
> explicitly tests the *opposite*, that the worker continues to run when we
> SuspendTimeouts() the window.

Right, but that's the point of this patch, isn't it? We want to make sure that no worker events arrive on the main thread, but we also want to make sure that the worker keeps running (unlike before). If the worker didn't keep running, the pause-by-nested-event-loop mechanism would be impossible to implement.
Flags: needinfo?(ejpbruel)
That event listener should have been on the worker, of course. Good catch.
Attachment #8664214 - Attachment is obsolete: true
Attachment #8668330 - Flags: review?(khuey)
(In reply to Eddy Bruel [:ejpbruel] from comment #21)
> The difficulty is that unlike on the main thread, worker scripts may have
> infinite loops, in which case the interrupt request will never be processed.
> Now that I think about it, this is a actually a general problem: you cannot
> interact with the debugger if the worker is inside an infinite loop. At
> first glance, the correct solution here would be to give debugger runnables
> the same status as control runnables (i.e. they trigger the interrupt
> callback). What do you think?

Right, this is what I'm interested in.  I'm not sure that all debugger runnables should work as control runnables do.  But it does seem like we should have some sort of "interrupt and transfer control to the debugger" thing that uses a control runnable to interrupt iloops.

> > > We also don't need the pause-all-workers mechanism to be in place before
> > > this patch can land: although we want pause-all-workers to become the
> > > default, the devtools team has indicated that they want to at least have the
> > > option to pause individual workers. So we can start with that, and implement
> > > the pause-all-workers mechanism later. This is easier from an implementation
> > > point-of-view as well.
> > 
> > I guess what I'm hung up on is that the test included with this patch
> > explicitly tests the *opposite*, that the worker continues to run when we
> > SuspendTimeouts() the window.
> 
> Right, but that's the point of this patch, isn't it? We want to make sure
> that no worker events arrive on the main thread, but we also want to make
> sure that the worker keeps running (unlike before). If the worker didn't
> keep running, the pause-by-nested-event-loop mechanism would be impossible
> to implement.

But we don't actually want the content script to keep running, we want it to be preempted and for the debugger to spin up a nested event loop.  The test tests that the content script keeps running.
Flags: needinfo?(ejpbruel)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> (In reply to Eddy Bruel [:ejpbruel] from comment #21)
> > The difficulty is that unlike on the main thread, worker scripts may have
> > infinite loops, in which case the interrupt request will never be processed.
> > Now that I think about it, this is a actually a general problem: you cannot
> > interact with the debugger if the worker is inside an infinite loop. At
> > first glance, the correct solution here would be to give debugger runnables
> > the same status as control runnables (i.e. they trigger the interrupt
> > callback). What do you think?
> 
> Right, this is what I'm interested in.  I'm not sure that all debugger
> runnables should work as control runnables do.  But it does seem like we
> should have some sort of "interrupt and transfer control to the debugger"
> thing that uses a control runnable to interrupt iloops.

Probably the only debugger runnable that needs to be a control runnable is the one that sends a message to the debugger. We want to be able to talk to the debugger, so we can pause the worker, set breakpoints, etc, even when the worker is inside an infinite loop. 

> 
> > > > We also don't need the pause-all-workers mechanism to be in place before
> > > > this patch can land: although we want pause-all-workers to become the
> > > > default, the devtools team has indicated that they want to at least have the
> > > > option to pause individual workers. So we can start with that, and implement
> > > > the pause-all-workers mechanism later. This is easier from an implementation
> > > > point-of-view as well.
> > > 
> > > I guess what I'm hung up on is that the test included with this patch
> > > explicitly tests the *opposite*, that the worker continues to run when we
> > > SuspendTimeouts() the window.
> > 
> > Right, but that's the point of this patch, isn't it? We want to make sure
> > that no worker events arrive on the main thread, but we also want to make
> > sure that the worker keeps running (unlike before). If the worker didn't
> > keep running, the pause-by-nested-event-loop mechanism would be impossible
> > to implement.
> 
> But we don't actually want the content script to keep running, we want it to
> be preempted and for the debugger to spin up a nested event loop.  The test
> tests that the content script keeps running.

I think we are conflating two separate problems here:

1. When the main thread pauses, we want to prevent any client code from running as a result of worker
   events. That is why we suspend all worker runnables in SuspendTimeouts. We don't actually want to
   freeze the worker threads: since the debugger runs in the same thread as the thing its debugging,
   doing so makes the debugger unresponsive.

2. Once the main thread is paused, we then want to pause all workers as well. More accurately, we want only
   debugger code to run in each worker, not client code. To accomplish this, we send a message to each
   debugger server telling it to spin up a nested event loop.

This patch is only about part 1.  I realise that landing this without the pause-all-workers mechanism is not perfect. However, it is still an improvement over the current situation:

From the perspective of the main thread pausing, its irrelevant what mechanism we use, as long as no worker events can cause client code to run on the main thread. From the perspective of debugging workers, the current mechanism is broken, because it makes workers completely undebuggable when the main thread pauses.

This patch improves the situation in the sense that at least now workers are still debuggable when the main thread pauses. They won't automatically pause when the main thread pauses, but that's something we can live with for now.

More importantly, we need to land this patch first, because we want to implement the pause-all-workers mechanism on the debugger server level, and its impossible to talk to a debugger server in a frozen worker. Once workers are no longer frozen when the main thread pauses, we can tell the debugger server in each worker to spin up a nested event loop to simulate a thread pause.

Note that this is why we're testing that the worker keeps running: we need to make sure that it does, otherwise we can't tell the debugger to only allow debugger code to run!
Flags: needinfo?(ejpbruel)
FWIW, if you are worried that landing this patch without the pause-all-workers mechanism will subtly change the behavior of workers when we hit a breakpoint on the main thread, then perhaps we could change the patch so that we keep freezing workers when worker debugging is preffed off?
(In reply to Eddy Bruel [:ejpbruel] from comment #24)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> > (In reply to Eddy Bruel [:ejpbruel] from comment #21)
> > > The difficulty is that unlike on the main thread, worker scripts may have
> > > infinite loops, in which case the interrupt request will never be processed.
> > > Now that I think about it, this is a actually a general problem: you cannot
> > > interact with the debugger if the worker is inside an infinite loop. At
> > > first glance, the correct solution here would be to give debugger runnables
> > > the same status as control runnables (i.e. they trigger the interrupt
> > > callback). What do you think?
> > 
> > Right, this is what I'm interested in.  I'm not sure that all debugger
> > runnables should work as control runnables do.  But it does seem like we
> > should have some sort of "interrupt and transfer control to the debugger"
> > thing that uses a control runnable to interrupt iloops.
> 
> Probably the only debugger runnable that needs to be a control runnable is
> the one that sends a message to the debugger. We want to be able to talk to
> the debugger, so we can pause the worker, set breakpoints, etc, even when
> the worker is inside an infinite loop. 

Something like that.  We can quibble over the details later.

> > > > > We also don't need the pause-all-workers mechanism to be in place before
> > > > > this patch can land: although we want pause-all-workers to become the
> > > > > default, the devtools team has indicated that they want to at least have the
> > > > > option to pause individual workers. So we can start with that, and implement
> > > > > the pause-all-workers mechanism later. This is easier from an implementation
> > > > > point-of-view as well.
> > > > 
> > > > I guess what I'm hung up on is that the test included with this patch
> > > > explicitly tests the *opposite*, that the worker continues to run when we
> > > > SuspendTimeouts() the window.
> > > 
> > > Right, but that's the point of this patch, isn't it? We want to make sure
> > > that no worker events arrive on the main thread, but we also want to make
> > > sure that the worker keeps running (unlike before). If the worker didn't
> > > keep running, the pause-by-nested-event-loop mechanism would be impossible
> > > to implement.
> > 
> > But we don't actually want the content script to keep running, we want it to
> > be preempted and for the debugger to spin up a nested event loop.  The test
> > tests that the content script keeps running.
> 
> I think we are conflating two separate problems here:
> 
> 1. When the main thread pauses, we want to prevent any client code from
> running as a result of worker
>    events. That is why we suspend all worker runnables in SuspendTimeouts.
> We don't actually want to
>    freeze the worker threads: since the debugger runs in the same thread as
> the thing its debugging,
>    doing so makes the debugger unresponsive.

The stuff in SuspendTimeouts only affects runnables from the worker that are running on the main thread.  It doesn't affect the code running in the worker at all.

> 2. Once the main thread is paused, we then want to pause all workers as
> well. More accurately, we want only
>    debugger code to run in each worker, not client code. To accomplish this,
> we send a message to each
>    debugger server telling it to spin up a nested event loop.

Right.

> This patch is only about part 1.  I realise that landing this without the
> pause-all-workers mechanism is not perfect. However, it is still an
> improvement over the current situation:
> 
> From the perspective of the main thread pausing, its irrelevant what
> mechanism we use, as long as no worker events can cause client code to run
> on the main thread. From the perspective of debugging workers, the current
> mechanism is broken, because it makes workers completely undebuggable when
> the main thread pauses.
> 
> This patch improves the situation in the sense that at least now workers are
> still debuggable when the main thread pauses. They won't automatically pause
> when the main thread pauses, but that's something we can live with for now.
> 
> More importantly, we need to land this patch first, because we want to
> implement the pause-all-workers mechanism on the debugger server level, and
> its impossible to talk to a debugger server in a frozen worker. Once workers
> are no longer frozen when the main thread pauses, we can tell the debugger
> server in each worker to spin up a nested event loop to simulate a thread
> pause.

Ok.

> Note that this is why we're testing that the worker keeps running: we need
> to make sure that it does, otherwise we can't tell the debugger to only
> allow debugger code to run!

Right ... but we're explicitly testing that the client code on the worker keeps running.  That's what I don't like.
Comment on attachment 8668330 [details] [diff] [review]
Implement SuspendWorkersForWindow.

Review of attachment 8668330 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the understanding from IRC that this situation is temporary.
Attachment #8668330 - Flags: review?(khuey) → review+
https://hg.mozilla.org/mozilla-central/rev/00f403aa149a
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 44
Blocks: 1214248
Summary: SuspendTimeouts should suspend worker events without freezing workers → Worker should not be frozen when events are suspended.
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.