Closed Bug 1426467 Opened 2 years ago Closed 11 months ago

step in accidentally lands in worker.onmessage

Categories

(DevTools :: Debugger, defect)

defect
Not set

Tracking

(firefox65 fixed)

RESOLVED FIXED
Firefox 65
Tracking Status
firefox65 --- fixed

People

(Reporter: jlast, Assigned: jimb)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 17 obsolete files)

3.40 KB, patch
Details | Diff | Splinter Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
When there is a worker that is sending messages down, "step in" can accidentally land in the worker.onmessage callback
         
STR: (https://dbg-setinterval.glitch.me/)

1. click click me
2. step over
3. step IN

AR: land in funca
ER: land in onmessage
Jim, can you investigate and forward this issue to the right person?
Flags: needinfo?(jimb)
Priority: -- → P2
Jim and I talked about this and I don't think it's a JS engine bug.

Currently in devtools/server/actors/tab.js, we have:

  postNest(nestData) {
    ...
    windowUtils.resumeTimeouts();
    windowUtils.suppressEventHandling(false);
  },

Un-suppressing event handling apparently fires any delayed events synchronously:
  https://searchfox.org/mozilla-central/source/dom/base/nsDOMWindowUtils.cpp#1617

That seems undesirable in any case (delayed events should simply be rescheduled to fire as normal from the event loop) and the likely cause of this bug.
Profiler tells this story, which differs from comment 2 in the specifics but confirms the general idea:

(...main event loop, blah blah blah...)
-> onclick (index.html)
  -> stuff (client.js:16)
    -> _makeOnStep/< (devtools/server/actors/script.js:833)
      -> pauseAndRespond (devtools/server/actors/script.js:894)
        -> _pauseAndRespond (devtools/server/actors/script.js:707)
          -> _pushThreadPause (devtools/server/actors/script.js:533)
            -> bound function (self-hosted code)
              -> enter (devtools/server/actors/script.js:350)
                -> postNest (devtools/server/actors/tab.js:1168)
                  -> nsDOMWindowUtils::ResumeTimeouts()
                    -> nsGlobalWindowInner::Resume()
                      -> mozilla::dom::workers::RuntimeService::ResumeWorkersForWindow()
                        -> (...event dispatching goop...)
                          -> myWorker.onmessage (client.js:3)

(Clearing this bug's Priority field for re-triage.)

https://perfht.ml/2lPCE0W
Component: JavaScript Engine → Developer Tools: Debugger
Priority: P2 → --
Product: Core → Firefox
mrbkap, can you help? I think all the background you need is: the debugger's on-stack synchronous code (the code that runs when you hit a breakpoint etc.) is inadvertently firing events, synchronously. Ironically it happens when the debugger calls WindowUtils.resumeTimeouts(), trying to avoid synchronously firing events.

It's happening in WorkerPrivateParent<_>::ParentWindowResumed():

    https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#3356-3358

See bug 854739.

But I don't get it. Why would events be delivered out of order? At a guess: it must be that new events from the worker would be delivered to a different queue (that is, mQueuedRunnables is not the "real" event queue). But if so, it seems like the right fix would have been to empty mQueuedRunnables into that event queue here, rather than synchronously firing the events.

I think this Thaw method can be reentered, to make matters worse.
Flags: needinfo?(jimb) → needinfo?(mrbkap)
I haven't forgotten about this, I'll try to come up with an answer for this tomorrow.
Blake, how close are you to a fix here? I was going to work on this, I just forgot to assign myself.
(jimb and I spoke on IRC about this and he's going to take it.)
Assignee: nobody → jimb
Flags: needinfo?(mrbkap)
Attached patch mochitest wip (obsolete) — Splinter Review
Here's a mochitest for this problem that fails.
Attachment #8943747 - Flags: feedback?(jlaster)
I'll bet this same bug occurs with MessageChannel as well.
Comment on attachment 8943747 [details] [diff] [review]
mochitest wip

looks good
Attachment #8943747 - Flags: feedback?(jlaster) → feedback+
Blocks: js-devtools
Here's what's going on in this bug:

- In the midst of some JavaScript invocation, the worker's parent window hits a
  breakpoint. The devtools server calls nsIDOMWindowUtils::SuspendTimeouts, and
  starts a nested event loop.

- The worker sends the parent a message, enqueuing a runnable on the main thread.

- The main thread's event loop processes the runnable. Since the parent window
  has timeouts suspended, MessageEventRunnable::WorkerRun stashes the runnable
  in WorkerPrivateParent::mQueuedRunnables: https://goo.gl/eFgA1u

- The developer continues from the breakpoint. The devtools server calls
  nsIDOMWindowUtils::ResumeTimeouts, which runs the runnables saved in
  mQueuedRunnables.

- The worker message runnable calls the parent window's onmessage handler.

This is incorrect. Running the onmessage handler in the midst of the original
JavaScript invocation (which was only paused at a breakpoint) violates the
run-to-completion rule.
A proposed solution is to change ResumeTimeouts to re-queue the runnable on the
main thread, where it will run after the JavaScript execution interrupted by the
debugger has completed.

This is adequate to ensure we follow the run-to-completion rule. If the parent
hits another breakpoint and is paused again, then the message runnable may get
stashed in mQueuedRunnables again, but either way, it will only ever run when
the parent window isn't paused, directly from the event loop, so it cannot
interrupt any other JavaScript execution in the parent window.

However, this solution is still buggy, because it can reorder messages from the
worker, in the following situation:

- Parent window sends worker messages A and B, and then hits a breakpoint.

- Worker responds to message A.

- Main thread event loop processes the response runnable. Since the parent
  window is paused, the runnable stashes A in mQueuedRunnables.

- Worker responds to message B.

- Before reaching B's runnable, then main thread event loop resumes the parent.
  ResumeTimeouts removes A from mQueuedRunnables and re-enqueues it for the main
  thread --- *after* B.
Blocks: 1433317
I think respecting the run-to-completion rule is more important than the race condition above. Submitted the race condition as a follow-up bug 1433317.
I think it's okay to land a fix for this even if it re-orders messages, because I can concoct a scenario where the present code will re-order messages too:

- While parent windows runs JS, worker thread sends messages A and B to its
  parent window.

- Parent window hits a breakpoint. The devtools server calls SuspendTimeouts on
  the parent window, and enters a nested loop.

- The nested event loop runs the A and B runnables. Since their window is
  paused, they are stashed in mQueuedRunnables.

- Developer continues from breakpoint. ResumeTimeouts (as currently implemented)
  makes a copy of mQueuedRunnables and begins running them in order.

- While running A's onmessage handler, the code hits a breakpoint. The devtools
  server calls SuspendTimeouts on the parent window and enters a second nested
  event loop.

- The worker sends a third message C to the parent window.

- The second nested event loop runs C's runnable. Since the parent window is
  paused, it is stashed in mQueuedRunnables.

- The developer resumes from that breakpoint. A second invocation of
  ResumeTimeouts makes a second copy of mQueuedRunnables, runs C's onmessage
  handler, and returns.

- Control returns to A's onmessage handler, after its breakpoint. A's onmessage
  handler completes.

- The first ResumeTimeouts call now turns to the next runnable in its copy of
  mQueuedRunnables, which is B's onmessage handler.

So A's onmessage handler runs first, and is interrupted by C's. Then, B's
onmessage handler runs.
I think the overall lesson here is that nobody has any real idea what is going on. We need an architecture in which it's pretty clear that debugging does not reorder messages or break run-to-completion.
Attachment #8943747 - Attachment is obsolete: true
Attachment #8949473 - Flags: review?(bkelly)
Comment on attachment 8949473 [details] [diff] [review]
Re-enqueue messages from workers when debugger pause ends; don't run them immediately.

Review of attachment 8949473 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for all the excellent documentation!

::: dom/workers/WorkerPrivate.cpp
@@ +2069,5 @@
>      mQueuedRunnables.SwapElements(runnables);
>  
>      for (uint32_t index = 0; index < runnables.Length(); index++) {
> +      if (!NS_SUCCEEDED(ParentAsWorkerPrivate()->DispatchToMainThread(runnables[index].forget())))
> +        MOZ_ASSERT_UNREACHABLE("Re-dispatch of delayed runnable failed");

nit: This can just be:

  MOZ_ALWAYS_SUCCEEDS(
    ParentAsWorkerPrivate()->DispatchToMainThread(runnables[index].forget()));
Attachment #8949473 - Flags: review?(bkelly) → review+
Revised for nits, carrying over r+
Attachment #8949473 - Attachment is obsolete: true
Attachment #8949565 - Flags: review+
Revised for ESlint complaints. Carrying over r+.
Attachment #8949565 - Attachment is obsolete: true
Attachment #8949569 - Flags: review+
Verified fixed with original test case.
Keywords: checkin-needed
There were conflicts while landing this patch. 
@jimb can you please take a look?
Flags: needinfo?(jimb)
Keywords: checkin-needed
Oh, it seems like WorkerPrivateParent went away --- FANTASTIC.
(Bug 1435263)

Should be easy to fix.
Flags: needinfo?(jimb)
Rebased on current inbound; carrying over r=bkelly.
Attachment #8949569 - Attachment is obsolete: true
Attachment #8949905 - Flags: review+
Keywords: checkin-needed
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/42deb2563aa8
Re-enqueue messages from workers when debugger pause ends; don't run them immediately. r=bkelly
Keywords: checkin-needed
Amended to work around ESlint's misidentification of functions in content that are called from the mochitest's chrome window as "unused".
Attachment #8949905 - Attachment is obsolete: true
Flags: needinfo?(jimb)
Keywords: checkin-needed
Keywords: checkin-needed
Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=829452d518dfe914bc95bf631587c6dc847590dc

I'm not sure that this will run the ESlint tests; we'll see. But I've never used ESlint export comments before, so further problems are likely; it'd be rude to mark it checkin-needed and make the sheriffs deal with it if it fails.
Checked with `mach lint` locally. Re-requesting checkin.
Keywords: checkin-needed
Status: NEW → ASSIGNED
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab4aab822769
Re-enqueue messages from workers when debugger pause ends; don't run them immediately. r=bkelly
Keywords: checkin-needed
Backed out changeset ab4aab822769 (bug 1426467) for Mochitest failure on dom/workers/test/test_suspend.html

Log of the failure:
https://treeherder.mozilla.org/logviewer.html#?job_id=162250118&repo=mozilla-inbound&lineNumber=2372

Backout changeset:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd94c4c474d3d3d658d33c5ebc49f22ac0fad0fa
Flags: needinfo?(jimb)
*sigh*
Flags: needinfo?(jimb)
This failure seems to be in dom/workers/test/test_suspend.html:

    is(lastCount, data - 1, "Missed a message, suspend failed!");

https://searchfox.org/mozilla-central/rev/4234703a532006c5ef9ce09b4c487d88124526a0/dom/workers/test/test_suspend.html#112

The actual failure is:

    INFO TEST-UNEXPECTED-FAIL | dom/workers/test/test_suspend.html | Missed a message, suspend failed! - got 28, expected 42

I'm unable to reproduce this locally using: `mach test --verify --headless dom/workers/test/test_suspend.html
It seems like my patch can reorder messages when we navigate back to a page with workers, even if the debugger is not involved at all.

I added a call to the sleep function in WorkerPrivate::Freeze, to ensure the worker could get some messages enqueued after the parent window had started to freeze. I added another sleep to WorkerPrivate::ParentWindowResumed, to let the re-awakened worker queue some messages before the deferred message runnables are re-dispatched. These changes make the mochitest fail locally for me.

Obviously, adding sleep calls should never affect the correctness of the code, so there's a race condition.
Just for reference, here's exactly where I added the 'sleep' calls to bring out the bug.
nsGlobalWindowInner::Thaw calls ThawInternal, which causes frozen workers to resume execution; and then it calls Resume, which is what re-dispatches the messages. If the messages were re-dispatched before the workers resumed execution, then the new messages couldn't jump ahead of the old ones. WorkerPrivate::Thaw is probably where this needs to happen, but it carefully avoids running the deferred runnables.
Attachment #8963839 - Flags: review?(nfroyd)
Only requesting feedback at the moment; I haven't actually used this to fix the bug yet, but I'm pretty sure it's what we want.
Attachment #8963840 - Flags: feedback?(nfroyd)
Comment on attachment 8963840 [details] [diff] [review]
Add Pause, Resume, and IsPaused methods to ThrottledEventQueue.

Review of attachment 8963840 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling feedback not because I think we don't want this, but because this implementation raises a lot of questions for me, see below.

::: xpcom/threads/ThrottledEventQueue.cpp
@@ +185,5 @@
>      {
>        MutexAutoLock lock(mMutex);
>  
> +      // If this queue is paused, free the executor and return without running
> +      // anything. We'll create a new executor when we're resumed.

Do we want to add logic in Dispatch so we're not dispatching do-nothing executors while the queue is paused?  I guess having do-nothing executors is convenient, if wasteful...

@@ +188,5 @@
> +      // If this queue is paused, free the executor and return without running
> +      // anything. We'll create a new executor when we're resumed.
> +      if (IsPaused(lock)) {
> +        // Note, this breaks a ref cycle.
> +        mExecutor = nullptr;

Note the comment above that was carried over into EnsureExecutor, where mExecutor was keeping the inner queue alive until the queue was drained...with this implementation of pausing, I guess we're no longer holding that invariant.

So maybe we not only need do-nothing executors, but we even need to keep dispatching do-nothing executors from the executor itself while the queue is paused?  That seems even more wasteful, since we might be constantly running do-nothing executors while the queue is paused, which potentially wouldn't let the underlying thread sleep.

@@ +340,5 @@
>      MOZ_ASSERT(!onBaseTarget);
>  #endif
>  
>      MutexAutoLock lock(mMutex);
> +    MOZ_ASSERT(!IsPaused(lock));

Do we want this same assertion somewhere in the shutdown process?

@@ +375,5 @@
> +
> +    if (mPauseCount == 1) {
> +      // We're about to no longer be paused. If we have events to run, make sure
> +      // we have an executor in flight to run them. This is fallible, so do this
> +      // before adjusting the pause count.

Is it better to have Resume() fail and require the caller to do error handling or is it better to silently eat the error here and make Resume() infallible?  I don't know what suitable error handling looks like in the former case.  In the latter case, the client will at least see failures come up in Dispatch, which seems like a more reasonable place to handle errors, since presumably the client is already equipped to do that.

Either way opens us up to the unfortunate situation of having runnables in the queue, but not having an executor to process them, which seems like a situation likely to produce intermittent test failures due to leaks at some point.

::: xpcom/threads/ThrottledEventQueue.h
@@ +111,5 @@
> +  // Resume execution of events from this ThrottledEventQueue.
> +  //
> +  // Pairs of calls to Pause and Resume may nest. Only when Resume has been
> +  // called as many times as Pause are events run. You must not make an
> +  // unmatched call to Resume.

Since Resume can fail, one can make matched calls to Pause and Resume and still not have a queue in an unpaused state, correct?  So this should really make the "successful call" requirement of Resume explicit, shouldn't it?
Attachment #8963840 - Flags: feedback?(nfroyd)
Comment on attachment 8963839 [details] [diff] [review]
Add GTests for ThrottledEventQueue.

Review of attachment 8963839 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for writing all these!

::: xpcom/tests/gtest/TestThrottledEventQueue.cpp
@@ +25,5 @@
> +
> +namespace TestThrottledEventQueue {
> +
> +// An nsIRunnable that simply calls a closure (or any C++ callable).
> +struct FunctionRunnable final : nsIRunnable {

Can we just use NS_NewRunnableFunction instead?
Attachment #8963839 - Flags: review?(nfroyd) → review+
Thanks very much for the helpful feedback!

(In reply to Nathan Froyd [:froydnj] from comment #39)
> ::: xpcom/threads/ThrottledEventQueue.cpp
> @@ +185,5 @@
> >      {
> >        MutexAutoLock lock(mMutex);
> >  
> > +      // If this queue is paused, free the executor and return without running
> > +      // anything. We'll create a new executor when we're resumed.
> 
> Do we want to add logic in Dispatch so we're not dispatching do-nothing
> executors while the queue is paused?  I guess having do-nothing executors is
> convenient, if wasteful...

There is already an IsPaused check in Inner::Dispatch around the call to EnsureExecutor. Is that not what you mean?

> @@ +188,5 @@
> > +      // If this queue is paused, free the executor and return without running
> > +      // anything. We'll create a new executor when we're resumed.
> > +      if (IsPaused(lock)) {
> > +        // Note, this breaks a ref cycle.
> > +        mExecutor = nullptr;
> 
> Note the comment above that was carried over into EnsureExecutor, where
> mExecutor was keeping the inner queue alive until the queue was
> drained...with this implementation of pausing, I guess we're no longer
> holding that invariant.
> 
> So maybe we not only need do-nothing executors, but we even need to keep
> dispatching do-nothing executors from the executor itself while the queue is
> paused?  That seems even more wasteful, since we might be constantly running
> do-nothing executors while the queue is paused, which potentially wouldn't
> let the underlying thread sleep.

I think this is unnecessary: If the queue will ever be unpaused, there must be a reference to the ThrottledEventQueue, so it won't get collected. If there is no reference to the ThrottledEventQueue, then it will never dispatch events again, so it needn't be kept alive.


> @@ +340,5 @@
> >      MOZ_ASSERT(!onBaseTarget);
> >  #endif
> >  
> >      MutexAutoLock lock(mMutex);
> > +    MOZ_ASSERT(!IsPaused(lock));
> 
> Do we want this same assertion somewhere in the shutdown process?

The answer to this would follow from the answer to my question above.

> 
> @@ +375,5 @@
> > +
> > +    if (mPauseCount == 1) {
> > +      // We're about to no longer be paused. If we have events to run, make sure
> > +      // we have an executor in flight to run them. This is fallible, so do this
> > +      // before adjusting the pause count.
> 
> Is it better to have Resume() fail and require the caller to do error
> handling or is it better to silently eat the error here and make Resume()
> infallible?  I don't know what suitable error handling looks like in the
> former case.  In the latter case, the client will at least see failures come
> up in Dispatch, which seems like a more reasonable place to handle errors,
> since presumably the client is already equipped to do that.
> 
> Either way opens us up to the unfortunate situation of having runnables in
> the queue, but not having an executor to process them, which seems like a
> situation likely to produce intermittent test failures due to leaks at some
> point.

I think Resume should have exactly the same error-handling requirements as Dispatch. After all, it must do a dispatch. I've never really understood how Dispatch failure should be handled, either.

> ::: xpcom/threads/ThrottledEventQueue.h
> @@ +111,5 @@
> > +  // Resume execution of events from this ThrottledEventQueue.
> > +  //
> > +  // Pairs of calls to Pause and Resume may nest. Only when Resume has been
> > +  // called as many times as Pause are events run. You must not make an
> > +  // unmatched call to Resume.
> 
> Since Resume can fail, one can make matched calls to Pause and Resume and
> still not have a queue in an unpaused state, correct?  So this should really
> make the "successful call" requirement of Resume explicit, shouldn't it?

Yes, thanks.

I noticed that a Paused, non-empty ThrottledEventQueue will also cause the call to SpinEventLoopUntil in Inner::Observe to hang. I'd like to bring the shutdown behavior under test, so that the tests would have caught this mistake.
(In reply to Jim Blandy :jimb from comment #41)
> I think Resume should have exactly the same error-handling requirements as
> Dispatch. After all, it must do a dispatch. I've never really understood how
> Dispatch failure should be handled, either.

On that topic: Is it really okay that the existing code calls NS_DispatchToMainThread but ignores the return value?
This patch removes all the specialized shutdown behavior from ThrottledEventQueue.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2b9cdc4257292e62f0e3b467ad86522b1c49e91

At the moment, it seems green. If the good weather holds, that'd be pretty awesome...
Depends on: 1452410
Asking for re-review, since I added a few more tests, and implemented requested changes.
Attachment #8963839 - Attachment is obsolete: true
Attachment #8967264 - Flags: review?(nfroyd)
Attachment #8963840 - Attachment is obsolete: true
Comment on attachment 8965915 [details] [diff] [review]
Simplify ThrottledEventQueue's shutdown behavior.

Obsoleted by bug 1452410.
Attachment #8965915 - Attachment is obsolete: true
Comment on attachment 8967264 [details] [diff] [review]
Part 6: Add GTests for ThrottledEventQueue.

Review of attachment 8967264 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you!

::: xpcom/tests/gtest/TestThrottledEventQueue.cpp
@@ +284,5 @@
> +  ASSERT_TRUE(NS_SUCCEEDED(rv));
> +
> +  // We can't guarantee that the thread has reached the AwaitIdle call, but we
> +  // can get pretty close. Either way, it shouldn't affect the behavior of the
> +  // test.

If it doesn't matter, why are we sleeping here at all?

Couldn't we guarantee that the thread has reached the AwaitIdle call by re-using `cond` for signaling that AwaitIdle() had finished?
Attachment #8967264 - Flags: review?(nfroyd) → review+
The patch in comment 51 isn't right (well, it was a work-in-progress, so that's expected): it pauses all communication on the mMainThreadEventTarget, but I gather there are many other runnables posted on that target that do need to be delivered, so it's incorrect to pause them.

For example, dom/workers/test/test_WorkerDebugger_suspended.xul now hangs.
(In reply to Nathan Froyd [:froydnj] from comment #53)
> ::: xpcom/tests/gtest/TestThrottledEventQueue.cpp
> @@ +284,5 @@
> > +  ASSERT_TRUE(NS_SUCCEEDED(rv));
> > +
> > +  // We can't guarantee that the thread has reached the AwaitIdle call, but we
> > +  // can get pretty close. Either way, it shouldn't affect the behavior of the
> > +  // test.
> 
> If it doesn't matter, why are we sleeping here at all?

It's not important, and could come out. I was interested in exercising the case where AwaitIdle actually has to block on the condition variable, not just take the mutex, see that all is well, and go on. There's no way to really ensure that without adding testing-only interfaces to TEQ, and the sleep seemed like it would make sure that path was exercised almost all the time.

I'll improve the comment.

> Couldn't we guarantee that the thread has reached the AwaitIdle call by
> re-using `cond` for signaling that AwaitIdle() had finished?

No, because the test wants to not run the base queue until it's sure the thread is blocked on the condvar in AwaitIdle. Without access to the condvar hidden inside the TEQ and some API to see whether anyone's blocking on it, there's no way to ensure that.
Product: Firefox → DevTools
Blocks: dbg-r2c
In my attempts to simplify, I mistakenly concluded that using the WorkerRef to hold the worker alive meant that CancelingOnParentRunnable didn't need to be a WorkerDebuggeeRunnable any more.

Some of the unfreeze/unpause rules were wrong.

New try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=35a21ff5806e775584954c870d8c35aac3b84710
This is almost certainly caused by my patch:

TEST-UNEXPECTED-FAIL | dom/cache/test/mochitest/test_cache_worker_gc.html | worker weak reference should be garbage collected

I don't know what that refers to, but I do introduce a new worker reference, so perhaps I'm doing something wrong there.

ESLint has some complaints about my new test.

Other than that it looks pretty clean.
Depends on: 1499535, 1499462, 1499473
Depends on: 1499534
No longer depends on: 1499535
Depends on: 1499468
I've prepared a revised, shorter version of this patch stack. Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=817bc6bd73955a12de034fc3bcda637a6eef140d
This just cleans up the function a bit to make the next change easier to see. No
behavior change intended.
Certain runnables sent from the worker to the content window must be delayed if
the content is paused in the JavaScript debugger. For example, delivering
onmessage events while stopped at a breakpoint would violate the DOM's
run-to-completion rule.

However, other sorts of runnables must be delivered promptly if the worker is
continue to function properly. Thus, the later patches in this bug that
implement the delay for the debugger may, in general, reorder the delivery of
some runnables. So whereas previously runnables sent from the worker to the main
thread could simply assert that the worker was still alive, delayed runnables
will now need to use a WorkerRef to hold the worker alive until they are
processed.

This affects the timing with which weak references to workers decay. Since there
is no solid way to test such GC-sensitive APIs, this patch simply requests a
second GC. This is not guaranteed to pass, but then again, the test as it stands
is not guaranteed to pass either.

Depends on D9217
Separating these runnables out under a separate subclass will let us delay their
delivery while the content window is paused in the debugger.

CancelingOnParentRunnable, used when the worker calls self.close(), to close the
worker from the parent thread, must also be a WorkerDebuggeeRunnable, since it
must be processed only after all prior messages/errors from the worker.

Depends on D9218
Remove WorkerPrivate::mQueuedRunnables and its associated functions. The
approach they implement can never be correct, as the parent window gets
'resumed' whenever the debugger resumes execution after a breakpoint. The
interrupted JavaScript invocation has not yet completed, so it is not yet time
to run mQueuedRunnables. Simply re-enqueing them at that point can cause
messages from the worker to arrive out of order.

Instead, we create a separate ThrottledEventQueue,
WorkerPrivate::mMainThreadDebuggeeEventTarget especially for
WorkerDebuggeeRunnables, runnables sent from the worker to the main thread that
should not be delivered to a paused or frozen content window. This queue is
paused and resumed by WorkerPrivate::Freeze, WorkerPrivate::Thaw,
WorkerPrivate::ParentWindowPaused, and WorkerPrivate::ParentWindowResumed.

Since this affects when WorkerDebuggeeRunnables are delivered relative to other
administrative worker runnables, WorkerDebuggeeRunnable must use a
ThreadSafeWorkerRef to ensure that the WorkerPrivate sticks around long enough
for them to run properly.

Depends on D9219
Attachment #8967259 - Attachment is obsolete: true
Attachment #8967260 - Attachment is obsolete: true
Attachment #8967261 - Attachment is obsolete: true
Attachment #8967262 - Attachment is obsolete: true
Attachment #8967263 - Attachment is obsolete: true
Attachment #8967264 - Attachment is obsolete: true
Attachment #8967266 - Attachment is obsolete: true
Attachment #8967267 - Attachment is obsolete: true
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14b5c82201e5
Part 1: Rewrite test_cache_worker_gc.html to use async functions. r=baku
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/135b9518b925
Part 2: Give the weak reference in test_cache_worker_gc.html more time to decay. r=baku
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9acfa4df8ac9
Part 3: Create a WorkerRunnable subclass, WorkerDebuggeeRunnable, for runnables that could run debuggee JS. r=baku
https://hg.mozilla.org/mozilla-central/rev/14b5c82201e5
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 65
Oops. Not all patches are landed yet.
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0b9e4abc5911
Part 4: Segregate WorkerDebuggeeRunnables into their own queues. r=baku
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7790c75ca391
Part 5: Mochitest: pause/unpause window while worker sends messages, without breaking run-to-completion. r=baku
Status: REOPENED → RESOLVED
Closed: 11 months ago11 months ago
Keywords: leave-open
Resolution: --- → FIXED

I think this broke https://searchfox.org/mozilla-central/rev/01b4b3830ea3cae2e9e431019afa6391b471c6da/dom/workers/WorkerPrivate.cpp#2700-2702,2705-2706
We are waiting on the queue which is probably empty anyhow, since postMessage uses mMainThreadDebuggeeEventTarget

Depends on: 1527978
You need to log in before you can comment on or make changes to this bug.