Closed
Bug 1426467
Opened 7 years ago
Closed 6 years ago
step in accidentally lands in worker.onmessage
Categories
(DevTools :: Debugger, defect)
DevTools
Debugger
Tracking
(firefox65 fixed)
RESOLVED
FIXED
Firefox 65
Tracking | Status | |
---|---|---|
firefox65 | --- | fixed |
People
(Reporter: jlast, Assigned: jimb)
References
(Blocks 1 open bug)
Details
Attachments
(6 files, 17 obsolete files)
3.40 KB,
patch
|
Details | Diff | Splinter Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1426467: Part 2: Give the weak reference in test_cache_worker_gc.html more time to decay. r?baku
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review |
When there is a worker that is sending messages down, "step in" can accidentally land in the worker.onmessage callback
STR: (https://dbg-setinterval.glitch.me/)
1. click click me
2. step over
3. step IN
AR: land in funca
ER: land in onmessage
Comment 1•7 years ago
|
||
Jim, can you investigate and forward this issue to the right person?
Flags: needinfo?(jimb)
Priority: -- → P2
Comment 2•7 years ago
|
||
Jim and I talked about this and I don't think it's a JS engine bug.
Currently in devtools/server/actors/tab.js, we have:
postNest(nestData) {
...
windowUtils.resumeTimeouts();
windowUtils.suppressEventHandling(false);
},
Un-suppressing event handling apparently fires any delayed events synchronously:
https://searchfox.org/mozilla-central/source/dom/base/nsDOMWindowUtils.cpp#1617
That seems undesirable in any case (delayed events should simply be rescheduled to fire as normal from the event loop) and the likely cause of this bug.
Comment 3•7 years ago
|
||
Profiler tells this story, which differs from comment 2 in the specifics but confirms the general idea:
(...main event loop, blah blah blah...)
-> onclick (index.html)
-> stuff (client.js:16)
-> _makeOnStep/< (devtools/server/actors/script.js:833)
-> pauseAndRespond (devtools/server/actors/script.js:894)
-> _pauseAndRespond (devtools/server/actors/script.js:707)
-> _pushThreadPause (devtools/server/actors/script.js:533)
-> bound function (self-hosted code)
-> enter (devtools/server/actors/script.js:350)
-> postNest (devtools/server/actors/tab.js:1168)
-> nsDOMWindowUtils::ResumeTimeouts()
-> nsGlobalWindowInner::Resume()
-> mozilla::dom::workers::RuntimeService::ResumeWorkersForWindow()
-> (...event dispatching goop...)
-> myWorker.onmessage (client.js:3)
(Clearing this bug's Priority field for re-triage.)
https://perfht.ml/2lPCE0W
status-firefox59:
--- → affected
Component: JavaScript Engine → Developer Tools: Debugger
Priority: P2 → --
Product: Core → Firefox
Comment 4•7 years ago
|
||
mrbkap, can you help? I think all the background you need is: the debugger's on-stack synchronous code (the code that runs when you hit a breakpoint etc.) is inadvertently firing events, synchronously. Ironically it happens when the debugger calls WindowUtils.resumeTimeouts(), trying to avoid synchronously firing events.
It's happening in WorkerPrivateParent<_>::ParentWindowResumed():
https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#3356-3358
See bug 854739.
But I don't get it. Why would events be delivered out of order? At a guess: it must be that new events from the worker would be delivered to a different queue (that is, mQueuedRunnables is not the "real" event queue). But if so, it seems like the right fix would have been to empty mQueuedRunnables into that event queue here, rather than synchronously firing the events.
I think this Thaw method can be reentered, to make matters worse.
Flags: needinfo?(jimb) → needinfo?(mrbkap)
Comment 5•7 years ago
|
||
I haven't forgotten about this, I'll try to come up with an answer for this tomorrow.
Assignee | ||
Comment 6•7 years ago
|
||
Blake, how close are you to a fix here? I was going to work on this, I just forgot to assign myself.
Comment 7•7 years ago
|
||
(jimb and I spoke on IRC about this and he's going to take it.)
Assignee: nobody → jimb
Flags: needinfo?(mrbkap)
Assignee | ||
Comment 8•7 years ago
|
||
Here's a mochitest for this problem that fails.
Attachment #8943747 -
Flags: feedback?(jlaster)
Assignee | ||
Comment 9•7 years ago
|
||
I'll bet this same bug occurs with MessageChannel as well.
Reporter | ||
Comment 10•7 years ago
|
||
Comment on attachment 8943747 [details] [diff] [review]
mochitest wip
looks good
Attachment #8943747 -
Flags: feedback?(jlaster) → feedback+
Reporter | ||
Updated•7 years ago
|
Blocks: js-devtools
Assignee | ||
Comment 11•7 years ago
|
||
Here's what's going on in this bug:
- In the midst of some JavaScript invocation, the worker's parent window hits a
breakpoint. The devtools server calls nsIDOMWindowUtils::SuspendTimeouts, and
starts a nested event loop.
- The worker sends the parent a message, enqueuing a runnable on the main thread.
- The main thread's event loop processes the runnable. Since the parent window
has timeouts suspended, MessageEventRunnable::WorkerRun stashes the runnable
in WorkerPrivateParent::mQueuedRunnables: https://goo.gl/eFgA1u
- The developer continues from the breakpoint. The devtools server calls
nsIDOMWindowUtils::ResumeTimeouts, which runs the runnables saved in
mQueuedRunnables.
- The worker message runnable calls the parent window's onmessage handler.
This is incorrect. Running the onmessage handler in the midst of the original
JavaScript invocation (which was only paused at a breakpoint) violates the
run-to-completion rule.
Assignee | ||
Comment 12•7 years ago
|
||
A proposed solution is to change ResumeTimeouts to re-queue the runnable on the
main thread, where it will run after the JavaScript execution interrupted by the
debugger has completed.
This is adequate to ensure we follow the run-to-completion rule. If the parent
hits another breakpoint and is paused again, then the message runnable may get
stashed in mQueuedRunnables again, but either way, it will only ever run when
the parent window isn't paused, directly from the event loop, so it cannot
interrupt any other JavaScript execution in the parent window.
However, this solution is still buggy, because it can reorder messages from the
worker, in the following situation:
- Parent window sends worker messages A and B, and then hits a breakpoint.
- Worker responds to message A.
- Main thread event loop processes the response runnable. Since the parent
window is paused, the runnable stashes A in mQueuedRunnables.
- Worker responds to message B.
- Before reaching B's runnable, then main thread event loop resumes the parent.
ResumeTimeouts removes A from mQueuedRunnables and re-enqueues it for the main
thread --- *after* B.
Assignee | ||
Comment 13•7 years ago
|
||
I think respecting the run-to-completion rule is more important than the race condition above. Submitted the race condition as a follow-up bug 1433317.
Assignee | ||
Comment 14•7 years ago
|
||
I think it's okay to land a fix for this even if it re-orders messages, because I can concoct a scenario where the present code will re-order messages too:
- While parent windows runs JS, worker thread sends messages A and B to its
parent window.
- Parent window hits a breakpoint. The devtools server calls SuspendTimeouts on
the parent window, and enters a nested loop.
- The nested event loop runs the A and B runnables. Since their window is
paused, they are stashed in mQueuedRunnables.
- Developer continues from breakpoint. ResumeTimeouts (as currently implemented)
makes a copy of mQueuedRunnables and begins running them in order.
- While running A's onmessage handler, the code hits a breakpoint. The devtools
server calls SuspendTimeouts on the parent window and enters a second nested
event loop.
- The worker sends a third message C to the parent window.
- The second nested event loop runs C's runnable. Since the parent window is
paused, it is stashed in mQueuedRunnables.
- The developer resumes from that breakpoint. A second invocation of
ResumeTimeouts makes a second copy of mQueuedRunnables, runs C's onmessage
handler, and returns.
- Control returns to A's onmessage handler, after its breakpoint. A's onmessage
handler completes.
- The first ResumeTimeouts call now turns to the next runnable in its copy of
mQueuedRunnables, which is B's onmessage handler.
So A's onmessage handler runs first, and is interrupted by C's. Then, B's
onmessage handler runs.
Assignee | ||
Comment 15•7 years ago
|
||
I think the overall lesson here is that nobody has any real idea what is going on. We need an architecture in which it's pretty clear that debugging does not reorder messages or break run-to-completion.
Assignee | ||
Comment 16•7 years ago
|
||
Assignee | ||
Comment 17•7 years ago
|
||
Attachment #8943747 -
Attachment is obsolete: true
Attachment #8949473 -
Flags: review?(bkelly)
Comment 18•7 years ago
|
||
Comment on attachment 8949473 [details] [diff] [review]
Re-enqueue messages from workers when debugger pause ends; don't run them immediately.
Review of attachment 8949473 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for all the excellent documentation!
::: dom/workers/WorkerPrivate.cpp
@@ +2069,5 @@
> mQueuedRunnables.SwapElements(runnables);
>
> for (uint32_t index = 0; index < runnables.Length(); index++) {
> + if (!NS_SUCCEEDED(ParentAsWorkerPrivate()->DispatchToMainThread(runnables[index].forget())))
> + MOZ_ASSERT_UNREACHABLE("Re-dispatch of delayed runnable failed");
nit: This can just be:
MOZ_ALWAYS_SUCCEEDS(
ParentAsWorkerPrivate()->DispatchToMainThread(runnables[index].forget()));
Attachment #8949473 -
Flags: review?(bkelly) → review+
Assignee | ||
Comment 19•7 years ago
|
||
Revised for nits, carrying over r+
Attachment #8949473 -
Attachment is obsolete: true
Attachment #8949565 -
Flags: review+
Assignee | ||
Comment 20•7 years ago
|
||
Revised for ESlint complaints. Carrying over r+.
Attachment #8949565 -
Attachment is obsolete: true
Attachment #8949569 -
Flags: review+
Assignee | ||
Comment 21•7 years ago
|
||
Verified fixed with original test case.
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 22•7 years ago
|
||
There were conflicts while landing this patch.
@jimb can you please take a look?
Flags: needinfo?(jimb)
Keywords: checkin-needed
Assignee | ||
Comment 23•7 years ago
|
||
Oh, it seems like WorkerPrivateParent went away --- FANTASTIC.
(Bug 1435263)
Should be easy to fix.
Flags: needinfo?(jimb)
Assignee | ||
Comment 24•7 years ago
|
||
Rebased on current inbound; carrying over r=bkelly.
Attachment #8949569 -
Attachment is obsolete: true
Attachment #8949905 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 25•7 years ago
|
||
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/42deb2563aa8
Re-enqueue messages from workers when debugger pause ends; don't run them immediately. r=bkelly
Keywords: checkin-needed
Comment 26•7 years ago
|
||
Backed out changeset 42deb2563aa8 (bug 1426467) for ESlint failure at /devtools/server/tests/mochitest/suspendTimeouts_content.js on a CLOSED TREE
Failure push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=aad65073f7569d72c33eb0930a7a162b2d12d0a0&filter-classifiedState=unclassified&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception
Failure log:
https://treeherder.mozilla.org/logviewer.html#?job_id=161716297&repo=mozilla-inbound&lineNumber=245
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/de8a9bb319a0d470fa586e962ab7f8e82da5e9c2
Flags: needinfo?(jimb)
Assignee | ||
Comment 27•7 years ago
|
||
Amended to work around ESlint's misidentification of functions in content that are called from the mochitest's chrome window as "unused".
Attachment #8949905 -
Attachment is obsolete: true
Flags: needinfo?(jimb)
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 28•7 years ago
|
||
Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=829452d518dfe914bc95bf631587c6dc847590dc
I'm not sure that this will run the ESlint tests; we'll see. But I've never used ESlint export comments before, so further problems are likely; it'd be rude to mark it checkin-needed and make the sheriffs deal with it if it fails.
Assignee | ||
Comment 29•7 years ago
|
||
Checked with `mach lint` locally. Re-requesting checkin.
Keywords: checkin-needed
Assignee | ||
Updated•7 years ago
|
Status: NEW → ASSIGNED
Comment 30•7 years ago
|
||
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab4aab822769
Re-enqueue messages from workers when debugger pause ends; don't run them immediately. r=bkelly
Keywords: checkin-needed
Comment 31•7 years ago
|
||
Backed out changeset ab4aab822769 (bug 1426467) for Mochitest failure on dom/workers/test/test_suspend.html
Log of the failure:
https://treeherder.mozilla.org/logviewer.html#?job_id=162250118&repo=mozilla-inbound&lineNumber=2372
Backout changeset:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd94c4c474d3d3d658d33c5ebc49f22ac0fad0fa
Flags: needinfo?(jimb)
Assignee | ||
Comment 33•7 years ago
|
||
This failure seems to be in dom/workers/test/test_suspend.html:
is(lastCount, data - 1, "Missed a message, suspend failed!");
https://searchfox.org/mozilla-central/rev/4234703a532006c5ef9ce09b4c487d88124526a0/dom/workers/test/test_suspend.html#112
The actual failure is:
INFO TEST-UNEXPECTED-FAIL | dom/workers/test/test_suspend.html | Missed a message, suspend failed! - got 28, expected 42
I'm unable to reproduce this locally using: `mach test --verify --headless dom/workers/test/test_suspend.html
Assignee | ||
Comment 34•7 years ago
|
||
It seems like my patch can reorder messages when we navigate back to a page with workers, even if the debugger is not involved at all.
I added a call to the sleep function in WorkerPrivate::Freeze, to ensure the worker could get some messages enqueued after the parent window had started to freeze. I added another sleep to WorkerPrivate::ParentWindowResumed, to let the re-awakened worker queue some messages before the deferred message runnables are re-dispatched. These changes make the mochitest fail locally for me.
Obviously, adding sleep calls should never affect the correctness of the code, so there's a race condition.
Assignee | ||
Comment 35•7 years ago
|
||
Just for reference, here's exactly where I added the 'sleep' calls to bring out the bug.
Assignee | ||
Comment 36•7 years ago
|
||
nsGlobalWindowInner::Thaw calls ThawInternal, which causes frozen workers to resume execution; and then it calls Resume, which is what re-dispatches the messages. If the messages were re-dispatched before the workers resumed execution, then the new messages couldn't jump ahead of the old ones. WorkerPrivate::Thaw is probably where this needs to happen, but it carefully avoids running the deferred runnables.
Assignee | ||
Comment 37•7 years ago
|
||
Attachment #8963839 -
Flags: review?(nfroyd)
Assignee | ||
Comment 38•7 years ago
|
||
Only requesting feedback at the moment; I haven't actually used this to fix the bug yet, but I'm pretty sure it's what we want.
Attachment #8963840 -
Flags: feedback?(nfroyd)
Comment 39•7 years ago
|
||
Comment on attachment 8963840 [details] [diff] [review]
Add Pause, Resume, and IsPaused methods to ThrottledEventQueue.
Review of attachment 8963840 [details] [diff] [review]:
-----------------------------------------------------------------
Canceling feedback not because I think we don't want this, but because this implementation raises a lot of questions for me, see below.
::: xpcom/threads/ThrottledEventQueue.cpp
@@ +185,5 @@
> {
> MutexAutoLock lock(mMutex);
>
> + // If this queue is paused, free the executor and return without running
> + // anything. We'll create a new executor when we're resumed.
Do we want to add logic in Dispatch so we're not dispatching do-nothing executors while the queue is paused? I guess having do-nothing executors is convenient, if wasteful...
@@ +188,5 @@
> + // If this queue is paused, free the executor and return without running
> + // anything. We'll create a new executor when we're resumed.
> + if (IsPaused(lock)) {
> + // Note, this breaks a ref cycle.
> + mExecutor = nullptr;
Note the comment above that was carried over into EnsureExecutor, where mExecutor was keeping the inner queue alive until the queue was drained...with this implementation of pausing, I guess we're no longer holding that invariant.
So maybe we not only need do-nothing executors, but we even need to keep dispatching do-nothing executors from the executor itself while the queue is paused? That seems even more wasteful, since we might be constantly running do-nothing executors while the queue is paused, which potentially wouldn't let the underlying thread sleep.
@@ +340,5 @@
> MOZ_ASSERT(!onBaseTarget);
> #endif
>
> MutexAutoLock lock(mMutex);
> + MOZ_ASSERT(!IsPaused(lock));
Do we want this same assertion somewhere in the shutdown process?
@@ +375,5 @@
> +
> + if (mPauseCount == 1) {
> + // We're about to no longer be paused. If we have events to run, make sure
> + // we have an executor in flight to run them. This is fallible, so do this
> + // before adjusting the pause count.
Is it better to have Resume() fail and require the caller to do error handling or is it better to silently eat the error here and make Resume() infallible? I don't know what suitable error handling looks like in the former case. In the latter case, the client will at least see failures come up in Dispatch, which seems like a more reasonable place to handle errors, since presumably the client is already equipped to do that.
Either way opens us up to the unfortunate situation of having runnables in the queue, but not having an executor to process them, which seems like a situation likely to produce intermittent test failures due to leaks at some point.
::: xpcom/threads/ThrottledEventQueue.h
@@ +111,5 @@
> + // Resume execution of events from this ThrottledEventQueue.
> + //
> + // Pairs of calls to Pause and Resume may nest. Only when Resume has been
> + // called as many times as Pause are events run. You must not make an
> + // unmatched call to Resume.
Since Resume can fail, one can make matched calls to Pause and Resume and still not have a queue in an unpaused state, correct? So this should really make the "successful call" requirement of Resume explicit, shouldn't it?
Attachment #8963840 -
Flags: feedback?(nfroyd)
Comment 40•7 years ago
|
||
Comment on attachment 8963839 [details] [diff] [review]
Add GTests for ThrottledEventQueue.
Review of attachment 8963839 [details] [diff] [review]:
-----------------------------------------------------------------
Thank you for writing all these!
::: xpcom/tests/gtest/TestThrottledEventQueue.cpp
@@ +25,5 @@
> +
> +namespace TestThrottledEventQueue {
> +
> +// An nsIRunnable that simply calls a closure (or any C++ callable).
> +struct FunctionRunnable final : nsIRunnable {
Can we just use NS_NewRunnableFunction instead?
Attachment #8963839 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 41•7 years ago
|
||
Thanks very much for the helpful feedback!
(In reply to Nathan Froyd [:froydnj] from comment #39)
> ::: xpcom/threads/ThrottledEventQueue.cpp
> @@ +185,5 @@
> > {
> > MutexAutoLock lock(mMutex);
> >
> > + // If this queue is paused, free the executor and return without running
> > + // anything. We'll create a new executor when we're resumed.
>
> Do we want to add logic in Dispatch so we're not dispatching do-nothing
> executors while the queue is paused? I guess having do-nothing executors is
> convenient, if wasteful...
There is already an IsPaused check in Inner::Dispatch around the call to EnsureExecutor. Is that not what you mean?
> @@ +188,5 @@
> > + // If this queue is paused, free the executor and return without running
> > + // anything. We'll create a new executor when we're resumed.
> > + if (IsPaused(lock)) {
> > + // Note, this breaks a ref cycle.
> > + mExecutor = nullptr;
>
> Note the comment above that was carried over into EnsureExecutor, where
> mExecutor was keeping the inner queue alive until the queue was
> drained...with this implementation of pausing, I guess we're no longer
> holding that invariant.
>
> So maybe we not only need do-nothing executors, but we even need to keep
> dispatching do-nothing executors from the executor itself while the queue is
> paused? That seems even more wasteful, since we might be constantly running
> do-nothing executors while the queue is paused, which potentially wouldn't
> let the underlying thread sleep.
I think this is unnecessary: If the queue will ever be unpaused, there must be a reference to the ThrottledEventQueue, so it won't get collected. If there is no reference to the ThrottledEventQueue, then it will never dispatch events again, so it needn't be kept alive.
> @@ +340,5 @@
> > MOZ_ASSERT(!onBaseTarget);
> > #endif
> >
> > MutexAutoLock lock(mMutex);
> > + MOZ_ASSERT(!IsPaused(lock));
>
> Do we want this same assertion somewhere in the shutdown process?
The answer to this would follow from the answer to my question above.
>
> @@ +375,5 @@
> > +
> > + if (mPauseCount == 1) {
> > + // We're about to no longer be paused. If we have events to run, make sure
> > + // we have an executor in flight to run them. This is fallible, so do this
> > + // before adjusting the pause count.
>
> Is it better to have Resume() fail and require the caller to do error
> handling or is it better to silently eat the error here and make Resume()
> infallible? I don't know what suitable error handling looks like in the
> former case. In the latter case, the client will at least see failures come
> up in Dispatch, which seems like a more reasonable place to handle errors,
> since presumably the client is already equipped to do that.
>
> Either way opens us up to the unfortunate situation of having runnables in
> the queue, but not having an executor to process them, which seems like a
> situation likely to produce intermittent test failures due to leaks at some
> point.
I think Resume should have exactly the same error-handling requirements as Dispatch. After all, it must do a dispatch. I've never really understood how Dispatch failure should be handled, either.
> ::: xpcom/threads/ThrottledEventQueue.h
> @@ +111,5 @@
> > + // Resume execution of events from this ThrottledEventQueue.
> > + //
> > + // Pairs of calls to Pause and Resume may nest. Only when Resume has been
> > + // called as many times as Pause are events run. You must not make an
> > + // unmatched call to Resume.
>
> Since Resume can fail, one can make matched calls to Pause and Resume and
> still not have a queue in an unpaused state, correct? So this should really
> make the "successful call" requirement of Resume explicit, shouldn't it?
Yes, thanks.
I noticed that a Paused, non-empty ThrottledEventQueue will also cause the call to SpinEventLoopUntil in Inner::Observe to hang. I'd like to bring the shutdown behavior under test, so that the tests would have caught this mistake.
Assignee | ||
Comment 42•7 years ago
|
||
(In reply to Jim Blandy :jimb from comment #41)
> I think Resume should have exactly the same error-handling requirements as
> Dispatch. After all, it must do a dispatch. I've never really understood how
> Dispatch failure should be handled, either.
On that topic: Is it really okay that the existing code calls NS_DispatchToMainThread but ignores the return value?
Assignee | ||
Comment 43•7 years ago
|
||
This patch removes all the specialized shutdown behavior from ThrottledEventQueue.
Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2b9cdc4257292e62f0e3b467ad86522b1c49e91
At the moment, it seems green. If the good weather holds, that'd be pretty awesome...
Assignee | ||
Comment 44•7 years ago
|
||
Assignee | ||
Comment 45•7 years ago
|
||
Assignee | ||
Comment 46•7 years ago
|
||
Assignee | ||
Comment 47•7 years ago
|
||
Assignee | ||
Comment 48•7 years ago
|
||
Assignee | ||
Comment 49•7 years ago
|
||
Asking for re-review, since I added a few more tests, and implemented requested changes.
Attachment #8963839 -
Attachment is obsolete: true
Attachment #8967264 -
Flags: review?(nfroyd)
Assignee | ||
Comment 50•7 years ago
|
||
Attachment #8963840 -
Attachment is obsolete: true
Assignee | ||
Comment 51•7 years ago
|
||
Attachment #8950799 -
Attachment is obsolete: true
Assignee | ||
Comment 52•7 years ago
|
||
Comment on attachment 8965915 [details] [diff] [review]
Simplify ThrottledEventQueue's shutdown behavior.
Obsoleted by bug 1452410.
Attachment #8965915 -
Attachment is obsolete: true
Comment 53•7 years ago
|
||
Comment on attachment 8967264 [details] [diff] [review]
Part 6: Add GTests for ThrottledEventQueue.
Review of attachment 8967264 [details] [diff] [review]:
-----------------------------------------------------------------
Thank you!
::: xpcom/tests/gtest/TestThrottledEventQueue.cpp
@@ +284,5 @@
> + ASSERT_TRUE(NS_SUCCEEDED(rv));
> +
> + // We can't guarantee that the thread has reached the AwaitIdle call, but we
> + // can get pretty close. Either way, it shouldn't affect the behavior of the
> + // test.
If it doesn't matter, why are we sleeping here at all?
Couldn't we guarantee that the thread has reached the AwaitIdle call by re-using `cond` for signaling that AwaitIdle() had finished?
Attachment #8967264 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 54•7 years ago
|
||
The patch in comment 51 isn't right (well, it was a work-in-progress, so that's expected): it pauses all communication on the mMainThreadEventTarget, but I gather there are many other runnables posted on that target that do need to be delivered, so it's incorrect to pause them.
For example, dom/workers/test/test_WorkerDebugger_suspended.xul now hangs.
Assignee | ||
Comment 55•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #53)
> ::: xpcom/tests/gtest/TestThrottledEventQueue.cpp
> @@ +284,5 @@
> > + ASSERT_TRUE(NS_SUCCEEDED(rv));
> > +
> > + // We can't guarantee that the thread has reached the AwaitIdle call, but we
> > + // can get pretty close. Either way, it shouldn't affect the behavior of the
> > + // test.
>
> If it doesn't matter, why are we sleeping here at all?
It's not important, and could come out. I was interested in exercising the case where AwaitIdle actually has to block on the condition variable, not just take the mutex, see that all is well, and go on. There's no way to really ensure that without adding testing-only interfaces to TEQ, and the sleep seemed like it would make sure that path was exercised almost all the time.
I'll improve the comment.
> Couldn't we guarantee that the thread has reached the AwaitIdle call by
> re-using `cond` for signaling that AwaitIdle() had finished?
No, because the test wants to not run the base queue until it's sure the thread is blocked on the condvar in AwaitIdle. Without access to the condvar hidden inside the TEQ and some API to see whether anyone's blocking on it, there's no way to ensure that.
Updated•6 years ago
|
Product: Firefox → DevTools
Assignee | ||
Comment 56•6 years ago
|
||
Rebased and simplified, new try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=81b72b75390a7ec7a323a96cddb1a2722a275baf
status-firefox59:
affected → ---
Assignee | ||
Comment 57•6 years ago
|
||
Assignee | ||
Comment 58•6 years ago
|
||
In my attempts to simplify, I mistakenly concluded that using the WorkerRef to hold the worker alive meant that CancelingOnParentRunnable didn't need to be a WorkerDebuggeeRunnable any more.
Some of the unfreeze/unpause rules were wrong.
New try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=35a21ff5806e775584954c870d8c35aac3b84710
Assignee | ||
Comment 59•6 years ago
|
||
This is almost certainly caused by my patch:
TEST-UNEXPECTED-FAIL | dom/cache/test/mochitest/test_cache_worker_gc.html | worker weak reference should be garbage collected
I don't know what that refers to, but I do introduce a new worker reference, so perhaps I'm doing something wrong there.
ESLint has some complaints about my new test.
Other than that it looks pretty clean.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 60•6 years ago
|
||
I've prepared a revised, shorter version of this patch stack. Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=817bc6bd73955a12de034fc3bcda637a6eef140d
Assignee | ||
Comment 61•6 years ago
|
||
This just cleans up the function a bit to make the next change easier to see. No
behavior change intended.
Assignee | ||
Comment 62•6 years ago
|
||
Certain runnables sent from the worker to the content window must be delayed if
the content is paused in the JavaScript debugger. For example, delivering
onmessage events while stopped at a breakpoint would violate the DOM's
run-to-completion rule.
However, other sorts of runnables must be delivered promptly if the worker is
continue to function properly. Thus, the later patches in this bug that
implement the delay for the debugger may, in general, reorder the delivery of
some runnables. So whereas previously runnables sent from the worker to the main
thread could simply assert that the worker was still alive, delayed runnables
will now need to use a WorkerRef to hold the worker alive until they are
processed.
This affects the timing with which weak references to workers decay. Since there
is no solid way to test such GC-sensitive APIs, this patch simply requests a
second GC. This is not guaranteed to pass, but then again, the test as it stands
is not guaranteed to pass either.
Depends on D9217
Assignee | ||
Comment 63•6 years ago
|
||
Separating these runnables out under a separate subclass will let us delay their
delivery while the content window is paused in the debugger.
CancelingOnParentRunnable, used when the worker calls self.close(), to close the
worker from the parent thread, must also be a WorkerDebuggeeRunnable, since it
must be processed only after all prior messages/errors from the worker.
Depends on D9218
Assignee | ||
Comment 64•6 years ago
|
||
Remove WorkerPrivate::mQueuedRunnables and its associated functions. The
approach they implement can never be correct, as the parent window gets
'resumed' whenever the debugger resumes execution after a breakpoint. The
interrupted JavaScript invocation has not yet completed, so it is not yet time
to run mQueuedRunnables. Simply re-enqueing them at that point can cause
messages from the worker to arrive out of order.
Instead, we create a separate ThrottledEventQueue,
WorkerPrivate::mMainThreadDebuggeeEventTarget especially for
WorkerDebuggeeRunnables, runnables sent from the worker to the main thread that
should not be delivered to a paused or frozen content window. This queue is
paused and resumed by WorkerPrivate::Freeze, WorkerPrivate::Thaw,
WorkerPrivate::ParentWindowPaused, and WorkerPrivate::ParentWindowResumed.
Since this affects when WorkerDebuggeeRunnables are delivered relative to other
administrative worker runnables, WorkerDebuggeeRunnable must use a
ThreadSafeWorkerRef to ensure that the WorkerPrivate sticks around long enough
for them to run properly.
Depends on D9219
Assignee | ||
Comment 65•6 years ago
|
||
Detailed comments in the test itself.
Assignee | ||
Updated•6 years ago
|
Attachment #8967259 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8967260 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8967261 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8967262 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8967263 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8967264 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8967266 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8967267 -
Attachment is obsolete: true
Comment 66•6 years ago
|
||
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14b5c82201e5
Part 1: Rewrite test_cache_worker_gc.html to use async functions. r=baku
Comment 67•6 years ago
|
||
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/135b9518b925
Part 2: Give the weak reference in test_cache_worker_gc.html more time to decay. r=baku
Comment 68•6 years ago
|
||
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9acfa4df8ac9
Part 3: Create a WorkerRunnable subclass, WorkerDebuggeeRunnable, for runnables that could run debuggee JS. r=baku
Comment 69•6 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox65:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 65
Comment 70•6 years ago
|
||
bugherder |
Assignee | ||
Comment 71•6 years ago
|
||
Oops. Not all patches are landed yet.
Comment 72•6 years ago
|
||
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0b9e4abc5911
Part 4: Segregate WorkerDebuggeeRunnables into their own queues. r=baku
Comment 73•6 years ago
|
||
Pushed by jblandy@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7790c75ca391
Part 5: Mochitest: pause/unpause window while worker sends messages, without breaking run-to-completion. r=baku
Comment 74•6 years ago
|
||
bugherder |
Assignee | ||
Updated•6 years ago
|
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Comment 75•6 years ago
|
||
I think this broke https://searchfox.org/mozilla-central/rev/01b4b3830ea3cae2e9e431019afa6391b471c6da/dom/workers/WorkerPrivate.cpp#2700-2702,2705-2706
We are waiting on the queue which is probably empty anyhow, since postMessage uses mMainThreadDebuggeeEventTarget
You need to log in
before you can comment on or make changes to this bug.
Description
•