Closed
Bug 1281040
Opened 8 years ago
Closed 8 years ago
service worker is not resumed if debugger window is closed
Categories
(DevTools :: Debugger, defect, P1)
Tracking
(firefox48 wontfix, firefox49 fixed, firefox-esr45 disabled, firefox50 fixed, firefox51 fixed)
RESOLVED
FIXED
Firefox 51
People
(Reporter: bkelly, Assigned: ejpbruel)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(3 files, 2 obsolete files)
20.67 KB,
patch
|
jlong
:
review+
|
Details | Diff | Splinter Review |
19.40 KB,
patch
|
ejpbruel
:
review+
ritu
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
19.43 KB,
patch
|
Details | Diff | Splinter Review |
STR: 1) Register a service worker 2) Open the site to get the service worker running 3) Open the debugger window for the service worker 4) Set a break point in the fetch event handler 5) Reload the site 6) Observer the break point being triggered 7) Close the debugger window while the worker is paused without clicking continue 8) Observe that the worker is stuck until browser exit This is a pretty big footgun for people using the debugger feature.
Reporter | ||
Comment 1•8 years ago
|
||
Eddy, can we get a quick fix for this? Its a really big footgun for people trying to use the worker debugger and seems like it might be relatively easy to fix.
Flags: needinfo?(ejpbruel)
Assignee | ||
Comment 2•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #1) > Eddy, can we get a quick fix for this? Its a really big footgun for people > trying to use the worker debugger and seems like it might be relatively easy > to fix. Ok. Will take a look at this tomorrow.
Flags: needinfo?(ejpbruel)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → ejpbruel
Assignee | ||
Comment 3•8 years ago
|
||
Marking this as P1 because bkelly indicated that this is a pretty big footgun for people trying to debug service workers, and it looks like this is low hanging fruit.
Priority: -- → P1
Assignee | ||
Comment 4•8 years ago
|
||
This patch should fix the issue. I only tested it against normal workers, but since service workers use the same code, it should work there as well.
Attachment #8766757 -
Flags: review?(jlong)
Reporter | ||
Comment 5•8 years ago
|
||
I think I see sandbox compartments being leaked by devtools when I open and close a worker debugger as well: │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:183) [103] │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:209) [103] │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:220) [103] │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:227) [103] │ ├──103 (13.05%) ── [System Principal], Addon-SDK (from: resource://gre/modules/commonjs/toolkit/loader.js:249) [103] │ ├───21 (02.66%) ── [System Principal], outOfProcessTabChildGlobal [21] │ └────8 (01.01%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/server/actors/child-process.js:35) [8] Should this be fixed by this patch as well? Or should I write a separate bug?
Flags: needinfo?(ejpbruel)
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #5) > I think I see sandbox compartments being leaked by devtools when I open and > close a worker debugger as well: > > │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: > resource://gre/modules/commonjs/toolkit/loader.js -> > resource://devtools/shared/builtin-modules.js:183) [103] > │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: > resource://gre/modules/commonjs/toolkit/loader.js -> > resource://devtools/shared/builtin-modules.js:209) [103] > │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: > resource://gre/modules/commonjs/toolkit/loader.js -> > resource://devtools/shared/builtin-modules.js:220) [103] > │ ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: > resource://gre/modules/commonjs/toolkit/loader.js -> > resource://devtools/shared/builtin-modules.js:227) [103] > │ ├──103 (13.05%) ── [System Principal], Addon-SDK (from: > resource://gre/modules/commonjs/toolkit/loader.js:249) [103] > │ ├───21 (02.66%) ── [System Principal], outOfProcessTabChildGlobal [21] > │ └────8 (01.01%) ── [System Principal], [anonymous sandbox] (from: > resource://gre/modules/commonjs/toolkit/loader.js -> > resource://devtools/server/actors/child-process.js:35) [8] > > Should this be fixed by this patch as well? Or should I write a separate > bug? It's very possible that we're still leaking stuff even if we detach from the worker client, so I would file a separate bug for this. Did you happen to try my patch with a service worker? Did it resolve the issue for you?
Flags: needinfo?(ejpbruel)
Reporter | ||
Comment 7•8 years ago
|
||
(In reply to Eddy Bruel [:ejpbruel] from comment #6) > Did you happen to try my patch with a service worker? Did it resolve the > issue for you? No. I don't have a build handy. I'll start one now.
Comment 8•8 years ago
|
||
Comment on attachment 8766757 [details] [diff] [review] Make sure we detach from worker client when target is destroyed. Review of attachment 8766757 [details] [diff] [review]: ----------------------------------------------------------------- I'm not too familiar with this area of the code, but this makes sense. Especially since I've frequently seen `this._extraPools is null` errors.
Attachment #8766757 -
Flags: review?(jlong) → review+
Reporter | ||
Comment 9•8 years ago
|
||
Unfortunately this patch does not fix the compartment leak. I filed bug 1283587.
Reporter | ||
Comment 10•8 years ago
|
||
James, can we get this into 50 and uplifted to 48/49 before the next merge?
Flags: needinfo?(jlong)
Comment 11•8 years ago
|
||
I already r+ it, eddy can you land it?
Flags: needinfo?(jlong) → needinfo?(ejpbruel)
Comment 12•8 years ago
|
||
Looks like it just needs a try push and then we can push
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to James Long (:jlongster) from comment #12) > Looks like it just needs a try push and then we can push I was planning on taking a day off tomorrow, but I can try to land this if it means we can make the uplift.
Flags: needinfo?(ejpbruel)
Assignee | ||
Comment 14•8 years ago
|
||
Try push for this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eae1115fd441
Comment 15•8 years ago
|
||
I'm happy to push it then, it's a tiny patch :)
Assignee | ||
Comment 16•8 years ago
|
||
(In reply to James Long (:jlongster) from comment #15) > I'm happy to push it then, it's a tiny patch :) Thanks, that would be great. If we can get a green try push for it tonight, you can push it tomorrow.
Comment 18•8 years ago
|
||
There are several very suspicious timeout failures for the worker tests, so it must be related to this. It's failing pretty consistently. I don't think I should push this yet.
Flags: needinfo?(jlong)
Assignee | ||
Comment 19•8 years ago
|
||
Going to look into these test failures. Sorry for the delay; I was on PTO last week.
Assignee | ||
Comment 20•8 years ago
|
||
I figured out the issue. The root of the problem is that calling postMessage on a WorkerDebugger is inherently racy: when a worker shuts down, there is a small interval during which no more runnables can be dispatched to the worker, but the worker debugger has not yet been closed. As a result, calling postMessage on a WorkerDebugger can fail, even if you check that the WorkerDebugger is not yet closed. When we detach from a worker actor, we close the connection to the corresponding worker server by closing the transport for that connection. The transport responds to this by removing itself as a listener on the WorkerDebugger that represents the connection, and then calling its onClosed hook. One place where this onClosed hook is used is here: https://dxr.mozilla.org/mozilla-central/source/devtools/server/main.js?q=server%2Fmain.js%3A843&redirect_type=direct#863 In this hook we send a message to the server requesting it to clean up its end of the connection. If the WorkerDebugger is already closed, we don’t need to send this message, but as explained above, this check is actually racy, so the call to postMessage fails. When this happens, the resulting exception propagates up all the way to here: https://dxr.mozilla.org/mozilla-central/source/devtools/server/actors/worker.js#192 This causes us to exit from the call to _detach early, so the WorkerActor is never fully detached. Detaching from the worker a second time will cause the transport to remove itself as listener from the WorkerDebugger again. Since it already removed itself as listener during our previous attempt, this will cause an error to be thrown. The failing tests are set up so that they destroy the toolbox first, and then close the worker client (note that closing an attached worker client will automatically detach from it). Previously, destroying the toolbox did not cause us to detach from the worker client, but with this patch applied it does. Consequently, these tests try to detach from the worker client twice. That shouldn’t be a problem, but if the first attempt to detach fails, due to the race condition above, the second attempt to detach will cause an error. The solution here is to make the call to wrap the call to postMessage in the onClosed trap in a try catch block. The only time this call can throw an error is if the worker is already shutting down, in which case we don’t care whether the message arrives anyway.
Assignee | ||
Comment 21•8 years ago
|
||
New patch with test failures addressed. This probably needs to be re-reviewed.
Attachment #8766757 -
Attachment is obsolete: true
Attachment #8776954 -
Flags: review?(jlong)
Assignee | ||
Comment 22•8 years ago
|
||
New try push for this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa288033b651
Assignee | ||
Comment 23•8 years ago
|
||
Ran into memory leaks on try. I suspect the cause is that we don't wait for the toolbox to be fully closed before shutting down the test. Fixed that issue. Additionally split up the browser_dbg_worker-console test into three smaller ones because I kept running into timeouts.
Attachment #8776954 -
Attachment is obsolete: true
Attachment #8776954 -
Flags: review?(jlong)
Attachment #8777482 -
Flags: review?(jlong)
Assignee | ||
Comment 24•8 years ago
|
||
New try push for this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe90311f8992
Comment 25•8 years ago
|
||
Comment on attachment 8777482 [details] [diff] [review] Make sure we detach from worker client when target is destroyed. Review of attachment 8777482 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine, not a whole lot different from last review. I can't remember if the tests were split up before, but I had a few nits. ::: devtools/client/debugger/test/mochitest/browser_dbg_worker-console-01.js @@ +3,5 @@ > + > +var TAB_URL = EXAMPLE_URL + "doc_WorkerActor.attachThread-tab.html"; > +var WORKER_URL = "code_WorkerActor.attachThread-worker.js"; > + > +function* initWorkerDebugger(TAB_URL, WORKER_URL) { Why isn't this in head.js? I guess this code shouldn't be reviewed too much, since you are just splitting the tests up. But this function at least is duplicated 3 times... ::: devtools/client/debugger/test/mochitest/browser_dbg_worker-console-03.js @@ +1,2 @@ > +// Check to make sure that a worker can be attached to a toolbox > +// and that the console works. This comment is the same in all 3 test files, can you go into a little more detail what each test does? ::: devtools/server/main.js @@ +861,5 @@ > transport.ready(); > transport.hooks = { > onClosed: () => { > if (!dbg.isClosed) { > + // If the worker happens to be shutting down, there is a small Can you explain why the worker would be shutting down sometimes, and not other times?
Attachment #8777482 -
Flags: review?(jlong) → review+
Assignee | ||
Comment 26•8 years ago
|
||
This patch is turning out to be harder than I expected :-( The tests are still leaking, because we fail to destroy the worker toolbox properly. When we try to destroy the toolbox, we hit an error in the webconsole client. There is some asynchronous bug there that causes us to destroy things either twice or in the wrong order. I am not yet sure what the problem is exactly. I suffered a system crash today that wiped out my entire local git repository, so I probably won't be able to look into this again until monday.
Assignee | ||
Comment 27•8 years ago
|
||
I think I finally figured out why those tests were failing. Here is another try run with those issues hopefully addressed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4345ac00752d
Comment 28•8 years ago
|
||
Pushed by ejpbruel@mozilla.com: https://hg.mozilla.org/integration/fx-team/rev/c0e4639e98b0 Make sure we detach from worker client when target is destroyed. r=jlongster
Comment 29•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c0e4639e98b0
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
Reporter | ||
Comment 30•8 years ago
|
||
Eddy, do you feel comfortable uplifting this to aurora or beta?
Flags: needinfo?(ejpbruel)
Assignee | ||
Comment 31•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #30) > Eddy, do you feel comfortable uplifting this to aurora or beta? I think we can do that yeah.
Flags: needinfo?(ejpbruel)
Assignee | ||
Comment 32•8 years ago
|
||
Approval Request Comment [Feature/regressing bug #]: [User impact if declined]: [Describe test coverage new/current, TreeHerder]: [Risks and why]: [String/UUID change made/needed]: Approval Request Comment [Feature/regressing bug #]: No clear candidate. [User impact if declined]: Service workers are not resumed if you close the debugger while they are paused. This is a pretty big footgun for people using the debugger. [Describe test coverage new/current, TreeHerder]: browser_dbg_worker-console-01.js browser_dbg_worker-console-01.js browser_dbg_worker-window.js [Risks and why]: Applying this patch caused some test failures because it changes the order in which certain events happen during shutdown. Those changes are all well defined, however, and the patch should have addressed all of them. [String/UUID change made/needed]: N/A
Attachment #8781563 -
Flags: review+
Attachment #8781563 -
Flags: approval-mozilla-beta?
Attachment #8781563 -
Flags: approval-mozilla-aurora?
Reporter | ||
Updated•8 years ago
|
status-firefox48:
--- → wontfix
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox-esr45:
--- → disabled
Keywords: regression
Comment on attachment 8781563 [details] [diff] [review] Uplift request Lots of new automated tests (yay!), stabilized on Nightly for a week, Aurora50+
Attachment #8781563 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 34•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/c9e56c940955
Comment 35•8 years ago
|
||
Comment on attachment 8781563 [details] [diff] [review] Uplift request We want service workers to keep on working for webdevs. Please uplift for the beta 6 build!
Attachment #8781563 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 37•8 years ago
|
||
This patch should do the trick for beta.
Flags: needinfo?(ejpbruel)
Attachment #8783575 -
Flags: checkin?(ryanvm)
Updated•8 years ago
|
Attachment #8783575 -
Flags: checkin?(ryanvm)
Comment 38•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/b21b7f9f4912
Updated•8 years ago
|
Version: unspecified → 48 Branch
Updated•6 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•