Closed Bug 1281040 Opened 8 years ago Closed 8 years ago

service worker is not resumed if debugger window is closed

Tracking

(firefox48 wontfix, firefox49 fixed, firefox-esr45 disabled, firefox50 fixed, firefox51 fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 51

Tracking Flags:

Tracking

Status

firefox48

---

wontfix

firefox49

---

fixed

firefox-esr45

---

disabled

firefox50

---

fixed

firefox51

---

fixed

People

(Reporter: bkelly, Assigned: ejpbruel)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(3 files, 2 obsolete files)

Make sure we detach from worker client when target is destroyed. 8 years ago Eddy Bruel [:ejpbruel] 2.59 KB, patch	jlong : review+	Details \| Diff \| Splinter Review
Make sure we detach from worker client when target is destroyed. 8 years ago Eddy Bruel [:ejpbruel] 7.98 KB, patch		Details \| Diff \| Splinter Review
Make sure we detach from worker client when target is destroyed. 8 years ago Eddy Bruel [:ejpbruel] 20.67 KB, patch	jlong : review+	Details \| Diff \| Splinter Review
Uplift request 8 years ago Eddy Bruel [:ejpbruel] 19.40 KB, patch	ejpbruel : review+ ritu : approval-mozilla-aurora+ lizzard : approval-mozilla-beta+	Details \| Diff \| Splinter Review
Rebased patch for beta. 8 years ago Eddy Bruel [:ejpbruel] 19.43 KB, patch		Details \| Diff \| Splinter Review

Ben Kelly [:bkelly, not reviewing]

Reporter

Description

•

8 years ago

STR:

1) Register a service worker
2) Open the site to get the service worker running
3) Open the debugger window for the service worker
4) Set a break point in the fetch event handler
5) Reload the site
6) Observer the break point being triggered
7) Close the debugger window while the worker is paused without clicking continue
8) Observe that the worker is stuck until browser exit

This is a pretty big footgun for people using the debugger feature.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 1

•

8 years ago

Eddy, can we get a quick fix for this?  Its a really big footgun for people trying to use the worker debugger and seems like it might be relatively easy to fix.

Flags: needinfo?(ejpbruel)

Eddy Bruel [:ejpbruel]

Assignee

Comment 2

•

8 years ago

(In reply to Ben Kelly [:bkelly] from comment #1)
> Eddy, can we get a quick fix for this?  Its a really big footgun for people
> trying to use the worker debugger and seems like it might be relatively easy
> to fix.

Ok. Will take a look at this tomorrow.

Flags: needinfo?(ejpbruel)

Eddy Bruel [:ejpbruel]

Assignee

Updated

•

8 years ago

Assignee: nobody → ejpbruel

Eddy Bruel [:ejpbruel]

Assignee

Comment 3

•

8 years ago

Marking this as P1 because bkelly indicated that this is a pretty big footgun for people trying to debug service workers, and it looks like this is low hanging fruit.

Priority: -- → P1

Eddy Bruel [:ejpbruel]

Assignee

Comment 4

•

8 years ago

Attached patch Make sure we detach from worker client when target is destroyed. (obsolete) — Details — Splinter Review

This patch should fix the issue. I only tested it against normal workers, but since service workers use the same code, it should work there as well.

Attachment #8766757 - Flags: review?(jlong)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 5

•

8 years ago

I think I see sandbox compartments being leaked by devtools when I open and close a worker debugger as well:

│  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:183) [103]
│  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:209) [103]
│  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:220) [103]
│  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/builtin-modules.js:227) [103]
│  ├──103 (13.05%) ── [System Principal], Addon-SDK (from: resource://gre/modules/commonjs/toolkit/loader.js:249) [103]
│  ├───21 (02.66%) ── [System Principal], outOfProcessTabChildGlobal [21]
│  └────8 (01.01%) ── [System Principal], [anonymous sandbox] (from: resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/server/actors/child-process.js:35) [8]

Should this be fixed by this patch as well?  Or should I write a separate bug?

Flags: needinfo?(ejpbruel)

Eddy Bruel [:ejpbruel]

Assignee

Comment 6

•

8 years ago

(In reply to Ben Kelly [:bkelly] from comment #5)
> I think I see sandbox compartments being leaked by devtools when I open and
> close a worker debugger as well:
> 
> │  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from:
> resource://gre/modules/commonjs/toolkit/loader.js ->
> resource://devtools/shared/builtin-modules.js:183) [103]
> │  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from:
> resource://gre/modules/commonjs/toolkit/loader.js ->
> resource://devtools/shared/builtin-modules.js:209) [103]
> │  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from:
> resource://gre/modules/commonjs/toolkit/loader.js ->
> resource://devtools/shared/builtin-modules.js:220) [103]
> │  ├──103 (13.05%) ── [System Principal], [anonymous sandbox] (from:
> resource://gre/modules/commonjs/toolkit/loader.js ->
> resource://devtools/shared/builtin-modules.js:227) [103]
> │  ├──103 (13.05%) ── [System Principal], Addon-SDK (from:
> resource://gre/modules/commonjs/toolkit/loader.js:249) [103]
> │  ├───21 (02.66%) ── [System Principal], outOfProcessTabChildGlobal [21]
> │  └────8 (01.01%) ── [System Principal], [anonymous sandbox] (from:
> resource://gre/modules/commonjs/toolkit/loader.js ->
> resource://devtools/server/actors/child-process.js:35) [8]
> 
> Should this be fixed by this patch as well?  Or should I write a separate
> bug?

It's very possible that we're still leaking stuff even if we detach from the worker client, so I would file a separate bug for this.

Did you happen to try my patch with a service worker? Did it resolve the issue for you?

Flags: needinfo?(ejpbruel)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 7

•

8 years ago

(In reply to Eddy Bruel [:ejpbruel] from comment #6)
> Did you happen to try my patch with a service worker? Did it resolve the
> issue for you?

No.  I don't have a build handy.  I'll start one now.

James Long (:jlongster)

Comment 8

•

8 years ago

Comment on attachment 8766757 [details] [diff] [review]
Make sure we detach from worker client when target is destroyed.

Review of attachment 8766757 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not too familiar with this area of the code, but this makes sense. Especially since I've frequently seen `this._extraPools is null` errors.

Attachment #8766757 - Flags: review?(jlong) → review+

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 9

•

8 years ago

Unfortunately this patch does not fix the compartment leak.  I filed bug 1283587.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 10

•

8 years ago

James, can we get this into 50 and uplifted to 48/49 before the next merge?

Flags: needinfo?(jlong)

James Long (:jlongster)

Comment 11

•

8 years ago

I already r+ it, eddy can you land it?

Flags: needinfo?(jlong) → needinfo?(ejpbruel)

James Long (:jlongster)

Comment 12

•

8 years ago

Looks like it just needs a try push and then we can push

Eddy Bruel [:ejpbruel]

Assignee

Comment 13

•

8 years ago

(In reply to James Long (:jlongster) from comment #12)
> Looks like it just needs a try push and then we can push

I was planning on taking a day off tomorrow, but I can try to land this if it means we can make the uplift.

Flags: needinfo?(ejpbruel)

Eddy Bruel [:ejpbruel]

Assignee

Comment 14

•

8 years ago

Try push for this patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eae1115fd441

James Long (:jlongster)

Comment 15

•

8 years ago

I'm happy to push it then, it's a tiny patch :)

Eddy Bruel [:ejpbruel]

Assignee

Comment 16

•

8 years ago

(In reply to James Long (:jlongster) from comment #15)
> I'm happy to push it then, it's a tiny patch :)

Thanks, that would be great. If we can get a green try push for it tonight, you can push it tomorrow.

James Long (:jlongster)

Comment 17

•

8 years ago

ni? myself to remind me

Flags: needinfo?(jlong)

James Long (:jlongster)

Comment 18

•

8 years ago

There are several very suspicious timeout failures for the worker tests, so it must be related to this. It's failing pretty consistently. I don't think I should push this yet.

Flags: needinfo?(jlong)

Eddy Bruel [:ejpbruel]

Assignee

Comment 19

•

8 years ago

Going to look into these test failures. Sorry for the delay; I was on PTO last week.

Eddy Bruel [:ejpbruel]

Assignee

Comment 20

•

8 years ago

I figured out the issue. The root of the problem is that calling postMessage on
a WorkerDebugger is inherently racy: when a worker shuts down, there is a small
interval during which no more runnables can be dispatched to the worker, but
the worker debugger has not yet been closed. As a result, calling postMessage on
a WorkerDebugger can fail, even if you check that the WorkerDebugger is not yet
closed.

When we detach from a worker actor, we close the connection to the corresponding
worker server by closing the transport for that connection. The transport
responds to this by removing itself as a listener on the WorkerDebugger that
represents the connection, and then calling its onClosed hook.

One place where this onClosed hook is used is here:
https://dxr.mozilla.org/mozilla-central/source/devtools/server/main.js?q=server%2Fmain.js%3A843&redirect_type=direct#863

In this hook we send a message to the server requesting it to clean up its end
of the connection. If the WorkerDebugger is already closed, we don’t need to
send this message, but as explained above, this check is actually racy, so the
call to postMessage fails.

When this happens, the resulting exception propagates up all the way to here:
https://dxr.mozilla.org/mozilla-central/source/devtools/server/actors/worker.js#192

This causes us to exit from the call to _detach early, so the WorkerActor is
never fully detached. Detaching from the worker a second time will cause the
transport to remove itself as listener from the WorkerDebugger again. Since it
already removed itself as listener during our previous attempt, this will cause
an error to be thrown.

The failing tests are set up so that they destroy the toolbox first, and then
close the worker client (note that closing an attached worker client will
automatically detach from it).

Previously, destroying the toolbox did not cause us to detach from the worker
client, but with this patch applied it does. Consequently, these tests try to
detach from the worker client twice. That shouldn’t be a problem, but if the
first attempt to detach fails, due to the race condition above, the second
attempt to detach will cause an error.

The solution here is to make the call to wrap the call to postMessage in the
onClosed trap in a try catch block. The only time this call can throw an error
is if the worker is already shutting down, in which case we don’t care whether
the message arrives anyway.

Eddy Bruel [:ejpbruel]

Assignee

Comment 21

•

8 years ago

Attached patch Make sure we detach from worker client when target is destroyed. (obsolete) — Details — Splinter Review

New patch with test failures addressed. This probably needs to be re-reviewed.

Attachment #8766757 - Attachment is obsolete: true

Attachment #8776954 - Flags: review?(jlong)

Eddy Bruel [:ejpbruel]

Assignee

Comment 22

•

8 years ago

New try push for this patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa288033b651

Eddy Bruel [:ejpbruel]

Assignee

Comment 23

•

8 years ago

Attached patch Make sure we detach from worker client when target is destroyed. — Details — Splinter Review

Ran into memory leaks on try. I suspect the cause is that we don't wait for the toolbox to be fully closed before shutting down the test. Fixed that issue. Additionally split up the browser_dbg_worker-console test into three smaller ones because I kept running into timeouts.

Attachment #8776954 - Attachment is obsolete: true

Attachment #8776954 - Flags: review?(jlong)

Attachment #8777482 - Flags: review?(jlong)

Eddy Bruel [:ejpbruel]

Assignee

Comment 24

•

8 years ago

New try push for this patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe90311f8992

James Long (:jlongster)

Comment 25

•

8 years ago

Comment on attachment 8777482 [details] [diff] [review]
Make sure we detach from worker client when target is destroyed.

Review of attachment 8777482 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine, not a whole lot different from last review. I can't remember if the tests were split up before, but I had a few nits.

::: devtools/client/debugger/test/mochitest/browser_dbg_worker-console-01.js
@@ +3,5 @@
> +
> +var TAB_URL = EXAMPLE_URL + "doc_WorkerActor.attachThread-tab.html";
> +var WORKER_URL = "code_WorkerActor.attachThread-worker.js";
> +
> +function* initWorkerDebugger(TAB_URL, WORKER_URL) {

Why isn't this in head.js?

I guess this code shouldn't be reviewed too much, since you are just splitting the tests up. But this function at least is duplicated 3 times...

::: devtools/client/debugger/test/mochitest/browser_dbg_worker-console-03.js
@@ +1,2 @@
> +// Check to make sure that a worker can be attached to a toolbox
> +// and that the console works.

This comment is the same in all 3 test files, can you go into a little more detail what each test does?

::: devtools/server/main.js
@@ +861,5 @@
>            transport.ready();
>            transport.hooks = {
>              onClosed: () => {
>                if (!dbg.isClosed) {
> +                // If the worker happens to be shutting down, there is a small

Can you explain why the worker would be shutting down sometimes, and not other times?

Attachment #8777482 - Flags: review?(jlong) → review+

Eddy Bruel [:ejpbruel]

Assignee

Comment 26

•

8 years ago

This patch is turning out to be harder than I expected :-(

The tests are still leaking, because we fail to destroy the worker toolbox properly. When we try to destroy the toolbox, we hit an error in the webconsole client. There is some asynchronous bug there that causes us to destroy things either twice or in the wrong order. I am not yet sure what the problem is exactly.

I suffered a system crash today that wiped out my entire local git repository, so I probably won't be able to look into this again until monday.

Eddy Bruel [:ejpbruel]

Assignee

Comment 27

•

8 years ago

I think I finally figured out why those tests were failing. Here is another try run with those issues hopefully addressed:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4345ac00752d

Pulsebot

Comment 28

•

8 years ago

Pushed by ejpbruel@mozilla.com:
https://hg.mozilla.org/integration/fx-team/rev/c0e4639e98b0
Make sure we detach from worker client when target is destroyed. r=jlongster

Carsten Book [:Tomcat]

Comment 29

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c0e4639e98b0

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox51: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 51

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 30

•

8 years ago

Eddy, do you feel comfortable uplifting this to aurora or beta?

Flags: needinfo?(ejpbruel)

Eddy Bruel [:ejpbruel]

Assignee

Comment 31

•

8 years ago

(In reply to Ben Kelly [:bkelly] from comment #30)
> Eddy, do you feel comfortable uplifting this to aurora or beta?

I think we can do that yeah.

Flags: needinfo?(ejpbruel)

Eddy Bruel [:ejpbruel]

Assignee

Comment 32

•

8 years ago

Attached patch Uplift request — Details — Splinter Review

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]:
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: 
[String/UUID change made/needed]:

Approval Request Comment
[Feature/regressing bug #]:
No clear candidate.

[User impact if declined]:
Service workers are not resumed if you close the debugger while they are paused. This is a pretty big footgun for people using the debugger.

[Describe test coverage new/current, TreeHerder]:
browser_dbg_worker-console-01.js
browser_dbg_worker-console-01.js
browser_dbg_worker-window.js

[Risks and why]: 
Applying this patch caused some test failures because it changes the order in which certain events happen during shutdown. Those changes are all well defined, however, and the patch should have addressed all of them.

[String/UUID change made/needed]:
N/A

Attachment #8781563 - Flags: review+

Attachment #8781563 - Flags: approval-mozilla-beta?

Attachment #8781563 - Flags: approval-mozilla-aurora?

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

8 years ago

status-firefox48: --- → wontfix

status-firefox49: --- → affected

status-firefox50: --- → affected

status-firefox-esr45: --- → disabled

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

8 years ago

Keywords: regression

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 33

•

8 years ago

Comment on attachment 8781563 [details] [diff] [review]
Uplift request

Lots of new automated tests (yay!), stabilized on Nightly for a week, Aurora50+

Attachment #8781563 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 34

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/c9e56c940955

status-firefox50: affected → fixed

Liz Henry (:lizzard) (relman/hg->git project)

Comment 35

•

8 years ago

Comment on attachment 8781563 [details] [diff] [review]
Uplift request

We want service workers to keep on working for webdevs. Please uplift for the beta 6 build!

Attachment #8781563 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 36

•

8 years ago

Needs rebasing for Beta.

Flags: needinfo?(ejpbruel)

Eddy Bruel [:ejpbruel]

Assignee

Comment 37

•

8 years ago

Attached patch Rebased patch for beta. — Details — Splinter Review

This patch should do the trick for beta.

Flags: needinfo?(ejpbruel)

Attachment #8783575 - Flags: checkin?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Attachment #8783575 - Flags: checkin?(ryanvm)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 38

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/b21b7f9f4912

status-firefox49: affected → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Version: unspecified → 48 Branch

BMO Automation

Updated

•

6 years ago

Product: Firefox → DevTools

You need to log in before you can comment on or make changes to this bug.