Closed Bug 1273920 Opened 8 years ago Closed 8 years ago

service worker install fails if install event is GC'd before waitUntil() promise is fulfilled

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla49

Tracking Flags:

Tracking

Status

firefox46

---

wontfix

firefox47

---

wontfix

firefox48

---

wontfix

firefox49

fixed

firefox-esr45

---

disabled

People

(Reporter: bkelly, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

(Whiteboard: btpp-active)

Attachments

(5 files, 7 obsolete files)

P2 Add mochitest that demonstrates we cancel sw install if install event is GC'd. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 4.66 KB, patch		Details \| Diff \| Splinter Review
P2 Add mochitest that demonstrates we cancel sw install if install event is GC'd. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 4.98 KB, patch		Details \| Diff \| Splinter Review
P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 9.20 KB, patch		Details \| Diff \| Splinter Review
P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 3.65 KB, patch		Details \| Diff \| Splinter Review
P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 10.59 KB, patch		Details \| Diff \| Splinter Review
P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 3.65 KB, patch	asuth : review+	Details \| Diff \| Splinter Review
P3 Add mochitest that demonstrates we cancel sw install if install event is GC'd. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 4.96 KB, patch	asuth : review+	Details \| Diff \| Splinter Review
P4 Fix bugs in dom/push/test_serviceworker_lifetime.html test. r=kitcambridge 8 years ago Ben Kelly [:bkelly, not reviewing] 2.24 KB, patch	lina : review+	Details \| Diff \| Splinter Review
P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 7.35 KB, patch	asuth : review+	Details \| Diff \| Splinter Review
P3b cleanup test to use child-cc-request observer mechanism instead of 10sec timeout 8 years ago Andrew Sutherland [:asuth] (he/him) 4.57 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth 8 years ago Ben Kelly [:bkelly, not reviewing] 3.73 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P3b Trigger cycle collector instead of using a timeout. r=bkelly 8 years ago Ben Kelly [:bkelly, not reviewing] 4.62 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Description

•

8 years ago

The wpt test register-wait-forever-in-install-worker.https.html blocks an install event waitUntil() forever.  The old spec before the queue rewrite used to explicitly kill workers in this state when register() was called again.  The new spec does not do this because we only let one job run at a time.

Somehow this test is seeing the promise fail as expected, but only after close to 10 seconds.  I need to investigate whats happening.

Till Schneidereit [:till]

Updated

•

8 years ago

Blocks: 911216

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 1

•

8 years ago

This is a bad bug!  We are failing the service worker install if the install event is GC'd before the waitUntil promises are fulfilled.  This is very racy and could lead to unexpected behavior in the wild.

Blocks: ServiceWorkers-compat

status-firefox46: --- → wontfix

status-firefox47: --- → affected

status-firefox48: --- → affected

status-firefox49: --- → affected

status-firefox-esr45: --- → disabled

Summary: register-wait-forever-in-install-worker.https.html doesn't seem to work quite right any more → service worker install fails if install event is GC'd before waitUntil() promise is fulfilled

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 2

•

8 years ago

[Tracking Requested - why for this release]:
This is a bad compat bug.  I would like to get it into FF47 if I can.  I hope to have a test and fix written today.

tracking-firefox47: --- → ?

tracking-firefox48: --- → ?

tracking-firefox49: --- → ?

Liz Henry (:lizzard) (relman/hg->git project)

Comment 3

•

8 years ago

Tracking and marking this as blocking 47 based on Ben's assessment.

tracking-firefox47: ? → blocking

tracking-firefox48: ? → +

tracking-firefox49: ? → +

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 4

•

8 years ago

Attached patch P2 Add mochitest that demonstrates we cancel sw install if install event is GC'd. r=asuth (obsolete) — Details — Splinter Review

This test demonstrates the problem.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 5

•

8 years ago

Attached patch P2 Add mochitest that demonstrates we cancel sw install if install event is GC'd. r=asuth (obsolete) — Details — Splinter Review

Attachment #8753984 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 6

•

8 years ago

Ugh.  This affects respondWith() too!

Andrew Overholt [:overholt]

Updated

•

8 years ago

Whiteboard: btpp-active

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

8 years ago

Comment 7

•

8 years ago

This is trickier to test than I thought.  If I want to gracefully teardown the test I'd like to resolve the waitUntil() promise after I've verified it doesn't get GC'd.  But if I write any code that holds on to the resolve() function to call it later, then the promise is not GC'd.  This makes sense.

It also means this issue is a lot less critical.  All useful code is going to be resolving or rejecting their waitUntil() promise.

We still probably want to fix this, though, because otherwise it makes GC observable to script.

I'm going to have to investigate further tomorrow.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 8

•

8 years ago

Attached patch P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth (obsolete) — Details — Splinter Review

Attachment #8753985 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 9

•

8 years ago

Attached patch P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth (obsolete) — Details — Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 10

•

8 years ago

Attached patch P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth (obsolete) — Details — Splinter Review

Attachment #8755023 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 11

•

8 years ago

Attached patch P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth (obsolete) — Details — Splinter Review

Attachment #8755024 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 12

•

8 years ago

Attached patch P3 Add mochitest that demonstrates we cancel sw install if install event is GC'd. r=asuth — Details — Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 13

•

8 years ago

Attached patch P4 Fix bugs in dom/push/test_serviceworker_lifetime.html test. r=kitcambridge — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=773709106b57

I'm not running try on windows since all the TC clipboard errors make it too noisy at the moment.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 14

•

8 years ago

Comment on attachment 8755080 [details] [diff] [review]
P4 Fix bugs in dom/push/test_serviceworker_lifetime.html test. r=kitcambridge

Kit, this fixes a bug in the test_serviceworker_lifetime.html test where it was trying to call waitUntil() asynchronously.  It must be called synchronously in the event handler.

Attachment #8755080 - Flags: review?(kcambridge)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 15

•

8 years ago

I don't think we should uplift this any more.  Any useful service worker code is actually going to handle this just find because the promise is held alive via other means.  Its pretty much only a problem for stuff like:

  evt.waitUntil(new Promise(function() { }));

Which isn't very useful in practice.

Since the impact is much less than I originally thought I don't want to uplift this anymore.

status-firefox47: affected → wontfix

status-firefox48: affected → wontfix

tracking-firefox47: blocking → ---

tracking-firefox48: + → ---

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 16

•

8 years ago

Comment on attachment 8755078 [details] [diff] [review]
P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth

This fixes the wpt test to expect the new spec behavior.  In the past the spec would attempt to kill an installing worker when a new register() call for the scope is made.

We now run register jobs sequentially.  The second register job should not run until the first job is timed out.

I modified the test to wait some reasonable time before gracefully ending the first install.  It seems the easiest way to demonstrate that the second register does not abort the first.

Relevant spec starts here:

https://slightlyoff.github.io/ServiceWorker/spec/service_worker/#navigator-service-worker-register

Attachment #8755078 - Flags: review?(bugmail)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 17

•

8 years ago

Comment on attachment 8755079 [details] [diff] [review]
P3 Add mochitest that demonstrates we cancel sw install if install event is GC'd. r=asuth

This adds a test that shows we incorrectly abort the SW install if the promise passed to evt.waitUntil() is GC'd.

I would have preferred to perform an exact CC/GC, but we don't have the infrastructure to do this for a worker JS context.  I filed bug 1274100 about this.  Instead I had to use a rather lame 10 second timeout.

This does consistently trigger the problem locally for me without the P1 patch.

Attachment #8755079 - Flags: review?(bugmail)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 18

•

8 years ago

Comment on attachment 8755077 [details] [diff] [review]
P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth

This patch will hold a strong ref to the promise passed to evt.waitUntil() or evt.respondWith() until the promise is fulfilled.

It does this by having the KeepAliveHandler ref the promise and the ServiceWorkerPrivate ref the KeepAliveHandler.  We use the ServiceWorkerPrivate StoreISupports() mechanism for this second ref.  These refs are automatically dropped if the service worker is timed out and killed.

The patch has added complication because the Promise can only be AddRef()/Release()'d on the worker thread.  The ServiceWorkerPrivate StoreISupports(), however, only works from the main thread.

This is mostly straightforward if the promise resolves normally, but gets harder if the worker terminated and the KeepAliveHandler is destroyed on the main thread.  To handle this case we use a WorkerControlRunnable to release the Promise.  We also hold a feature alive to make sure we can dispatch the runnable.

Attachment #8755077 - Flags: review?(bugmail)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 19

•

8 years ago

Comment on attachment 8755077 [details] [diff] [review]
P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth

Review of attachment 8755077 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/ServiceWorkerPrivate.cpp
@@ +388,5 @@
> +  void
> +  ReleaseOnMainThread()
> +  {
> +    AssertIsOnMainThread();
> +    mKeepAliveToken->GetServiceWorkerPrivate()->RemoveISupports(this);

It occurs to me we should probably explicitly drop the mKeepAliveToken here.  This could avoid an unnecessary main thread proxy runnable to release the token if something on the worker thread still has a ref to the KeepAliveHandler.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 20

•

8 years ago

Comment on attachment 8755077 [details] [diff] [review]
P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth

Actually, I think I can simplify this a lot.  It creates a cycle ref with the promise, so we shouldn't need the StoreISupports hassle.

Attachment #8755077 - Flags: review?(bugmail)

Till Schneidereit [:till]

Comment 21

•

8 years ago

(In reply to Ben Kelly [:bkelly] from comment #18)
> Comment on attachment 8755077 [details] [diff] [review]
> P1 Hold strong reference to service worker WaitUntil() promise until its
> fulfilled. r=asuth

How would this work with JS Promise? Those aren't CC'd, and we don't keep the wrappers alive, so it seems like this mechanism wouldn't transfer directly.

Flags: needinfo?(bkelly)

Lina Butler [:lina]

Updated

•

8 years ago

Attachment #8755080 - Flags: review?(kcambridge) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 22

•

8 years ago

(In reply to Till Schneidereit [:till] from comment #21)
> (In reply to Ben Kelly [:bkelly] from comment #18)
> > Comment on attachment 8755077 [details] [diff] [review]
> > P1 Hold strong reference to service worker WaitUntil() promise until its
> > fulfilled. r=asuth
> 
> How would this work with JS Promise? Those aren't CC'd, and we don't keep
> the wrappers alive, so it seems like this mechanism wouldn't transfer
> directly.

You are saying if c++ code holds its dom::Promise reference alive the promise can still be GC'd?  And this will drop references to the native callbacks?

If the out js promise is GC'd before it's fulfilled, does it reject the inner dom Promise?

Flags: needinfo?(bkelly) → needinfo?(till)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 23

•

8 years ago

Attached patch P1 Hold strong reference to service worker WaitUntil() promise until its fulfilled. r=asuth — Details — Splinter Review

The simplified patch works as well.  Its much easier to understand since we don't need to mess with main thread at all.

This try build (with separate interdiff patch before folding together) is green:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0731ae47c13c

Till, I also ran my tests P3 test locally with spidermonkey promises.  The test failed without this P1 patch and passed with the P1 patch.  In the end it does not really matter if the outer JS promise is GC'd as long as the inner C++ Promise is kept alive.

Attachment #8755077 - Attachment is obsolete: true

Flags: needinfo?(till)

Attachment #8755161 - Flags: review?(bugmail)

Comment hidden (Intermittent Failures Robot)

15 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* try: 15

Platform breakdown:
* linux64: 6
* osx-10-10: 4
* linux32: 4
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1273920&startday=2016-05-16&endday=2016-05-22&tree=all

Andrew Sutherland [:asuth] (he/him)

Comment 25

•

8 years ago

Attached patch P3b cleanup test to use child-cc-request observer mechanism instead of 10sec timeout (obsolete) — Details — Splinter Review

It turns out bug 927740 gave us the gift of the "child-cc-request" observer notification that, when emitted:
- in the parent, dispatches an async message to children to re-emit it
- in the child, causes a CycleCollectorRunnable to be dispatched to all workers

Here's a patch that you can use or scavenge as appropriate.  It causes failures without the P1 fix and passes with the P1 fix.  There are comments in there about how it will break when the serviceworkers start getting spawned in a different process and a hint about how to fix it.  I'm assuming the ordering barrier postMessage will not get bounced off the parent process's main thread when that happens, so I think the attempt would be broken at this time.

Attachment #8755691 - Flags: feedback?(bkelly)

Andrew Sutherland [:asuth] (he/him)

Updated

•

8 years ago

Attachment #8755161 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Comment 26

•

8 years ago

Comment on attachment 8755078 [details] [diff] [review]
P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth

Review of attachment 8755078 [details] [diff] [review]:
-----------------------------------------------------------------

r=asuth assuming we're just going with a comment and the dump() calls removed.

::: testing/web-platform/tests/service-workers/service-worker/register-wait-forever-in-install-worker.https.html
@@ +24,5 @@
> +            return;
> +          }
> +          registration.installing.postMessage('STOP_WAITING');
> +          resolve();
> +        }, 2000);

If I understand correctly, the motivation behind the timeout is that the Job Queue abstraction is not not exposed to content so it's hard to know whether the other register() call has:
- A: actually been processed and the determination made to queue and not abort the bad one, or
- B: is still making its way to that processing logic and will do the incorrect thing when it gets there.

The timeout is a hack/compromise.

A more appealing hack if it's possible might be to leverage that "A user agent must maintain a separate job queue for each service worker registration keyed by its scope url."  In other words, do:
- register(bad, '/maxscope')
- register(good, '/maxscope')
- register(cleverInference, '/maxscope/subscope').

The idea would be that the cleverInference worker is able to start installing itself in parallel to "bad" and that all registrations go through a single funnel point which is the processing stage of interest to us.  If cleverInference has begun installing (which we detect by it messaging us), then "good" must have been processed already and enqueued.

It's possible this idea is already forbidden by spec, is wrong about how service workers work, or makes unacceptable assumption about unspecified behavior.  But I think we all hate timeouts, so it's worth proposing.  I can do more research if it doesn't seem nuts but needs more investigation.

If a better hack isn't possible, I think it'd be great to have a comment here about the timeout really being the only option and how even my crazy proposal was worse/impossible.

@@ +41,3 @@
>          })
> +      .then(function() {
> +          dump(registration.installing);

Suspect these dump() calls were accidentally left in here.  If intentional, convert from non-standard dump() to something more idiomatic.

Attachment #8755078 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Updated

•

8 years ago

Attachment #8755079 - Flags: review?(bugmail) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 27

•

8 years ago

Attached patch P2 Fix register-wait-forever-in-install-worker.https.html to expect new unified service worker job queue behavior. r=asuth — Details — Splinter Review

Good idea!  That's totally allowed per the spec since each scope gets its own job queue.

Attachment #8755078 - Attachment is obsolete: true

Attachment #8755984 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 28

•

8 years ago

Comment on attachment 8755691 [details] [diff] [review]
P3b cleanup test to use child-cc-request observer mechanism instead of 10sec timeout

Looks good to me!  I'll just land it as a separate P3b patch.

Attachment #8755691 - Flags: feedback?(bkelly) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 29

•

8 years ago

Attached patch P3b Trigger cycle collector instead of using a timeout. r=bkelly — Details — Splinter Review

I had to add a child-gc-request before and after the CC request in order to trigger the failure for SPIDERMONKEY_PROMISE builds.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bb1cae06858

Attachment #8755691 - Attachment is obsolete: true

Attachment #8755997 - Flags: review+

Pulsebot

Comment 30

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5e7f47c7f2b9
https://hg.mozilla.org/integration/mozilla-inbound/rev/be3124589d82
https://hg.mozilla.org/integration/mozilla-inbound/rev/013b6cecd6ea
https://hg.mozilla.org/integration/mozilla-inbound/rev/36ed89b1851c
https://hg.mozilla.org/integration/mozilla-inbound/rev/3313a2e169d1

Carsten Book [:Tomcat]

Comment 31

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5e7f47c7f2b9
https://hg.mozilla.org/mozilla-central/rev/be3124589d82
https://hg.mozilla.org/mozilla-central/rev/013b6cecd6ea
https://hg.mozilla.org/mozilla-central/rev/36ed89b1851c
https://hg.mozilla.org/mozilla-central/rev/3313a2e169d1

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox49: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla49

You need to log in before you can comment on or make changes to this bug.