Closed Bug 1317266 Opened 8 years ago Closed 8 years ago

registration.installing gets unexpectedly cycle collected when the SW switches states

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: catalinb, Assigned: catalinb)

References

Details

Attachments

(3 files, 4 obsolete files)

Waiting for a worker to become active using the state change event sometimes fails
because the ServiceWorker DOM object can get collected when changing states.

new Promise(function(res) {
  var worker = registration.installingWorker;
  worker.addEventListener(function() {
    if (worker.state === "active") res();
  })
});

These are two tests that show the issue. In both cases we wait for the worker to become active using the statechange event. The worker takes 10s to activate using
setTimeout.

test_worker_reference_gc.html succeeds because there's a global variable keeping
the ServiceWorker object alive

test_worker_Reference_gc_timeout.html fails because the object gets CC-ed and there
are no listeners when the last state change is finally dispatched.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=113a9b20dad82e894a6a2cfe9ad74e9f1328431a&selectedJob=30982704

I'm not sure why test_workerupdatefoundevent.html also fails, the patch only adds a couple of NS_WARNINGs and the two mochitests.
Ben, could you please look at this and confirm the issue?
Flags: needinfo?(bkelly)
Blocks: 1263304
I could be a consequence of how we do lazy tracking of the DOM ServiceWorker objects on the registration object.  While you might expect the ServiceWorkerRegistration simply holds a RefPtr<> to the installing/waiting/active worker attributes, it does not.

We only populate these attributes when a user accesses them.  This by itself is not a problem.

However, when the SW changes state we invalidate the cached RefPtr<> values forcing the DOM ServiceWorker to be looked up again the next time.  This is done in:

https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerRegistration.cpp#308

So in this case the SW has changed state and wiped its references to the SW DOM object.  If script is not holding the SW DOM object alive it could be collected at this point before the next state change event occurs.
Flags: needinfo?(bkelly)
What I don't understand in this test, however, is why the closure doesn't hold the worker alive:

add_task(function* test_worker_ref_gc() {
  let registration = yield navigator.serviceWorker.register(
    "lazy_worker.js", { scope: "./lazy_worker_scope_timeout"} )
    .then(function(registration) {
      SpecialPowers.exactGC();
      var worker = registration.installing;
      return new Promise(function(resolve) {
        worker.addEventListener('statechange', function() {
          info("CATALINB: state is " + worker.state + "\n");
          SpecialPowers.exactGC();
          if (worker.state === 'activated') {
            resolve(registration);
          }
        });
      });
});

The promise chain should be held alive by the async function yield.  The promise chain should then hold the `var worker` alive.

Till, Boris, do you have any idea how we could be collecting the DOM object referenced by `var worker` in this code?
Flags: needinfo?(till)
Flags: needinfo?(bzbarsky)
Actually, wpt tests have a 10 second timeout by default.  You are probably just racing that.  Try adding this to your test html files and regenerating the wpt manifest:

  <meta name=timeout content=long>
Flags: needinfo?(till)
Flags: needinfo?(bzbarsky)
(In reply to Ben Kelly [:bkelly] from comment #5)
> Actually, wpt tests have a 10 second timeout by default.  You are probably
> just racing that.  Try adding this to your test html files and regenerating
> the wpt manifest:
> 
>   <meta name=timeout content=long>
It's a mochitest.
Oh, these are not wpt tests.  Ignore comment 5.  Till, Boris, any thoughts on comment 4?  Sorry for the flag spam.
Flags: needinfo?(till)
Flags: needinfo?(bzbarsky)
Catalin, I would also be happy to see us change comment 3.  If someone adds a listener and we don't keep the worker ref in js explicitly the listener should still fire.  We probably should explicitly keep the DOM SW referenced by the registration once its set even if it changes state.
(In reply to Ben Kelly [:bkelly] from comment #8)
> Catalin, I would also be happy to see us change comment 3.  If someone adds
> a listener and we don't keep the worker ref in js explicitly the listener
> should still fire.  We probably should explicitly keep the DOM SW referenced
> by the registration once its set even if it changes state.

Yes, I've been working on a fix today.
Assignee: nobody → catalin.badea392
The basic setup here is this:

  (function() {
    var worker = getMeAWorker();
    worker.addEventListener("statechange", function() {
      // use "worker" here
    });
  })();

right?  So the only thing with a reference to "worker" is the function passed to addEventListener, and the only reference to that function is from the worker itself.  Our cycle collector is quite happy to collect such a cycle.  Otherwise this would leak:

  (function() {
     var div = document.createElement("div");
     div.addEventListener("a new hope", function() {
       div.setAttribute("death star", "exploded");
     });
   })();

Whatever is going to fire the statechange event should be keeping the worker object alive in this case...
Flags: needinfo?(bzbarsky)
I don't know what I could usefully add to this.
Flags: needinfo?(till)
Comment on attachment 8810831 [details] [diff] [review]
ServiceWorkerInfo keeps strong references to SW instances.

This breaks a bunch of devtools tests.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=70c215962d7a95c1eb044a3490946dbc0d4f2ecd&selectedJob=31149025
Attachment #8810831 - Flags: review?(bkelly)
Comment on attachment 8810830 [details] [diff] [review]
Test registration.installing getting collected before all statechange events are dispatched.

Review of attachment 8810830 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/test/serviceworkers/test_worker_reference_gc_timeout.html
@@ +33,5 @@
> +  let registration = yield navigator.serviceWorker.register(
> +    "lazy_worker.js", { scope: "./lazy_worker_scope_timeout"} )
> +    .then(function(registration) {
> +      SpecialPowers.exactGC();
> +      var worker = registration.installing;

Please add a comment about what is going on here.  I would mention that the `worker` ref is only kept alive by the closure held via the SW listener callback.  Unless the browser holds the DOM SW object alive this is subject to cycle collection.
Attachment #8810830 - Flags: review?(bkelly) → review+
I was breaking the QI to nsIServiceWorkerInfo.
Attachment #8810831 - Attachment is obsolete: true
Attachment #8810982 - Flags: review?(bkelly)
Comment on attachment 8810982 [details] [diff] [review]
ServiceWorkerInfo keeps strong references to SW instances.

Review of attachment 8810982 [details] [diff] [review]:
-----------------------------------------------------------------

As we discussed in IRC this would cause us to accumulate DOM ServiceWorker objects until the ServiceWorkerInfo is finally destroyed.  For a long running browser that navigates through many intercepted pages this could be significant.

Ideally it would be better to make ServiceWorkerRegistration not drop its refs to ServiceWorker objects when the state changes.
Attachment #8810982 - Flags: review?(bkelly)
Comment on attachment 8811729 [details] [diff] [review]
Transition ServiceWorker instances when the SW changes states instead of just invalidating them.

Review of attachment 8811729 [details] [diff] [review]:
-----------------------------------------------------------------

This is the right idea, but we must change the ServiceWorker.state attribute synchronously with the ServiceWorkerRegistration attributes.

Also, this is turning into a dupe of bug 1257977.  Maybe the discussion in there will help.

Thanks for tackling this!

::: dom/workers/ServiceWorkerManager.cpp
@@ +2530,5 @@
> +ServiceWorkerManager::TransitionServiceWorkerRegistrationWorker(ServiceWorkerRegistrationInfo* aRegistration,
> +                                                                WhichServiceWorker aWhichOne)
> +{
> +  AssertIsOnMainThread();
> +  nsTObserverArray<ServiceWorkerRegistrationListener*>::ForwardIterator it(mServiceWorkerRegistrationListeners);

I'm not asking you to change it here, but I find this listener setup really weird.  Maybe in the future we should store the listeners on the ServiceWorkerRegistrationInfo directly.  That way we don't have to compare scopes here.

::: dom/workers/ServiceWorkerRegistration.cpp
@@ +311,5 @@
> +ServiceWorkerRegistrationMainThread::TransitionWorker(WhichServiceWorker aWhichOne)
> +{
> +  AssertIsOnMainThread();
> +
> +  MOZ_ASSERT(!(aWhichOne & WhichServiceWorker::ACTIVE_WORKER));

I think it would be easier to move all your assertions here to the top of the method as:

  MOZ_ASSERT(aWhichOne == INSTALLING_WORKER || aWhichOne == WAITING_WORKER);

@@ +314,5 @@
> +
> +  MOZ_ASSERT(!(aWhichOne & WhichServiceWorker::ACTIVE_WORKER));
> +  if (aWhichOne & WhichServiceWorker::INSTALLING_WORKER) {
> +    MOZ_ASSERT(!(aWhichOne & WhichServiceWorker::ACTIVE_WORKER));
> +    mWaitingWorker = mInstallingWorker.forget();

So I think we have a bigger problem here.  I was going to suggest we do something like:

MOZ_ASSERT(mState == ServiceWorkerState::Installed)

But this won't work because we set the state via ChangeStateUpdater runnable.  This will run after this method is run in your current patch.

I think what we need to do instead is trigger this transition from the ChangeStateUpdater itself.  This will ensure the worker and registration attributes are changed synchronously.

The devtools callbacks can then be fired from another runnable that is dispatched after the ChangeStateUpdater is dispatched.

Incidentally this is basically bug 1257977.  I have some patches that I started there down this road, but didn't have time to finish them.
Attachment #8811729 - Flags: review?(bkelly) → review-
Comment on attachment 8812233 [details] [diff] [review]
Transition ServiceWorker instances when the SW changes states instead of just invalidating them.

Review of attachment 8812233 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments addressed.

::: dom/workers/ServiceWorkerRegistration.cpp
@@ +312,5 @@
> +{
> +  AssertIsOnMainThread();
> +
> +  if (aWhichOne == WhichServiceWorker::INSTALLING_WORKER) {
> +    mWaitingWorker = mInstallingWorker.forget();

Please assert the worker state here.  I guess we must assert that it is the old state.  Add a comment that the state is updated in a queued runnable.

::: dom/workers/ServiceWorkerRegistrationInfo.cpp
@@ +321,3 @@
>  
> +nsresult
> +ServiceWorkerRegistrationInfo::UpdateStatePropertiesRunnable::Run()

If you make this a method on ServiceWorkerRegistrationInfo like AsyncUpdateRegistrationStateProperties(), then you could just do this:

  nsCOMPtr<nsIRunnable> r = NewRunnableMethod(
    this,
    &ServiceWorkerRegistrationInfo::AsyncUpdateRegistrationStateProperties,
    aWorker, aTransition);

And you can remove UpdateStatePropertiesRunnable completely.

@@ +342,5 @@
> +  RefPtr<Runnable> runnable = new UpdateStatePropertiesRunnable(this, aWorker,
> +                                                                aTransition);
> +  MOZ_ALWAYS_SUCCEEDS(NS_DispatchToMainThread(runnable.forget()));
> +
> +  NotifyChromeRegistrationListeners();

This will fire the chrome listeners before the attributes are updated.  We probably want to ensure all attributes are updated before firing these.

So I think you should probably queue another runnable after async setting the registration attributes to fire these chrome listeners.  You have to wait until that async step to queue the runnable in order to guarantee it fires after the state change runnable.
Attachment #8812233 - Flags: review?(bkelly) → review+
Addressed comments except the change about how chrome listeners are notified.
The chrome listeners need to be called in the same task that changed
mSWInfo properties (mInstllingWorker etc). This is checked in test_serviceworkerregistrationinfo.xul.
Attachment #8812233 - Attachment is obsolete: true
(In reply to Cătălin Badea (:catalinb) from comment #22)
> Addressed comments except the change about how chrome listeners are notified.
> The chrome listeners need to be called in the same task that changed
> mSWInfo properties (mInstllingWorker etc). This is checked in
> test_serviceworkerregistrationinfo.xul.

Ok.  Can you add a comment to that effect?
Pushed by catalin.badea392@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c1343636012
Test registration.installing getting collected before all statechange events are dispatched. r=bkelly
https://hg.mozilla.org/integration/mozilla-inbound/rev/ac6b94962bf2
Transition ServiceWorker instances when the SW changes states instead of just invalidating them. r=bkelly
https://hg.mozilla.org/mozilla-central/rev/3c1343636012
https://hg.mozilla.org/mozilla-central/rev/ac6b94962bf2
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
For the setTimeout in the test, it seems like the intent of the setTimeout() is for the SW to delay activation long enough for exactGC() to complete.  Could lazy_worker.js be re-written to using postMessage() instead?

For example, in lazy_worker.js:

  onactivate = function(event) {
    var promise = new Promise(function(res) {
      self.onmessage = function() {
        res();
      }
    });
    event.waitUntil(promise);
  }

and the caller then be:

  ...
  // (we see 'installed', 'activating', 'activated')
  SpecialPowers.exactGC();
  if (worker.state === 'activating') {
    worker.postMessage({});
  } else if (worker.state === 'activated') {
    resolve(registration);
  }
Flags: needinfo?(catalin.badea392)
(In reply to Andrew Sutherland [:asuth] from comment #26)
> For the setTimeout in the test, it seems like the intent of the setTimeout()
> is for the SW to delay activation long enough for exactGC() to complete. 
> Could lazy_worker.js be re-written to using postMessage() instead?
> 
> For example, in lazy_worker.js:
> 
>   onactivate = function(event) {
>     var promise = new Promise(function(res) {
>       self.onmessage = function() {
>         res();
>       }
>     });
>     event.waitUntil(promise);
>   }
> 
> and the caller then be:
> 
>   ...
>   // (we see 'installed', 'activating', 'activated')
>   SpecialPowers.exactGC();
>   if (worker.state === 'activating') {
>     worker.postMessage({});
>   } else if (worker.state === 'activated') {
>     resolve(registration);
>   }

Yes, though, looking through the source code, I think we should use the callback that exactGC takes
to wait until the gc is done.

https://dxr.mozilla.org/mozilla-central/source/testing/specialpowers/content/specialpowersAPI.js#1465
Flags: needinfo?(catalin.badea392)
Will ride the train.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: