Closed Bug 1226443 Opened 9 years ago Closed 9 years ago

delay service worker updates triggered by events

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox43 --- wontfix
firefox44 --- affected
firefox45 --- fixed

People

(Reporter: bdahl, Assigned: bkelly)

References

(Blocks 1 open bug, )

Details

Attachments

(6 files, 12 obsolete files)

8.29 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
1.84 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
12.84 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
1.27 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
1.45 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
1.25 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
STR

1) Load https://adriftwith.me/sandbox/service_worker_reload/go.html
2) Reload the page
3) Reload again

Expected:
On the second reload there should be a message "new ServiceWorker installed"

Actual:
The service worker is never updated

In chrome I see the expected behavior above. I'm not exactly sure what the spec expects use to do, but not updating the service worker on reload makes development more painful, since you either have to trigger a navigation event or do a registration.update() on every page to get the new service worker.
I'm looking at this.

Note that the console.log() for the install event does appear in the webconsole.
Brendan, can you change your code to look for registration.installing || registration.waiting?  Per the spec we go to waiting immediately if there is not already a waiting worker.  This effectively races with the dispatch of the onupdatefound.
Flags: needinfo?(bdahl)
Nevermind, your script would be throwing if that was the case.
Flags: needinfo?(bdahl)
Brendan, it seems your code assumes the .ready promise will always resolve before the onupdatefound event.  Its late at night here, but I don't think the spec guarantees that.

Can you try placing your onupdatefound event on the registration returned from service worker .register()?

Or can you point me at what synchronizes these two events in the spec such that activation will always occur prior to starting the installation of the new service worker script?

I'll look more tomorrow as well.
Flags: needinfo?(bdahl)
To clarify, though, the service worker is updating on refresh in current nightly.  Your just missing the onupdatefound event.
Some other information for what I saw:

* Refreshing your online site I did see the "new ServiceWorker installed" text maybe 20% of the time.
* When I copied the site locally and used a node.js script to generate the sw.js timestamps I saw the "new ServiceWorker installed" text 100% of the time.

I think this behavior also suggests a race between the ready promise and the onupdatefound.
Actually, this is spec'd.

The spec says:

1) Handle the fetch for the navigation
2) If the active worker is still in activating, wait until activated
3) fire fetch event
4) do update

The ready promise fires when the worker becomes activated in (2).  The onupdatefound is triggered in (4).

In gecko, however, we don't currently wait for the active worker to go to activated.

I agree this is a problem for developers.  We should implement bug 1226441.
Depends on: 1226441
Flags: needinfo?(bdahl)
Thanks bdahl and bkelly for digging in!

I actually had the same behaviour yesterday and worked around it by listening to the 'statechange' event and waiting for the old worker to be redundant: https://github.com/mozilla/platatus/pull/162/files#diff-99a85c3932c0670278158f38d818e0efR21 . Glad to hear that this is spec'd!
Brendan, Harald, its possible this fix won't make it into 44.  Do you think its bad enough for developers to block the release?  Again, the updates do occur, but we fire the ready promise late.
Flags: needinfo?(hkirschner)
Flags: needinfo?(bdahl)
After sleeping on it I think we need to do something for this before shipping.
Flags: needinfo?(hkirschner)
Flags: needinfo?(bdahl)
So, I have a patch in bug 1226441 that makes fetch event wait for the activate event to complete.  It even has a new wpt test demonstrating that it works.

It doesn't, however, make this demo site work more consistently.  I'm investigating.
Ok, I think this is a spec bug:

  https://github.com/slightlyoff/ServiceWorker/issues/788

We should probably wait to trigger the update until after the target document has loaded.  Patch forthcoming.
Another good reason to delay the update is to avoid regression document load times with extra main thread runnables.
Ehsan, welcome back!  Have a review..

I tested this locally and confirmed it make the demo act consistent on each refresh.  In addition, we match chrome's behavior here (although they delay even more).

I opened a spec bug to try to get this standardized:

  https://github.com/slightlyoff/ServiceWorker/issues/788

The one question I am unsure of is if DOMContentLoaded will fire if the page errors out and goes to about:neterror.  If it does, then we can probably remove the failure case that triggers an update in ServiceWorkerPrivate fetch event dispatch path.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Attachment #8690349 - Flags: review?(ehsan)
(In reply to Ben Kelly [:bkelly] from comment #14)
> The one question I am unsure of is if DOMContentLoaded will fire if the page
> errors out and goes to about:neterror.  If it does, then we can probably
> remove the failure case that triggers an update in ServiceWorkerPrivate
> fetch event dispatch path.

I just confirmed that DOMContentLoaded does not fire in this case.  So we do need the ServiceWorkerPrivate code.

In other words, the patch does not need to change again.
Investigating try failures.  The test_skip_waiting failure is a bit mysterious at first glance...
Slight modification not to use SWM if the nsIServiceWorkerManager is not available.
Attachment #8690349 - Attachment is obsolete: true
Attachment #8690349 - Flags: review?(ehsan)
Attachment #8690564 - Flags: review?(ehsan)
I believe the test_skip_waiting.html failures were due to a race in the spec.  I describe it here:

  https://github.com/slightlyoff/ServiceWorker/issues/789

Basically, the navigation update was racing with a .register(differentScriptURL) on the page.  With the right timing the navigation update would wipe the new script URL specified in the .register().

I believe we should just skip the update in this case as its clear a registration change is already in process.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cd008dd3cbd
Attachment #8690565 - Flags: review?(ehsan)
So I think we were passing the tests in unregister-then-register-new-script.https.html by accident due to our update timing and the dubious reset of the registration script URL by the update.

We really have a bug in our .register() method as described in bug 1227007.  We are running some code async which should really be sync.  This causes us not to clear the uninstalling flag before the frame.remove() in these tests.

This is very corner casey, though, so I propose we mark these expected fail for now.  I think its more important to run updates consistently when the document can see the updatefound event.
Attachment #8690607 - Flags: review?(ehsan)
Unfortunately still some intermittent orange in the try.  Timeouts in skip-waiting.https.html and a crash in test_sanitize_domain.html.  Unfortunately I've run out of time to work this before my PTO.  If anyone wants to track these down, that would be great.  Otherwise I will look when I get back on Nov 30.
Comment on attachment 8690565 [details] [diff] [review]
P2 Don't reset service worker registration script URL during soft update. r=ehsan

Since this isn't going to land yet, I think we should probably just do the full change described in bug 1227015 instead of this P2 patch.
Attachment #8690565 - Flags: review?(ehsan)
Depends on: 1227015
It would be nice to get this in 44, but I don't think it should block.  It is a company issue with chrome, though.
Comment on attachment 8690564 [details] [diff] [review]
P1 Delay navigation update until after DOM content is loaded. r=ehsa

Review of attachment 8690564 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/ServiceWorkerManager.cpp
@@ +3058,5 @@
> +
> +  // We perform these success path navigation update steps when the
> +  // document tells us its more or less done loading.  This avoids
> +  // slowing down page load and also lets pages consistently get
> +  // updatefound events when they fire.

Can we delay this even further?  Like a second after the document has been loaded, similar to what Chrome does?

(I'm r+ing this as it fixes what the bug was about, but I really think we should delay more.  Do you mind doing that in a follow-up, please?)
Attachment #8690564 - Flags: review?(ehsan) → review+
(In reply to :Ehsan Akhgari from comment #25)
> (I'm r+ing this as it fixes what the bug was about, but I really think we
> should delay more.  Do you mind doing that in a follow-up, please?)

Thanks.  I might do it here if it lets me sidestep some of the races in the spec similar to what chrome does now.  I'll look at it this week.
Attachment #8690607 - Flags: review?(ehsan) → review+
I'm disabling some update wpt tests in bug 1230164.  They navigation and update race which makes the test flakey.  We should re-enable in this bug.
Summary: Run service worker soft update on refresh → delay service worker updates triggered by events
Attached patch work in progress (obsolete) — Splinter Review
A work-in-progress to change how the delay is done.

I think it would be better to mark a registration as needing up and then scheduler a timer.  Any update requests cancel and reschedule the timer further out.  When the timer finally fires it does just a single update.

Scheduling the timer will occur at document load time and at functional event time.  We will track whether the update should check 24 hour time or not.  For example, if a navigation and functional event both schedule updates in the same timer window, then no 24 hour check is done because the navigation takes precedence.

This will supersede the P1 patch.  P2 is no longer needed after bug 1227015.
Attachment #8690564 - Attachment is obsolete: true
Attachment #8690565 - Attachment is obsolete: true
Attachment #8690607 - Attachment is obsolete: true
Attachment #8696092 - Attachment is obsolete: true
Updated to explicitly cancel any update timers when a registration is cleared.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=08ec2084e8a4
Attachment #8697115 - Attachment is obsolete: true
Updated to cleanup the timers on browser shutdown.

The try looked rather green except for the leak, so flagging for review.
Attachment #8697138 - Attachment is obsolete: true
Attachment #8697159 - Flags: review?(ehsan)
One more version that checks mShuttingDown when a timer fires.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=57dd44c3e200
Attachment #8697159 - Attachment is obsolete: true
Attachment #8697159 - Flags: review?(ehsan)
Attachment #8697163 - Flags: review?(ehsan)
Sorry for all the flag spam.  I realized we should also avoid adding new timers if mShuttingDown is true.
Attachment #8697163 - Attachment is obsolete: true
Attachment #8697163 - Flags: review?(ehsan)
Attachment #8697165 - Flags: review?(ehsan)
Attachment #8697165 - Flags: review?(ehsan)
Sorry again.  I realized I wasn't checking for the 24 hour threshold when the timer fired for a purely function event update.

I'm not going to change this any more until after review.  Sorry again for the churn.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dff9255d3eac
Attachment #8697165 - Attachment is obsolete: true
Attachment #8697167 - Flags: review?(ehsan)
Comment on attachment 8697167 [details] [diff] [review]
P1 Add a timer based mechanism for firing service worker updates. r=ehsan

Hmm, there is still a shutdown leak somehow.  Sorry, I have to investigate that.
Attachment #8697167 - Flags: review?(ehsan)
Attachment #8697167 - Attachment description: bug1226443_p1_addtimedupdate.patch → P1 Add a timer based mechanism for firing service worker updates. r=ehsan
It appears that we are trying to run an update during the start of shutdown where the channel AsyncOpen() fails.  This causes the ServiceWorkerScriptCache init to fail.  This should be fine, except ServiceWorkerScriptCache leaks all over the place in this condition.

This patch fixes those leaks.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a92c839721ab
This patch makes us always fire the update timer 1 second after the first event triggers an async update.  This ensures that we always do the update regardless of how many events are being received.

The original P1 patch would reschedule the timer to fire 1 second after the last event is received.  In some ways that is nicer, but it can lead to the update being prevented from firing forever if continuous events are being received.  Given that MessageEvent can be easily triggered programmatically I think we should avoid this possibility.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee79e8923c9e
Attachment #8697345 - Attachment description: bug1226443_p5_firstdelay.patch → P5 Always use first scheduled update timer instead of rescheduling on new events. r=ehsan
Attachment #8697167 - Flags: review?(ehsan)
Attachment #8697336 - Flags: review?(ehsan)
Attachment #8697345 - Flags: review?(ehsan)
Try is looking green.
Attachment #8697161 - Flags: review?(ehsan) → review+
Attachment #8697336 - Flags: review?(ehsan) → review+
Attachment #8697167 - Flags: review?(ehsan) → review+
Attachment #8697160 - Flags: review?(ehsan) → review+
Attachment #8697345 - Flags: review?(ehsan) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c873567c2898 since I didn't have any way to tell whether this was innocent and just didn't save bug 1227015 from its faults, or introduced failures in serviceworker web-platform-tests itself.
Attachment #8697466 - Flags: review?(ehsan)
A new try with some additional patches addressing various wpt test issues.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=831fb5b3acec
Attachment #8697466 - Flags: review?(ehsan) → review+
The try in comment 48 is green across a large number of retriggers.  I'm going to reland.
Depends on: 1232558
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: