service worker not updating when navigation interception fails

RESOLVED FIXED in Firefox 46

Status

()

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: bkelly, Assigned: bkelly)

Tracking

(Blocks: 1 bug)

48 Branch
mozilla49
Points:
---

Firefox Tracking Flags

(firefox46+ fixed, firefox47+ fixed, firefox48+ fixed, firefox49+ fixed, firefox-esr45 disabled)

Details

(Whiteboard: btpp-active)

Attachments

(12 attachments, 2 obsolete attachments)

1.00 KB, patch
jdm
: review+
Details | Diff | Splinter Review
13.49 KB, patch
jdm
: review+
Details | Diff | Splinter Review
1.79 KB, patch
jdm
: review+
Details | Diff | Splinter Review
12.96 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
960 bytes, patch
Details | Diff | Splinter Review
13.57 KB, patch
Details | Diff | Splinter Review
1.80 KB, patch
Details | Diff | Splinter Review
5.65 KB, patch
Details | Diff | Splinter Review
960 bytes, patch
Details | Diff | Splinter Review
13.54 KB, patch
Details | Diff | Splinter Review
1.80 KB, patch
Details | Diff | Splinter Review
5.65 KB, patch
Details | Diff | Splinter Review
I have a report of a production site in a broken state on firefox due to a service worker.  See the twitter thread starting here:

  https://twitter.com/nekrtemplar/status/725012340660404224

The way this is supposed to work is that the site fixes its service worker and then an update loads the new/fixed service worker.

Unfortunately we have a couple bugs here:

1) Since its a navigation, the update should be made regardless of how long since the last update has occurred.  We don't do this, however, because the document load event doesn't fire since we ended up at about:neterror.
2) So we treat it as a non-navigation which should still trigger an update after 24 hours.  Unfortunately we don't persist the last update time and have a bug where we won't fire a time based update until an update has been run another way.  So these updates will never fire in this case.

Fixing both of these problems should be very small bugs and I'd like to uplift as far as we can go.  This is breaking a site in the wild so I'm asking to track this on the release channel.
(Assignee)

Updated

3 years ago
Blocks: 1226983
This fixes issue (2) from comment 0 by treating last update time as infinitely in the past after browser restart.  Until we actually persist the last update time its better to just trigger one extra update after restart.
Attachment #8745470 - Flags: review?(josh)
Attachment #8745470 - Flags: review?(josh) → review+
Whiteboard: btpp-active
This is a mechanical patch to pass the ServiceWorkerRegistrationInfo down to the CancelChannelRunnable.  This is setup for the next patch.
Attachment #8745541 - Flags: review?(josh)
This patch makes us trigger an update regardless of last check time whenever a network request fails due to the service worker interception.

This will make us update more frequently on problem sites, but that is much better than letting them get stuck longer in a broken state.  Also note the spec says this under the soft update algorithm:

  "The user agent may call this as often as it likes to check for updates."

This should allow sites to escape a broken service worker by shipping a new service worker script.

This works in my local testing.  I'll work on a test next.
Attachment #8745548 - Flags: review?(josh)
Attachment #8745541 - Flags: review?(josh) → review+
Attachment #8745548 - Flags: review?(josh) → review+
Last patch.  This test verifies that we can recover from a bad navigation interception.  It times out without the P1 to P3 patches.  It passes with them.

I apologize for the setTimeout(), but I wanted to verify the iframe actually failed to load.  There is no good cross-browser way to detect this other than to set an artificial timeout.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3ae53ce2091
Attachment #8745771 - Flags: review?(josh)
Hmm, splinter is showing the diff from the files I hg cp'd to make the test.  Sorry if that is confusing.
Updated the test so update-recovery-worker.py does not use the same cookie as update-worker.py.  They were not playing well together.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=25d7767245cd
Attachment #8745771 - Attachment is obsolete: true
Attachment #8745771 - Flags: review?(josh)
Attachment #8745813 - Flags: review?(josh)
We may be planning a dot release for mid next week. 
Tracking to make sure we uplift this at least to beta.
status-firefox49: --- → affected
tracking-firefox46: ? → +
tracking-firefox47: ? → +
tracking-firefox48: ? → +
tracking-firefox49: ? → +
Comment on attachment 8745813 [details] [diff] [review]
P4 Add a wpt test that verifies a service worker update can recover from a broken navigation interception. r=jdm

Review of attachment 8745813 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/web-platform/tests/service-workers/service-worker/resources/update-recovery-worker.py
@@ +7,5 @@
>      # no-cache itself to ensure the user agent finds a new version for each update.
>      headers = [('Cache-Control', 'no-cache, must-revalidate'),
>                 ('Pragma', 'no-cache')]
>  
> +    content_type = 'application/javascript'

This isn't used.

@@ +18,5 @@
> +        extra_body = "addEventListener('fetch', function(e) { e.respondWith(Promise.reject()); });"
> +    elif mode == 'bad':
> +        # When the update tries to pull the script again, update to
> +        # a worker service worker that does not break document
> +        # navigation.  Servce the same script from then on.

Serves

@@ +26,2 @@
>      # Return a different script for each access.  Use .time() and .clock() for
>      # best time resolution across different platforms.

Comment doesn't make sense for this test.
Attachment #8745813 - Flags: review?(josh) → review+
Updated wpt test per review comments.  I also changed the update-recovery-worker.py to clear its cookie when its done.  This will let the test work with run-until-failure, etc.
Attachment #8745813 - Attachment is obsolete: true
Attachment #8746046 - Flags: review+
Comment on attachment 8745470 [details] [diff] [review]
Bug 1267733 P1 Treat last-update-time as infinitely in the past when checking for service worker update after restart. r=jdm

Approval Request Comment
[Feature/regressing bug #]: Service workers.
[User impact if declined]: See comment 0.  If a site accidentally pushes a broken service worker to its users it cannot then update the service worker again.  This can effectively permanently brick their site.  We need to fix the bug preventing us from triggering a service worker update for sites in this state.
[Describe test coverage new/current, TreeHerder]: Existing mochitests and wpt tests.  New wpt test verifying this particular fix.
[Risks and why]: This is extremely low risk.  We are using an existing update mechanism and just triggering it in these particular failure cases.
[String/UUID change made/needed]: None.
Attachment #8745470 - Flags: approval-mozilla-release?
Attachment #8745470 - Flags: approval-mozilla-beta?
Attachment #8745470 - Flags: approval-mozilla-aurora?
Comment on attachment 8745541 [details] [diff] [review]
P2 Pass ServiceWorkerRegistrationInfo down to CancelChannelRunnable. r=jdm

See comment 11.
Attachment #8745541 - Flags: approval-mozilla-release?
Attachment #8745541 - Flags: approval-mozilla-beta?
Attachment #8745541 - Flags: approval-mozilla-aurora?
Comment on attachment 8745548 [details] [diff] [review]
P3 Trigger service worker update after failed interception. r=jdm

See comment 11.
Attachment #8745548 - Flags: approval-mozilla-release?
Attachment #8745548 - Flags: approval-mozilla-beta?
Attachment #8745548 - Flags: approval-mozilla-aurora?
Comment on attachment 8746046 [details] [diff] [review]
P4 Add a wpt test that verifies a service worker update can recover from a broken navigation interception. r=jdm

See comment 11.
Attachment #8746046 - Flags: approval-mozilla-release?
Attachment #8746046 - Flags: approval-mozilla-beta?
Attachment #8746046 - Flags: approval-mozilla-aurora?

Comment 15

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/75764a011b6e
https://hg.mozilla.org/mozilla-central/rev/c595e49da220
https://hg.mozilla.org/mozilla-central/rev/23da91f0f80e
https://hg.mozilla.org/mozilla-central/rev/b396a6c17381
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox49: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Do you have a sense of how widespread this issue is? Would it drive a point release on its own?
I meant to needinfo to you Ben, in comment 16.
Flags: needinfo?(bkelly)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #16)
> Do you have a sense of how widespread this issue is? Would it drive a point
> release on its own?

Right now it's only affecting one site.  It's a risk for any site implementing service worker offline with Firefox, though.  It's tough to say if it would drive it's own point release.

I guess I'd like to do a point release if people are willing to avoid sites doing a UA version check to disable service workers.
Flags: needinfo?(bkelly)
The patches apply to aurora, but need rebase for beta and release.  Working on it.
I built these beta rebased patches locally and verified the WPT tests still pass.
I locally built and tested these release branch patches as well.
Comment on attachment 8745470 [details] [diff] [review]
Bug 1267733 P1 Treat last-update-time as infinitely in the past when checking for service worker update after restart. r=jdm

P1 regression in SW, Aurora48+, Beta47+
Attachment #8745470 - Flags: approval-mozilla-beta?
Attachment #8745470 - Flags: approval-mozilla-beta+
Attachment #8745470 - Flags: approval-mozilla-aurora?
Attachment #8745470 - Flags: approval-mozilla-aurora+
Attachment #8745541 - Flags: approval-mozilla-beta?
Attachment #8745541 - Flags: approval-mozilla-beta+
Attachment #8745541 - Flags: approval-mozilla-aurora?
Attachment #8745541 - Flags: approval-mozilla-aurora+
Attachment #8745548 - Flags: approval-mozilla-beta?
Attachment #8745548 - Flags: approval-mozilla-beta+
Attachment #8745548 - Flags: approval-mozilla-aurora?
Attachment #8745548 - Flags: approval-mozilla-aurora+
Attachment #8746046 - Flags: approval-mozilla-beta?
Attachment #8746046 - Flags: approval-mozilla-beta+
Attachment #8746046 - Flags: approval-mozilla-aurora?
Attachment #8746046 - Flags: approval-mozilla-aurora+
(Assignee)

Updated

3 years ago
status-firefox48: affected → fixed
Comment on attachment 8745470 [details] [diff] [review]
Bug 1267733 P1 Treat last-update-time as infinitely in the past when checking for service worker update after restart. r=jdm

OK, let's take this for 46.0.1 so that service workers will behave correctly
Attachment #8745470 - Flags: approval-mozilla-release? → approval-mozilla-release+
Comment on attachment 8745470 [details] [diff] [review]
Bug 1267733 P1 Treat last-update-time as infinitely in the past when checking for service worker update after restart. r=jdm

I'm having cold feet, I'd like to talk with Ben or someone who can explain the impact more clearly first. Please find me on Monday morning to discuss!
Attachment #8745470 - Flags: approval-mozilla-release+ → approval-mozilla-release-
Comment on attachment 8745470 [details] [diff] [review]
Bug 1267733 P1 Treat last-update-time as infinitely in the past when checking for service worker update after restart. r=jdm

After discussion with ben about impact, approved for m-r uplift
Attachment #8745470 - Flags: approval-mozilla-release- → approval-mozilla-release+
Attachment #8745541 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #8745548 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #8746046 - Flags: approval-mozilla-release? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.