Closed Bug 1448141 Opened 6 years ago Closed 6 years ago

fix service worker ScriptLoader issues

Categories

(Core :: DOM: Service Workers, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- disabled
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

Attachments

(5 files, 12 obsolete files)

4.62 KB, patch
asuth
: review+
Details | Diff | Splinter Review
18.08 KB, patch
asuth
: review+
Details | Diff | Splinter Review
1.00 KB, patch
baku
: review+
Details | Diff | Splinter Review
19.11 KB, patch
asuth
: review+
Details | Diff | Splinter Review
7.71 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
Splitting this off from bug 1406547 comment 19.  Currently if ScriptLoader cannot find a service worker's scripts in CacheStorage it will just load them over the network and store them again.  This is a bit bugged because:

1. The registration should not exist any more if quota is wiped by QM. (bug 1183245)
2. The service worker script may change without triggering new install/activate events.  This could break the site.
3. The ScriptLoader fallback code is buggy.  For example in bug 1406547 we discovered it allows scripts to be redirected, which is not correct for service workers.

Rather than do this broken fallback we will instead unregister the service worker in this case.  This will emulate what should happen when a QM wipe occurs.
Priority: -- → P2
Hmm... apparently we use this "fallback" path for the initial installation of the scripts as well.  The offlining in ServiceWorkerScriptCache is only for the update path.

So there are a couple problems here.

1. We don't block redirects like we're supposed to in this path AFAICT.
2. We need to support initial installation, but not reloading the script later.
Summary: unregister service worker if ScriptLoader cannot find its scripts in CacheStorage → fix service worker ScriptLoader issues
https://treeherder.mozilla.org/#/jobs?repo=try&revision=41eb0ad43974ed9a3a684b930ce94c31285988dc

I still need to add some logic to unregister the service worker if we hit the network when we're not first evaluating.
I think the best thing to do here:

1. Let the fetch event fail and reset to network.
2. The update should then run.
3. If the update fails, then clear the registration.

So I will add logic in the update code to clear the registration if it fails when the main script is missing from the old cache.
This avoids the crash in bug 1406547, but runs afoul the "removed a registration controlling clients" assertion.
Blocks: 1406547
Perhaps we can avoid the most common situation by moving this code:

https://searchfox.org/mozilla-central/rev/56274d0a016a6833e5da7770ce70580b6f5abb21/dom/serviceworkers/ServiceWorkerManager.cpp#2410-2474

Right now this code starts controlling the reserved/initial client before we realize the service worker is corrupted.  If we move this later, then we can avoid controlling the client at all.

On the other hand, that won't help if there are other windows already open with the given service worker controller.  I think that situation is unlikely, though, given if there are other windows already controlled then the service worker is most likely *not* corrupted.

So I will try moving the "control this loading client" code to after ServiceWorkerPrivate determines that things are ok to proceed.

Also, I need to write a test for the one method I have to reproduce this crash.
I investigated a bit more why we still need the code in ScriptLoader to store some scripts.  I was having trouble understanding why ServiceWorkerScriptCache was inadequate.

The answer is that ServiceWorkerScriptCache always offlines the main script.  It does know about any new importScripts(), however, because the script must be evaulated to detect those.  It only offlines importScripts() it has seen from previous versions of the script.

So the ScriptLoader code to store in cache API is only really used for importScripts.
We don't need to block redirecting service worker scripts in ScriptLoader since we should never be loading them over network there.
Attachment #8961919 - Attachment is obsolete: true
Attachment #8962356 - Attachment is obsolete: true
I tried doing comment 12, but its turning out to be quite thorny.  There are a number of paths we have to handle and ScriptLoader doesn't do exactly what we want on error.

I'm going to investigate the "force clients to be uncontrolled" approach instead.  I don't like it as much since it means doing something sites don't expect.  Its also possible for us to accidentally abuse it in the future.  It might be the most reasonable thing to do for now, though.
Actually, I think I can just loosen the assertion if we are in the "forced uninstall due to corrupt storage" state.  That seems to work ok.
Just need to write some tests now.
Comment on attachment 8962394 [details] [diff] [review]
P1 Only load service worker importScripts() from the network in ScriptLoader on first evaluation. r=asuth

Andrew, we currently have two different ways we load and store service worker scripts for offline usage:

1. The ServiceWorkerScriptCache code loads/stores the main script and any importScripts() it knows about.  That is, any importScripts() seen in the last version of the registration.
2. The ScriptLoader also loads/stores importScripts().  It does this for importScripts() that ServiceWorkerScriptCache hasn't seen yet.  Basically, we do it here because we need the ScriptLoader to be evaluating the script to find the importScripts() call.

The problem is that the ScriptLoader path in (2) could also be triggered for main scripts if the CacheStorage data went missing.  This was bad for a couple reasons.  First, the new script could be different and not trigger install/activate events.  Second, this path did not protect against redirects.

This patch closes off the ScriptLoader (2) path for main scripts and importScripts() after the initial installation phase.
Attachment #8962394 - Flags: review?(bugmail)
Comment on attachment 8962739 [details] [diff] [review]
P2 Uninstall the service worker registration if update fails while its scripts are missing from offline storage. r=asuth

So the P1 patch causes us to fail to load a broken service worker script.  What do we do then?

Well, every navigation will trigger an update whether it fired a FetchEvent or not.  With that in mind we can handle the problem during update.

This patch causes us to note if the old scripts are missing.  If they are we set a flag indicating that we're in a bad state.  If the update can get the new script then we allow it to install.  In addition, we force mark it as skip-waiting to get rid of the broken service worker as soon as possible.

If the update fails and we've set our "OMG its broken" flag, then we are in a bad place.  For lack of anything better to do we force uninstall the registration.
Comment on attachment 8962739 [details] [diff] [review]
P2 Uninstall the service worker registration if update fails while its scripts are missing from offline storage. r=asuth

See comment 24.
Attachment #8962739 - Flags: review?(bugmail)
Comment on attachment 8962875 [details] [diff] [review]
P3 Add a browser chrome test verifying service workers handles its scripts being wiped. r=asuth

This patch adds a test that verifies:

1. We trigger a real update if the underlying scripts went missing and we had to reinstall.
2. We remove the registration if the update fails.
3. A redirect will not succeed as an update.
Attachment #8962875 - Flags: review?(bugmail)
Attachment #8962876 - Attachment is obsolete: true
The test here is hitting bug 1447300 on the verify run.  I guess I'll fix that as well.
Depends on: 1447300
I'd like to get this into 60 beta if I can.  The sooner we can catch these corrupted storage issues the better.
Comment on attachment 8962394 [details] [diff] [review]
P1 Only load service worker importScripts() from the network in ScriptLoader on first evaluation. r=asuth

Review of attachment 8962394 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent comments, thank you.
Attachment #8962394 - Flags: review?(bugmail) → review+
Comment on attachment 8962739 [details] [diff] [review]
P2 Uninstall the service worker registration if update fails while its scripts are missing from offline storage. r=asuth

Review of attachment 8962739 [details] [diff] [review]:
-----------------------------------------------------------------

Really great comments in here too.

::: dom/serviceworkers/ServiceWorkerUpdateJob.h
@@ +13,5 @@
>  namespace mozilla {
>  namespace dom {
>  
> +namespace serviceWorkerScriptCache {
> +enum class OnFailure : uint8_t;

This is a weird thing to forward declare given how tiny ServiceWorkerScriptCache.h is (it only includes nsString.h, etc.), but okay.  If the intent is to hide the actual enum values from this file and its includers, please do say that in a comment though.
Attachment #8962739 - Flags: review?(bugmail) → review+
Comment on attachment 8962875 [details] [diff] [review]
P3 Add a browser chrome test verifying service workers handles its scripts being wiped. r=asuth

Review of attachment 8962875 [details] [diff] [review]:
-----------------------------------------------------------------

Fantastic, very readable and well-commented test.  I suggest a few additional comments for my own laziness in the future.

::: dom/serviceworkers/test/browser_storage_recovery.js
@@ +1,2 @@
> +"use strict";
> +

I suggest adding a file-level block comment up here before the constants that explains things a bit more, like:

This test registers a SW for a scope that will never control a document and therefore never trigger a "fetch" functional event that would automatically attempt to update the registration.  The overlap of the PAGE_URI and SCOPE is incidental.  checkForUpdate is the only thing that will trigger an update of the registration and so there is no need to worry about Schedule Job races to coalesce an update job.

@@ +12,5 @@
> +    return !!reg.installing;
> +  });
> +}
> +
> +async function wipeStorage(u) {

I suggest adding a block comment like as follows.  I do think your line comments before the invocations are great for experts, but generally worry about code like this getting cargo-culted without understanding:

Delete all of our chrome-namespace Caches for this origin, leaving any content-owned caches in place.  This is exclusively for simulating loss of the origin's storage without loss of the registration and without having to worry that future enhancements to QuotaClients/ServiceWorkerRegistrar will break this test.  If you want to wipe storage for an origin, use QuotaManager APIs

::: dom/serviceworkers/test/storage_recovery_worker.sjs
@@ +14,5 @@
> +    setState("redirect", "true");
> +  }
> +
> +  response.setHeader("Content-Type", "application/javascript");
> +  response.write("");

For byte-comparison paranoia purposes, would it better for the body to be "// I am not empty.js!" or something like that (since empty.js is an empty string too)?
Attachment #8962875 - Flags: review?(bugmail) → review+
Comment on attachment 8963158 [details] [diff] [review]
P4 Immediately start closing worker scripts if their initial compilation fails. r=baku

Andrea, this causes us to start closing the worker thread when we notice the initial worker script failed to compile.
Attachment #8963158 - Flags: review?(amarchesini)
Comment on attachment 8963341 [details] [diff] [review]
P5 Remove the service worker script load failure runnable. r=asuth

Andrew, this removes the fail runnable from the service worker startup path.  It turns out its really not needed and is causing duplicate callbacks.  If the script fails to compile the thread is still started and our various WorkerRunnables still get dispatched there.  They can then notice the worker thread is shutting down themselves and report back their own callback.
Attachment #8963341 - Flags: review?(bugmail)
I need to tweak the test to explicitly disable the script redirecting.  I think the --verify test runs are confusing the .sjs state handling a bit somehow.
(In reply to Andrew Sutherland [:asuth] from comment #34)
> For byte-comparison paranoia purposes, would it better for the body to be
> "// I am not empty.js!" or something like that (since empty.js is an empty
> string too)?

This shouldn't be necessary because redirects are an error.
Those last TV failures seem to have been due to the redirect being cached in http cache.
Attachment #8963650 - Attachment is obsolete: true
Attachment #8963711 - Flags: review+
Attachment #8963341 - Flags: review?(bugmail) → review+
Attachment #8963158 - Flags: review?(amarchesini) → review+
(In reply to Andrew Sutherland [:asuth] from comment #32)
> > +namespace serviceWorkerScriptCache {
> > +enum class OnFailure : uint8_t;
> 
> This is a weird thing to forward declare given how tiny
> ServiceWorkerScriptCache.h is (it only includes nsString.h, etc.), but okay.
> If the intent is to hide the actual enum values from this file and its
> includers, please do say that in a comment though.

I'd like to leave this as is.  We forward declare enums in other places in the tree.  Even if the header is small today, it will get more stuff added in the future. Forward declaring whenever possible is about de-coupling header dependencies and I think is a good thing as a general principal.
Give risk from P4 and P5 I'd prefer to just let this ride the trains.
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0b60b1f171f2
P1 Only load service worker importScripts() from the network in ScriptLoader on first evaluation. r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a78a425ee42
P2 Uninstall the service worker registration if update fails while its scripts are missing from offline storage. r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/e1a568d82d1c
P3 Add a browser chrome test verifying service workers handles its scripts being wiped. r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb8d4e3b8d01
P4 Immediately start closing worker scripts if their initial compilation fails. r=baku
https://hg.mozilla.org/integration/mozilla-inbound/rev/c22501d278b4
P5 Remove the service worker script load failure runnable. r=asuth
Depends on: 1450874
No longer depends on: 1447300
Depends on: 1465670
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: