Closed Bug 1424300 Opened 6 years ago Closed 6 years ago

investigate why the service worker job queue is stuck for twitter in bug 1409115

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- wontfix
firefox58 + fixed
firefox59 + fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

(Keywords: regression)

Attachments

(3 files, 2 obsolete files)

We are seeing leaks on twitter in bug 1409115.  This is because the service worker register() promise is never resolving because the job queue is stuck.  While Andrew is working on a fix to avoid leaking through promises like this, we should also figure out what is breaking the SW job queue.

Also see bug 1424299 for adding a timeout and report mechanism for stuck job queues.

I have a broken profile that shows this behavior, so I will investigate in the next week or two.
I think I figured this out.  The old cache exists, but is empty.  So we never perform any fetches here:

https://searchfox.org/mozilla-central/rev/9d920555ec81f1c9d4b7fa3b08e23eb88efb60e1/dom/workers/ServiceWorkerScriptCache.cpp#471
Blocks: 1290951
Comment on attachment 8936655 [details] [diff] [review]
P1 Gracefully handle when the service worker script cache exists, but is empty. r=edenchuang

Eden, can you review this patch in the ServiceWorkerScriptCache?

The bug here is in the service worker update mechanism.  During update we try to look at our currently installed script and importScripts() resources.  We do a fetch for each one and compare.

The problem here is that sometimes it seems we can get a cache that does not contain any entries.  So we get len == 0 here and don't start any fetches.  This leaves the job hung.

My proposed fix here is to always perform the top level main script fetch.  We use the keys to generate fetches for importScripts() only.  This allows us to also handle any corner cases with the main script being missing, but other resources being there.
Attachment #8936655 - Flags: review?(echuang)
I'm going to try to write a test.
Comment on attachment 8936655 [details] [diff] [review]
P1 Gracefully handle when the service worker script cache exists, but is empty. r=edenchuang

I think I have some more changes to make.
Attachment #8936655 - Flags: review?(echuang)
Comment on attachment 8936655 [details] [diff] [review]
P1 Gracefully handle when the service worker script cache exists, but is empty. r=edenchuang

This patch is unchanged.  Please see comment 3 for description.

In addition, I figured out why we sometimes get a Cache object with no entries.  Its because if you manually delete the cache we just create a new empty Cache for you automatically.  So if something blows away the storage directory from the user's profile you end up with this situation.
Attachment #8936655 - Flags: review?(echuang)
Comment on attachment 8936682 [details] [diff] [review]
P2 Store response URLs in the service worker script cache. r=edenchuang

This patch is not strictly necessary, but its nice for writing tests and other diagnostics.  It makes us store the Response.url in Cache API for our various service worker script resources.

I also fix a very dubious nsresult usage here.  The code was setting an rv2 but checking rv instead.  I don't see any reason for the rv2, so lets remove it.
Attachment #8936682 - Flags: review?(echuang)
Comment on attachment 8936683 [details] [diff] [review]
P3 Add a mochitest verifying we can recover if the script cache is deleted. r=edenchuang

This patch adds a mochitest.  It verifies that if we manually delete the service worker script cache data we can still update and restore the service worker.

On current mozilla-central this test times out because the update gets stuck and never completes.  With the patches applied it passes.
Attachment #8936683 - Flags: review?(echuang)
I checked our profile refresh feature in case it was triggering this.  Fortunately it does not trigger it.
I believe we used to handle this before bug 1290951, so its a regression.
Keywords: regression
While the code was implemented in bug 1290951, it wasn't enabled until bug 1353636.  So this problem only effects 57+.
Blocks: 1353636
I have a small tweak to the test make it past --verify, but leaving the review flag as is for now.  It doesn't change the general gist of the test.  I just need to do one extra thing to cleanup so the test can re-run consecutively in the same process.
Actually, lets just update the test.  This just adds an extra waitForState() after the update to make --verify pass.  Please see previous description of the patch.
Attachment #8936683 - Attachment is obsolete: true
Attachment #8936683 - Flags: review?(echuang)
Attachment #8936694 - Flags: review?(echuang)
Comment on attachment 8936655 [details] [diff] [review]
P1 Gracefully handle when the service worker script cache exists, but is empty. r=edenchuang

Review of attachment 8936655 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. The explanation is very clear, I have nothing to comment.
Attachment #8936655 - Flags: review?(echuang) → review+
Attachment #8936682 - Flags: review?(echuang) → review+
Comment on attachment 8936694 [details] [diff] [review]
P3 Add a mochitest verifying we can recover if the script cache is deleted. r=edenchuang

Review of attachment 8936694 [details] [diff] [review]:
-----------------------------------------------------------------

Clean up the cache at the end of the test.

::: dom/workers/test/serviceworkers/test_bad_script_cache.html
@@ +57,5 @@
> +    await waitForState(reg.installing, 'activated');
> +
> +    // Verify that the script cache knows about the worker script again.
> +    response = await chromeCaches.match(scriptURL.href);
> +    is(response.url, scriptURL.href, 'worker script should be stored');

call deleteCaches(chromeCaches) again to make sure the test does not contaminate the test environment.

might be the root cause of https://treeherder.mozilla.org/logviewer.html#?job_id=151554852&repo=try&lineNumber=2177
Attachment #8936694 - Flags: review?(echuang)
(In reply to Eden Chuang[:edenchuang] from comment #19)
> ::: dom/workers/test/serviceworkers/test_bad_script_cache.html
> @@ +57,5 @@
> > +    await waitForState(reg.installing, 'activated');
> > +
> > +    // Verify that the script cache knows about the worker script again.
> > +    response = await chromeCaches.match(scriptURL.href);
> > +    is(response.url, scriptURL.href, 'worker script should be stored');
> 
> call deleteCaches(chromeCaches) again to make sure the test does not
> contaminate the test environment.

This should not be necessary since we do the service worker unregister().  The service worker code should remove its cache when the file is deleted.

> might be the root cause of
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=151554852&repo=try&lineNumber=2177

I'm going to investigate this further, but I think its a Cache API bug.  I'll either do another patch in this bug or file a separate follow-on bug.
Comment on attachment 8936694 [details] [diff] [review]
P3 Add a mochitest verifying we can recover if the script cache is deleted. r=edenchuang

See comment 20.  Re-flagging because I think the verify failure on windows is a different bug in Cache API.
Attachment #8936694 - Flags: review?(echuang)
Attachment #8936694 - Flags: review?(echuang) → review+
tracking as recent regression
I looked at the assert for a few hours this morning.  AFAICT Cache API is doing everything correctly.  It just seems like something during shutdown (QuotaManager?) is scanning files and has a lock on the body file.  So the windows OS refuses to delete it.

If I add a manual GC at the end of the test the assert goes away.  I'm going to go ahead and include that in the test since I don't see any other problems here.  Unfortunately we can't prevent other code from touching these files.
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/474703f75734
P1 Gracefully handle when the service worker script cache exists, but is empty. r=edenchuang
https://hg.mozilla.org/integration/mozilla-inbound/rev/e48d2f34e8f9
P2 Store response URLs in the service worker script cache. r=edenchuang
https://hg.mozilla.org/integration/mozilla-inbound/rev/9e183521e2de
P3 Add a mochitest verifying we can recover if the script cache is deleted. r=edenchuang
Comment on attachment 8936655 [details] [diff] [review]
P1 Gracefully handle when the service worker script cache exists, but is empty. r=edenchuang

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1290951 and Bug 1353636
[User impact if declined]: If the users profile loses storage for a site using service workers the service worker API may stop working.  It does so in a way that can create a serious memory leak for sites.  Its unclear why storage for the site is being removed/corrupted, but there can be many causes including anti-virus, user poking around in their profile, etc.
[Is this code covered by automated tests?]: P3 includes a mochitest
[Has the fix been verified in Nightly?]: Verified locally, but it has not landed on m-c or a nightly build yet.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: No
[Is the change risky?]: Minimal risk.
[Why is the change risky/not risky?]: The code changes are relatively small and self contained.  We have a good existing set of tests for this code to avoid regressions.
[String changes made/needed]: None
Attachment #8936655 - Flags: approval-mozilla-beta?
Comment on attachment 8936682 [details] [diff] [review]
P2 Store response URLs in the service worker script cache. r=edenchuang

See comment 26.
Attachment #8936682 - Flags: approval-mozilla-beta?
Comment on attachment 8936882 [details] [diff] [review]
P3 Add a mochitest verifying we can recover if the script cache is deleted. r=edenchuang

See comment 26.
Attachment #8936882 - Flags: approval-mozilla-beta?
Comment on attachment 8936655 [details] [diff] [review]
P1 Gracefully handle when the service worker script cache exists, but is empty. r=edenchuang

Fix a potential leak issue. Beta58+.
Attachment #8936655 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8936682 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8936882 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Sorry, not sure how that extra moxhitest.ini line slipped into my patch.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: