Closed
Bug 1270161
Opened 8 years ago
Closed 8 years ago
service workers that fail to install due to 404 during Cache.addAll() are leaked forever
Categories
(Core :: DOM: Service Workers, defect)
Tracking
()
RESOLVED
FIXED
mozilla49
People
(Reporter: bkelly, Assigned: bkelly)
References
Details
(Whiteboard: [MemShrink:P2] btpp-active)
Attachments
(1 file, 5 obsolete files)
The service worker currently on my blog has a bug that causes it to reject the install event waitUntil() promise. I noticed today that when this happens we don't immediately kill the worker thread: ├─────19.65 MB (01.96%) -- workers │ ├───9.97 MB (00.99%) ++ workers(telegram.org)/worker(js/lib/crypto_worker.js, 0x1ca385af000) │ ├───6.81 MB (00.68%) ++ workers(chrome) │ └───2.88 MB (00.29%) -- workers(blog.wanderview.com) │ ├──1.44 MB (00.14%) ++ worker(https://blog.wanderview.com/sw.js, 0x1ca3d242800) │ └──1.44 MB (00.14%) ++ worker(https://blog.wanderview.com/sw.js, 0x1cbc3224000) We should be cleaning these up immediately. I think they probably get cleaned up eventually by the long timer kill mechanism, but I need to verify that.
Assignee | ||
Comment 1•8 years ago
|
||
Its been over an hour and the workers still show in about:memory. Seems we leak them forever.
Summary: service workers that fail to install are left alive → service workers that fail to install are leaked forever
Whiteboard: [MemShrink]
Assignee | ||
Comment 2•8 years ago
|
||
This appear to be fixed in nightly 49. It reproduces in aurora 48, though.
Assignee | ||
Comment 3•8 years ago
|
||
I take it back. This does happen in nightly, but only in non-e10s mode. I think it might be an interaction with devtools about:debugging.
Updated•8 years ago
|
Whiteboard: [MemShrink] → [MemShrink] btpp-active
Updated•8 years ago
|
Whiteboard: [MemShrink] btpp-active → [MemShrink:P2] btpp-active
Assignee | ||
Comment 4•8 years ago
|
||
Turns out this is a bug in PromiseWorkerProxy. If a proxy is actively holding a worker alive with a WorkerFeature, it will leak the thread forever when you call WorkerPrivate::Terminate(). When I trigger this error I get this in the console: [3688] WARNING: A runnable was posted to a worker that is already shutting down!: file c:/devel/mozilla-central/dom/workers/WorkerPrivate.cpp, line 2397 [3688] WARNING: Could not dispatch fetch response end: file c:/devel/mozilla-central/dom/fetch/Fetch.cpp, line 355 Here a fetch() is running and holding a WorkerPromiseProxy. This is most likely due to a the Cache.addAll() being run in the service worker script. When one of those addAll() requests fails, however, it rejects the install event waitUntil() immediately. The failed install then causes ServiceWorkerPrivate to call WorkerPrivate::Terminate() even though other parts of the addAll() are still running. The result is the WorkerPrivate gets a status of Terminating. The PromiseWorkerProxy is still alive, however, and can only be cleaned up on the worker thread. No WorkerRunnable can be schedule on the worker thread, however, because of this check: https://dxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#2395 This means there is no way for the code using the PromiseWorkerProxy to clean it up at this point and the WorkerFeature will keep the thread alive forever. The solution is simple. We just make PromiseWorkerProxy clean itself up when the worker status reaches a point where runnables can't be dispatched any more.
Attachment #8748962 -
Flags: review?(khuey)
Assignee | ||
Comment 5•8 years ago
|
||
Actually, it seems like it should cleanup on Closing to avoid getting into this situation. Unless there is some mechanism that guarantees a worker will progress from Closing to Terminating.
Attachment #8748962 -
Attachment is obsolete: true
Attachment #8748962 -
Flags: review?(khuey)
Attachment #8748967 -
Flags: review?(khuey)
Assignee | ||
Comment 6•8 years ago
|
||
An alternative solution is to make ServiceWorkerPrivate use WorkerPrivate::Kill() instead of Terminate(). I'm proposing this patch, though, since this seems like the PromiseWorkerProxy leak would be a problem for non-service-worker code as well.
Assignee | ||
Comment 7•8 years ago
|
||
It seems this might be a problem for these other places in the code as well: dom/workers/ScriptLoader.cpp dom/indexedDB/IDBTransaction.cpp dom/cache/Feature.cpp dom/base/WebSocket.cpp dom/workers/XMLHttpRequest.cpp dom/notification/Notification.cpp A quick glance suggests that many, if not all, require executing a runnable on the worker thread to release the feature. For example, IDB needs to receive a PBackground message that executes FireCompleteOrAbortEvents() on the worker thread. If you think its appropriate, I can make patches to switch these over to Closing as well.
Assignee | ||
Comment 8•8 years ago
|
||
Or maybe IPC uses control runnables so they can still be dispatched or something. If that's the case, then maybe I should make Fetch use a control runnable.
Assignee | ||
Comment 9•8 years ago
|
||
I spoke with Kyle on IRC and he indicated that fetch() should be using a control runnable here. I can't just always use a control runnable, though, because in the non-shutting-down case we don't want to cleanup the promise proxy before our other runnable resolves the promise. So we just fire a control runnable to cleanup if the non-control runnable dispatch fails.
Attachment #8748967 -
Attachment is obsolete: true
Attachment #8748967 -
Flags: review?(khuey)
Attachment #8749212 -
Flags: review?(amarchesini)
Assignee | ||
Comment 10•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d9ff9291e25c
Assignee | ||
Comment 11•8 years ago
|
||
The ContinueConsumeBodyRunnable does not have this problem because FetchBodyFeature::Notify() cleans things up on the lowest worker Closing status.
Comment 12•8 years ago
|
||
Comment on attachment 8749212 [details] [diff] [review] Make fetch() use a control runnable to cleanup if the worker thread is shutting down. r=baku Review of attachment 8749212 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/fetch/Fetch.cpp @@ +301,5 @@ > +class WorkerFetchResponseEndBase > +{ > + RefPtr<PromiseWorkerProxy> mPromiseProxy; > +public: > + WorkerFetchResponseEndBase(PromiseWorkerProxy* aPromiseProxy) explicit @@ +339,5 @@ > + Cancel() override > + { > + // Execute Run anyway to make sure we cleanup our promise proxy to avoid > + // leaking the worker thread > + Run(); same here. Call WorkerRun() instead. @@ +362,5 @@ > + WorkerRunInternal(aWorkerPrivate); > + return true; > + } > + > + // Control runnable cancel already calls Run(). Correct, it calls Run() but that will not call WorkerRun(). You must call WorkerRunInternal(aWorkerPrivate) by yourself.
Attachment #8749212 -
Flags: review?(amarchesini) → review-
Assignee | ||
Comment 13•8 years ago
|
||
Fix the missing explicit keyword. I also removed the MOZ_ALWAYS_TRUE() on the control runnable dispatch. I realized that it can actually fail to dispatch if the worker thread is canceled or killed since that will cause PromiseWorkerProxy to give up its feature early. As discussed on IRC I left the calls to Run(). They should work since they are called before WorkerRunnable::Cancel() which sets the flag necessary for IsCanceled() to return true.
Attachment #8749212 -
Attachment is obsolete: true
Attachment #8749296 -
Flags: review?(amarchesini)
Assignee | ||
Comment 14•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac8cdaeb66f8
Updated•8 years ago
|
Attachment #8749296 -
Flags: review?(amarchesini) → review+
Assignee | ||
Comment 15•8 years ago
|
||
This is a relatively small change and prevents erratic memory leaks. I think we should uplift it. Note, I marked that it affects esr45 since you can trigger this on dedicated workers.
status-firefox46:
--- → wontfix
status-firefox47:
--- → affected
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox-esr45:
--- → affected
Assignee | ||
Comment 16•8 years ago
|
||
Updated to fix NS_WARN_IF_FALSE() syntax. https://treeherder.mozilla.org/#/jobs?repo=try&revision=5fc3a1130be1
Attachment #8749296 -
Attachment is obsolete: true
Attachment #8749320 -
Flags: review+
Assignee | ||
Comment 17•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=234eaec2296b
Attachment #8749320 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Summary: service workers that fail to install are leaked forever → service workers that fail to install due to 404 during Cache.addAll() are leaked forever
Comment 19•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d657d71ec3e9
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Assignee | ||
Comment 20•8 years ago
|
||
Comment on attachment 8749332 [details] [diff] [review] Make fetch() use a control runnable to cleanup if the worker thread is shutting down. r=baku Approval Request Comment [Feature/regressing bug #]: Fetch API and Cache API used in workers. In practice this mostly means service workers, but developers can also use these APIs in normal web workers. [User impact if declined]: Memory leak. [Describe test coverage new/current, TreeHerder]: Existing mochitests and wpt tests for regression testing. It was not possible to write a test for this particular leak, however. [Risks and why]: Minimal. This is a very isolated change to the usage of these APIs on workers. [String/UUID change made/needed]: None.
Attachment #8749332 -
Flags: approval-mozilla-beta?
Attachment #8749332 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 21•8 years ago
|
||
On further reflection I don't think we should uplift this to ESR45. Its not a security issue and the common leak involves service workers which are disabled. In theory a site could use Fetch API or Cache API on a normal web worker and trigger this, but its highly unlikely.
Comment on attachment 8749332 [details] [diff] [review] Make fetch() use a control runnable to cleanup if the worker thread is shutting down. r=baku Fixing a mlk in SWs, let's uplift to Aurora48 and bake for a bit before deciding whether to uplift to Beta or not.
Attachment #8749332 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 23•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/ddb972fb7c37
Comment on attachment 8749332 [details] [diff] [review] Make fetch() use a control runnable to cleanup if the worker thread is shutting down. r=baku Baked on Nightly and Aurora for a few days, looks safe to uplift to Beta47.
Attachment #8749332 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 25•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/beaf4d37d01c
Assignee | ||
Comment 26•8 years ago
|
||
I pushed a fix for the beta build bustage: remote: https://hg.mozilla.org/releases/mozilla-beta/rev/2991f214d4f4
Assignee | ||
Comment 27•8 years ago
|
||
Unfortunately, the patch in this bug fixed the leak in my local opt-DEBUG builds, but there is still a leak in opt/pgo builds. I'm going to keep investigating that in bug 1271493.
You need to log in
before you can comment on or make changes to this bug.
Description
•