Refactor WorkerPrivate to support CheckedUnsafePtr and convert existing raw class member pointers
Categories
(Core :: DOM: Workers, task)
Tracking
()
People
(Reporter: jstutte, Assigned: edenchuang)
References
(Blocks 1 open bug)
Details
Attachments
(11 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
We currently have many long-living WorkerPrivate* mWorkerPrivate;
class variables in our code. It is difficult to always oversee their lifecycle and to guarantee that the referred WorkerPrivate
instance is still alive.
Adding thread-safe ref-counting to WorkerPrivate
will help to get rid of some if not all of them and might eventually help us to remove some flavor of WorkerRef
, too.
This is delicate as adding ref-counts will increase the probability of leaks but it might help to detect unwanted/unhandled circles in our references that are currently invisible.
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
Updated•2 years ago
|
Reporter | ||
Comment 2•2 years ago
|
||
Depends on D132708
Reporter | ||
Comment 3•2 years ago
|
||
Depends on D132709
Reporter | ||
Comment 4•2 years ago
|
||
Depends on D132710
Updated•2 years ago
|
Reporter | ||
Comment 5•2 years ago
•
|
||
Many of the WorkerPrivate::GetParent
uses look like this:
// Walk up to our containing page
WorkerPrivate* wp = aWorkerPrivate;
while (wp->GetParent()) {
wp = wp->GetParent();
}
nsCOMPtr<nsPIDOMWindowInner> window = wp->GetWindow();
while we already have WorkerPrivate::GetAncestorWindow
. I assume we can use this everywhere even though it does an additional:
if (mLoadInfo.mWindow) {
return mLoadInfo.mWindow;
}
before walking the parents?
Comment 6•2 years ago
|
||
Short proposal: Let's use CheckedUnsafePtr with checking always enabled in nightly to replace the raw WorkerPrivate uses.
Longer Proposal and Rationale
Currently we canonically expect there to be 2 strong references to WorkerPrivate:
- The WorkerPrivate held by the owner of the worker so that it can message the worker and so that it can terminate it. This strong reference is dropped immediately upon requesting termination.
- The self-reference held by the worker as long as it is alive.
- This self-reference also then gets registered with the runtime service largely for being able to expose the worker for debugging. Because of the existence of nested workers, this gets ugly and arguably it could be appropriate for the runtime service to be given its own strong reference. Attaching a debugger could then also use that existing strong reference to acquire a specific strong reference for the debugger.
Where I think we've historically run into trouble is that the implementation has held raw pointers to the WorkerPrivate which have conceptually been borrowing the self-reference's strong reference. The fatal flaw in this approach is that any kind of logic bug where something outlives the global potentially turns into a UAF.
We faced a similar situation with IndexedDB, which led to the creation of CheckedUnsafePtr and which seems suitable for use in this case as well. Here we are primarily concerned that it's indicative of a logic bug if we use these borrowed pointers beyond their use point.
This is preferable to converting the raw pointers to RefPtrs, as
- It's very much our intent that all of the accesses that are being done are being done under the auspices of a specific owning strong-reference.
- Marking the WorkerPrivate as threadsafe refcounted and standardizing use of RefPtr's to WorkerPrivate when there should only be the two strong references potentially throws away a lot of the good intent behind the current strategy. (Reiterating that the current strategy has serious issues because of UAFs).
SafeRefPtr could also potentially accomplish this as it requires intent to create new references, but as you remarked in chat, it has the logistical problem that it's a much larger initial set of changes.
Another option would be to introduce some other kind of holding class (or reuse something like nsMainThreadPtrHolder in nsProxyRelease.h that supports other threads) that could allow ref-counted pointers that reuse the underlying strong reference so that we can keep that strong reference alive to avoid UAFs, but to also allow us to add some assertions/checks. Because it's definitely the case that there are logic bugs where these borrowed pointers are used too late and we want to detect that, not just paper over them.
Meta: More things should be using nsCOMPtr<nsIGlobalObject>
nsIGlobalObject exists to allow code that wants to work with a global to not have to have a separate code path for windows and workers (plus there are more global types than just those!). There are a lot of cases where code is accessing WorkerPrivate
when it really wants to be accessing nsIGlobalObject
.
But it should be noted that in the nsIGlobalObject case, all the reasons that code already needs to use WorkerRefs still apply. Which is why in https://phabricator.services.mozilla.com/D125555#4135011 I proposed (and I think brought up elsewhere with :smaug? Maybe Matrix?) that nsIGlobalObject should likely have a generic concept that uses WorkerRefs on workers and something else on windows so that:
- Code can indicate that it currently needs the global / its underpinnings to be kept alive, at least until...
- It's notified that the global is terminating, at which point the code is responsible for cleaning up as quickly as possible and dropping its reference to the keepalive.
(Note that windows and nsIGlobalObject do already somewhat have a concept of this through DOMEventTargetHelper, but its semantics don't line up exactly.)
When performing cleanups where it might seem like we want more code to be holding a strong reference to WorkerPrivate, we really want them to be holding a reference to nsIGlobalObject and/or a keepalive.
Reporter | ||
Comment 7•2 years ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)
Short proposal: Let's use CheckedUnsafePtr with checking always enabled in nightly to replace the raw WorkerPrivate uses.
OK, changing this bug to this scope then.
Meta: More things should be using nsCOMPtr<nsIGlobalObject>
nsIGlobalObject exists to allow code that wants to work with a global to not have to have a separate code path for windows and workers (plus there are more global types than just those!). There are a lot of cases where code is accessing
WorkerPrivate
when it really wants to be accessingnsIGlobalObject
.But it should be noted that in the nsIGlobalObject case, all the reasons that code already needs to use WorkerRefs still apply. Which is why in https://phabricator.services.mozilla.com/D125555#4135011 I proposed (and I think brought up elsewhere with :smaug? Maybe Matrix?) that nsIGlobalObject should likely have a generic concept that uses WorkerRefs on workers and something else on windows so that:
- Code can indicate that it currently needs the global / its underpinnings to be kept alive, at least until...
- It's notified that the global is terminating, at which point the code is responsible for cleaning up as quickly as possible and dropping its reference to the keepalive.
(Note that windows and nsIGlobalObject do already somewhat have a concept of this through DOMEventTargetHelper, but its semantics don't line up exactly.)
When performing cleanups where it might seem like we want more code to be holding a strong reference to WorkerPrivate, we really want them to be holding a reference to nsIGlobalObject and/or a keepalive.
This sounds like an additional action to take and merits probably its own bug. This bug might want to note suspicious uses with // XXX:
comments, though.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Reporter | ||
Comment 8•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #7)
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)
Meta: More things should be using nsCOMPtr<nsIGlobalObject>
This sounds like an additional action to take and merits probably its own bug. This bug might want to note suspicious uses with
// XXX:
comments, though.
I filed bug 1744251 for this.
Reporter | ||
Comment 9•2 years ago
|
||
Depends on D132710
Reporter | ||
Comment 10•2 years ago
|
||
Depends on D132795
Reporter | ||
Comment 11•2 years ago
|
||
Depends on D132797
Reporter | ||
Comment 12•2 years ago
|
||
Depends on D132799
Updated•2 years ago
|
Comment 13•2 years ago
|
||
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2c8941fdcee5 Make WorkerPrivate support CheckedUnsafePtr. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/26d3176a00ef Use CheckedUnsafePtr<WorkerPrivate> in WorkerGlobalScopeBase. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/e9b749f80ec2 Use CheckedUnsafePtr<WorkerPrivate> in WorkerPrivate where possible. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/e6b99af4b7ca Use CheckedUnsafePtr<WorkerPrivate> in WorkerDebugger. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/e1bc4e2066c7 Remove only assertion-relevant mWorkerPrivate from FetchBody. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/913cdee4824f Use CheckedUnsafePtr<WorkerPrivate> in Notification. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/46e9f06636da Use CheckedUnsafePtr<WorkerPrivate> in PerformanceWorker. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/b6452430763d Use CheckedUnsafePtr<WorkerPrivate> in WorkerEventTarget. r=dom-worker-reviewers,edenchuang
Comment 14•2 years ago
•
|
||
Backed out for causing bp hybrid build bustages in dom/clients/api/Clients.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/d675762911a779c1765e144795c2a5294454ee09
ERROR - /builds/worker/checkouts/gecko/dom/clients/api/Clients.cpp:225:36: error: member access into incomplete type 'mozilla::dom::WorkerGlobalScope'
[task 2021-12-07T23:15:47.594Z] 23:15:47 INFO - if (!workerPrivate->GlobalScope()->WindowInteractionAllowed()) {
[task 2021-12-07T23:15:47.595Z] 23:15:47 INFO - ^
[task 2021-12-07T23:15:47.595Z] 23:15:47 INFO - /builds/worker/workspace/obj-build/dist/include/mozilla/dom/WorkerPrivate.h:71:7: note: forward declaration of 'mozilla::dom::WorkerGlobalScope'
[task 2021-12-07T23:15:47.595Z] 23:15:47 INFO - class WorkerGlobalScope;
LE:
Failure log wpt
Reporter | ||
Comment 15•2 years ago
|
||
Those bustages need some reasoning about the include order. I will come to this next week, keeping the ni?.
Assignee | ||
Comment 16•2 years ago
|
||
Assignee | ||
Comment 17•2 years ago
|
||
The build error is caused by the patch https://phabricator.services.mozilla.com/D132709.
#include "mozilla/dom/WorkerScope.h" is removed from WorkerPrivate.h. It make Client.cpp has no complete type information about WorkerScope.
New patch submitted for the build errors by replacing #include "mozilla/dom/WorkerPrivate.h" with #include "mozilla/dom/WorkerScope.h" in cpp files where using WorkerPrivate::GlobalScope()
Comment 18•2 years ago
|
||
Pushed by echuang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/43186fbbf8b4 Make WorkerPrivate support CheckedUnsafePtr. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/5598943a94fd Use CheckedUnsafePtr<WorkerPrivate> in WorkerGlobalScopeBase. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/cfb822df5b3f Use CheckedUnsafePtr<WorkerPrivate> in WorkerPrivate where possible. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/8abd6ba69815 Use CheckedUnsafePtr<WorkerPrivate> in WorkerDebugger. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/a5edfa1c9cd6 Remove only assertion-relevant mWorkerPrivate from FetchBody. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/af171636a87f Use CheckedUnsafePtr<WorkerPrivate> in Notification. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/1580a4ea1a85 Use CheckedUnsafePtr<WorkerPrivate> in PerformanceWorker. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/7aa395dcdbe4 Use CheckedUnsafePtr<WorkerPrivate> in WorkerEventTarget. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/89dca4fc5940 Replace include "mozilla/dom/WorkerPrivate.h" with include "mozilla/dom/WorkerScope.h" where WorkerPrivate->GlobalScope() is called. r=dom-worker-reviewers,smaug,jstutte
Comment 19•2 years ago
|
||
Backed out for causing wpt failures on CheckedUnsafePtr.h
Backout link: https://hg.mozilla.org/integration/autoland/rev/ddf7a6c0a118b33abaab02b44522612d02a61b48
Failure log: https://treeherder.mozilla.org/logviewer?job_id=361346297&repo=autoland&lineNumber=7875
Failure line: SUMMARY: AddressSanitizer: SEGV /builds/worker/workspace/obj-build/dist/include/mozilla/dom/quota/CheckedUnsafePtr.h:247:31 in NotifyCheckFailure
Updated•2 years ago
|
Reporter | ||
Comment 20•2 years ago
|
||
Whohoo, that is at least confirming that CheckedUnsafePtr helps finding issues. I am a bit puzzled why this did not happen on our earlier try runs, though. The Push seems to be quite buggy, too (on several jobs I read Log parsing failed. Unable to generate failure summary.
). Not sure if we should just give it a second try?
Comment 21•2 years ago
•
|
||
The Push seems to be quite buggy, too (on several jobs I read Log parsing failed. Unable to generate failure summary.).
I guess that's something to do with today's AWS outage.
test-linux1804-64-asan-qr/opt-web-platform-tests-e10s wpt1
happened five times with proper logs and I think that was enough for a backout.
Comment 22•2 years ago
•
|
||
It definitely looks like CheckedUnsafePtr caught a case where a CheckedUnsafePtr outlived the worker dropping its self-ref. The fact that this happened in xpcom-shutdown-threads suggests this might not be a content global but instead something like the OS.File global and that ASAN performance characteristics slowed things down to shuffle thread scheduling. (edit: Although this was a content process which inherently has different shutdown characteristics than the parent, so maybe that's also related and maybe it was a content global that just didn't have a lot of time to shutdown before XPCOM shutdown.) We definitely need to investigate and/or add more instrumentation before relanding, and should probably target a bunch of ASAN jobs like this as good candidates to be introducing a chaos mode factor.
Comment 23•2 years ago
|
||
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:jstutte, could you have a look please?
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 24•2 years ago
|
||
(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #23)
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:jstutte, could you have a look please?
For more information, please visit auto_nag documentation.
Eden?
Assignee | ||
Comment 25•2 years ago
|
||
Since these promises are also holding the GlobalObject(ServiceWorkerGlobalScope), they are needed to be added into the CycleCollection checking.
Otherwise, ServiceWorkerGlobalScope can not be unrooted as expected.
Depends on D133483
Assignee | ||
Updated•2 years ago
|
Comment 26•2 years ago
|
||
Pushed by echuang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dfb7e4b76f92 Make WorkerPrivate support CheckedUnsafePtr. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/cbe7670b4ae5 Use CheckedUnsafePtr<WorkerPrivate> in WorkerGlobalScopeBase. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/968efcd33efb Use CheckedUnsafePtr<WorkerPrivate> in WorkerPrivate where possible. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/c56f48bc403a Use CheckedUnsafePtr<WorkerPrivate> in WorkerDebugger. r=dom-worker-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/06f11a04cc8a Remove only assertion-relevant mWorkerPrivate from FetchBody. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/38ebc27679a4 Use CheckedUnsafePtr<WorkerPrivate> in Notification. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/27c8e30bf969 Use CheckedUnsafePtr<WorkerPrivate> in PerformanceWorker. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/b31f930a6e32 Use CheckedUnsafePtr<WorkerPrivate> in WorkerEventTarget. r=dom-worker-reviewers,edenchuang https://hg.mozilla.org/integration/autoland/rev/a362ed940cd7 Replace include "mozilla/dom/WorkerPrivate.h" with include "mozilla/dom/WorkerScope.h" where WorkerPrivate->GlobalScope() is called. r=dom-worker-reviewers,smaug,jstutte https://hg.mozilla.org/integration/autoland/rev/35a19eee470b Adding FetchEvent.mHandled FetchEvent.mPreloadResponse into CycleCollection checking. r=dom-worker-reviewers,smaug
Comment 27•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/dfb7e4b76f92
https://hg.mozilla.org/mozilla-central/rev/cbe7670b4ae5
https://hg.mozilla.org/mozilla-central/rev/968efcd33efb
https://hg.mozilla.org/mozilla-central/rev/c56f48bc403a
https://hg.mozilla.org/mozilla-central/rev/06f11a04cc8a
https://hg.mozilla.org/mozilla-central/rev/38ebc27679a4
https://hg.mozilla.org/mozilla-central/rev/27c8e30bf969
https://hg.mozilla.org/mozilla-central/rev/b31f930a6e32
https://hg.mozilla.org/mozilla-central/rev/a362ed940cd7
https://hg.mozilla.org/mozilla-central/rev/35a19eee470b
Reporter | ||
Comment 28•2 years ago
|
||
In an intermediate version of D132710 we wanted to use RefPtr and in AutoSyncLoopHolder::Run the local WP
variable would have kept alive the WorkerPrivate during the sync loop. But downgrading for now to CheckedUnsafePtr
does not give us any lifecycle guarantees here anymore, so the naming was inappropriate.
Comment 29•2 years ago
|
||
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2ae62a58f031 Restore original naming of local WP variable in AutoSyncLoopHolder::Run r=dom-worker-reviewers,smaug
Comment 30•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Reporter | ||
Comment 31•2 years ago
|
||
I think that the remaining work has been/will be done by Eden.
Assignee | ||
Comment 32•1 year ago
|
||
Comment 33•1 year ago
|
||
Comment on attachment 9304871 [details]
Bug 1744025 - Print out the creation stack and the last assignment stack of CheckedUnsafePtr when it is unsafe. r=#dom-worker-reviewers
Revision D162823 was moved to bug 1789399. Setting attachment 9304871 [details] to obsolete.
Comment 34•11 months ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:edenchuang, maybe it's time to close this bug?
For more information, please visit BugBot documentation.
Assignee | ||
Comment 35•8 months ago
|
||
Remove the leave-open keyword and set it as resolved-worksforme.
We don't have any further implementation for CheckedUnsafePtr here.
Comment 36•8 months ago
|
||
(Changing to fixed since a number of patches did land here.)
Description
•