Crash in [@ mozilla::ipc::IProtocol::ActorDealloc] from PRemoteWorkerControllerParent.cpp:517:52', void, const mozilla::dom::ServiceWorkerOpResult&>::_Delete_this
Categories
(Core :: DOM: Service Workers, defect, P1)
Tracking
()
People
(Reporter: aryx, Assigned: nika)
References
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [post-critsmash-triage][sec-survey][adv-main86+r])
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
tjr
:
approval-mozilla-beta+
tjr
:
sec-approval+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/1bc352e0-d3dd-4e1f-b1c8-156f60201219
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll mozilla::ipc::IProtocol::ActorDealloc ipc/glue/ProtocolUtils.h:331
1 xul.dll mozilla::ipc::ActorLifecycleProxy::~ActorLifecycleProxy ipc/glue/ProtocolUtils.cpp:276
2 xul.dll std::_Func_impl_no_alloc<`lambda at /builds/worker/workspace/obj-build/ipc/ipdl/PRemoteWorkerControllerParent.cpp:517:52', void, const mozilla::dom::ServiceWorkerOpResult&>::_Delete_this vs2017_15.8.4/VC/include/functional:1240
3 xul.dll mozilla::MozPromise<mozilla::dom::ServiceWorkerOpResult, nsresult, 1>::ThenValue<`lambda at /builds/worker/checkouts/gecko/dom/serviceworkers/ServiceWorkerOp.cpp:412:7'>::DoResolveOrRejectInternal xpcom/threads/MozPromise.h:845
4 xul.dll mozilla::MozPromise<CopyableTArray<bool>, nsresult, 1>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:410
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1197
6 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:302
7 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:327
8 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309
9 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:442
Comment 1•4 years ago
|
||
Another actor lifetime issue with an actor on a PBackground thread. Any ideas, Nika?
Comment 2•4 years ago
|
||
This seems like a ServiceWorker shutdown bug. We're in xpcom-shutdown-threads and this MozPromise should have been disconnected/discarded earlier in the shutdown process, or we're looking at a request issued after shutdown began that should have been forbidden. Obviously, if IPC magic can magically save us, that'd be great, of course.
Comment 3•4 years ago
|
||
I thought the idea of the lifecycle proxy was that it served as a weak reference, but it doesn't seem to be saving us here. Maybe there's a threading issue or something?
Comment 4•4 years ago
|
||
:tsmith has come to the rescue and independently discovered this in bug 1684331 and provided a pernosco trace there at https://bugzilla.mozilla.org/show_bug.cgi?id=1684331#c1. I'll cc you both because I have no clue who has sec-bits and who doesn't.
Assignee | ||
Comment 6•4 years ago
|
||
I took a quick look at the pernosco trace which :asuth linked from the other bug. It appears that this crash is being caused by the BackgroundChild
's thread going away before an actor managed on that thread has been ActorDealloc
-ed. This is probably happening due to the changes in bug 1676223, where async returns were switched to using an ActorLifecycleProxy
, as this means that ActorDealloc
will be delayed until the resolvers have occurred.
Even without that change, this was already thread-unsafe, as a reference to a non-threadsafe WeakPtr
was being held across threads, but it seems to have been made worse.
It seems like callers are assuming that the std::function
passed as an async returns value to IPDL functions is threadsafe, which it is not. I'm not sure what we can easily do to fix that incorrect assumption though.
If we want to be maximally defensive in the IPC code, we could theoretically make resolvers hold a threadsafe weak pointer (which will probably either require some MFBT changes to add destruction customization to mozilla::detail::RefCounted
, or adding support for this feature to XPCOM refcounting) to the ActorLifecycleProxy
, (which itself sometimes acts like a weak pointer to the actor), and do a weak pointer upgrade + thread safety check + ActorLifecycleProxy access in the resolver code, making sure we're on the correct thread, and that no object has died while the resolver was in-use.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 7•4 years ago
|
||
Nika, it's not clear to me why you are referring to questions of thread safety in comment 6. At least in the crash from Bug 1684331, the destruction of the ChildImpl happens on the same thread that later tries to access it. What is the reason for calling BackgroundChild::CloseForCurrentThread
when there are still outstanding events in https://searchfox.org/mozilla-release/rev/4585f47ffd5244d276b33420fc30cd6647a0a37b/xpcom/threads/nsThread.cpp#444-463?
Assignee | ||
Comment 8•4 years ago
|
||
(In reply to Simon Giesecke [:sg] [he/him] from comment #7)
Nika, it's not clear to me why you are referring to questions of thread safety in comment 6. At least in the crash from Bug 1684331, the destruction of the ChildImpl happens on the same thread that later tries to access it.
I can't remember specifically why I had brought up thread safety here. I think I was probably primed to be thinking about it due to the other thread safety issues I had been running into around ActorLifecycleProxy
being held alive after our manager was deallocated. Given that the threadsafety issues I was worrying about aren't actually an issue, I think it should be possible to fix this bug by making the Background{Parent,Child} actors refcounted
, which should be a small patch. I'll quickly post it up.
What is the reason for calling
BackgroundChild::CloseForCurrentThread
when there are still outstanding events in https://searchfox.org/mozilla-release/rev/4585f47ffd5244d276b33420fc30cd6647a0a37b/xpcom/threads/nsThread.cpp#444-463?
I'm guessing this is done so that async dispatched tasks queued from received messages have a chance to be flushed after the actor has been disconnected from the remote.
Assignee | ||
Comment 9•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 10•4 years ago
|
||
Comment on attachment 9198193 [details]
Bug 1683490 - Make PBackground be a refcounted actor,
Security Approval Request
- How easily could an exploit be constructed based on the patch?: The fix is fairly distant from the specific instance which is causing issues. While it may be possible to figure out that there are issues with UAF on the
PBackground
actor, finding a specific exploit would require a large amount of specific knowledge and investigation to find specific actors and then figure out how to exploit them. - Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: Fx84+
- If not all supported branches, which bug introduced the flaw?: Bug 1676223
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: I think it is likely that this patch will apply cleanly to all supported branches as-is.
- How likely is this patch to cause regressions; how much testing does it need?: I think this patch is unlikely to cause regressions, although it has not been run through try due to its status as a security bug.
Updated•4 years ago
|
Comment 11•4 years ago
|
||
Comment on attachment 9198193 [details]
Bug 1683490 - Make PBackground be a refcounted actor,
Approved to land and uplift to 86
![]() |
Reporter | |
Comment 12•4 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/c5dd5d053cb14df3d0d95c758aa1f77bbe85ef07
https://hg.mozilla.org/mozilla-central/rev/c5dd5d053cb1
Comment 13•4 years ago
|
||
Updated•4 years ago
|
Comment 14•4 years ago
|
||
As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.
Please visit this google form to reply.
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Should we dupe & hide bug 1708359, too?
Updated•4 years ago
|
Updated•4 years ago
|
Updated•3 years ago
|
Description
•