Closed Bug 1545577 Opened 5 years ago Closed 5 years ago

Use-after-free of WorkerPrivate involving ModifyBusyCountRunnable which runs after TopLevelWorkerFinishedRunnable

Categories

(Core :: DOM: Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID
Tracking Status
firefox-esr60 --- wontfix
firefox66 --- wontfix
firefox67 + wontfix
firefox68 + wontfix

People

(Reporter: mstange, Unassigned)

References

(Regression)

Details

(Keywords: csectype-uaf, regression, sec-high)

[Tracking Requested - why for this release]: Crash / potential security bug

My local build crashes when I go to https://www.google.com/maps .
This build is compiled with --enable-tasktracer, which may affect event loop ordering (though it really shouldn't). It's possible that bug is not present in the binaries we ship.

I've debugged this a bit and what happens is that a ModifyBusyCountRunnable runnable runs on the main thread whose mWorkerPrivate has already been destroyed.

The ModifyBusyCountRunnable is dispatched from this DispatchToMainThread call in WorkerControlRunnable::DispatchInternal(). Its mBehavior is ParentThreadUnchangedBusyCount.

Does that call just need to be changed to DispatchToMainThreadForMessaging? Does the same need to be done for other types of runnables?

Here's the crash stack:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x000000010d67d409 XUL`mozilla::dom::WorkerPrivate::GlobalScope() const [inlined] std::__1::__atomic_base<PRThread const*, false>::load(this=<unavailable>, __m=memory_order_acquire) const at atomic:926 [opt]
    frame #1: 0x000000010d67d409 XUL`mozilla::dom::WorkerPrivate::GlobalScope() const [inlined] mozilla::detail::IntrinsicMemoryOps<PRThread const*, (mozilla::MemoryOrdering)1, (mozilla::recordreplay::Behavior)1>::load(aPtr=<unavailable>) at Atomics.h:220 [opt]
    frame #2: 0x000000010d67d3f4 XUL`mozilla::dom::WorkerPrivate::GlobalScope() const [inlined] mozilla::detail::AtomicBaseIncDec<PRThread const*, (mozilla::MemoryOrdering)1, (mozilla::recordreplay::Behavior)1>::operator PRThread const*(this=0xe5e5e5e5e5e5eb35) const at Atomics.h:376 [opt]
    frame #3: 0x000000010d67d3f4 XUL`mozilla::dom::WorkerPrivate::GlobalScope() const [inlined] mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible>::IsCorrectThread() const at ThreadBound.h:128 [opt]
    frame #4: 0x000000010d67d3ed XUL`mozilla::dom::WorkerPrivate::GlobalScope() const [inlined] mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible>::AssertIsCorrectThread() const at ThreadBound.h:135 [opt]
    frame #5: 0x000000010d67d3ed XUL`mozilla::dom::WorkerPrivate::GlobalScope() const [inlined] mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible>::Accessor<true>::Accessor(mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible> const&) at ThreadBound.h:100 [opt]
    frame #6: 0x000000010d67d3ed XUL`mozilla::dom::WorkerPrivate::GlobalScope() const [inlined] mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible>::Accessor<true>::Accessor(mozilla::ThreadBound<mozilla::dom::WorkerPrivate::WorkerThreadAccessible> const&) at ThreadBound.h:99 [opt]
    frame #7: 0x000000010d67d3ed XUL`mozilla::dom::WorkerPrivate::GlobalScope(this=0xe5e5e5e5e5e5e5e5) const at WorkerPrivate.h:308 [opt]
    frame #8: 0x000000010df852be XUL`mozilla::dom::WorkerRunnable::Run(this=0x0000000117818540) at WorkerRunnable.cpp:294 [opt]
    frame #9: 0x000000010bb8549a XUL`mozilla::ThrottledEventQueue::Inner::ExecuteRunnable(this=<unavailable>) at ThrottledEventQueue.cpp:243 [opt]
    frame #10: 0x000000010bb836cd XUL`mozilla::ThrottledEventQueue::Inner::Executor::Run(this=<unavailable>) at ThrottledEventQueue.cpp:80 [opt]
    frame #11: 0x000000010f2f423f XUL`mozilla::tasktracer::TracedRunnable::Run(this=0x00000001056c91f0) at TracedTaskCommon.cpp:101 [opt]
    frame #12: 0x000000010bb7ad09 XUL`nsThread::ProcessNextEvent(this=0x0000000104562d00, aMayWait=<unavailable>, aResult=0x00007fffffffffff) at nsThread.cpp:1180 [opt]
    frame #13: 0x000000010bb7d3a6 XUL`NS_ProcessNextEvent(aThread=<unavailable>, aMayWait=false) at nsThreadUtils.cpp:486 [opt]
    frame #14: 0x000000010c0b20d6 XUL`mozilla::ipc::MessagePump::Run(this=0x000000010456b5b0, aDelegate=0x00007fff5bc71d18) at MessagePump.cpp:88 [opt]
    frame #15: 0x000000010c070669 XUL`MessageLoop::Run() [inlined] MessageLoop::RunInternal() at message_loop.cc:315 [opt]
    frame #16: 0x000000010c07065a XUL`MessageLoop::Run() [inlined] MessageLoop::RunHandler() at message_loop.cc:308 [opt]
    frame #17: 0x000000010c07065a XUL`MessageLoop::Run(this=<unavailable>) at message_loop.cc:290 [opt]
    frame #18: 0x000000010e1d2349 XUL`nsBaseAppShell::Run(this=0x00000001045ed700) at nsBaseAppShell.cpp:137 [opt]
    frame #19: 0x000000010e23daeb XUL`nsAppShell::Run(this=0x00000001045ed700) at nsAppShell.mm:704 [opt]
    frame #20: 0x000000010f5c6e18 XUL`XRE_RunAppShell() at nsEmbedFunctions.cpp:919 [opt]
    frame #21: 0x000000010c070669 XUL`MessageLoop::Run() [inlined] MessageLoop::RunInternal() at message_loop.cc:315 [opt]
    frame #22: 0x000000010c07065a XUL`MessageLoop::Run() [inlined] MessageLoop::RunHandler() at message_loop.cc:308 [opt]
    frame #23: 0x000000010c07065a XUL`MessageLoop::Run(this=<unavailable>) at message_loop.cc:290 [opt]
    frame #24: 0x000000010f5c6bc5 XUL`XRE_InitChildProcess(aArgc=<unavailable>, aArgv=<unavailable>, aChildData=<unavailable>) at nsEmbedFunctions.cpp:757 [opt]
    frame #25: 0x0000000103f8df17 plugin-container`main [inlined] content_process_main(bootstrap=0x00000001045041c0, argc=<unavailable>, argv=0x00007fff5bc71fa0) at plugin-container.cpp:56 [opt]
    frame #26: 0x0000000103f8deeb plugin-container`main(argc=<unavailable>, argv=0x00007fff5bc71fa0) at MozillaRuntimeMain.cpp:23 [opt]
    frame #27: 0x00007fff8b30b235 libdyld.dylib`start + 1

Here's the stack from which the WorkerPrivate gets destroyed (earlier):

    frame #1: 0x0000000114b6b595 XUL`mozilla::dom::WorkerPrivate::~WorkerPrivate(this=0x000000010dfb9800) at WorkerPrivate.cpp:2178 [opt]
    frame #2: 0x0000000114b7eee4 XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run() [inlined] mozilla::dom::WorkerPrivate::~WorkerPrivate(this=<unavailable>) at WorkerPrivate.cpp:2167 [opt]
    frame #3: 0x0000000114b7eedc XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run() [inlined] mozilla::dom::WorkerPrivate::Release(this=0x000000010dfb9800) at WorkerPrivate.h:102 [opt]
    frame #4: 0x0000000114b7eece XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run() [inlined] mozilla::RefPtrTraits<mozilla::dom::WorkerPrivate>::Release(aPtr=0x000000010dfb9800) at RefPtr.h:46 [opt]
    frame #5: 0x0000000114b7eece XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run() [inlined] RefPtr<mozilla::dom::WorkerPrivate>::ConstRemovingRefPtrTraits<mozilla::dom::WorkerPrivate>::Release(aPtr=0x000000010dfb9800) at RefPtr.h:363 [opt]
    frame #6: 0x0000000114b7eece XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run() [inlined] RefPtr<mozilla::dom::WorkerPrivate>::assign_assuming_AddRef(this=<unavailable>) at RefPtr.h:65 [opt]
    frame #7: 0x0000000114b7eeb7 XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run() [inlined] RefPtr<mozilla::dom::WorkerPrivate>::operator=(this=<unavailable>) at RefPtr.h:156 [opt]
    frame #8: 0x0000000114b7eeb7 XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run() [inlined] mozilla::dom::WorkerPrivate::ClearSelfAndParentEventTargetRef(this=0x000000010dfb9800) at WorkerPrivate.h:125 [opt]
    frame #9: 0x0000000114b7ee9a XUL`mozilla::dom::(anonymous namespace)::TopLevelWorkerFinishedRunnable::Run(this=0x00000001315ad4c0) at WorkerPrivate.cpp:273 [opt]
    frame #10: 0x0000000112773c9a XUL`mozilla::ThrottledEventQueue::Inner::ExecuteRunnable(this=<unavailable>) at ThrottledEventQueue.cpp:243 [opt]
    frame #11: 0x0000000112771ecd XUL`mozilla::ThrottledEventQueue::Inner::Executor::Run(this=<unavailable>) at ThrottledEventQueue.cpp:80 [opt]
    frame #12: 0x0000000112773c9a XUL`mozilla::ThrottledEventQueue::Inner::ExecuteRunnable(this=<unavailable>) at ThrottledEventQueue.cpp:243 [opt]
    frame #13: 0x0000000112771ecd XUL`mozilla::ThrottledEventQueue::Inner::Executor::Run(this=<unavailable>) at ThrottledEventQueue.cpp:80 [opt]
    frame #14: 0x0000000112758e6f XUL`mozilla::SchedulerGroup::Runnable::Run(this=0x000000010df6c080) at SchedulerGroup.cpp:295 [opt]
    frame #15: 0x0000000115ee723f XUL`mozilla::tasktracer::TracedRunnable::Run(this=0x00000001437de060) at TracedTaskCommon.cpp:101 [opt]
    frame #16: 0x0000000112769509 XUL`nsThread::ProcessNextEvent(this=0x000000010af62b80, aMayWait=<unavailable>, aResult=0x00007fffffffffff) at nsThread.cpp:1180 [opt]
    frame #17: 0x000000011276bba6 XUL`NS_ProcessNextEvent(aThread=<unavailable>, aMayWait=false) at nsThreadUtils.cpp:486 [opt]
    frame #18: 0x0000000112ca08d6 XUL`mozilla::ipc::MessagePump::Run(this=0x000000010af6b5b0, aDelegate=0x00007fff55210de8) at MessagePump.cpp:88 [opt]
    frame #19: 0x0000000112c5ee69 XUL`MessageLoop::Run() [inlined] MessageLoop::RunInternal() at message_loop.cc:315 [opt]
    frame #20: 0x0000000112c5ee5a XUL`MessageLoop::Run() [inlined] MessageLoop::RunHandler() at message_loop.cc:308 [opt]
    frame #21: 0x0000000112c5ee5a XUL`MessageLoop::Run(this=<unavailable>) at message_loop.cc:290 [opt]
    frame #22: 0x0000000114dc5349 XUL`nsBaseAppShell::Run(this=0x000000010c1d71f0) at nsBaseAppShell.cpp:137 [opt]
    frame #23: 0x0000000114e30aeb XUL`nsAppShell::Run(this=0x000000010c1d71f0) at nsAppShell.mm:704 [opt]
    frame #24: 0x00000001161b9e18 XUL`XRE_RunAppShell() at nsEmbedFunctions.cpp:919 [opt]
    frame #25: 0x0000000112c5ee69 XUL`MessageLoop::Run() [inlined] MessageLoop::RunInternal() at message_loop.cc:315 [opt]
    frame #26: 0x0000000112c5ee5a XUL`MessageLoop::Run() [inlined] MessageLoop::RunHandler() at message_loop.cc:308 [opt]
    frame #27: 0x0000000112c5ee5a XUL`MessageLoop::Run(this=<unavailable>) at message_loop.cc:290 [opt]
    frame #28: 0x00000001161b9bc5 XUL`XRE_InitChildProcess(aArgc=<unavailable>, aArgv=<unavailable>, aChildData=<unavailable>) at nsEmbedFunctions.cpp:757 [opt]
    frame #29: 0x000000010a9eef17 plugin-container`main [inlined] content_process_main(bootstrap=0x000000010af041b0, argc=<unavailable>, argv=0x00007fff55211070) at plugin-container.cpp:56 [opt]
    frame #30: 0x000000010a9eeeeb plugin-container`main(argc=<unavailable>, argv=0x00007fff55211070) at MozillaRuntimeMain.cpp:23 [opt]
    frame #31: 0x00007fff8b30b235 libdyld.dylib`start + 1
Flags: needinfo?(bugs)
Group: core-security → dom-core-security

Tracking as this is a 67 regression and a potential security issue. Olli do you think that an upliftable fix is doable in the 67 timeframe? Thanks

This involves workers. Yaron, maybe you could take a look? Thanks.

Flags: needinfo?(ytausky)

I'll try to find time still this week to look at his.

control runnables shouldn't use DispatchToMainThreadForMessaging

Flags: needinfo?(ytausky)

So there is task tracer stuff on stack, and that is known to be broken
https://bugzilla.mozilla.org/show_bug.cgi?id=1524594#c0

mstange, you mentioned that you can reproduce with some other testcase too? Or did I misunderstand.

Flags: needinfo?(bugs) → needinfo?(mstange)

The other testcase is opening and closing the devtools. It crashes in the same stack. That's also in a build with --enable-tasktracer.

But yeah, if prioritization is broken, then that would explain it. But independently from that, I question whether we should rely on event loop ordering to prevent use-after-free bugs.

Flags: needinfo?(mstange)

(In reply to Markus Stange [:mstange] from comment #5)

But yeah, if prioritization is broken, then that would explain it. But independently from that, I question whether we should rely on event loop ordering to prevent use-after-free bugs.

I think there's consensus that we should not. However, cleaning up the workers implementation is somewhat slow going. (WorkerRefs are replacing busy-count type things and WorkerHolders and otherwise attempting to eliminate use of bare WorkerPrivate pointer references.) That said, it's likely that we would intentionally trigger a fatal crash in cases like this where fundamental invariants are violated.

As I understand the situation here, the TopLevelWorkerFinishedRunnable is getting dispatched to the main thread via DispatchToMainThreadForMessaging[1] while the ModifyBusyCountRunnable is a WorkerControlRunnable which uses the normal DispatchToMainThread[2]. The intent of the choice of thread for the TopLevelWorkerFinishedRunnable is that the normal mMainThreadEventTarget will be drained in its entirety in preference to mMainThreadEventTargetForMessaging.

And that's indeed what the logic at https://searchfox.org/mozilla-central/rev/444ee13e14fe30451651c0f62b3979c76766ada4/xpcom/threads/PrioritizedEventQueue.cpp#108 does. (We'd have a problem if we are using the 'high' priority queue which has support for letting the other queues go every other time.) But it seems task tracer fails to implement nsIRunnablePriority so when it wraps a runnable it breaks the logic in ThreadEventQueue::PutEventInternal[3] that depends on priorities. I assume this is what Olli meant at https://bugzilla.mozilla.org/show_bug.cgi?id=1524594#c0. (I assume this also means that task tracer traces should only be used for causality analyses.)

I'm marking this invalid as this appears indeed to solely be an issue induced by task-tracer and I don't believe that we want busy-count-modifications to take the slow path because of their potential to stack up.

1: https://searchfox.org/mozilla-central/rev/444ee13e14fe30451651c0f62b3979c76766ada4/dom/workers/WorkerPrivate.cpp#3187
2: https://searchfox.org/mozilla-central/rev/444ee13e14fe30451651c0f62b3979c76766ada4/dom/workers/WorkerRunnable.cpp#520
3: https://searchfox.org/mozilla-central/rev/444ee13e14fe30451651c0f62b3979c76766ada4/xpcom/threads/ThreadEventQueue.cpp#83

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID
Group: dom-core-security
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.