Closed Bug 1798773 Opened 2 years ago Closed 2 years ago

Assertion failure: data (Cycle collected object used on a thread without a cycle collector.), at /xpcom/base/nsCycleCollector.cpp:3799

Categories

(Core :: DOM: File, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
109 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox107 --- disabled
firefox108 --- disabled
firefox109 --- fixed

People

(Reporter: jkratzer, Assigned: janv)

References

(Blocks 2 open bugs)

Details

(Keywords: sec-moderate, testcase, Whiteboard: [post-critsmash-triage])

Attachments

(4 files, 3 obsolete files)

Testcase found while fuzzing mozilla-central rev 2db9822e6dd3 (built with: --enable-address-sanitizer --enable-fuzzing).

Note, the testcase is not very reliable and may require several attempts before triggering the issue.

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 2db9822e6dd3 --asan --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 10 --relaunch 1
Assertion failure: data (Cycle collected object used on a thread without a cycle collector.), at /xpcom/base/nsCycleCollector.cpp:3799

    =================================================================
    ==169089==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7fce45b7997e bp 0x7fce2f6baed0 sp 0x7fce2f6bae00 T20)
    ==169089==The signal is caused by a WRITE memory access.
    ==169089==Hint: address points to the zero page.
        #0 0x7fce45b7997e in NS_CycleCollectorSuspect3 /xpcom/base/nsCycleCollector.cpp:3797:3
        #1 0x7fce4bab5704 in incr<&NS_CycleCollectorSuspect3> /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:248:7
        #2 0x7fce4bab5704 in incr<&NS_CycleCollectorSuspect3> /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:234:12
        #3 0x7fce4bab5704 in mozilla::DOMEventTargetHelper::AddRef() /dom/events/DOMEventTargetHelper.cpp:81:1
        #4 0x7fce4e3fbf4f in nsCOMPtr /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:525:7
        #5 0x7fce4e3fbf4f in ScriptSettingsStackEntry /dom/script/ScriptSettings.cpp:138:7
        #6 0x7fce4e3fbf4f in mozilla::dom::AutoJSAPI::AutoJSAPI(nsIGlobalObject*, bool, mozilla::dom::ScriptSettingsStackEntry::Type) /dom/script/ScriptSettings.cpp:391:7
        #7 0x7fce4e3fbbee in mozilla::dom::AutoEntryScript::AutoEntryScript(nsIGlobalObject*, char const*, bool) /dom/script/AutoEntryScript.cpp:66:7
        #8 0x7fce45f5929a in MaybeSomething<mozilla::ErrorResult> /builds/worker/workspace/obj-build/dist/include/mozilla/dom/Promise.h:407:21
        #9 0x7fce45f5929a in mozilla::dom::Promise::MaybeReject(mozilla::ErrorResult&&) /builds/worker/workspace/obj-build/dist/include/mozilla/dom/Promise.h:106:5
        #10 0x7fce4bd523c8 in MaybeRejectWithUnknownError /builds/worker/workspace/obj-build/dist/include/mozilla/dom/DOMExceptionNames.h:45:1
        #11 0x7fce4bd523c8 in MaybeRejectWithUnknownError<23> /builds/worker/workspace/obj-build/dist/include/mozilla/dom/DOMExceptionNames.h:45:1
        #12 0x7fce4bd523c8 in operator()<nsresult> /dom/fs/child/FileSystemRequestHandler.cpp:341:18
        #13 0x7fce4bd523c8 in std::_Function_handler<void (nsresult), mozilla::dom::fs::FileSystemRequestHandler::GetRootHandle(RefPtr<mozilla::dom::FileSystemManager>, RefPtr<mozilla::dom::Promise>, mozilla::ErrorResult&)::$_3>::_M_invoke(std::_Any_data const&, nsresult&&) /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2
        #14 0x7fce4bd39f60 in std::function<void (nsresult)>::operator()(nsresult) const /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
        #15 0x7fce4bd3c90e in operator() /dom/fs/api/FileSystemManager.cpp:111:15
        #16 0x7fce4bd3c90e in InvokeMethod<(lambda at /dom/fs/api/FileSystemManager.cpp:102:11), void ((lambda at /dom/fs/api/FileSystemManager.cpp:102:11)::*)(const mozilla::MozPromise<bool, nsresult, false>::ResolveOrRejectValue &) const, const mozilla::MozPromise<bool, nsresult, false>::ResolveOrRejectValue &> /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:630:12
        #17 0x7fce4bd3c90e in InvokeCallbackMethod<false, (lambda at /dom/fs/api/FileSystemManager.cpp:102:11), void ((lambda at /dom/fs/api/FileSystemManager.cpp:102:11)::*)(const mozilla::MozPromise<bool, nsresult, false>::ResolveOrRejectValue &) const, const mozilla::MozPromise<bool, nsresult, false>::ResolveOrRejectValue &, RefPtr<mozilla::MozPromise<bool, nsresult, false>::Private> > /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:661:5
        #18 0x7fce4bd3c90e in mozilla::MozPromise<bool, nsresult, false>::ThenValue<mozilla::dom::FileSystemManager::BeginRequest(std::function<void (RefPtr<mozilla::dom::FileSystemManagerChild> const&)>&&, std::function<void (nsresult)>&&)::$_2>::DoResolveOrRejectInternal(mozilla::MozPromise<bool, nsresult, false>::ResolveOrRejectValue&) /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:914:7
        #19 0x7fce46cbd60e in mozilla::MozPromise<bool, nsresult, false>::ThenValueBase::ResolveOrRejectRunnable::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:487:21
        #20 0x7fce4dfc6231 in mozilla::dom::(anonymous namespace)::ExternalRunnableWrapper::Cancel() /dom/workers/WorkerPrivate.cpp:221:13
        #21 0x7fce4dfb4d6f in mozilla::dom::WorkerRunnable::Run() /dom/workers/WorkerRunnable.cpp:247:5
        #22 0x7fce45d5cb8e in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1198:16
        #23 0x7fce45d553f7 in NS_ProcessPendingEvents(nsIThread*, unsigned int) /xpcom/threads/nsThreadUtils.cpp:430:19
        #24 0x7fce4dfa206a in mozilla::dom::WorkerPrivate::ClearMainEventQueue(mozilla::dom::WorkerPrivate::WorkerRanOrNot) /dom/workers/WorkerPrivate.cpp:3831:5
        #25 0x7fce4df8ef54 in mozilla::dom::WorkerPrivate::ScheduleDeletion(mozilla::dom::WorkerPrivate::WorkerRanOrNot) /dom/workers/WorkerPrivate.cpp:3650:3
        #26 0x7fce4df741bc in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /dom/workers/RuntimeService.cpp:2116:19
        #27 0x7fce45d5cb8e in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1198:16
        #28 0x7fce45d66e14 in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:465:10
        #29 0x7fce4751e895 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:300:20
        #30 0x7fce47399ed1 in RunInternal /ipc/chromium/src/base/message_loop.cc:381:10
        #31 0x7fce47399ed1 in RunHandler /ipc/chromium/src/base/message_loop.cc:374:3
        #32 0x7fce47399ed1 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:356:3
        #33 0x7fce45d53ce8 in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:383:10
        #34 0x7fce6dcd8b7e in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
        #35 0x7fce6e49db42 in start_thread nptl/./nptl/pthread_create.c:442:8
        #36 0x7fce6e52f9ff  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
    
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV /xpcom/base/nsCycleCollector.cpp:3797:3 in NS_CycleCollectorSuspect3
    Thread T20 created by T0 (Isolated Web Co) here:
        #0 0x5607559f049c in __interceptor_pthread_create /builds/worker/fetches/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:208:3
        #1 0x7fce6dcc8c2c in _PR_CreateThread /nsprpub/pr/src/pthreads/ptthread.c:458:14
        #2 0x7fce6dcb9fce in PR_CreateThread /nsprpub/pr/src/pthreads/ptthread.c:533:12
        #3 0x7fce45d56c55 in nsThread::Init(nsTSubstring<char> const&) /xpcom/threads/nsThread.cpp:617:18
        #4 0x7fce4dfc3aaa in mozilla::dom::WorkerThread::Create(mozilla::dom::WorkerThreadFriendKey const&) /dom/workers/WorkerThread.cpp:102:7
        #5 0x7fce4df4df65 in mozilla::dom::workerinternals::RuntimeService::ScheduleWorker(mozilla::dom::WorkerPrivate&) /dom/workers/RuntimeService.cpp:1323:37
        #6 0x7fce4df4cfeb in mozilla::dom::workerinternals::RuntimeService::RegisterWorker(mozilla::dom::WorkerPrivate&) /dom/workers/RuntimeService.cpp:1205:19
        #7 0x7fce4df977a7 in mozilla::dom::WorkerPrivate::Constructor(JSContext*, nsTSubstring<char16_t> const&, bool, mozilla::dom::WorkerKind, nsTSubstring<char16_t> const&, nsTSubstring<char> const&, mozilla::dom::WorkerLoadInfo*, mozilla::ErrorResult&, nsTString<char16_t>) /dom/workers/WorkerPrivate.cpp:2588:24
        #8 0x7fce4df5df65 in mozilla::dom::Worker::Constructor(mozilla::dom::GlobalObject const&, nsTSubstring<char16_t> const&, mozilla::dom::WorkerOptions const&, mozilla::ErrorResult&) /dom/workers/Worker.cpp:43:41
        #9 0x7fce4a8d41d4 in mozilla::dom::Worker_Binding::_constructor(JSContext*, unsigned int, JS::Value*) /builds/worker/workspace/obj-build/dom/bindings/WorkerBinding.cpp:1107:52
        #10 0x7fce559891c7 in CallJSNative /js/src/vm/Interpreter.cpp:459:13
        #11 0x7fce559891c7 in CallJSNativeConstructor /js/src/vm/Interpreter.cpp:475:8
        #12 0x7fce559891c7 in InternalConstruct(JSContext*, js::AnyConstructArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:694:10
        #13 0x7fce559751ac in Interpret(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:3360:16
        #14 0x7fce5595a77e in js::RunScript(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:431:13
        #15 0x7fce559869f5 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:579:13
        #16 0x7fce5598849e in InternalCall /js/src/vm/Interpreter.cpp:614:10
        #17 0x7fce5598849e in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) /js/src/vm/Interpreter.cpp:646:8
        #18 0x7fce53f852c5 in JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) /js/src/vm/CallAndConstruct.cpp:117:10
        #19 0x7fce4ac930d9 in mozilla::dom::EventListener::HandleEvent(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::Event&, mozilla::ErrorResult&) /builds/worker/workspace/obj-build/dom/bindings/EventListenerBinding.cpp:62:8
        #20 0x7fce4bb36ef4 in void mozilla::dom::EventListener::HandleEvent<mozilla::dom::EventTarget*>(mozilla::dom::EventTarget* const&, mozilla::dom::Event&, mozilla::ErrorResult&, char const*, mozilla::dom::CallbackObject::ExceptionHandling, JS::Realm*) /builds/worker/workspace/obj-build/dist/include/mozilla/dom/EventListenerBinding.h:65:12
        #21 0x7fce4bb369b0 in mozilla::EventListenerManager::HandleEventSubType(mozilla::EventListenerManager::Listener*, mozilla::dom::Event*, mozilla::dom::EventTarget*) /dom/events/EventListenerManager.cpp:1310:43
        #22 0x7fce4bb37f6b in mozilla::EventListenerManager::HandleEventInternal(nsPresContext*, mozilla::WidgetEvent*, mozilla::dom::Event**, mozilla::dom::EventTarget*, nsEventStatus*, bool) /dom/events/EventListenerManager.cpp:1506:17
        #23 0x7fce4bb260ee in mozilla::EventTargetChainItem::HandleEvent(mozilla::EventChainPostVisitor&, mozilla::ELMCreationDetector&) /dom/events/EventDispatcher.cpp:348:17
        #24 0x7fce4bb24951 in mozilla::EventTargetChainItem::HandleEventTargetChain(nsTArray<mozilla::EventTargetChainItem>&, mozilla::EventChainPostVisitor&, mozilla::EventDispatchingCallback*, mozilla::ELMCreationDetector&) /dom/events/EventDispatcher.cpp:550:16
        #25 0x7fce4bb28b35 in mozilla::EventDispatcher::Dispatch(nsISupports*, nsPresContext*, mozilla::WidgetEvent*, mozilla::dom::Event*, nsEventStatus*, mozilla::EventDispatchingCallback*, nsTArray<mozilla::dom::EventTarget*>*) /dom/events/EventDispatcher.cpp:1119:11
        #26 0x7fce4bb2e4b1 in mozilla::EventDispatcher::DispatchDOMEvent(nsISupports*, mozilla::WidgetEvent*, mozilla::dom::Event*, nsPresContext*, nsEventStatus*) /dom/events/EventDispatcher.cpp
        #27 0x7fce493c37f4 in nsINode::DispatchEvent(mozilla::dom::Event&, mozilla::dom::CallerType, mozilla::ErrorResult&) /dom/base/nsINode.cpp:1373:17
        #28 0x7fce48cb670f in nsContentUtils::DispatchEvent(mozilla::dom::Document*, nsISupports*, nsTSubstring<char16_t> const&, mozilla::CanBubble, mozilla::Cancelable, mozilla::Composed, mozilla::Trusted, bool*, mozilla::ChromeOnlyDispatch) /dom/base/nsContentUtils.cpp:4500:28
        #29 0x7fce48cb63d6 in nsContentUtils::DispatchTrustedEvent(mozilla::dom::Document*, nsISupports*, nsTSubstring<char16_t> const&, mozilla::CanBubble, mozilla::Cancelable, mozilla::Composed, bool*) /dom/base/nsContentUtils.cpp:4470:10
        #30 0x7fce4900ce33 in mozilla::dom::Document::DispatchContentLoadedEvents() /dom/base/Document.cpp:7843:3
        #31 0x7fce4910833d in applyImpl<mozilla::dom::Document, void (mozilla::dom::Document::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1147:12
        #32 0x7fce4910833d in apply<mozilla::dom::Document, void (mozilla::dom::Document::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1153:12
        #33 0x7fce4910833d in mozilla::detail::RunnableMethodImpl<mozilla::dom::Document*, void (mozilla::dom::Document::*)(), true, (mozilla::RunnableKind)0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1200:13
        #34 0x7fce45d1ef5f in mozilla::SchedulerGroup::Runnable::Run() /xpcom/threads/SchedulerGroup.cpp:140:20
        #35 0x7fce45d32b62 in mozilla::RunnableTask::Run() /xpcom/threads/TaskController.cpp:538:16
        #36 0x7fce45d29ac7 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:851:26
        #37 0x7fce45d26d58 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:683:15
        #38 0x7fce45d27480 in mozilla::TaskController::ProcessPendingMTTask(bool) /xpcom/threads/TaskController.cpp:461:36
        #39 0x7fce45d39111 in operator() /xpcom/threads/TaskController.cpp:187:37
        #40 0x7fce45d39111 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_2>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:531:5
        #41 0x7fce45d5c368 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1204:16
        #42 0x7fce45d66e14 in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:465:10
        #43 0x7fce4751d1ef in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:85:21
        #44 0x7fce47399ed1 in RunInternal /ipc/chromium/src/base/message_loop.cc:381:10
        #45 0x7fce47399ed1 in RunHandler /ipc/chromium/src/base/message_loop.cc:374:3
        #46 0x7fce47399ed1 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:356:3
        #47 0x7fce4e89e1d7 in nsBaseAppShell::Run() /widget/nsBaseAppShell.cpp:150:27
        #48 0x7fce53aaec37 in XRE_RunAppShell() /toolkit/xre/nsEmbedFunctions.cpp:884:20
        #49 0x7fce47399ed1 in RunInternal /ipc/chromium/src/base/message_loop.cc:381:10
        #50 0x7fce47399ed1 in RunHandler /ipc/chromium/src/base/message_loop.cc:374:3
        #51 0x7fce47399ed1 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:356:3
        #52 0x7fce53aadd23 in XRE_InitChildProcess(int, char**, XREChildData const*) /toolkit/xre/nsEmbedFunctions.cpp:743:34
        #53 0x560755a446b5 in content_process_main(mozilla::Bootstrap*, int, char**) /browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
        #54 0x560755a44b07 in main /browser/app/nsBrowserApp.cpp:357:18
        #55 0x7fce6e432d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    
    ==169089==ABORTING
Attached file Testcase (obsolete) —
Severity: -- → S3
Priority: -- → P2

So here we are actually on a worker thread which has a cycle collector, but at the point we are in here it has been already shut down.

My assumption is that during GC/CC some other event (here for a promise) is scheduled but executed only after shutting down the cycle collector.

This looks somehow related to D150942 (assuming that ensuring the GC/CC to run before exiting our inner loop would have helped to execute that runnable, too).

Edit: That patch alone would not be enough, as it triggers mayNeedFinalGCCC only when we have a WorkerRef active. I assume we should just always GC/CC when we are supposed to shutdown and our event queue is empty (so removing HasActiveWorkerRefs() from that patch should do the trick).

There might be an OPFS related piece in it here that would avoid this situation, but IMHO the worker should not be tripped by such a usage.

Flags: needinfo?(bugmail)

Hi Jason, would it be possible to get a pernosco trace here? Thank you!

Flags: needinfo?(jkratzer)

This looks similar to bug 1792245. I don't know if sync XHR was involved in that bug as well.
Sync XHR usually triggers nasty edge cases.

It would be interesting to know if mShutdown is true or false at this line

(In reply to Jens Stutte [:jstutte] from comment #2)

My assumption is that during GC/CC some other event (here for a promise) is scheduled but executed only after shutting down the cycle collector.

Talked with Olli, it might just turn out that we keep mCreateFileSystemManagerChildPromiseRequestHolder longer than expected.

There might be an OPFS related piece in it here that would avoid this situation, but IMHO the worker should not be tripped by such a usage.

(In reply to Jan Varga [:janv] from comment #5)

It would be interesting to know if mShutdown is true or false at this line

Yes, and even if it was not in shutdown there, if we ever called FileSystemManager::Shutdown actually.

Flags: needinfo?(bugmail)

Yeah, I suspect that there's a bug in worker shutdown code and FileSystemManager::Shutdown is not called in some situations.

I wondered on Matrix if we end up invalidating WorkerNavigator and after that still manage to access StorageManager from it.

Attached file testcase.html
Attachment #9301626 - Attachment is obsolete: true

The test case should be the same. I just made it a single file.

(In reply to Tyson Smith [:tsmith] from comment #13)

The test case should be the same. I just made it a single file.

It is indeed, sorry for the noise.

I looked a bit into the pernosco session and I wondered if in general we should try to handle a MessageEventRunnable when a worker is already set into Canceling state. We currently check this only if ParentThreadUnchangedBusyCount and that is not the case here as we arrive on the target worker thread.
So IIUC we are handling the onmessage even though we are Cancelingand that makes all the JS inside the onmessage block run as one microtask inside the same single runnable, with all that can happen from that.

In a nutshell: after we do "postMessage to worker" (happening before we cancel) there seems to be a general race possible between "cancel the worker" and "handle the onmessage on the worker". And when the Canceling wins, the onmessage handler will do things that are not wanted/expected anymore.

I did some try pushes like this, and though there are quite some (known) intermittents I was not able to pinpoint one where the new canceling was actually triggered.

Unfortunately I was not able to trigger the testcase locally. Tyson, would you mind doing a test with the builds from this try?

In case it helps I am far from sure if

  1. we should actually do what the try push proposes and if Cancel() the MessageEventRunnable is the right thing to do, though it might seem wanted that we avoid any worker's Javascript execution once canceled
  2. if there is something bogus going on with the FileSystemManager handling of promises itself that might bite us in other occasions
Flags: needinfo?(twsmith)

(In reply to Jan Varga [:janv] from comment #7)

Yeah, I suspect that there's a bug in worker shutdown code and FileSystemManager::Shutdown is not called in some situations.

IIUC from the pernosco session, the order of things makes it happen that we do not yet have a StorageManager and thus a FileSystemManager we could shut down while WorkerNavigator::Invalidate() runs, but then we want to create one.

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #8)

I wondered on Matrix if we end up invalidating WorkerNavigator and after that still manage to access StorageManager from it.

As above, I wonder if we should check for Canceling inside WorkerNavigator::Storage before creating a new one. And maybe even each WEBIDL implementing function of the StorageManager should check for the worker state, too?

Flags: needinfo?(smaug)
Flags: needinfo?(jvarga)

If a worker is canceled right after issuing a postMessage there is a possible race between the onmessage handling and the NotifyRunnable on the worker thread. If the NotifyRunnable wins, the execution of the onmessage block is pointless, unexpected and in some cases even dangerous as it might initialize new JS objects that are normally uninitialized when the worker is notified to cancel. If there are additional dispatches involved (like in this bug), this initialization might be executed even after the worker has completely torn down the cycle collector.

To follow the specs, WorkerNavigator should keep the storagemanager alive as long as JS can access it. And if it is accessed first time very late, perhaps Shutdown() should be called immediately. That hopefully makes the manager effectively no-op (at least it should make it such).

Flags: needinfo?(smaug)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #18)

To follow the specs, WorkerNavigator should keep the storagemanager alive as long as JS can access it. And if it is accessed first time very late, perhaps Shutdown() should be called immediately. That hopefully makes the manager effectively no-op (at least it should make it such).

But the StorageManager is not alive when the worker is asked to cancel, it gets initialized only during the (late) onmessage handling, so WorkerNavigator cannot keep it alive? We could have a check inside WorkerNavigator::Storage if the worker is canceling, not sure what the callers will think of a nullptr there, though.

Assuming that we see the "terminate" case of the spec here I read there:

  1. If there are any tasks queued in the WorkerGlobalScope object's relevant agent's event loop's task queues, discard them without processing them.
  2. Abort the script currently running in the worker.

my interpretation would be, that the entire onmessage should not be run anymore.

There might be a question here, if the order between

  worker.postMessage([], [])
  setTimeout(() => {window.location.reload(true)})

should be always respected in executing the events on the worker thread, but IIUC cancel() can be issued anytime asynchronously, in general.

Can't we get the state to Canceling when we're handling that sync XHR? And JS can execute after sync XHR call.

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #20)

Can't we get the state to Canceling when we're handling that sync XHR? And JS can execute after sync XHR call.

Yeah, the proposed patch might not be enough for all edge cases.

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #20)

Can't we get the state to Canceling when we're handling that sync XHR? And JS can execute after sync XHR call.

So looking a bit at sync loops, it seems they already have a notion of when they should not run anymore but IIUC this seems not to have consequences on WorkerPrivate::RunCurrentSyncLoop but only on their start ?

Flags: needinfo?(bugmail)
Attached file try_debug_99c33fac.txt

Here are the results from running test case with the try builds.

Flags: needinfo?(twsmith)

I believe this is a variation of bug 1750870 (see https://bugzilla.mozilla.org/show_bug.cgi?id=1750870#c17 onwards).

The tl;dr of that is: The worker cancellation wants to throw an uncatachable error that unwinds the entire content stack, but we don't have a mechanism that does that. In that bug I have an outstanding action item to do a lot of work related to our WebIDL bindings to support this, I believe.

In terms of FileSystemManager.h and its mCreateFileSystemManagerChildPromiseRequestHolder (mentioned in comment 6) that in manually disconnects in its Shutdown method it's worth noting that DOMMozPromiseRequestHolder generally exists to automatically do this by masquerading as a DOMEventTargetHelper Arguably we might want to augment DOMMozPromiseRequestHolder to have a defense-in-depth sort of check of nsIGlobalObject::IsDying at its point of creation so its correctness does not depend on the DOMEventTargetHelper falling edge but also handles the global already being dying.

Group: core-security
Depends on: 1750870
Flags: needinfo?(bugmail)

Re: the patch, I don't think https://phabricator.services.mozilla.com/D161987 will meaningfully help with this situation we're seeing in this bug. The call to DispatchDOMEvent that we skip making will already be precluded by existing checks against nsIGlobalObject by CallbackObject.h.

Group: core-security → dom-core-security

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #25)

I believe this is a variation of bug 1750870 (see https://bugzilla.mozilla.org/show_bug.cgi?id=1750870#c17 onwards).

The tl;dr of that is: The worker cancellation wants to throw an uncatachable error that unwinds the entire content stack, but we don't have a mechanism that does that. In that bug I have an outstanding action item to do a lot of work related to our WebIDL bindings to support this, I believe.

So Olli was right in saying that there is more to this, but such a huge amount of things comes a bit unexpected.

In terms of FileSystemManager.h and its mCreateFileSystemManagerChildPromiseRequestHolder (mentioned in comment 6) that in manually disconnects in its Shutdown method it's worth noting that DOMMozPromiseRequestHolder generally exists to automatically do this by masquerading as a DOMEventTargetHelper Arguably we might want to augment DOMMozPromiseRequestHolder to have a defense-in-depth sort of check of nsIGlobalObject::IsDying at its point of creation so its correctness does not depend on the DOMEventTargetHelper falling edge but also handles the global already being dying.

Is this a doable, independent smaller thing that would help us in getting OPFS out of the door?

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #26)

Re: the patch, I don't think https://phabricator.services.mozilla.com/D161987 will meaningfully help with this situation we're seeing in this bug. The call to DispatchDOMEvent that we skip making will already be precluded by existing checks against nsIGlobalObject by CallbackObject.h.

At least in the original pernosco session I examined, the DispatchDOMEvent is executed entirely and synchronously on the same stack we received the MessageEventRunnable on, so I am not sure I am following here, or does this refer to a future situation after we addressed bug 1750870 comment 17 onwards?

My understanding of the worker shutdown is still partial, but I have the gut feeling that the NotifyInternal(Killing); should not happen when there are runnables pending (those runnables will/should most probably just handle the various rejections caused by Canceling), as it will precede any pending runnables, leaving them around too long in the thread queue.

I am not saying that I believe, this would be enough to help here, but maybe it is worth a try.

(In reply to Tyson Smith [:tsmith] from comment #24)

Created attachment 9303333 [details]
try_debug_99c33fac.txt

Here are the results from running test case with the try builds.

Thanks a ton!

We assert !globalScopeAlive here, which just confirms that we keep references to the global (probably as part of some runnable in the queue?).

In any case IIUC I see a direct path from:

WorkerPrivate::NotifyInternal(Canceling)
-> WorkerGlobalScope::NoteTerminating()
-> StorageManager::Shutdown()

that will dispatch runnables while executing WorkerPrivate::ProcessAllControlRunnablesLocked() such that we would find normalRunnablesPending true when we do NotifyInteral(Killing);.

Assignee: nobody → jstutte
Status: NEW → ASSIGNED
Attachment #9303269 - Attachment is obsolete: true

Jan is working on an OPFS specific fix for this crash.

Assignee: jstutte → jvarga
Flags: needinfo?(jvarga)
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Attachment #9303512 - Attachment is obsolete: true
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: