Hit MOZ_CRASH(Promise not thread-safe) at /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:41
Categories
(Core :: DOM: Workers, defect, P2)
Tracking
()
People
(Reporter: tsmith, Assigned: asuth)
References
(Blocks 1 open bug)
Details
(6 keywords, Whiteboard: [bugmon:bisected,confirmed][adv-main85+r][adv-esr78.7+r])
Attachments
(4 files)
I'm not sure if the belongs in this competent or a worker competent.
Hit MOZ_CRASH(Promise not thread-safe) at /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:41
#0 0x7f42e9f6a48a in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:254:3
#1 0x7f42e9f6a48a in nsAutoOwningThread::AssertCurrentThreadOwnsMe(char const*) const src/xpcom/base/nsISupportsImpl.cpp:41:5
#2 0x7f42e9f50e58 in AssertOwnership<24> src/xpcom/base/nsISupportsImpl.h:60:5
#3 0x7f42e9f50e58 in mozilla::dom::Promise::Release() /builds/worker/workspace/obj-build/dist/include/mozilla/dom/Promise.h:44:3
#4 0x7f42ebb3a5bd in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:50:40
#5 0x7f42ebb3a5bd in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:381:36
#6 0x7f42ebb3a5bd in ~RefPtr /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:81:7
#7 0x7f42ebb3a5bd in mozilla::dom::BodyConsumer::~BodyConsumer() src/dom/base/BodyConsumer.cpp:373:29
#8 0x7f42ebb3ea9b in mozilla::dom::BodyConsumer::Release() src/dom/base/BodyConsumer.cpp:812:1
#9 0x7f42ebb52f2b in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:50:40
#10 0x7f42ebb52f2b in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:381:36
#11 0x7f42ebb52f2b in ~RefPtr /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:81:7
#12 0x7f42ebb52f2b in ~ src/dom/base/BodyConsumer.cpp:775:9
#13 0x7f42ebb52f2b in ~RunnableFunction /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:568:7
#14 0x7f42ebb52f2b in mozilla::detail::RunnableFunction<mozilla::dom::BodyConsumer::ShutDownMainThreadConsuming()::$_3>::~RunnableFunction() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:568:7
#15 0x7f42ea0376f7 in mozilla::Runnable::Release() src/xpcom/threads/nsThreadUtils.cpp:68:1
#16 0x7f42ea058392 in assign_assuming_AddRef /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:428:7
#17 0x7f42ea058392 in operator= /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:697:5
#18 0x7f42ea058392 in mozilla::ThrottledEventQueue::Inner::ExecuteRunnable() src/xpcom/threads/ThrottledEventQueue.cpp:257:11
#19 0x7f42ea0538c1 in mozilla::ThrottledEventQueue::Inner::Executor::Run() src/xpcom/threads/ThrottledEventQueue.cpp:81:15
#20 0x7f42ea02e5ef in mozilla::RunnableTask::Run() src/xpcom/threads/TaskController.cpp:450:16
#21 0x7f42ea02cc5a in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:720:26
#22 0x7f42ea02bd04 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:579:15
#23 0x7f42ea02beb7 in mozilla::TaskController::ProcessPendingMTTask(bool) src/xpcom/threads/TaskController.cpp:373:36
#24 0x7f42ea031e46 in operator() src/xpcom/threads/TaskController.cpp:120:37
#25 0x7f42ea031e46 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_3>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:577:5
#26 0x7f42ea0433c7 in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1197:14
#27 0x7f42ea04910a in NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:513:10
#28 0x7f42ea9346c6 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:87:21
#29 0x7f42ea8a65b3 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:334:10
#30 0x7f42ea8a64cd in RunHandler src/ipc/chromium/src/base/message_loop.cc:327:3
#31 0x7f42ea8a64cd in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:309:3
#32 0x7f42ee59bde8 in nsBaseAppShell::Run() src/widget/nsBaseAppShell.cpp:137:27
#33 0x7f42efda12e3 in XRE_RunAppShell() src/toolkit/xre/nsEmbedFunctions.cpp:913:20
#34 0x7f42ea935489 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:237:9
#35 0x7f42ea8a65b3 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:334:10
#36 0x7f42ea8a64cd in RunHandler src/ipc/chromium/src/base/message_loop.cc:327:3
#37 0x7f42ea8a64cd in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:309:3
#38 0x7f42efda0ec8 in XRE_InitChildProcess(int, char**, XREChildData const*) src/toolkit/xre/nsEmbedFunctions.cpp:744:34
#39 0x5561aeb1b687 in content_process_main src/browser/app/../../ipc/contentproc/plugin-container.cpp:56:28
#40 0x5561aeb1b687 in main src/browser/app/nsBrowserApp.cpp:304:18
Comment 1•2 years ago
|
||
I this possibly a recent regression?
Assignee | ||
Comment 2•2 years ago
|
||
Is it possible to get a pernosco repro of this? This looks very interesting, and by interesting I mean complex!
Comment 3•2 years ago
|
||
Bugmon Analysis:
Verified bug as reproducible on mozilla-central 20201030160944-a7b7d089d5c3.
Failed to bisect testcase (Start build crashes!):
Start: e8b7c48d4e7ed1b63aeedff379b51e566ea499d9 (20191107015224)
End: d54d820d6a8d24a86d6e03d976280182cbd7758d (20201029040710)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=False, coverage=False, valgrind=False)
Comment 4•2 years ago
|
||
I think it's unlikely this is a recent regression. I haven't landed the patch in bug 1660555 that alters BodyConsumer
functionality, which might or might not actually be landable now, out of caution over a leak if the entire stack is landed. The existing code is somewhat blasé about how AbortFollower
deals with threads (and therefore how the things it keeps alive deal with threads). The patches in bug 1660555 that I have landed so far, do not, I believe, change how anything gets threaded around.
The patches in bug 1660555 (if they didn't have that final leak) would at end of day likely resolve this, by forcing followers to be single-threaded. But that full stack of patches would be...unpleasant...to backport anywhere.
A non-failing bisection would definitely help here. If this predates bug 1660555, it is probably not recent.
Reporter | ||
Comment 5•2 years ago
•
|
||
A Pernosco session is available here: https://pernos.co/debug/b4v-gSp-UbVFk0RngzSVqQ/index.html
Updated•2 years ago
|
Assignee | ||
Comment 7•2 years ago
|
||
Thank you for the pernosco trace!
Despite the presence of self.close() in the repro which is a source of other bugs due to premature main-event-queue-clearing, it doesn't appear to be (directly) that.
There's a gap in state machine handling here. Basically:
- The creation of the BodyConsumer creates mConsumePromise.
- mConsumePromise gets resolved/rejected (and nulled out) during BodyConsumer::ContinueConsumeBlobBody (on the worker thread).
- This only happens when the mConsumeBodyPump completes, (indirectly via nsStreamLoader::OnStopRequest) calling ConsumeBodyDoneObserver::OnStreamComplete.
- In this trace, neither nsStreamLoader::OnStartRequest nor nsStreamLoader::OnStopRequest ever run because the nsInputStreamPump is still in the process of AsyncWait-ing the RemoteLazyInputStream which has had to issue a StreamNeeded request via PRemoteLazyInputStream and is waiting for
StreamReady
. (The RecvStreamReady is only received after all the bad stuff in the test case happens.)- Note that there does seem to be some explicit logic in the case the BodyConsumer is producing a Blob to explicitly invoke ContinueConsumeBlobBody in the abort case.
- In this pernosco trace, the body gets canceled on the main thread by BodyConsumer::ShutDownMainThreadConsuming as a result of the WorkerRef callback triggering shutdown of the BodyConsumer.
- There is nothing holding the BodyConsumer alive at this point other than the shutdown process, so when the shutdown process runs on the main thread, all references are torn down.
It seems like the most straightforward fix here is to have the StrongWorkerRef callback null out mConsumePromise
and set mBodyConsumed
to true when invoked so as to cause both variants of ContinueConsumeBody to exit without trying to resolve/reject the promise if they end up getting called. It's guaranteed to be invoked on the worker thread and it's guaranteed that the content code is no longer allowed to run on the global (so we don't need to resolve/reject it). (And ShutDownMainThreadConsuming is also invoked in non-global-shutdown cases, specifically BodyConsumer::RunAbortAlgorithm which calls it before calling ContinueConsumeBody.)
Assignee | ||
Comment 8•2 years ago
|
||
Okay, so I think I have evolved a fix, will moz-phab it momentarily.
When reproducing using https://github.com/MozillaSecurity/ffpuppet and the provided prefs file[1] using a file scheme, however, I noticed we run into the problem that there's an unsafe use of FileCreatorHelper::CreateFile as part of a clever local-file specialization that ends up trying to use the Worker's global on the main thread, which is not okay and will fail a nsAutoOwningThread::AssertCurrentThreadOwnsMe(char const*)
check done by mozilla::DOMEventTargetHelper::AddRef()
. I disabled this fast-path when workers are involved so that doesn't happen, but it was a nice fast-path, so at the very least we'd want a follow-up to eventually be filed to add a correct fast-path.
1: Which does user_pref("security.fileuri.strict_origin_policy", false);
which is required for the sibling-pathed worker to be launched.
Assignee | ||
Comment 9•2 years ago
|
||
Updated•2 years ago
|
Updated•2 years ago
|
![]() |
||
Comment 10•2 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/0b2b633a3b2cb27f4433e8fa9f16ffaf15fe5d4f
https://hg.mozilla.org/mozilla-central/rev/0b2b633a3b2c
Comment 11•2 years ago
|
||
Bugmon Analysis:
Verified bug as fixed on rev mozilla-central 20201205093858-7ce95b6cde26.
Removing bugmon keyword as no further action possible.
Please review the bug and re-add the keyword for further analysis.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 13•2 years ago
|
||
Comment on attachment 9187960 [details]
Bug 1674278 - Improve BodyConsumer worker shutdown. r=ytausky,#dom-workers-and-storage
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's a simple, non-risky sec-moderate patch.
- User impact if declined: Probably not much, it's a sec-moderate.
- Fix Landed on Version: 85
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It's a simple patch that's been on mozilla-central for a while without any problems.
- String or UUID changes made by this patch:
Comment 14•2 years ago
|
||
Comment on attachment 9187960 [details]
Bug 1674278 - Improve BodyConsumer worker shutdown. r=ytausky,#dom-workers-and-storage
Approved for 78.7esr.
Comment 15•2 years ago
|
||
uplift |
Comment 16•2 years ago
|
||
Comment 17•2 years ago
|
||
For some reason the crashtest I wrote doesn't register as a crash on my machine, even though the process does crash. :jgraham, can you make something of this log file?
Comment 18•2 years ago
|
||
So I'm pretty sure this is only happening with asan builds. It's a bug and I'll investigate further, but it shouldn't stop us landing the patch for now and expecting it to fail on CI if there's a regression (side note: crash tests can be shared with other vendors they don't have to be mozilla-only. History suggests a crash in one engine will be a crash in another more often than you'd expect).
Assignee | ||
Comment 19•2 years ago
|
||
TSAN in bug 1686156 is concerned about the setting of mShuttingDown
at https://searchfox.org/mozilla-central/rev/014fe72eaba26dcf6082fb9bbaf208f97a38594e/dom/base/BodyConsumer.cpp#302. Setting the flag wasn't essential to the fix but seemed appropriate as we expected that the flag would otherwise never be set in this particular failure case triggering this problem. Specifically, we only needed to drop the mConsumePromise
promise and set mBodyConsumed
to true in order to ensure that the promise was freed on the right thread and the state machine would not advance further.
mShuttingDown
will end up getting set on the main thread by the call to ShutDownMainThreadConsuming()
so this was really just an extra level of paranoia.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 20•1 year ago
|
||
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/364cf5dc03d6 Add crashtest to prevent regressions r=asuth
Comment 21•1 year ago
|
||
bugherder |
Description
•