Closed Bug 1674278 Opened 2 years ago Closed 2 years ago

Hit MOZ_CRASH(Promise not thread-safe) at /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:41

Categories

(Core :: DOM: Workers, defect, P2)

defect

Tracking

()

VERIFIED FIXED
85 Branch
Tracking Status
firefox-esr78 85+ fixed
firefox83 --- wontfix
firefox84 --- wontfix
firefox85 + verified

People

(Reporter: tsmith, Assigned: asuth)

References

(Blocks 1 open bug)

Details

(6 keywords, Whiteboard: [bugmon:bisected,confirmed][adv-main85+r][adv-esr78.7+r])

Attachments

(4 files)

Attached file testcase.zip

I'm not sure if the belongs in this competent or a worker competent.

Hit MOZ_CRASH(Promise not thread-safe) at /builds/worker/checkouts/gecko/xpcom/base/nsISupportsImpl.cpp:41

#0 0x7f42e9f6a48a in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:254:3
#1 0x7f42e9f6a48a in nsAutoOwningThread::AssertCurrentThreadOwnsMe(char const*) const src/xpcom/base/nsISupportsImpl.cpp:41:5
#2 0x7f42e9f50e58 in AssertOwnership<24> src/xpcom/base/nsISupportsImpl.h:60:5
#3 0x7f42e9f50e58 in mozilla::dom::Promise::Release() /builds/worker/workspace/obj-build/dist/include/mozilla/dom/Promise.h:44:3
#4 0x7f42ebb3a5bd in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:50:40
#5 0x7f42ebb3a5bd in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:381:36
#6 0x7f42ebb3a5bd in ~RefPtr /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:81:7
#7 0x7f42ebb3a5bd in mozilla::dom::BodyConsumer::~BodyConsumer() src/dom/base/BodyConsumer.cpp:373:29
#8 0x7f42ebb3ea9b in mozilla::dom::BodyConsumer::Release() src/dom/base/BodyConsumer.cpp:812:1
#9 0x7f42ebb52f2b in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:50:40
#10 0x7f42ebb52f2b in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:381:36
#11 0x7f42ebb52f2b in ~RefPtr /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:81:7
#12 0x7f42ebb52f2b in ~ src/dom/base/BodyConsumer.cpp:775:9
#13 0x7f42ebb52f2b in ~RunnableFunction /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:568:7
#14 0x7f42ebb52f2b in mozilla::detail::RunnableFunction<mozilla::dom::BodyConsumer::ShutDownMainThreadConsuming()::$_3>::~RunnableFunction() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:568:7
#15 0x7f42ea0376f7 in mozilla::Runnable::Release() src/xpcom/threads/nsThreadUtils.cpp:68:1
#16 0x7f42ea058392 in assign_assuming_AddRef /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:428:7
#17 0x7f42ea058392 in operator= /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:697:5
#18 0x7f42ea058392 in mozilla::ThrottledEventQueue::Inner::ExecuteRunnable() src/xpcom/threads/ThrottledEventQueue.cpp:257:11
#19 0x7f42ea0538c1 in mozilla::ThrottledEventQueue::Inner::Executor::Run() src/xpcom/threads/ThrottledEventQueue.cpp:81:15
#20 0x7f42ea02e5ef in mozilla::RunnableTask::Run() src/xpcom/threads/TaskController.cpp:450:16
#21 0x7f42ea02cc5a in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:720:26
#22 0x7f42ea02bd04 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:579:15
#23 0x7f42ea02beb7 in mozilla::TaskController::ProcessPendingMTTask(bool) src/xpcom/threads/TaskController.cpp:373:36
#24 0x7f42ea031e46 in operator() src/xpcom/threads/TaskController.cpp:120:37
#25 0x7f42ea031e46 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_3>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:577:5
#26 0x7f42ea0433c7 in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1197:14
#27 0x7f42ea04910a in NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:513:10
#28 0x7f42ea9346c6 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:87:21
#29 0x7f42ea8a65b3 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:334:10
#30 0x7f42ea8a64cd in RunHandler src/ipc/chromium/src/base/message_loop.cc:327:3
#31 0x7f42ea8a64cd in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:309:3
#32 0x7f42ee59bde8 in nsBaseAppShell::Run() src/widget/nsBaseAppShell.cpp:137:27
#33 0x7f42efda12e3 in XRE_RunAppShell() src/toolkit/xre/nsEmbedFunctions.cpp:913:20
#34 0x7f42ea935489 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:237:9
#35 0x7f42ea8a65b3 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:334:10
#36 0x7f42ea8a64cd in RunHandler src/ipc/chromium/src/base/message_loop.cc:327:3
#37 0x7f42ea8a64cd in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:309:3
#38 0x7f42efda0ec8 in XRE_InitChildProcess(int, char**, XREChildData const*) src/toolkit/xre/nsEmbedFunctions.cpp:744:34
#39 0x5561aeb1b687 in content_process_main src/browser/app/../../ipc/contentproc/plugin-container.cpp:56:28
#40 0x5561aeb1b687 in main src/browser/app/nsBrowserApp.cpp:304:18

I this possibly a recent regression?

Keywords: bugmon

Is it possible to get a pernosco repro of this? This looks very interesting, and by interesting I mean complex!

Flags: needinfo?(twsmith)

Bugmon Analysis:
Verified bug as reproducible on mozilla-central 20201030160944-a7b7d089d5c3.
Failed to bisect testcase (Start build crashes!):

Start: e8b7c48d4e7ed1b63aeedff379b51e566ea499d9 (20191107015224)
End: d54d820d6a8d24a86d6e03d976280182cbd7758d (20201029040710)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=False, coverage=False, valgrind=False)

Whiteboard: [bugmon:bisected,confirmed]

I think it's unlikely this is a recent regression. I haven't landed the patch in bug 1660555 that alters BodyConsumer functionality, which might or might not actually be landable now, out of caution over a leak if the entire stack is landed. The existing code is somewhat blasé about how AbortFollower deals with threads (and therefore how the things it keeps alive deal with threads). The patches in bug 1660555 that I have landed so far, do not, I believe, change how anything gets threaded around.

The patches in bug 1660555 (if they didn't have that final leak) would at end of day likely resolve this, by forcing followers to be single-threaded. But that full stack of patches would be...unpleasant...to backport anywhere.

A non-failing bisection would definitely help here. If this predates bug 1660555, it is probably not recent.

A Pernosco session is available here: https://pernos.co/debug/b4v-gSp-UbVFk0RngzSVqQ/index.html

Flags: needinfo?(twsmith)
Component: DOM: Core & HTML → DOM: Workers

Pernosco ping to Andrew.

Flags: needinfo?(bugmail)

Thank you for the pernosco trace!

Despite the presence of self.close() in the repro which is a source of other bugs due to premature main-event-queue-clearing, it doesn't appear to be (directly) that.

There's a gap in state machine handling here. Basically:

  • The creation of the BodyConsumer creates mConsumePromise.
  • mConsumePromise gets resolved/rejected (and nulled out) during BodyConsumer::ContinueConsumeBlobBody (on the worker thread).
    • This only happens when the mConsumeBodyPump completes, (indirectly via nsStreamLoader::OnStopRequest) calling ConsumeBodyDoneObserver::OnStreamComplete.
    • In this trace, neither nsStreamLoader::OnStartRequest nor nsStreamLoader::OnStopRequest ever run because the nsInputStreamPump is still in the process of AsyncWait-ing the RemoteLazyInputStream which has had to issue a StreamNeeded request via PRemoteLazyInputStream and is waiting for StreamReady. (The RecvStreamReady is only received after all the bad stuff in the test case happens.)
      • Note that there does seem to be some explicit logic in the case the BodyConsumer is producing a Blob to explicitly invoke ContinueConsumeBlobBody in the abort case.
    • In this pernosco trace, the body gets canceled on the main thread by BodyConsumer::ShutDownMainThreadConsuming as a result of the WorkerRef callback triggering shutdown of the BodyConsumer.
      • There is nothing holding the BodyConsumer alive at this point other than the shutdown process, so when the shutdown process runs on the main thread, all references are torn down.

It seems like the most straightforward fix here is to have the StrongWorkerRef callback null out mConsumePromise and set mBodyConsumed to true when invoked so as to cause both variants of ContinueConsumeBody to exit without trying to resolve/reject the promise if they end up getting called. It's guaranteed to be invoked on the worker thread and it's guaranteed that the content code is no longer allowed to run on the global (so we don't need to resolve/reject it). (And ShutDownMainThreadConsuming is also invoked in non-global-shutdown cases, specifically BodyConsumer::RunAbortAlgorithm which calls it before calling ContinueConsumeBody.)

Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)

Okay, so I think I have evolved a fix, will moz-phab it momentarily.

When reproducing using https://github.com/MozillaSecurity/ffpuppet and the provided prefs file[1] using a file scheme, however, I noticed we run into the problem that there's an unsafe use of FileCreatorHelper::CreateFile as part of a clever local-file specialization that ends up trying to use the Worker's global on the main thread, which is not okay and will fail a nsAutoOwningThread::AssertCurrentThreadOwnsMe(char const*) check done by mozilla::DOMEventTargetHelper::AddRef(). I disabled this fast-path when workers are involved so that doesn't happen, but it was a nice fast-path, so at the very least we'd want a follow-up to eventually be filed to add a correct fast-path.

1: Which does user_pref("security.fileuri.strict_origin_policy", false); which is required for the sibling-pathed worker to be launched.

Severity: -- → S3
Priority: -- → P2
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

Bugmon Analysis:
Verified bug as fixed on rev mozilla-central 20201205093858-7ce95b6cde26.
Removing bugmon keyword as no further action possible.
Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon
Whiteboard: [bugmon:bisected,confirmed] → [bugmon:bisected,confirmed][adv-main85+r]

NI myself so I remember to write the test.

Flags: needinfo?(ytausky)

Comment on attachment 9187960 [details]
Bug 1674278 - Improve BodyConsumer worker shutdown. r=ytausky,#dom-workers-and-storage

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's a simple, non-risky sec-moderate patch.
  • User impact if declined: Probably not much, it's a sec-moderate.
  • Fix Landed on Version: 85
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It's a simple patch that's been on mozilla-central for a while without any problems.
  • String or UUID changes made by this patch:
Flags: needinfo?(ytausky)
Attachment #9187960 - Flags: approval-mozilla-esr78?

Comment on attachment 9187960 [details]
Bug 1674278 - Improve BodyConsumer worker shutdown. r=ytausky,#dom-workers-and-storage

Approved for 78.7esr.

Attachment #9187960 - Flags: approval-mozilla-esr78? → approval-mozilla-esr78+

For some reason the crashtest I wrote doesn't register as a crash on my machine, even though the process does crash. :jgraham, can you make something of this log file?

Flags: needinfo?(james)

So I'm pretty sure this is only happening with asan builds. It's a bug and I'll investigate further, but it shouldn't stop us landing the patch for now and expecting it to fail on CI if there's a regression (side note: crash tests can be shared with other vendors they don't have to be mozilla-only. History suggests a crash in one engine will be a crash in another more often than you'd expect).

Flags: needinfo?(james)

TSAN in bug 1686156 is concerned about the setting of mShuttingDown at https://searchfox.org/mozilla-central/rev/014fe72eaba26dcf6082fb9bbaf208f97a38594e/dom/base/BodyConsumer.cpp#302. Setting the flag wasn't essential to the fix but seemed appropriate as we expected that the flag would otherwise never be set in this particular failure case triggering this problem. Specifically, we only needed to drop the mConsumePromise promise and set mBodyConsumed to true in order to ensure that the promise was freed on the right thread and the state machine would not advance further.

mShuttingDown will end up getting set on the main thread by the call to ShutDownMainThreadConsuming() so this was really just an extra level of paranoia.

Whiteboard: [bugmon:bisected,confirmed][adv-main85+r] → [bugmon:bisected,confirmed][adv-main85+r][adv-esr78.7+r]
Group: core-security-release
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/364cf5dc03d6
Add crashtest to prevent regressions r=asuth
You need to log in before you can comment on or make changes to this bug.