Open Bug 1872774 Opened 2 years ago Updated 1 year ago

Assertion failure: mTransferredPorts.IsEmpty(), at /builds/worker/checkouts/gecko/dom/base/StructuredCloneHolder.cpp:354

Categories

(Core :: DOM: postMessage, defect)

defect

Tracking

()

Tracking Status
firefox123 --- affected

People

(Reporter: tsmith, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, pernosco, testcase, Whiteboard: [bugmon:bisected,confirmed])

Attachments

(1 file)

Attached file testcase.html

Found while fuzzing m-c 20231208-31a6430ad25b (--enable-debug --enable-fuzzing)

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -d --fuzzing -n firefox
$ python -m grizzly.replay.bugzilla ./firefox/firefox <bugid>

Assertion failure: mTransferredPorts.IsEmpty(), at /builds/worker/checkouts/gecko/dom/base/StructuredCloneHolder.cpp:354

#0 0x7fdd4085e524 in mozilla::dom::StructuredCloneHolder::~StructuredCloneHolder() /builds/worker/checkouts/gecko/dom/base/StructuredCloneHolder.cpp:354:3
#1 0x7fdd439cf3e1 in ~MessageEventRunnable /builds/worker/checkouts/gecko/dom/workers/MessageEventRunnable.h:20:7
#2 0x7fdd439cf3e1 in mozilla::dom::MessageEventRunnable::~MessageEventRunnable() /builds/worker/checkouts/gecko/dom/workers/MessageEventRunnable.h:20:7
#3 0x7fdd439ff76b in mozilla::dom::WorkerRunnable::Release() /builds/worker/checkouts/gecko/dom/workers/WorkerRunnable.cpp:192:1
#4 0x7fdd3ea7b046 in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:49:40
#5 0x7fdd3ea7b046 in assign_assuming_AddRef /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:322:7
#6 0x7fdd3ea7b046 in operator= /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:597:5
#7 0x7fdd3ea7b046 in mozilla::ThrottledEventQueue::Inner::ExecuteRunnable() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:257:11
#8 0x7fdd3ea77de1 in mozilla::ThrottledEventQueue::Inner::Executor::Run() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:81:15
#9 0x7fdd3ea7b020 in mozilla::ThrottledEventQueue::Inner::ExecuteRunnable() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:254:22
#10 0x7fdd3ea77de1 in mozilla::ThrottledEventQueue::Inner::Executor::Run() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:81:15
#11 0x7fdd3ea4a337 in mozilla::RunnableTask::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:568:16
#12 0x7fdd3ea3faa6 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:895:26
#13 0x7fdd3ea3e287 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:718:15
#14 0x7fdd3ea3e705 in mozilla::TaskController::ProcessPendingMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:504:36
#15 0x7fdd3ea4e349 in operator() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:225:37
#16 0x7fdd3ea4e349 in mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_1>::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.h:548:5
#17 0x7fdd3ea63642 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1199:16
#18 0x7fdd3ea6a78d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
#19 0x7fdd3f73da73 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:107:5
#20 0x7fdd3f657591 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#21 0x7fdd3f657591 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#22 0x7fdd43f84248 in nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:148:27
#23 0x7fdd44041218 in nsAppShell::Run() /builds/worker/checkouts/gecko/widget/gtk/nsAppShell.cpp:470:33
#24 0x7fdd45e74bbb in XRE_RunAppShell() /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:721:20
#25 0x7fdd3f73e9a6 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:235:9
#26 0x7fdd3f657591 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#27 0x7fdd3f657591 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#28 0x7fdd45e74422 in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:656:34
#29 0x55f9bdd27156 in content_process_main /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
#30 0x55f9bdd27156 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:375:18
#31 0x7fdd53029d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#32 0x7fdd53029e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#33 0x55f9bdcfce88 in _start (/home/user/workspace/browsers/m-c-20240102161658-fuzzing-debug/firefox-bin+0x58e88) (BuildId: 507e52af05b747dbd8efbe2c70d2b11980ccc847)
Flags: in-testsuite?

Verified bug as reproducible on mozilla-central 20240103050624-cc67c788cded.
Unable to bisect testcase (Testcase reproduces on start build!):

Start: 616a6f1689dc99810e5d7e6465b781a49b6430c8 (20230104042941)
End: 31a6430ad25b20a75a4b299efea5d8ca94c2dfe9 (20231208151401)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)

Whiteboard: [bugmon:bisected,confirmed]
Component: DOM: Core & HTML → DOM: Workers

Successfully recorded a pernosco session. A link to the pernosco session will be added here shortly.

A pernosco session for this bug can be found here.

A pernosco session for this bug can be found here.

The testcase seems to do an endless loop of self.postMessage (and as such is a DoS attack) that somehow makes it possible that we destruct a StructuredCloneHolder on the main thread without ever having run TakeTransferredPortsAsSequence(ports) on the worker thread.

:smaug, you were involved in bug 1186307 long time ago, do you think this assertion shows a real problem?

Flags: needinfo?(smaug)

Not sure why TakeTransferredPortsAsSequence is interesting here.
A port is passed as transferred object, but never used for anything.

And I don't see any DoS here. There is just a single message.

The problem is new Uint8ClampedArray(16). Because of that the main thread doesn't ever get the message.
I would expect postMessage to throw or succeed, but it doesn't do either.

sfink, you might know about this.

(But the actual assertion is rather safe. Either we should remove that or clear the array somewhere explicitly)

Flags: needinfo?(smaug) → needinfo?(sphink)
Component: DOM: Workers → DOM: postMessage

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #6)

Not sure why TakeTransferredPortsAsSequence is interesting here.

My understanding is that the only way of clearing the mTransferredPorts array is this std::move. (Actually looking at this again I'd want to confirm, that std::move really reliably clears the underlying array header and not just moves the underlying dynamic memory, but I'd assume it does.)

Edit: To be more precise, TakeTransferredPortsAsSequence is only one of the potential callers of StructuredCloneHolder::TakeTransferredPorts, and the pernosco session shows also a few calls to nsContentUtils::StructuredClone and quite some to JSActorManager::ReceiveRawMessage. So yes, it is not clear, which of these calls being there or not really matters here.

A port is passed as transferred object, but never used for anything.

And I don't see any DoS here. There is just a single message.

The pernosco session shows that postMessage is called pretty often on the worker until the assertion triggers. My understanding is that the self.postMessage would lead to an endless loop, assuming that self refers to the worker here. (If that is necessarily to be called a DoS might be questionable, though.) My theory is that this loop is interrupted only due to some race that triggers the assertion.

FWIW, I reactivated the pernosco session, if anybody else wants to take a look.

self.postMessage() just posts a message to the other side. (I wasn't looking at the pernosco, just looking at the testcase and testing what it does)

Actually looking at this again I'd want to confirm, that std::move really reliably clears the underlying array header and not just moves the underlying dynamic memory, but I'd assume it does.

Probably a side track and not the cause of this assertion at all, but IIUC it seems that semantically we should not make any assumptions about an object's state after std::move:

note that -in the standard library- moving implies that the moved-from object is left in a valid but unspecified state. Which means that, after such an operation, the value of the moved-from object should only be destroyed or assigned a new value; accessing it otherwise yields an unspecified value.

I did not even try to check how nsTArray really behaves wtr, but generally spoken we should either not assert or use something more explicit than std::move to move&clear the array.

Edit: Apparently for nsTArray this assert is a safe check, if I tracked the move assignment down to nsTArray_base correctly.

Maybe time to take another look...

Flags: needinfo?(sphink) → needinfo?(jstutte)
Flags: needinfo?(jstutte)
Whiteboard: [bugmon:bisected,confirmed] → [bugmon:confirm]

From the old pernosco session from comment 4 it seems to me that we fail a structured clone inside ImageData::ReadStructuredClone for a failure in JS_ReadTypedArray.

Any failure like this may break the asserted assumption that all ports have been transferred given the earlier bail out during the runnable processing. I do not know what makes the cloning fail and probably that should be investigated, but I think this particular assertion is a bit over-ambitious in case of errors, while we might miss a better assertion (or error handling) on the cloning being successful somewhere ?

Flags: needinfo?(smaug)

Verified bug as reproducible on mozilla-central 20241104092716-b26d51f8c917.
Unable to bisect testcase (Testcase reproduces on start build!):

Start: 925231a8fb5eccf055afb3e9921596d8f990128b (20231106094018)
End: 31a6430ad25b20a75a4b299efea5d8ca94c2dfe9 (20231208151401)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)

Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]
Flags: needinfo?(smaug)

Testcase crashes using the initial build (mozilla-central 20240112173417-734e2e027196) but not with tip (mozilla-central 20250111091648-dbf9beb835a2.)

The bug appears to have been fixed in the following build range:

Start: e89e8ca472e6a63d8b0b15fcb412c1ecc64ddb22 (20250107202216)
End: e11e64eee86355c44e4f0ef4db03cd31d60862b3 (20250107222131)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e89e8ca472e6a63d8b0b15fcb412c1ecc64ddb22&tochange=e11e64eee86355c44e4f0ef4db03cd31d60862b3

tsmith, can you confirm that the above bisection range is responsible for fixing this issue?
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Flags: needinfo?(sphink) → needinfo?(twsmith)
Keywords: bugmon

resetting ni?

I can confirm this is no longer reproducible but I'm not sure if there are additional changes to be made. Also should the attached test case be added as a crash test?

Flags: needinfo?(twsmith) → needinfo?(sphink)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: