Assertion failure: mTransferredPorts.IsEmpty(), at /builds/worker/checkouts/gecko/dom/base/StructuredCloneHolder.cpp:354
Categories
(Core :: DOM: postMessage, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox123 | --- | affected |
People
(Reporter: tsmith, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: assertion, pernosco, testcase, Whiteboard: [bugmon:bisected,confirmed])
Attachments
(1 file)
|
518 bytes,
text/html
|
Details |
Found while fuzzing m-c 20231208-31a6430ad25b (--enable-debug --enable-fuzzing)
To reproduce via Grizzly Replay:
$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -d --fuzzing -n firefox
$ python -m grizzly.replay.bugzilla ./firefox/firefox <bugid>
Assertion failure: mTransferredPorts.IsEmpty(), at /builds/worker/checkouts/gecko/dom/base/StructuredCloneHolder.cpp:354
#0 0x7fdd4085e524 in mozilla::dom::StructuredCloneHolder::~StructuredCloneHolder() /builds/worker/checkouts/gecko/dom/base/StructuredCloneHolder.cpp:354:3
#1 0x7fdd439cf3e1 in ~MessageEventRunnable /builds/worker/checkouts/gecko/dom/workers/MessageEventRunnable.h:20:7
#2 0x7fdd439cf3e1 in mozilla::dom::MessageEventRunnable::~MessageEventRunnable() /builds/worker/checkouts/gecko/dom/workers/MessageEventRunnable.h:20:7
#3 0x7fdd439ff76b in mozilla::dom::WorkerRunnable::Release() /builds/worker/checkouts/gecko/dom/workers/WorkerRunnable.cpp:192:1
#4 0x7fdd3ea7b046 in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:49:40
#5 0x7fdd3ea7b046 in assign_assuming_AddRef /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:322:7
#6 0x7fdd3ea7b046 in operator= /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:597:5
#7 0x7fdd3ea7b046 in mozilla::ThrottledEventQueue::Inner::ExecuteRunnable() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:257:11
#8 0x7fdd3ea77de1 in mozilla::ThrottledEventQueue::Inner::Executor::Run() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:81:15
#9 0x7fdd3ea7b020 in mozilla::ThrottledEventQueue::Inner::ExecuteRunnable() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:254:22
#10 0x7fdd3ea77de1 in mozilla::ThrottledEventQueue::Inner::Executor::Run() /builds/worker/checkouts/gecko/xpcom/threads/ThrottledEventQueue.cpp:81:15
#11 0x7fdd3ea4a337 in mozilla::RunnableTask::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:568:16
#12 0x7fdd3ea3faa6 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:895:26
#13 0x7fdd3ea3e287 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:718:15
#14 0x7fdd3ea3e705 in mozilla::TaskController::ProcessPendingMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:504:36
#15 0x7fdd3ea4e349 in operator() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:225:37
#16 0x7fdd3ea4e349 in mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_1>::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.h:548:5
#17 0x7fdd3ea63642 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1199:16
#18 0x7fdd3ea6a78d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
#19 0x7fdd3f73da73 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:107:5
#20 0x7fdd3f657591 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#21 0x7fdd3f657591 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#22 0x7fdd43f84248 in nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:148:27
#23 0x7fdd44041218 in nsAppShell::Run() /builds/worker/checkouts/gecko/widget/gtk/nsAppShell.cpp:470:33
#24 0x7fdd45e74bbb in XRE_RunAppShell() /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:721:20
#25 0x7fdd3f73e9a6 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:235:9
#26 0x7fdd3f657591 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#27 0x7fdd3f657591 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#28 0x7fdd45e74422 in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:656:34
#29 0x55f9bdd27156 in content_process_main /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
#30 0x55f9bdd27156 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:375:18
#31 0x7fdd53029d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#32 0x7fdd53029e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#33 0x55f9bdcfce88 in _start (/home/user/workspace/browsers/m-c-20240102161658-fuzzing-debug/firefox-bin+0x58e88) (BuildId: 507e52af05b747dbd8efbe2c70d2b11980ccc847)
Comment 1•2 years ago
|
||
Verified bug as reproducible on mozilla-central 20240103050624-cc67c788cded.
Unable to bisect testcase (Testcase reproduces on start build!):
Start: 616a6f1689dc99810e5d7e6465b781a49b6430c8 (20230104042941)
End: 31a6430ad25b20a75a4b299efea5d8ca94c2dfe9 (20231208151401)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)
Updated•2 years ago
|
Updated•2 years ago
|
Comment 2•2 years ago
|
||
Successfully recorded a pernosco session. A link to the pernosco session will be added here shortly.
Comment 5•2 years ago
|
||
The testcase seems to do an endless loop of self.postMessage (and as such is a DoS attack) that somehow makes it possible that we destruct a StructuredCloneHolder on the main thread without ever having run TakeTransferredPortsAsSequence(ports) on the worker thread.
:smaug, you were involved in bug 1186307 long time ago, do you think this assertion shows a real problem?
Comment 6•2 years ago
|
||
Not sure why TakeTransferredPortsAsSequence is interesting here.
A port is passed as transferred object, but never used for anything.
And I don't see any DoS here. There is just a single message.
The problem is new Uint8ClampedArray(16). Because of that the main thread doesn't ever get the message.
I would expect postMessage to throw or succeed, but it doesn't do either.
sfink, you might know about this.
(But the actual assertion is rather safe. Either we should remove that or clear the array somewhere explicitly)
Updated•2 years ago
|
Comment 7•2 years ago
•
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #6)
Not sure why TakeTransferredPortsAsSequence is interesting here.
My understanding is that the only way of clearing the mTransferredPorts array is this std::move. (Actually looking at this again I'd want to confirm, that std::move really reliably clears the underlying array header and not just moves the underlying dynamic memory, but I'd assume it does.)
Edit: To be more precise, TakeTransferredPortsAsSequence is only one of the potential callers of StructuredCloneHolder::TakeTransferredPorts, and the pernosco session shows also a few calls to nsContentUtils::StructuredClone and quite some to JSActorManager::ReceiveRawMessage. So yes, it is not clear, which of these calls being there or not really matters here.
A port is passed as transferred object, but never used for anything.
And I don't see any DoS here. There is just a single message.
The pernosco session shows that postMessage is called pretty often on the worker until the assertion triggers. My understanding is that the self.postMessage would lead to an endless loop, assuming that self refers to the worker here. (If that is necessarily to be called a DoS might be questionable, though.) My theory is that this loop is interrupted only due to some race that triggers the assertion.
FWIW, I reactivated the pernosco session, if anybody else wants to take a look.
Comment 8•2 years ago
•
|
||
self.postMessage() just posts a message to the other side. (I wasn't looking at the pernosco, just looking at the testcase and testing what it does)
Comment 9•2 years ago
•
|
||
Actually looking at this again I'd want to confirm, that std::move really reliably clears the underlying array header and not just moves the underlying dynamic memory, but I'd assume it does.
Probably a side track and not the cause of this assertion at all, but IIUC it seems that semantically we should not make any assumptions about an object's state after std::move:
note that -in the standard library- moving implies that the moved-from object is left in a valid but unspecified state. Which means that, after such an operation, the value of the moved-from object should only be destroyed or assigned a new value; accessing it otherwise yields an unspecified value.
I did not even try to check how but generally spoken we should either not assert or use something more explicit than nsTArray really behaves wtr,std::move to move&clear the array.
Edit: Apparently for nsTArray this assert is a safe check, if I tracked the move assignment down to nsTArray_base correctly.
Comment 10•1 year ago
|
||
Maybe time to take another look...
Updated•1 year ago
|
Comment 11•1 year ago
|
||
From the old pernosco session from comment 4 it seems to me that we fail a structured clone inside ImageData::ReadStructuredClone for a failure in JS_ReadTypedArray.
Any failure like this may break the asserted assumption that all ports have been transferred given the earlier bail out during the runnable processing. I do not know what makes the cloning fail and probably that should be investigated, but I think this particular assertion is a bit over-ambitious in case of errors, while we might miss a better assertion (or error handling) on the cloning being successful somewhere ?
Comment 12•1 year ago
|
||
Verified bug as reproducible on mozilla-central 20241104092716-b26d51f8c917.
Unable to bisect testcase (Testcase reproduces on start build!):
Start: 925231a8fb5eccf055afb3e9921596d8f990128b (20231106094018)
End: 31a6430ad25b20a75a4b299efea5d8ca94c2dfe9 (20231208151401)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)
Comment 13•1 year ago
•
|
||
Jens and I were looking and it seems like the problem the trace is seeing is that JS_ReadTypedArray sees a SCTAG_BACK_REFERENCE_OBJECT but doesn't know what to do with it. In the code it's the manually written mozilla::dom::ImageData::ReadStructuredClone calling JS_ReadTypedArray matching the call in mozilla::dom::ImageData::WriteStructuredClone to JS_WriteTypedArray. But JS_WriteTypedArray is all "hey, we serialize graphs here, let's not directly write the typed array" which ends up in JSStructuredCloneWriter::startObject and reuses the existing memory that gives us the SCTAG_BACK_REFERENCE_OBJECT.
:sfink, should the ImageData binding be doing something more clever here?
Updated•1 year ago
|
Comment 14•1 year ago
|
||
Testcase crashes using the initial build (mozilla-central 20240112173417-734e2e027196) but not with tip (mozilla-central 20250111091648-dbf9beb835a2.)
The bug appears to have been fixed in the following build range:
Start: e89e8ca472e6a63d8b0b15fcb412c1ecc64ddb22 (20250107202216)
End: e11e64eee86355c44e4f0ef4db03cd31d60862b3 (20250107222131)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e89e8ca472e6a63d8b0b15fcb412c1ecc64ddb22&tochange=e11e64eee86355c44e4f0ef4db03cd31d60862b3
tsmith, can you confirm that the above bisection range is responsible for fixing this issue?
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Comment 15•1 year ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1937694 is probably what fixed this bug.
| Reporter | ||
Comment 16•1 year ago
|
||
resetting ni?
I can confirm this is no longer reproducible but I'm not sure if there are additional changes to be made. Also should the attached test case be added as a crash test?
Description
•