Assertion failure: sInServoTraversal || NS_IsMainThread(), at /builds/worker/workspace/obj-build/dist/include/mozilla/ServoUtils.h:33
Categories
(Core :: DOM: Workers, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | unaffected |
firefox134 | --- | wontfix |
firefox135 | --- | verified |
firefox136 | --- | verified |
People
(Reporter: jkratzer, Assigned: asuth)
References
(Blocks 2 open bugs, Regression)
Details
(Keywords: pernosco, regression, testcase, Whiteboard: [bugmon:bisected,confirmed])
Crash Data
Attachments
(4 files)
Testcase found while fuzzing mozilla-central rev 5904a2d552f2 (built with: --enable-debug --enable-fuzzing).
Testcase can be reproduced using the following commands:
$ pip install fuzzfetch grizzly-framework --upgrade
$ python -m fuzzfetch --build 5904a2d552f2 --debug --fuzzing -n firefox
$ python -m grizzly.replay.bugzilla ./firefox/firefox <bugid>
Assertion failure: sInServoTraversal || NS_IsMainThread(), at /builds/worker/workspace/obj-build/dist/include/mozilla/ServoUtils.h:33
==579159==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f5c5e337717 bp 0x7f5c4b55a330 sp 0x7f5c4b55a330 T579215)
==579159==The signal is caused by a WRITE memory access.
==579159==Hint: address points to the zero page.
#0 0x7f5c5e337717 in IsInServoTraversal /builds/worker/workspace/obj-build/dist/include/mozilla/ServoUtils.h:33:3
#1 0x7f5c5e337717 in IsInServoTraversal /builds/worker/workspace/obj-build/dist/include/mozilla/ServoStyleSet.h:119:45
#2 0x7f5c5e337717 in gfxFontUtils::IsInServoTraversal() /gfx/thebes/gfxFontUtils.cpp:1764:21
#3 0x7f5c6295429d in mozilla::dom::FontFace::~FontFace() /layout/style/FontFace.cpp:80:3
#4 0x7f5c6298f95c in DeleteCycleCollectable /layout/style/FontFace.cpp:71:1
#5 0x7f5c6298f95c in mozilla::dom::FontFace::cycleCollection::DeleteCycleCollectable(void*) /builds/worker/workspace/obj-build/dist/include/mozilla/dom/FontFace.h:43:3
#6 0x7f5c5cbb38b5 in SuspectAfterShutdown(void*, nsCycleCollectionParticipant*, nsCycleCollectingAutoRefCnt*, bool*) /xpcom/base/nsCycleCollector.cpp:4003:12
#7 0x7f5c629541d1 in decr /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:291:7
#8 0x7f5c629541d1 in decr /builds/worker/workspace/obj-build/dist/include/nsISupportsImpl.h:279:12
#9 0x7f5c629541d1 in mozilla::dom::FontFace::Release() /layout/style/FontFace.cpp:71:1
#10 0x7f5c5cb9a982 in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:49:40
#11 0x7f5c5cb9a982 in ~nsCOMPtr /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:344:7
#12 0x7f5c5cb9a982 in ~SegmentImpl /builds/worker/workspace/obj-build/dist/include/mozilla/SegmentedVector.h:78:21
#13 0x7f5c5cb9a982 in mozilla::SegmentedVector<nsCOMPtr<nsISupports>, 4096ul, mozilla::MallocAllocPolicy>::PopLastN(unsigned int) /builds/worker/workspace/obj-build/dist/include/mozilla/SegmentedVector.h:253:14
#14 0x7f5c5cb83302 in mozilla::dom::DeferredFinalizerImpl<nsISupports>::DeferredFinalize(unsigned int, void*) /builds/worker/workspace/obj-build/dist/include/mozilla/dom/BindingUtils.h:2919:15
#15 0x7f5c5cb8391a in mozilla::IncrementalFinalizeRunnable::ReleaseNow(bool) /xpcom/base/CycleCollectedJSRuntime.cpp:1689:17
#16 0x7f5c5cb83f71 in mozilla::CycleCollectedJSRuntime::FinalizeDeferredThings(mozilla::CycleCollectedJSRuntime::DeferredFinalizeType) /xpcom/base/CycleCollectedJSRuntime.cpp:1768:24
#17 0x7f5c5cb81a9f in mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) /xpcom/base/CycleCollectedJSRuntime.cpp:1855:7
#18 0x7f5c6416c121 in js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) /js/src/gc/GC.cpp:4406:3
#19 0x7f5c6416d1f3 in ~AutoCallGCCallbacks /js/src/gc/GC.cpp:4379:32
#20 0x7f5c6416d1f3 in js::gc::GCRuntime::gcCycle(bool, JS::SliceBudget const&, JS::GCReason) /js/src/gc/GC.cpp:4507:1
#21 0x7f5c6416e983 in js::gc::GCRuntime::collect(bool, JS::SliceBudget const&, JS::GCReason) /js/src/gc/GC.cpp:4690:9
#22 0x7f5c64153839 in js::gc::GCRuntime::gc(JS::GCOptions, JS::GCReason) /js/src/gc/GC.cpp:4769:3
#23 0x7f5c63af0da3 in JSRuntime::destroyRuntime() /js/src/vm/Runtime.cpp:254:8
#24 0x7f5c63996faf in js::DestroyContext(JSContext*) /js/src/vm/JSContext.cpp:227:7
#25 0x7f5c5cb7571a in mozilla::CycleCollectedJSContext::~CycleCollectedJSContext() /xpcom/base/CycleCollectedJSContext.cpp:106:3
#26 0x7f5c6210713c in ~WorkerJSContext /dom/workers/RuntimeService.cpp:904:3
#27 0x7f5c6210713c in operator() /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:460:5
#28 0x7f5c6210713c in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:302:7
#29 0x7f5c6210713c in ~UniquePtr /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:250:18
#30 0x7f5c6210713c in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /dom/workers/RuntimeService.cpp:2245:5
#31 0x7f5c5cca675c in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1153:16
#32 0x7f5c5ccad24f in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:480:10
#33 0x7f5c5d85a788 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:299:20
#34 0x7f5c5d7a9fe1 in RunHandler /ipc/chromium/src/base/message_loop.cc:362:3
#35 0x7f5c5d7a9fe1 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:344:3
#36 0x7f5c5cca1ca3 in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:366:10
#37 0x7f5c6ce1c9df in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:191:3
#38 0x7f5c6cec1a93 in start_thread nptl/pthread_create.c:447:8
#39 0x7f5c6cf4ec3b in clone3 misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
==579159==Register values:
rax = 0x00007f5c58d3b7c2 rbx = 0x00007f5b900824b0 rcx = 0x000063dbb6bf6a20 rdx = 0x00007f5c6d029563
rdi = 0x00007f5c6d02a700 rsi = 0x0000000000000000 rbp = 0x00007f5c4b55a330 rsp = 0x00007f5c4b55a330
r8 = 0x0000000000000000 r9 = 0x0000000000000003 r10 = 0x0000000000000000 r11 = 0x0000000000000293
r12 = 0x0000000000000003 r13 = 0x00007f5b900696a8 r14 = 0x00007f5b900824b0 r15 = 0x00007f5b900824f8
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV /builds/worker/workspace/obj-build/dist/include/mozilla/ServoUtils.h:33:3 in IsInServoTraversal
==579159==ABORTING
Reporter | ||
Comment 1•28 days ago
|
||
Updated•27 days ago
|
Comment 2•27 days ago
|
||
Verified bug as reproducible on mozilla-central 20250115215720-f7524feb52aa.
The bug appears to have been introduced in the following build range:
Start: 2b86d382dab2e182131315e79fa26bb32affa790 (20241024013316)
End: 7936ca01a900402531a01de23474744fde9bbe1c (20241024040951)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=2b86d382dab2e182131315e79fa26bb32affa790&tochange=7936ca01a900402531a01de23474744fde9bbe1c
Comment 3•26 days ago
|
||
Emilio, do you have thoughts on this? In triage we're not sure this is graphics.
Comment 4•26 days ago
|
||
So this is a FontFace
getting destroyed in a worker. We're asserting that we're not in the servo traversal, but dom::GetCurrentThreadWorkerPrivate
seems to be returning null now here.
So the good thing is that it seems "bening", in the sense that it's only breaking an assertion by accident (since we're presumably still in a worker thread).
Andrew, it seems likely one of your worker changes in comment 2 caused this, any thoughts?
Assignee | ||
Comment 5•26 days ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)
Andrew, it seems likely one of your worker changes in comment 2 caused this, any thoughts?
So, bug 1131324 exposed navigator.serviceWorker
which is what the o[1]
through o[3]
are touching. Without bug 1131324 the test-case will be doing new WeakRef(undefined)
and it seems likely this is significant. I suspect/hope this test case would reproduce prior to my landing if it were to use navigator.locks
instead of navigator.serviceWorker
, since that has been around for a longer time.
What's bizarre is that the FontFace is being destroyed as a fresh deferred finalize (it's not like a IncrementalFinalizeRunnable
got wedged). The reference made it through the GC+CC cleanup gambit where we unroot the global scope, run at least one full shutdown GC and spin the event loop, and then call nsCycleCollector_shutdown() and spin the loop, and only have the teardown of the WorkerJSContext be where the finalizer runnable is called at the last moment (and efforts to get the WorkerPrivate from TLS will fail).
This is definitely a troublesome scenario, as dom::GetCurrentThreadWorkerPrivate
is correctly returning null, but no code should be put in the position where it's being freed so late that it has to deal with that.
That said, one non-bizarre-GC possibility is that something wacky with multi-threaded ownership (mozilla::dom::FontFaceSetImpl::DispatchToOwningThread does exist) is running a runnable incredibly late on the worker thread and that is dropping the last strong reference to the FontFace on the worker which then interacts with the WeakRef somehow? But for a race like that, I would expect that there would have been a whole family of other assertions/crashes for the cases where the runnable doesn't make it to the worker in time.
I'm going to set the pernosco-wanted flag because if pernosco can reproduce this, it would be very nice. If we can't get a pernosco repro, getting a log from around the crash point with MOZ_LOG set to WorkerThread:5
and possibly WorkerPrivate:5
too could be quite interesting and help us rule out FontFace multithreading lifetime issues. (If we can rule those out, then we should move this to the Workers component.)
I'm also going to set a needinfo on :smaug for his CC expertise, I may be missing something obvious here.
Comment 6•26 days ago
|
||
Successfully recorded a pernosco session. A link to the pernosco session will be added here shortly.
Comment 7•26 days ago
|
||
Not sure what the CC related question here is. We're deleting an object very very late, after CC has been shut down for the thread.
Updated•26 days ago
|
Comment 9•25 days ago
|
||
The model how WorkerRefs are used seems a bit suspicious, but haven't yet looked at it more closely.
https://searchfox.org/mozilla-central/rev/5ef39aaccb8f660fb9c9c98ee76e0985ab6398d2/layout/style/FontFaceSetWorkerImpl.cpp#39-49
https://searchfox.org/mozilla-central/rev/5ef39aaccb8f660fb9c9c98ee76e0985ab6398d2/layout/style/FontFaceSetWorkerImpl.cpp#136,139
If I comment out that mWorkerRef = nullptr; I can't reproduce the crash anymore.
Comment 10•23 days ago
|
||
This bug has been marked as a regression. Setting status flag for Nightly to affected
.
Updated•23 days ago
|
Assignee | ||
Comment 11•22 days ago
|
||
This isn't a regression. The test case is just over-constrained to depend on something that was first exposed in bug 1131324 but with high probability if the test case used navigator.locks the same thing would happen.
Comment 12•19 days ago
|
||
Andrew, do you think there is a more appropriate component for this than Graphics? The team isn't really feeling that it's something on our plate.
Assignee | ||
Comment 13•19 days ago
|
||
Indeed, the pernosco trace is showing that the refcount is getting dropped to 1 at a reasonable time during DisconnectGlobalTeardownObservers but we're not finalizing the FontFace until we destroy the runtime and so then we don't perform the final release until slightly later in the process of destroying the runtime. Which is weird given that the worker runtime is being very classy and calling nsCycleCollector_shutdown.
This is somewhere in the interaction of Workers, the Cycle Collector, or the JS engine, so I'm going to move this to Workers for now.
:smaug, :mccr8, do you have ideas on why we're not sweeping the FontFace during nsCycleCollector_shutdown?
Assignee | ||
Comment 14•19 days ago
•
|
||
So a definite problem is that WorkerNavigator is not nulling out mServiceWorkerContainer in mozilla::dom::WorkerNavigator::Invalidate or tracing it, which is definitely my bad and definitely a regression. This means the WeakRef gets to stay alive and I guess thanks to the complexities of GC (or limitations of), it probably results in keeping the (presumably closed over) array siblings alive too late. If we fix the failure to aggressively null out the strong ref and the CC tracing, presumably the problem will go away.
Assignee | ||
Comment 15•19 days ago
•
|
||
For posterity, here's a screenshot of a portion of a class diagram of WorkerNavigator made by searchfox that shows that all the other fields are cc-traversed. (They are indirectly unlinked, however, which is why the almost documented emoji don't show the unlink emoji sequence. Searchfox doesn't currently pierce the indirect unlink logic.) (The diagram is semi-ironic because I added the functionality to help avoid situations like this, although searchfox doesn't run against pull requests.)
Assignee | ||
Comment 16•19 days ago
|
||
Updated•19 days ago
|
Updated•19 days ago
|
Comment 17•19 days ago
|
||
Assignee | ||
Comment 18•19 days ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D235398
Updated•19 days ago
|
Comment 19•19 days ago
|
||
beta Uplift Approval Request
- User impact if declined: Reduces chance for us seeing weird worker crashes or memory leaks in the wild. More importantly, if we do see memory leaks or weird crashes in the wild and this was uplifted, we can know it's not from this bug. If we don't uplift and see weird stuff, we have to worry about this bug. The fuzzer scenario itself I think is hopefully somewhat unlikely, but if sites turn on more ServiceWorkers for Firefox, we could see interesting variants.
- Code covered by automated testing: yes
- Fix verified in Nightly: no
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: n/a
- Risk associated with taking this patch: Very low
- Explanation of risk level: This moves cleanup from a sketchy/too-late time to the correct, very safe time.
- String changes made/needed: none
- Is Android affected?: yes
Comment 20•18 days ago
|
||
bugherder |
Updated•18 days ago
|
Comment 21•18 days ago
|
||
uplift |
Updated•18 days ago
|
Comment 22•16 days ago
|
||
Verified bug as fixed on rev mozilla-central 20250125093830-ef11a9be8e75.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Description
•