Closed Bug 1896973 Opened 6 months ago Closed 5 months ago

Assertion failure: ok (Incremental marking verification failed), at /builds/worker/checkouts/gecko/js/src/gc/Verifier.cpp:762

Categories

(Core :: JavaScript: GC, defect, P1)

defect

Tracking

()

RESOLVED FIXED
129 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox127 --- wontfix
firefox128 --- wontfix
firefox129 --- fixed

People

(Reporter: tsmith, Assigned: jonco)

References

(Blocks 2 open bugs)

Details

(Keywords: assertion, pernosco, Whiteboard: [fuzzblocker])

Attachments

(3 files)

Found while fuzzing m-c 20240514-b2c1906d3f6e (--enable-debug --enable-fuzzing)

This has been reported may times by fuzzers after gczeal prefs were added. No test cases have been reliable enough to reproduce so far.

javascript.options.mem.gc_zeal.mode=11 was set.

A Pernosco session is available here: https://pernos.co/debug/sG_oSIaPYrMidbEY7p8VtQ/index.html

Assertion failure: ok (Incremental marking verification failed), at /builds/worker/checkouts/gecko/js/src/gc/Verifier.cpp:762

#0 0x7ffbb996702e in js::gc::MarkingValidator::validate() /builds/worker/checkouts/gecko/js/src/gc/Verifier.cpp:762:3
#1 0x7ffbb994f3dc in validateIncrementalMarking /builds/worker/checkouts/gecko/js/src/gc/Verifier.cpp:778:23
#2 0x7ffbb994f3dc in js::gc::GCRuntime::beginSweepingSweepGroup(JS::GCContext*, js::SliceBudget&) /builds/worker/checkouts/gecko/js/src/gc/Sweeping.cpp:1550:3
#3 0x7ffbb9981e90 in sweepaction::SweepActionSequence::run(js::gc::SweepAction::Args&) /builds/worker/checkouts/gecko/js/src/gc/Sweeping.cpp:2179:23
#4 0x7ffbb9978d2e in sweepaction::SweepActionForEach<js::gc::SweepGroupsIter, JSRuntime*>::run(js::gc::SweepAction::Args&) /builds/worker/checkouts/gecko/js/src/gc/Sweeping.cpp:2214:19
#5 0x7ffbb9954541 in js::gc::GCRuntime::performSweepActions(js::SliceBudget&) /builds/worker/checkouts/gecko/js/src/gc/Sweeping.cpp:2362:53
#6 0x7ffbb9892e03 in js::gc::GCRuntime::incrementalSlice(js::SliceBudget&, JS::GCReason, bool) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:3824:11
#7 0x7ffbb9896220 in js::gc::GCRuntime::gcCycle(bool, js::SliceBudget const&, JS::GCReason) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:4338:3
#8 0x7ffbb9897953 in js::gc::GCRuntime::collect(bool, js::SliceBudget const&, JS::GCReason) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:4529:9
#9 0x7ffbb3589f6d in GarbageCollectImpl(JS::GCReason, nsJSContext::IsShrinking, js::SliceBudget const&) /builds/worker/checkouts/gecko/dom/base/nsJSEnvironment.cpp:1059:5
#10 0x7ffbb358a1c0 in nsJSContext::RunIncrementalGCSlice(JS::GCReason, nsJSContext::IsShrinking, js::SliceBudget&) /builds/worker/checkouts/gecko/dom/base/nsJSEnvironment.cpp:1096:3
#11 0x7ffbb318a643 in mozilla::CCGCScheduler::GCRunnerFiredDoGC(mozilla::TimeStamp, mozilla::GCRunnerStep const&) /builds/worker/checkouts/gecko/dom/base/CCGCScheduler.cpp:469:3
#12 0x7ffbb31899f4 in mozilla::CCGCScheduler::GCRunnerFired(mozilla::TimeStamp) /builds/worker/checkouts/gecko/dom/base/CCGCScheduler.cpp:428:10
#13 0x7ffbb15e7841 in operator() /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:687:14
#14 0x7ffbb15e7841 in mozilla::IdleTaskRunner::Run() /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:124:14
#15 0x7ffbb15e840e in mozilla::IdleTaskRunnerTask::Run() /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:45:15
#16 0x7ffbb15f7096 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:907:26
#17 0x7ffbb15f59de in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:772:15
#18 0x7ffbb15f5cf5 in mozilla::TaskController::ProcessPendingMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:516:36
#19 0x7ffbb16059c6 in operator() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:234:37
#20 0x7ffbb16059c6 in mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.h:548:5
#21 0x7ffbb161acf2 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1199:16
#22 0x7ffbb1621e3d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
#23 0x7ffbb232b235 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:85:21
#24 0x7ffbb22410a1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#25 0x7ffbb22410a1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#26 0x7ffbb6cf8868 in nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:148:27
#27 0x7ffbb6dba628 in nsAppShell::Run() /builds/worker/checkouts/gecko/widget/gtk/nsAppShell.cpp:469:33
#28 0x7ffbb8c4c03b in XRE_RunAppShell() /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:712:20
#29 0x7ffbb232c116 in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:235:9
#30 0x7ffbb22410a1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#31 0x7ffbb22410a1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#32 0x7ffbb8c4b862 in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:647:34
#33 0x55f34635c496 in content_process_main /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
#34 0x55f34635c496 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:378:18
#35 0x7ffbc659bd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#36 0x7ffbc659be3f in __libc_start_main csu/../csu/libc-start.c:392:3
#37 0x55f3463321c8 in _start (/home/worker/builds/m-c-20240515032356-fuzzing-debug/firefox-bin+0x591c8) (BuildId: cd63beb80afcf153b4d505f337606914124f9b4d)

I'm going to look into bug 1897013. Steve can you take this one?

Flags: needinfo?(sphink)

(In reply to Jon Coppeard (:jonco) from comment #1)
Since this is marking validator issue the patch in bug 1897013 might fix this as well.

I should be able to confirm once the patch lands since the fuzzers are hitting this frequently.

Blocks: GC.stability
Severity: -- → S3
Priority: -- → P1
Has STR: --- → no
Keywords: testcase-wanted

The test case was just <!DOCTYPE html> so I think it is likely the prefs shaking out issues.

The common prefs seem to be:

javascript.options.ion.threshold = 10
javascript.options.mem.gc_zeal.mode = 11

I'm going to mark this as a fuzzblocker because it is easily triggered when javascript.options.mem.gc_zeal.mode = 11 is set.

Keywords: testcase-wanted
Whiteboard: [fuzzblocker]

This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:willyelm, could you consider increasing the severity?

For more information, please visit BugBot documentation.

Flags: needinfo?(wmedina)

(In reply to Tyson Smith [:tsmith] from comment #4)
I'm assuming this is still failing. Can you get a pernosco session for this since bug 1897013 landed?

A new Pernosco session is available here: https://pernos.co/debug/oTsozLkxZ9SN-GwHLbiMEg/index.html

I used m-c 20240528-046da0f065e9.

So far, I'm seeing that the incremental mark skips calling traceEmbeddingGrayRoots and the verification pass does not, resulting in an mStack field of a cycle-collected object only getting marked during the verification.

If I am reading this correctly, in the incremental GC we do beginMarkingSweepGroup then markGrayRootsInCurrentGroup, but the nonincremental validation happens in beginSweep which happens before either (and in fact happens before we've even computed the sweep groups). But that puts me in a "how does this ever work?" position, if we verify a GC before marking gray, but the verification does mark gray. Why does the zeal even matter? Shouldn't this always fail?

But I think it might be faster for jonco to take this now. You can use my notebook in the original Pernosco recording. I probably ought to learn how incremental verification and gray marking coordinate, but I think you'll probably recognize what's happening here immediately.

Flags: needinfo?(sphink) → needinfo?(jcoppeard)

(In reply to Steve Fink [:sfink] [:s:] from comment #8)

So far, I'm seeing that the incremental mark skips calling traceEmbeddingGrayRoots and the verification pass does not

Yes, we mark the black roots at the beginning and mark gray roots later, as we transition zone groups from marking to sweeping.

the nonincremental validation happens in beginSweep which happens before either (and in fact happens before we've even computed the sweep groups). But that puts me in a "how does this ever work?" position

We do a non-incremental mark in beginSweep to compute the mark bits we will compare against, but we don't do the actual comparison until later on when we have finished marking for the zone group. So this is fine.

Flags: needinfo?(jcoppeard)

What is happening is that the CC is deleting a promise during an incremental GC and the incremental marking validation is complaining that it is not being marked. This is in fact a violation of the snapshot-at-the-beginning invariant which is what we use to make incremental marking work (i.e. not free reachable things).

However, we don't have SATB for the CCed heap any more because we relaxed this with bug 1581574. We currently use a read barrier to make incremental marking work instead. So this is actually fine. There is no problem here and we can't delete reachable things, but the validator isn't seeing what it expects.

Some possible solutions:

  • don't validate gray bits (bad)
  • re-add the write barrier as well as the read barrier (wasteful, probably fine though)
  • go back to using a write barrier only and add/fix all the missing ExposeToActiveJS calls

Removing security sensitive flag.

I'm going to go with re-adding the write barrier here because it allows us to validate incremental gray marking and the cost is likely to be negligible compared to that of the existing read barrier.

Group: javascript-core-security

This renames address() and unsafeGet() methods to be const and non-const
versions of unsafeAddress.

The unnecessary operator bool() overload is removed.

Assignee: nobody → jcoppeard
Status: NEW → ASSIGNED

We don't need the read barrier when copying/assigning between Heap<T>s any more
if we are going to have a write barrier.

This reverses the changes to remove this in bug 1581574.

Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/95f73c01ac79 Part 1: Tidy some JS::Heap methods r=sfink https://hg.mozilla.org/integration/autoland/rev/54de0db07700 Part 2: Remove unnecessary read barrier when copying Heap<T> r=sfink https://hg.mozilla.org/integration/autoland/rev/8286bb22f140 Part 3: Add a write barrier for JS::Heap<T> r=sfink https://hg.mozilla.org/integration/autoland/rev/4930bc86aecc apply code formatting via Lando
Flags: needinfo?(wmedina)

The patch landed in nightly and beta is affected.
:jonco, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox128 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(jcoppeard)

This is quite a large change and so we should let it ride the trains. The issue does not affect users.

Flags: needinfo?(jcoppeard)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: