Closed Bug 1747643 Opened 3 years ago Closed 8 months ago

Intermittent Assertion failure: zone->needsIncrementalBarrier(), at /builds/worker/checkouts/gecko/js/src/gc/Marking.cpp:1050

Categories

(Core :: JavaScript: GC, defect, P5)

defect

Tracking

()

RESOLVED FIXED
125 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox123 --- wontfix
firefox124 --- wontfix
firefox125 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: sfink)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(3 files)

Filed by: abutkovits [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=362391844&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/IEZ7i6UOSEW8CQBym22kBw/runs/0/artifacts/public/logs/live_backing.log


[task 2021-12-26T11:40:24.360Z] 11:40:24     INFO - TEST-START | toolkit/components/passwordmgr/test/browser/browser_test_changeContentInputValue.js
[task 2021-12-26T11:40:24.388Z] 11:40:24     INFO - GECKO(7700) | [Child 2140: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 28070809000 == 3 [pid = 2140] [id = 109]
[task 2021-12-26T11:40:24.397Z] 11:40:24     INFO - GECKO(7700) | [Child 2140: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 5 (280703edac0) [pid = 2140] [serial = 329] [outer = 0]
[task 2021-12-26T11:40:24.399Z] 11:40:24     INFO - GECKO(7700) | [Child 2140: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 6 (28070809c00) [pid = 2140] [serial = 330] [outer = 280703edac0]
[task 2021-12-26T11:40:24.427Z] 11:40:24     INFO - GECKO(7700) | [Child 2140: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 7 (28070812800) [pid = 2140] [serial = 331] [outer = 280703edac0]
[task 2021-12-26T11:40:24.481Z] 11:40:24     INFO - GECKO(7700) | [Child 2140, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0xC1F30001 (NS_ERROR_NOT_INITIALIZED): file /builds/worker/checkouts/gecko/layout/forms/nsTextControlFrame.cpp:334
[task 2021-12-26T11:40:24.482Z] 11:40:24     INFO - GECKO(7700) | [Child 2140, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0xC1F30001 (NS_ERROR_NOT_INITIALIZED): file /builds/worker/checkouts/gecko/layout/forms/nsTextControlFrame.cpp:334
[task 2021-12-26T11:40:24.881Z] 11:40:24     INFO - GECKO(7700) | Assertion failure: zone->needsIncrementalBarrier(), at /builds/worker/checkouts/gecko/js/src/gc/Marking.cpp:1050
[task 2021-12-26T11:40:24.950Z] 11:40:24     INFO - GECKO(7700) | #01: js::gc::PerformIncrementalPreWriteBarrier(js::gc::TenuredCell*) [js/src/gc/Marking.cpp:1050]
[task 2021-12-26T11:40:24.960Z] 11:40:24     INFO - GECKO(7700) | #02: JSFreeOp::delete_<js::IndirectBindingMap>(js::gc::Cell*, js::IndirectBindingMap*, js::MemoryUse) [js/src/gc/FreeOp.h:106]
[task 2021-12-26T11:40:24.960Z] 11:40:24     INFO - GECKO(7700) | #03: static js::ModuleObject::finalize(JSFreeOp*, JSObject*) [js/src/builtin/ModuleObject.cpp:862]
[task 2021-12-26T11:40:24.961Z] 11:40:24     INFO - GECKO(7700) | #04: JSObject::finalize(JSFreeOp*) [js/src/vm/JSObject-inl.h:100]
[task 2021-12-26T11:40:24.962Z] 11:40:24     INFO - GECKO(7700) | #05: js::gc::Arena::finalize<JSObject>(JSFreeOp*, js::gc::AllocKind, unsigned long long) [js/src/gc/Sweeping.cpp:128]
[task 2021-12-26T11:40:24.962Z] 11:40:24     INFO - GECKO(7700) | #06: FinalizeTypedArenas<JSObject>(JSFreeOp*, js::gc::ArenaList&, js::gc::SortedArenaList&, js::gc::AllocKind, js::SliceBudget&) [js/src/gc/Sweeping.cpp:196]
[task 2021-12-26T11:40:24.963Z] 11:40:24     INFO - GECKO(7700) | #07: js::gc::GCRuntime::backgroundFinalize(JSFreeOp*, JS::Zone*, js::gc::AllocKind, js::gc::Arena**) [js/src/gc/Sweeping.cpp:261]
[task 2021-12-26T11:40:24.963Z] 11:40:24     INFO - GECKO(7700) | #08: js::gc::GCRuntime::sweepBackgroundThings(js::gc::ZoneList&) [js/src/gc/Sweeping.cpp:336]
[task 2021-12-26T11:40:24.964Z] 11:40:24     INFO - GECKO(7700) | #09: js::gc::GCRuntime::sweepFromBackgroundThread(js::AutoLockHelperThreadState&) [js/src/gc/Sweeping.cpp:418]
[task 2021-12-26T11:40:24.964Z] 11:40:24     INFO - GECKO(7700) | #10: js::gc::BackgroundSweepTask::run(js::AutoLockHelperThreadState&) [js/src/gc/Sweeping.cpp:404]
[task 2021-12-26T11:40:24.965Z] 11:40:24     INFO - GECKO(7700) | #11: js::GCParallelTask::runTask(js::AutoLockHelperThreadState&) [js/src/gc/GCParallelTask.cpp:176]
[task 2021-12-26T11:40:24.965Z] 11:40:24     INFO - GECKO(7700) | #12: js::GCParallelTask::runHelperThreadTask(js::AutoLockHelperThreadState&) [js/src/gc/GCParallelTask.cpp:164]
[task 2021-12-26T11:40:24.966Z] 11:40:24     INFO - GECKO(7700) | #13: js::GlobalHelperThreadState::runTaskLocked(js::HelperThreadTask*, js::AutoLockHelperThreadState&) [js/src/vm/HelperThreads.cpp:2251]
[task 2021-12-26T11:40:24.966Z] 11:40:24     INFO - GECKO(7700) | #14: js::GlobalHelperThreadState::runOneTask(js::AutoLockHelperThreadState&) [js/src/vm/HelperThreads.cpp:2218]
[task 2021-12-26T11:40:24.967Z] 11:40:24     INFO - GECKO(7700) | #15: JS::RunHelperThreadTask() [js/src/vm/HelperThreads.cpp:2205]
[task 2021-12-26T11:40:24.967Z] 11:40:24     INFO - GECKO(7700) | #16: HelperThreadTaskHandler::Run() [js/xpconnect/src/XPCJSContext.cpp:1107]
[task 2021-12-26T11:40:24.967Z] 11:40:24     INFO - GECKO(7700) | #17: mozilla::TaskController::RunPoolThread() [xpcom/threads/TaskController.cpp:257]
[task 2021-12-26T11:40:25.166Z] 11:40:25     INFO - GECKO(7700) | #18: PR_NativeRunThread(void*) [nsprpub/pr/src/threads/combined/pruthr.c:408]
[task 2021-12-26T11:40:25.173Z] 11:40:25     INFO - GECKO(7700) | #19: pr_root(void*) [nsprpub/pr/src/md/windows/w95thred.c:140]
[task 2021-12-26T11:40:25.197Z] 11:40:25     INFO -  fix-stacks: error: failed to read debug info file `ucrtbase.pdb` for `C:\Windows\System32\ucrtbase.dll`
[task 2021-12-26T11:40:25.203Z] 11:40:25     INFO -  fix-stacks: note: this is expected and harmless for all PDB files on opt automation runs
[task 2021-12-26T11:40:25.203Z] 11:40:25     INFO -  fix-stacks: The system cannot find the file specified. (os error 2)
[task 2021-12-26T11:40:25.203Z] 11:40:25     INFO - GECKO(7700) | #20: configthreadlocale [C:\Windows\System32\ucrtbase.dll + 0x21bb2]
[task 2021-12-26T11:40:25.204Z] 11:40:25     INFO - GECKO(7700) | #21: BaseThreadInitThunk [C:\Windows\System32\KERNEL32.DLL + 0x17034]
[task 2021-12-26T11:40:25.266Z] 11:40:25     INFO - GECKO(7700) | #22: patched_BaseThreadInitThunk(int, void*, void*) [toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp:570]
[task 2021-12-26T11:40:25.267Z] 11:40:25     INFO - GECKO(7700) | #23: RtlUserThreadStart [C:\Windows\SYSTEM32\ntdll.dll + 0x52651]
[task 2021-12-26T11:40:25.267Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
[task 2021-12-26T11:40:25.268Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
[task 2021-12-26T11:40:25.269Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
[task 2021-12-26T11:40:25.269Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
[task 2021-12-26T11:40:25.270Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
[task 2021-12-26T11:40:25.270Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
[task 2021-12-26T11:40:25.271Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
[task 2021-12-26T11:40:25.271Z] 11:40:25     INFO - GECKO(7700) | [Parent 4060, IPC I/O Parent] WARNING: [1.1]: Rejecting introduction request from 'C73919DD514B4F85.1B2BC14A746DA2B7' for unknown peer '20F410F32531BFD4.3DB3B9E4483E3336': file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:668
[task 2021-12-26T11:40:25.272Z] 11:40:25     INFO - GECKO(7700) | [GPU 7028, IPC I/O Child] WARNING: [C73919DD514B4F85.1B2BC14A746DA2B7]: Could not be introduced to peer 20F410F32531BFD4.3DB3B9E4483E3336: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:599
[task 2021-12-26T11:40:25.273Z] 11:40:25     INFO - GECKO(7700) | ###!!! [Parent][RunMessage] Error: Channel error: cannot send/recv
[task 2021-12-26T11:40:25.273Z] 11:40:25     INFO - GECKO(7700) | ###!!! [Parent][RunMessage] Error: Channel error: cannot send/recv

Update:

There have been 33 total failures within the last 7 days:

  • 13 failures on Linux 18.04 x64 WebRender debug
  • 4 failures on OS X 10.15 WebRender debug
  • 9 failures on Windows 11 x86 22H2 WebRender debug
  • 7 failures on Windows 11 x64 22H2 WebRender debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=419722755&repo=autoland&lineNumber=42129

[task 2023-06-17T17:57:29.575Z] 17:57:29     INFO - GECKO(9768) | Assertion failure: zone->needsIncrementalBarrier(), at /builds/worker/checkouts/gecko/js/src/gc/Marking.cpp:988
[task 2023-06-17T17:57:29.669Z] 17:57:29     INFO - GECKO(9768) | #01: js::gc::PerformIncrementalPreWriteBarrier(js::gc::TenuredCell*) [js/src/gc/Marking.cpp:988]
[task 2023-06-17T17:57:29.674Z] 17:57:29     INFO - GECKO(9768) | #02: mozilla::Vector<js::ExportEntry,0,js::SystemAllocPolicy>::~Vector() [mfbt/Vector.h:999]
[task 2023-06-17T17:57:29.675Z] 17:57:29     INFO - GECKO(9768) | #03: js::CyclicModuleFields::~CyclicModuleFields() [js/src/builtin/ModuleObject.cpp:623]
[task 2023-06-17T17:57:29.675Z] 17:57:29     INFO - GECKO(9768) | #04: JS::GCContext::delete_<js::CyclicModuleFields>(js::gc::Cell*, js::CyclicModuleFields*, unsigned int, js::MemoryUse) [js/src/gc/GCContext.h:164]
[task 2023-06-17T17:57:29.676Z] 17:57:29     INFO - GECKO(9768) | #05: js::ModuleObject::finalize(JS::GCContext*, JSObject*) [js/src/builtin/ModuleObject.cpp:890]
[task 2023-06-17T17:57:29.676Z] 17:57:29     INFO - GECKO(9768) | #06: JSObject::finalize(JS::GCContext*) [js/src/vm/JSObject-inl.h:99]
[task 2023-06-17T17:57:29.677Z] 17:57:29     INFO - GECKO(9768) | #07: js::gc::Arena::finalize<JSObject>(JS::GCContext*, js::gc::AllocKind, unsigned int) [js/src/gc/Sweeping.cpp:133]
[task 2023-06-17T17:57:29.679Z] 17:57:29     INFO - GECKO(9768) | #08: FinalizeTypedArenas<JSObject>(JS::GCContext*, js::gc::ArenaList&, js::gc::SortedArenaList&, js::gc::AllocKind, js::SliceBudget&) [js/src/gc/Sweeping.cpp:204]
[task 2023-06-17T17:57:29.682Z] 17:57:29     INFO - GECKO(9768) | #09: FinalizeArenas(JS::GCContext*, js::gc::ArenaList&, js::gc::SortedArenaList&, js::gc::AllocKind, js::SliceBudget&) [js/src/gc/Sweeping.cpp:241]
[task 2023-06-17T17:57:29.683Z] 17:57:29     INFO - GECKO(9768) | #10: js::gc::GCRuntime::backgroundFinalize(JS::GCContext*, JS::Zone*, js::gc::AllocKind, js::gc::Arena**) [js/src/gc/Sweeping.cpp:274]
[task 2023-06-17T17:57:29.683Z] 17:57:29     INFO - GECKO(9768) | #11: js::gc::GCRuntime::sweepBackgroundThings(js::gc::ZoneList&) [js/src/gc/Sweeping.cpp:351]
[task 2023-06-17T17:57:29.684Z] 17:57:29     INFO - GECKO(9768) | #12: js::gc::GCRuntime::sweepFromBackgroundThread(js::AutoLockHelperThreadState&) [js/src/gc/Sweeping.cpp:434]
[task 2023-06-17T17:57:29.685Z] 17:57:29     INFO - GECKO(9768) | #13: js::GCParallelTask::runTask(JS::GCContext*, js::AutoLockHelperThreadState&) [js/src/gc/GCParallelTask.cpp:210]
[task 2023-06-17T17:57:29.686Z] 17:57:29     INFO - GECKO(9768) | #14: js::GCParallelTask::runHelperThreadTask(js::AutoLockHelperThreadState&) [js/src/gc/GCParallelTask.cpp:195]
[task 2023-06-17T17:57:29.687Z] 17:57:29     INFO - GECKO(9768) | #15: js::GlobalHelperThreadState::runTaskLocked(js::HelperThreadTask*, js::AutoLockHelperThreadState&) [js/src/vm/HelperThreads.cpp:2716]
[task 2023-06-17T17:57:29.690Z] 17:57:29     INFO - GECKO(9768) | #16: js::GlobalHelperThreadState::runOneTask(js::AutoLockHelperThreadState&) [js/src/vm/HelperThreads.cpp:2683]
[task 2023-06-17T17:57:29.693Z] 17:57:29     INFO - GECKO(9768) | #17: JS::RunHelperThreadTask() [js/src/vm/HelperThreads.cpp:2670]
[task 2023-06-17T17:57:29.696Z] 17:57:29     INFO - GECKO(9768) | #18: HelperThreadTaskHandler::Run() [js/xpconnect/src/XPCJSContext.cpp:1177]
[task 2023-06-17T17:57:29.698Z] 17:57:29     INFO - GECKO(9768) | #19: mozilla::TaskController::RunPoolThread() [xpcom/threads/TaskController.cpp:356]
[task 2023-06-17T17:57:29.904Z] 17:57:29     INFO - GECKO(9768) | #20: _PR_NativeRunThread(void*) [nsprpub/pr/src/threads/combined/pruthr.c:399]
[task 2023-06-17T17:57:29.913Z] 17:57:29     INFO - GECKO(9768) | #21: pr_root(void*) [nsprpub/pr/src/md/windows/w95thred.c:139]
[task 2023-06-17T17:57:29.915Z] 17:57:29     INFO -  fix-stacks: error: failed to read debug info file `ucrtbase.pdb` for `C:\Windows\System32\ucrtbase.dll`
[task 2023-06-17T17:57:29.915Z] 17:57:29     INFO -  fix-stacks: note: this is expected and harmless for all PDB files on opt automation runs
[task 2023-06-17T17:57:29.916Z] 17:57:29     INFO -  fix-stacks: The system cannot find the file specified. (os error 2)
[task 2023-06-17T17:57:29.916Z] 17:57:29     INFO - GECKO(9768) | #22: time32 [C:\Windows\System32\ucrtbase.dll + 0x459d3]
[task 2023-06-17T17:57:29.918Z] 17:57:29     INFO - GECKO(9768) | #23: BaseThreadInitThunk [C:\Windows\System32\KERNEL32.DLL + 0x17d49]
[task 2023-06-17T17:57:30.013Z] 17:57:30     INFO - GECKO(9768) | #24: DllBlocklist_Initialize [Z:\task_168701970999290\build\application\firefox\mozglue.dll + 0x80833]
[task 2023-06-17T17:57:30.022Z] 17:57:30     INFO - GECKO(9768) | #25: RtlInitializeExceptionChain [C:\Windows\SYSTEM32\ntdll.dll + 0x6b74b]
[task 2023-06-17T17:57:30.023Z] 17:57:30     INFO - GECKO(9768) | #26: RtlClearBits [C:\Windows\SYSTEM32\ntdll.dll + 0x6b6cf]

Will, can you help us assign this to someone?
Thank you.

Flags: needinfo?(wmedina)
Whiteboard: [stockwell needswork:owner]
Flags: needinfo?(wmedina) → needinfo?(sfink)

I encountered this assert while fuzzing on git commit 151afcc1c9c44b3634bc76e2bcd40b8e40eb0992. The reproducer requires two files to be send via the reprl interface and is rather flakey. Still, I managed to capture a trace https://pernos.co/debug/XKOqLGF26iglL3y-f2cIEQ/index.html

Attached file reproducer_1.js
Attached file reproducer_2.js

I attached the two files which are sent in the captured trace (via the reprl interface)

The Pernosco recording looks like it's more than enough to figure this out. I'm shaky on a lot of this stuff here. Jon, I have a notebook populated in the Pernosco recording that describes what is happening, but in brief:

  • helper thread is running a background sweep
  • main thread is running the mutator
  • helper thread checks needsIncrementalBarriers(), gets true, calls PerformIncrementalPreWriteBarrier
  • main thread runs a GC slice, which temporarily turns off barriers, but then finishes and turns them back on
  • main thread runs another GC slice, which temporarily turns off barriers
  • helper thread asserts that barriers are on (after all, it's in a barrier!), fall down go boom

So the bug is rare because the main thread needs to resume in between the two needsIncrementalBarrier() checks and then suspend in the middle of a GC slice with barriers turned off.

The write barrier is triggered by the destruction of a module's ExportEntry, in particular one of its HeapPtr<JSAtom*> fields. It looks like this will check cell->isMarkedBlack() and possibly isAtomsZone() (which will be true), and either way return before doing anything else. So it seems like this is expected behavior, and perhaps the assertion needs to be after the atoms zone check?

Followup question: couldn't this also happen in ~ExportEntry while it is destroying its HeapPtr<ModuleRequestObject*> field? Is this really a background-sweepable class? The ExportEntry is inside of a CyclicModuleFields, which is JSCLASS_BACKGROUND_FINALIZE, but ExportEntry is defined with DEFINE_NATIVE_CLASS (at least for the shell), which does not seem to set that flag (and flags is 512 in the debugger, which indeed does not have that flag set).

Isn't this background-finalizing a non-background-finalizable class? I guess because it's in a GCVector, whose destructor deletes its elements. Should a background-finalizable class with a Vector field containing structs with GC pointers do something to forget its contents without deleting anything if it's running on a helper thread or something? How should this all work?

Flags: needinfo?(sfink) → needinfo?(jcoppeard)

(In reply to Steve Fink [:sfink] [:s:] from comment #121)
Thanks for looking into this.

The write barrier is triggered by the destruction of a module's ExportEntry, in particular one of its HeapPtr<JSAtom*> fields. It looks like this will check cell->isMarkedBlack() and possibly isAtomsZone() (which will be true), and either way return before doing anything else. So it seems like this is expected behavior, and perhaps the assertion needs to be after the atoms zone check?

Yes, this is expected behaviour, and I think that is the right solution.

In fact we don't need barriers at all when finalizing, only when we destroyed something outside of GC. We put barriers in the HeapPtr destructor so that we can easily use them in containers like vector that can move their contents and thus may destroy them outside GC.

Followup question: couldn't this also happen in ~ExportEntry while it is destroying its HeapPtr<ModuleRequestObject*> field?

This issue only affects the atoms zone because usually we don't start finalizing the contents of a zone while needsIncrementalBarriers() is still true (step 3 above). However any zone can contain pointers into the atoms zone so this can still have barriers enabled when we start sweeping other zones. One possibility would be to finish all marking before we start finalizing any zones, but we can finish GC earlier if we start finalizing zones as soon as they are ready.

Is this really a background-sweepable class?

Yes it's fine, we should be able to background finalize things that only contain GC pointers in the background.

Flags: needinfo?(jcoppeard)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c0301d36d0e1 fix race condition with overly strong assertion zone->needsIncrementalBarrier() r=jonco
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: