Closed Bug 2015359 Opened 1 month ago Closed 24 days ago

Assertion failure: element, at ds/SinglyLinkedList.h:77

Categories

(Core :: JavaScript: GC, defect, P2)

All
macOS
defect

Tracking

()

RESOLVED FIXED
150 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr140 --- fixed
firefox147 --- wontfix
firefox148 --- wontfix
firefox149 --- fixed
firefox150 --- fixed

People

(Reporter: gkw, Assigned: jandem)

References

(Blocks 3 open bugs)

Details

(Keywords: regression, reporter-external, testcase)

Attachments

(6 files)

Attached file both testcases

See attachment, extract both files into the same directory and run with CLI parameters below, on testcase.js.

(lldb) bt
* thread #287, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001006160e4 js-dbg-64-darwin-arm64-e849bd9a2174-606940`MOZ_CrashSequence(aAddress=0x0000000000000000, aLine=77) at Assertions.h:242:3 [opt] [inlined]
    frame #1: 0x00000001006160e4 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::getFirst(this=<unavailable>) const at SinglyLinkedList.h:77:5 [opt] [inline
d]
    frame #2: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::Iterator::Iterator(this=<unavailable>, list=<unavailable>) at SinglyLinkedL
ist.h:209:18 [opt] [inlined]
    frame #3: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::Iterator::Iterator(this=<unavailable>, list=<unavailable>) at SinglyLinkedL
ist.h:209:52 [opt] [inlined]
    frame #4: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::iter(this=<unavailable>) const at SinglyLinkedList.h:226:34 [opt] [inlined]
    frame #5: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::gc::BackgroundUnmarkTask::unmark(this=0x000000c5366aea30) at GC.cpp:2940:32 [opt]
/snip

Run with --fuzzing-safe --fast-warmup --ion-eager --more-compartments --gc-zeal=12 --no-baseline, compile with AR=ar sh ~/trees/firefox/js/src/configure --enable-debug --enable-debug-symbols --with-ccache --enable-nspr-build --enable-ctypes --enable-gczeal --enable-rust-simd --disable-tests, tested on gh rev e849bd9a2174718b9d1ca0a1ec756ba38c2f06c8.

Some other assertion failures seen are:

Assertion failure: cell, at /Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/objdir-js/dist/include/js/HeapAPI.h:730
Assertion failure: checkValue == FreeRegionCheckValue, at /Users/p2m/trees/firefox/js/src/gc/BufferAllocatorInternals.h:463
Assertion failure: slots == calculateDynamicSlots(), at /Users/p2m/trees/firefox/js/src/vm/JSObject-inl.h:36
Assertion failure: slotSpanSlow() == span, at /Users/p2m/trees/firefox/js/src/vm/Shape.h:582

and also:

Assertion failure: isInList(), at /Users/p2m/trees/firefox/js/src/ds/SlimLinkedList.h:98
#01: js::gc::LinkedListIter<js::gc::BufferAllocator::FreeRegion>::next()[/Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/js-dbg-64-darwin-arm64-e849bd9a2174-606940 +0x5f848c]
#02: js::NestedIterator<js::gc::BufferAllocator::FreeLists::FreeListIter, js::gc::LinkedListIter<js::gc::BufferAllocator::FreeRegion>>::next()[/Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/js-dbg-64-darwin-arm64-e849bd9a2174-606940 +0x5d77a4]
#03: js::gc::BufferAllocator::verifyChunk(js::gc::BufferChunk*, bool)[/Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/js-dbg-64-darwin-arm64-e849bd9a2174-606940 +0x5d5ba4]

More bisection information:

Rev ff0a6dce73e7 (from Jan 26, 2026) results in any of these:

Assertion failure: found(), at /Users/p2m/shell-cache/js-dbg-64-darwin-arm64-ff0a6dce73e7-605409/objdir-js/dist/include/mozilla/HashTable.h:1329
Assertion failure: element, at /Users/p2m/trees/firefox/js/src/ds/SinglyLinkedList.h:77


Rev 73cbb9ff0fdbf8b13f38d078ce01ef6ec0794f9c (earliest known rev) results in any of these:

Assertion failure: offset % Align == 0, at /Users/p2m/trees/firefox/js/src/gc/BufferAllocatorInternals.h:307
Assertion failure: element, at /Users/p2m/trees/firefox/js/src/ds/SinglyLinkedList.h:77

Note that this testcase is very resistant to reduction, somewhat intermittent (run a few times and lldb will catch it) and I tested this on macOS 26.2.

This seems to go back prior to gh rev https://github.com/mozilla-firefox/firefox/commit/73cbb9ff0fdbf8b13f38d078ce01ef6ec0794f9c and I am guessing it may be related to bug 1994023 again, as per bug 2011069.

Jon, is bug 1994023 a likely regressor?

Flags: sec-bounty?
Flags: needinfo?(jcoppeard)

Set release status flags based on info from the regressing bug 1994023

Group: core-security → javascript-core-security

I'm not sure what's going on with this. I can reproduce this on macOS but it doesn't always crash in the same way. I can't reproduce it on Linux at all.

The testcase spawns 6000 threads and uses 26GB of memory so I'm wondering if it's some interaction with whatever macOS does when it gets low on memory.

The crashes do seem to often be caused by unexpected null pointers. I tried adding a extra poisoning when the buffer allocator decommits/recommits memory but the poisoning didn't show up in crashes.

I don't think it's likely bug 1994023 is related, but I don't know what the regressor might be until I know more about what's going wrong.

Keywords: sec-low

:jonco, please ping me if this turns out to be something other than a nullptr.

I don't think there's a way to progress this at the moment.

Flags: needinfo?(jcoppeard)
Blocks: GC.stability
Severity: -- → S3
Keywords: stalled

I tested this on macOS 26.2

Gary, do you know if this reproduces on earlier versions of macOS / other operating systems?

Flags: needinfo?(nth10sd)

I have been testing on previous versions and didn't notice anything weird in general, but not specifically with this testcase.

It isn't easy downgrading macOS versions either...

Flags: needinfo?(nth10sd)
Priority: -- → P2

I'm looking into this because it matches one of the signatures for which we've seen a recent-ish increase in crashes on Mac.

It's hard to reproduce but I'm fairly sure this is a MacOS kernel bug with MADV_FREE_REUSABLE + MADV_FREE_REUSE. When the system has very high memory pressure, the kernel sometimes still zeroes memory pages after we've used MADV_FREE_REUSE. This can then cause all kinds of GC crashes.

I have a stand-alone C++ test case that reproduces the issue intermittently and I'm now trying to make it more reliable.

I think we can open this up. This is a kernel issue that's very hard to trigger. I've reported it to Apple with a stand-alone test case.

For now it might be best to switch back to MADV_FREE instead of MADV_FREE_REUSABLE + MADV_FREE_REUSE (revert bug 1567366). This will increase our RSS numbers unfortunately because the pages will only be freed by the OS when needed, but it should fix the crashes.

We also use MADV_FREE_REUSABLE + MADV_FREE_REUSE for our JIT code allocator. That code is more complex and we also use other syscalls there so I think we should start with the GC commit/decommit code, also because the crash signatures are GC-related, and see what that does to our crash data.

Group: javascript-core-security
No longer regressed by: 1994023
See Also: → 1567366

MacOS Tahoe has a bug where (very intermittently) pages can be zeroed by the kernel
after going through madvise with MADV_FREE_REUSABLE and then MADV_FREE_REUSE.

The fuzz bug triggers this issue in the JS shell and we've also seen an increase in crashes
on Mac after the release of Tahoe, with very similar signatures. I've been able to
reproduce this with a stand-alone C++ test case and the issue has been reported to Apple.

I haven't been able to reproduce this bug with MADV_FREE so this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.

The JIT code allocator also uses MADV_FREE_REUSABLE and MADV_FREE_REUSE but it's
more complex and it's not clear if it's affected the same way, so let's start with the
GC memory allocator and see what the effect is on crash rates.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED

(In reply to Jan de Mooij [:jandem] from comment #11)

I haven't been able to reproduce this bug with MADV_FREE so this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.

It's a bit of a hasstle but something mozjemalloc does is first use MADV_FREE then later munmap and mmap to force the pages out. The OS accounting is accurate after that. I was going to switch to MADV_REUSABLE but didn't follow-through. Bug 1760254.

Status: ASSIGNED → RESOLVED
Closed: 24 days ago
Resolution: --- → FIXED
Target Milestone: --- → 150 Branch

The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?

For more information, please visit BugBot documentation.

Flags: needinfo?(jdemooij)

This still reproduced with my stand-alone test case on version 26.3.1

MacOS Tahoe has a bug where (very intermittently) pages can be zeroed by the kernel
after going through madvise with MADV_FREE_REUSABLE and then MADV_FREE_REUSE.

The fuzz bug triggers this issue in the JS shell and we've also seen an increase in crashes
on Mac after the release of Tahoe, with very similar signatures. I've been able to
reproduce this with a stand-alone C++ test case and the issue has been reported to Apple.

I haven't been able to reproduce this bug with MADV_FREE so this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.

The JIT code allocator also uses MADV_FREE_REUSABLE and MADV_FREE_REUSE but it's
more complex and it's not clear if it's affected the same way, so let's start with the
GC memory allocator and see what the effect is on crash rates.

Original Revision: https://phabricator.services.mozilla.com/D285226

Attachment #9550942 - Flags: approval-mozilla-beta?

firefox-beta Uplift Approval Request

  • User impact if declined: Should fix an increase in number of crashes on macOS Tahoe.
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing:
  • Risk associated with taking this patch: low
  • Explanation of risk level: Change itself has pretty low risk but this might increase our macOS RSS memory usage numbers because we're switching back to MADV_FREE now and the kernel's memory usage accounting is a bit worse (delayed) for that.
  • String changes made/needed: N/A
  • Is Android affected?: no

MacOS Tahoe has a bug where (very intermittently) pages can be zeroed by the kernel
after going through madvise with MADV_FREE_REUSABLE and then MADV_FREE_REUSE.

The fuzz bug triggers this issue in the JS shell and we've also seen an increase in crashes
on Mac after the release of Tahoe, with very similar signatures. I've been able to
reproduce this with a stand-alone C++ test case and the issue has been reported to Apple.

I haven't been able to reproduce this bug with MADV_FREE so this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.

The JIT code allocator also uses MADV_FREE_REUSABLE and MADV_FREE_REUSE but it's
more complex and it's not clear if it's affected the same way, so let's start with the
GC memory allocator and see what the effect is on crash rates.

Original Revision: https://phabricator.services.mozilla.com/D285226

Attachment #9550943 - Flags: approval-mozilla-esr140?

firefox-esr140 Uplift Approval Request

  • User impact if declined: Should fix an increase in number of crashes on macOS Tahoe.
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing:
  • Risk associated with taking this patch: low
  • Explanation of risk level: Change itself has pretty low risk but this might increase our macOS RSS memory usage numbers because we're switching back to MADV_FREE now and the kernel's memory usage accounting is a bit worse (delayed) for that.
  • String changes made/needed: N/A
  • Is Android affected?: no
Attachment #9550942 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Regressions: 2022329
Flags: sec-bounty? → sec-bounty-
Attachment #9550943 - Flags: approval-mozilla-esr140? → approval-mozilla-esr140+
Flags: needinfo?(jdemooij)
QA Whiteboard: [qa-triage-done-c150/b149]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: