Assertion failure: element, at ds/SinglyLinkedList.h:77
Categories
(Core :: JavaScript: GC, defect, P2)
Tracking
()
People
(Reporter: gkw, Assigned: jandem)
References
(Blocks 3 open bugs)
Details
(Keywords: regression, reporter-external, testcase)
Attachments
(6 files)
|
51.67 KB,
application/x-compressed
|
Details | |
|
20.04 KB,
text/plain
|
Details | |
|
17.53 KB,
text/plain
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
See attachment, extract both files into the same directory and run with CLI parameters below, on testcase.js.
(lldb) bt
* thread #287, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x00000001006160e4 js-dbg-64-darwin-arm64-e849bd9a2174-606940`MOZ_CrashSequence(aAddress=0x0000000000000000, aLine=77) at Assertions.h:242:3 [opt] [inlined]
frame #1: 0x00000001006160e4 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::getFirst(this=<unavailable>) const at SinglyLinkedList.h:77:5 [opt] [inline
d]
frame #2: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::Iterator::Iterator(this=<unavailable>, list=<unavailable>) at SinglyLinkedL
ist.h:209:18 [opt] [inlined]
frame #3: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::Iterator::Iterator(this=<unavailable>, list=<unavailable>) at SinglyLinkedL
ist.h:209:52 [opt] [inlined]
frame #4: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::SinglyLinkedList<js::gc::Arena>::iter(this=<unavailable>) const at SinglyLinkedList.h:226:34 [opt] [inlined]
frame #5: 0x00000001006160b0 js-dbg-64-darwin-arm64-e849bd9a2174-606940`js::gc::BackgroundUnmarkTask::unmark(this=0x000000c5366aea30) at GC.cpp:2940:32 [opt]
/snip
Run with --fuzzing-safe --fast-warmup --ion-eager --more-compartments --gc-zeal=12 --no-baseline, compile with AR=ar sh ~/trees/firefox/js/src/configure --enable-debug --enable-debug-symbols --with-ccache --enable-nspr-build --enable-ctypes --enable-gczeal --enable-rust-simd --disable-tests, tested on gh rev e849bd9a2174718b9d1ca0a1ec756ba38c2f06c8.
Some other assertion failures seen are:
Assertion failure: cell, at /Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/objdir-js/dist/include/js/HeapAPI.h:730
Assertion failure: checkValue == FreeRegionCheckValue, at /Users/p2m/trees/firefox/js/src/gc/BufferAllocatorInternals.h:463
Assertion failure: slots == calculateDynamicSlots(), at /Users/p2m/trees/firefox/js/src/vm/JSObject-inl.h:36
Assertion failure: slotSpanSlow() == span, at /Users/p2m/trees/firefox/js/src/vm/Shape.h:582
and also:
Assertion failure: isInList(), at /Users/p2m/trees/firefox/js/src/ds/SlimLinkedList.h:98
#01: js::gc::LinkedListIter<js::gc::BufferAllocator::FreeRegion>::next()[/Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/js-dbg-64-darwin-arm64-e849bd9a2174-606940 +0x5f848c]
#02: js::NestedIterator<js::gc::BufferAllocator::FreeLists::FreeListIter, js::gc::LinkedListIter<js::gc::BufferAllocator::FreeRegion>>::next()[/Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/js-dbg-64-darwin-arm64-e849bd9a2174-606940 +0x5d77a4]
#03: js::gc::BufferAllocator::verifyChunk(js::gc::BufferChunk*, bool)[/Users/p2m/shell-cache/js-dbg-64-darwin-arm64-e849bd9a2174-606940/js-dbg-64-darwin-arm64-e849bd9a2174-606940 +0x5d5ba4]
More bisection information:
Rev ff0a6dce73e7 (from Jan 26, 2026) results in any of these:
Assertion failure: found(), at /Users/p2m/shell-cache/js-dbg-64-darwin-arm64-ff0a6dce73e7-605409/objdir-js/dist/include/mozilla/HashTable.h:1329
Assertion failure: element, at /Users/p2m/trees/firefox/js/src/ds/SinglyLinkedList.h:77
Rev 73cbb9ff0fdbf8b13f38d078ce01ef6ec0794f9c (earliest known rev) results in any of these:
Assertion failure: offset % Align == 0, at /Users/p2m/trees/firefox/js/src/gc/BufferAllocatorInternals.h:307
Assertion failure: element, at /Users/p2m/trees/firefox/js/src/ds/SinglyLinkedList.h:77
Note that this testcase is very resistant to reduction, somewhat intermittent (run a few times and lldb will catch it) and I tested this on macOS 26.2.
This seems to go back prior to gh rev https://github.com/mozilla-firefox/firefox/commit/73cbb9ff0fdbf8b13f38d078ce01ef6ec0794f9c and I am guessing it may be related to bug 1994023 again, as per bug 2011069.
Jon, is bug 1994023 a likely regressor?
| Reporter | ||
Comment 1•1 month ago
|
||
| Reporter | ||
Comment 2•1 month ago
|
||
Comment 3•1 month ago
|
||
Set release status flags based on info from the regressing bug 1994023
Updated•1 month ago
|
Comment 4•1 month ago
|
||
I'm not sure what's going on with this. I can reproduce this on macOS but it doesn't always crash in the same way. I can't reproduce it on Linux at all.
The testcase spawns 6000 threads and uses 26GB of memory so I'm wondering if it's some interaction with whatever macOS does when it gets low on memory.
The crashes do seem to often be caused by unexpected null pointers. I tried adding a extra poisoning when the buffer allocator decommits/recommits memory but the poisoning didn't show up in crashes.
I don't think it's likely bug 1994023 is related, but I don't know what the regressor might be until I know more about what's going wrong.
Updated•1 month ago
|
Updated•1 month ago
|
Comment 5•1 month ago
|
||
:jonco, please ping me if this turns out to be something other than a nullptr.
Updated•1 month ago
|
Comment 6•1 month ago
|
||
I don't think there's a way to progress this at the moment.
Updated•1 month ago
|
Comment 7•1 month ago
|
||
I tested this on macOS 26.2
Gary, do you know if this reproduces on earlier versions of macOS / other operating systems?
| Reporter | ||
Comment 8•1 month ago
|
||
I have been testing on previous versions and didn't notice anything weird in general, but not specifically with this testcase.
It isn't easy downgrading macOS versions either...
Updated•1 month ago
|
| Assignee | ||
Comment 9•1 month ago
|
||
I'm looking into this because it matches one of the signatures for which we've seen a recent-ish increase in crashes on Mac.
It's hard to reproduce but I'm fairly sure this is a MacOS kernel bug with MADV_FREE_REUSABLE + MADV_FREE_REUSE. When the system has very high memory pressure, the kernel sometimes still zeroes memory pages after we've used MADV_FREE_REUSE. This can then cause all kinds of GC crashes.
I have a stand-alone C++ test case that reproduces the issue intermittently and I'm now trying to make it more reliable.
| Assignee | ||
Comment 10•1 month ago
|
||
I think we can open this up. This is a kernel issue that's very hard to trigger. I've reported it to Apple with a stand-alone test case.
For now it might be best to switch back to MADV_FREE instead of MADV_FREE_REUSABLE + MADV_FREE_REUSE (revert bug 1567366). This will increase our RSS numbers unfortunately because the pages will only be freed by the OS when needed, but it should fix the crashes.
We also use MADV_FREE_REUSABLE + MADV_FREE_REUSE for our JIT code allocator. That code is more complex and we also use other syscalls there so I think we should start with the GC commit/decommit code, also because the crash signatures are GC-related, and see what that does to our crash data.
| Assignee | ||
Comment 11•1 month ago
|
||
MacOS Tahoe has a bug where (very intermittently) pages can be zeroed by the kernel
after going through madvise with MADV_FREE_REUSABLE and then MADV_FREE_REUSE.
The fuzz bug triggers this issue in the JS shell and we've also seen an increase in crashes
on Mac after the release of Tahoe, with very similar signatures. I've been able to
reproduce this with a stand-alone C++ test case and the issue has been reported to Apple.
I haven't been able to reproduce this bug with MADV_FREE so this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.
The JIT code allocator also uses MADV_FREE_REUSABLE and MADV_FREE_REUSE but it's
more complex and it's not clear if it's affected the same way, so let's start with the
GC memory allocator and see what the effect is on crash rates.
Updated•1 month ago
|
Comment 12•28 days ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #11)
I haven't been able to reproduce this bug with
MADV_FREEso this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.
It's a bit of a hasstle but something mozjemalloc does is first use MADV_FREE then later munmap and mmap to force the pages out. The OS accounting is accurate after that. I was going to switch to MADV_REUSABLE but didn't follow-through. Bug 1760254.
Comment 13•24 days ago
|
||
Comment 14•24 days ago
|
||
| bugherder | ||
Comment 15•23 days ago
|
||
The patch landed in nightly and beta is affected.
:jandem, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- See https://wiki.mozilla.org/Release_Management/Requesting_an_Uplift for documentation on how to request an uplift.
- If no, please set
status-firefox149towontfix.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 16•20 days ago
|
||
This still reproduced with my stand-alone test case on version 26.3.1
| Assignee | ||
Comment 17•20 days ago
|
||
MacOS Tahoe has a bug where (very intermittently) pages can be zeroed by the kernel
after going through madvise with MADV_FREE_REUSABLE and then MADV_FREE_REUSE.
The fuzz bug triggers this issue in the JS shell and we've also seen an increase in crashes
on Mac after the release of Tahoe, with very similar signatures. I've been able to
reproduce this with a stand-alone C++ test case and the issue has been reported to Apple.
I haven't been able to reproduce this bug with MADV_FREE so this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.
The JIT code allocator also uses MADV_FREE_REUSABLE and MADV_FREE_REUSE but it's
more complex and it's not clear if it's affected the same way, so let's start with the
GC memory allocator and see what the effect is on crash rates.
Original Revision: https://phabricator.services.mozilla.com/D285226
Updated•20 days ago
|
Comment 18•20 days ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined: Should fix an increase in number of crashes on macOS Tahoe.
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: Change itself has pretty low risk but this might increase our macOS RSS memory usage numbers because we're switching back to MADV_FREE now and the kernel's memory usage accounting is a bit worse (delayed) for that.
- String changes made/needed: N/A
- Is Android affected?: no
| Assignee | ||
Comment 19•20 days ago
|
||
MacOS Tahoe has a bug where (very intermittently) pages can be zeroed by the kernel
after going through madvise with MADV_FREE_REUSABLE and then MADV_FREE_REUSE.
The fuzz bug triggers this issue in the JS shell and we've also seen an increase in crashes
on Mac after the release of Tahoe, with very similar signatures. I've been able to
reproduce this with a stand-alone C++ test case and the issue has been reported to Apple.
I haven't been able to reproduce this bug with MADV_FREE so this patch switches back to
that, reverting bug 1567366 and bug 1682947. Unfortunately this means our RSS numbers are
likely to increase.
The JIT code allocator also uses MADV_FREE_REUSABLE and MADV_FREE_REUSE but it's
more complex and it's not clear if it's affected the same way, so let's start with the
GC memory allocator and see what the effect is on crash rates.
Original Revision: https://phabricator.services.mozilla.com/D285226
Updated•20 days ago
|
Comment 20•20 days ago
|
||
firefox-esr140 Uplift Approval Request
- User impact if declined: Should fix an increase in number of crashes on macOS Tahoe.
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing:
- Risk associated with taking this patch: low
- Explanation of risk level: Change itself has pretty low risk but this might increase our macOS RSS memory usage numbers because we're switching back to MADV_FREE now and the kernel's memory usage accounting is a bit worse (delayed) for that.
- String changes made/needed: N/A
- Is Android affected?: no
Updated•20 days ago
|
Updated•20 days ago
|
Comment 21•20 days ago
|
||
| uplift | ||
Updated•18 days ago
|
Updated•17 days ago
|
Updated•17 days ago
|
Comment 22•17 days ago
|
||
| uplift | ||
| Assignee | ||
Updated•13 days ago
|
Updated•12 days ago
|
Description
•