Open Bug 1896604 Opened 2 months ago Updated 1 month ago

SIGSEGV in js::gc::MapAlignedPages (SpiderMonkey 91)

Categories

(Core :: JavaScript Engine, defect, P5)

defect

Tracking

()

UNCONFIRMED

People

(Reporter: dch, Unassigned)

References

(Blocks 1 open bug)

Details

Steps to reproduce:

Using embedded spidermonkey, on aarch64 Ampere eMag and Ampere Altra CPUs, the JIT-enabled runtime coredumps with SIGSEGV during GC. Observed on FreeBSD.

reproducability:

  • get an aarch64 cpu with FreeBSD 14.0-RELEASE or similar
  • install npm-node20, spidermonkey91, gmake, elixir, erlang, git, gmake, libtool, ncurses
  • unpack apache couchdb 3.x tarball
  • ./configure --spidermonkey-version 91 && gmake && gmake eunit
  • check for SIGSEGV in dmesg
  • check for couchjs.core files after test suite completion

Actual results:

dumps core:

  • frame #0: 0x000000008321f3a0 libmozjs-91.sojs::gc::MapAlignedPages(unsigned long, unsigned long) + 1232 frame #1: 0x00000000831e56f0 libmozjs-91.sojs::gc::GCRuntime::pickChunk(js::AutoLockGCBgAlloc&) + 132
    frame #2: 0x00000000831e54a4 libmozjs-91.sojs::gc::ArenaLists::refillFreeListAndAllocate(js::gc::FreeLists&, js::gc::AllocKind, js::gc::ShouldCheckThresholds) + 308 frame #3: 0x00000000831e4638 libmozjs-91.sojs::jit::JitCode* js::Allocate<js::jit::JitCode, (js::AllowGC)0>(JSContext*) + 128
    frame #4: 0x0000000083478618 libmozjs-91.sojs::jit::JitCode* js::jit::JitCode::New<(js::AllowGC)0>(JSContext*, unsigned char*, unsigned int, unsigned int, js::jit::ExecutablePool*, js::jit::CodeKind) + 44 frame #5: 0x00000000834a27c8 libmozjs-91.sojs::jit::Linker::newCode(JSContext*, js::jit::CodeKind) + 284
    frame #6: 0x00000000832922c8 libmozjs-91.sojs::jit::BaselineCacheIRCompiler::compile() + 15424 frame #7: 0x000000008329b6e0 libmozjs-91.sojs::jit::AttachBaselineCacheIRStub(JSContext*, js::jit::CacheIRWriter const&, js::jit::CacheKind, JSScript*, js::jit::ICScript*, js::jit::ICFallbackStub*, bool*) + 212
    frame #8: 0x00000000832cf2a0 libmozjs-91.so___lldb_unnamed_symbol30268 + 276 frame #9: 0x00000000832cf070 libmozjs-91.sojs::jit::DoGetPropFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, JS::MutableHandle<JS::Value>, JS::MutableHandle<JS::Value>) + 444
    frame #10: 0x000014c25c2ea554
    thread #2, name = 'JS Helper', stop reason = signal SIGSEGV
    frame #0: 0x00000000859c95c4 libc.so.7__sys__umtx_op at _umtx_op.S:4 frame #1: 0x000000008c3e01cc libthr.so.3_thr_umtx_timedwait_uint [inlined] _umtx_op_err(obj=<unavailable>, op=<unavailable>, val=<unavailable>, uaddr=<unavailable>, uaddr2=<unavailable>) at thr_umtx.c:37:6
    frame #2: 0x000000008c3e01c0 libthr.so.3_thr_umtx_timedwait_uint(mtx=<unavailable>, id=2364333648, clockid=<unavailable>, abstime=<unavailable>, shared=-1930633648) at thr_umtx.c:234:10 frame #3: 0x000000008c3d5584 libthr.so.3_thr_sleep(curthread=<unavailable>, clockid=<unavailable>, abstime=<unavailable>) at thr_kern.c:197:9 [artificial]
    frame #4: 0x000000008c3d02b0 libthr.so.3cond_wait_common [inlined] cond_wait_user(cvp=0x000072cf28e1a0e0, mp=0x000072cf29400008, abstime=0x0000000000000000, cancel=1) at thr_cond.c:318:11 frame #5: 0x000000008c3d0210 libthr.so.3cond_wait_common(cond=<unavailable>, mutex=<unavailable>, abstime=0x0000000000000000, cancel=1) at thr_cond.c:378:11
    frame #6: 0x0000000083600fe4 libmozjs-91.somozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) + 84 frame #7: 0x0000000082f0fa44 libmozjs-91.sojs::HelperThread::threadLoop(js::InternalThreadPool*) + 232
    frame #8: 0x0000000082f0f8e0 libmozjs-91.sojs::HelperThread::ThreadMain(js::InternalThreadPool*, js::HelperThread*) + 92 frame #9: 0x0000000082f130f8 libmozjs-91.sojs::detail::ThreadTrampoline<void (&)(js::InternalThreadPool*, js::HelperThread*), js::InternalThreadPool*&, js::HelperThread*>::Start(void*) + 44
    frame #10: 0x000000008c3d134c libthr.so.3`thread_start(curthread=0x000072cf28e12700) at thr_create.c:290:16
    ... 3 more threads in same state as thread 2

Expected results:

no GC bugs \o/

The Bugbug bot thinks this bug should belong to the 'Core::JavaScript Engine' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Version: other → unspecified
Severity: -- → S3
Priority: -- → P5
Summary: SIGSEGV in js::gc::MapAlignedPages → SIGSEGV in js::gc::MapAlignedPages (SpiderMonkey 91)

It's worth seeing if this persists when catching up to something more modern -- SpiderMonkey 91 is -very old- at this point, and so it's unlikely we'd want to fix it directly.

This stack is also consistent with some sort of resource exhaustion. MapAlignedPages is trying to allocate new memory meeting certain criteria. If the operating system can't give it to us, then at least some of the time we give up and crash. I know that in some pathological cases on Linux, we can exceed the number of mmap mappings that the kernel is willing to create and crash even though there's still plenty of memory remaining.

This looks like it might be related to bug 1876632, which is also crashing in similar code on arm64 FreeBSD. Unfortunately, FreeBSD is a tier-3 target, by which I mean we don't have access to the hardware needed to debug this. I suspect that there's some interaction between FreeBSD and arm64 hardware that isn't working well here.

This comment linked from the other bug speculates that it might have something to do with addresses outside the 48-bit address space. To make NaN-boxing work, SpiderMonkey requires that the memory it allocates must be in the lowest 2^48 bytes of the address space. This is not a problem on most hardware, since even 64-bit machines typically only support 48-bit addresses (requiring the top 16 bits to be all 0s or all 1s), but it looks like there are at least a few arm64 machines with a LVA (Large Virtual Addressing) extension. I can't find any indication online that either of the Ampere machines has LVA, but if you are running on such a machine, that might be contributing to the problem. I have no idea what FreeBSD's support for LVA looks like.

If you can debug the problem locally and get a better idea of precisely where/how the crash is happening, that might shed some light on the issue.

See Also: → 1876632

The following demonstrates that arm64 has one more bit for address space on arm64 than amd64 on FreeBSD at least for LA48 amd64, which can also be seen in
VM_MAXUSER_ADDRESS in machine/*

#include <sys/mman.h>
#include <stdio.h>
int main() {
fprintf(stderr,"%p\n", mmap((void*)(1UL << 47), 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANON|MAP_FIXED, -1, 0));
}
on amd64 it returns MAP_FAILED and on arm64 it returns a valid pointer. So you can get an allocation out of bounds since 0x00007fffffffffff is the uint64_t maxJSAddress = UINT64_C(0x00007fffffffffff) not allowing the last extra bit on arm64. i.e on arm64 0x800000000000 and above are valid addresses...

at original poster if you disable aslr does the tests work? That would indicate that it is similar to the firefox bug.. 1876632

You need to log in before you can comment on or make changes to this bug.