Crash in [@ js::jit::MBasicBlock::isMarked]
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox125 | --- | affected |
People
(Reporter: release-mgmt-account-bot, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, stalled)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/fe015da6-26b7-4b0b-ac5d-2b3bb0240223
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll js::jit::MBasicBlock::isMarked const js/src/jit/MIRGraph.h:471
0 xul.dll js::jit::PruneUnusedBranches js/src/jit/IonAnalysis.cpp:393
0 xul.dll js::jit::OptimizeMIR js/src/jit/Ion.cpp:999
1 xul.dll js::jit::CompileBackEnd js/src/jit/Ion.cpp:1605
2 xul.dll js::jit::IonCompileTask::runTask js/src/jit/IonCompileTask.cpp:52
2 xul.dll js::jit::IonCompileTask::runHelperThreadTask js/src/jit/IonCompileTask.cpp:30
3 xul.dll js::GlobalHelperThreadState::runTaskLocked js/src/vm/HelperThreads.cpp:1728
3 xul.dll js::GlobalHelperThreadState::runOneTask js/src/vm/HelperThreads.cpp:1689
4 xul.dll JS::RunHelperThreadTask js/src/vm/HelperThreads.cpp:1684
4 xul.dll HelperThreadTaskHandler::Run js/xpconnect/src/XPCJSContext.cpp:1118
By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:
- First crash report: 2023-12-21
- Process type: Content
- Is startup crash: No
- Has user comments: No
- Is null crash: Yes - all crashes happened on null or near null memory address
Reporter | ||
Comment 1•11 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::JavaScript Engine: JIT' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•11 months ago
|
||
I grabbed the last week of crashes. 6/10 crashes in isMarked were in PruneUnusedBranches. All of those claim to crash on this line. This is a little interesting, because there are several calls to isMarked, but not that interesting, because it's the first time we traverse the graph, so it would run into any data corruption first. 5/6 are null or near-null. The sixth crashes with a non-canonical address, with an obvious bitflip in the high bits of a pointer.
Looking at the crashing disassembly, there are a few different patterns, but this one shows up four times:
cmp %rdx,%rax
jne 0x7ffb1f23098b
mov 0x80(%rcx),%rbx
cmpb $0x0,0xc4(%rbx) <- failing instruction
jne 0x7ffb95f24ead
movb $0x1,0xc4(%rbx)
cmpb $0x0,0x11(%rbx)
In each case, rdx and rax are some pointer value, and they're always equal (as expected, since otherwise we would take the branch). I'm pretty sure that 0xc4 is the offset of the isMarked boolean inside MBasicBlock: it doesn't quite match up with my debug build, but it's in the right ballpark, and there are some debug-only members. I'm pretty sure this code and the following code disassembles to
if (succ->isMarked()) {
continue;
}
JitSpew(JitSpew_Prune, "Reaches block %u", succ->id());
if (!markReachable(succ)) {
return false;
}
where markReachable
is a lambda defined above as
auto markReachable = [&](MBasicBlock* block) -> bool {
block->mark();
numMarked++;
if (block->alwaysBails()) {
needsTrim = true;
}
return worklist.append(block);
};
So that mostly confirms that the crash report isn't lying to us. We're crashing because succ is null.
The problem is that I'm a little unclear about how we're getting our hands on succ. In the source, we call MBasicBlock* succ = block->getSuccessor(i)
, which should be calling a virtual function on the control instruction for the block, and I can't find anything that looks like a virtual call. Staring at the code for a while, I think I've convinced myself that this is clang doing speculative devirtualization. Attempted partial decompilation:
// r12 is block
7ffb95f24dfd: cmpb $0x0,0x11(%r12) if block->alwaysbails
7ffb95f24e03: jne 0x7ffb95f24dd5 continue
7ffb95f24e05: mov 0x30(%r12),%rax rax = block->instructions_->rBegin
7ffb95f24e0a: lea -0x48(%rax),%rcx (vtable nonsense happens here?)
7ffb95f24e0e: test %rax,%rax ...
7ffb95f24e11: cmove %rax,%rcx ...
7ffb95f24e15: mov (%rcx),%rax ...
7ffb95f24e18: mov 0xf0(%rax),%rax ...
7ffb95f24e1f: cmp %r13,%rax compare to a baked-in constant we loaded above
7ffb95f24e22: je 0x7ffb95f24e34 if it matches, skip next check
7ffb95f24e24: lea -0x12ef0b(%rip),%rdx otherwise, load a different constant
7ffb95f24e2b: cmp %rdx,%rax if it doesn't match
7ffb95f24e2e: jne 0x7ffb95f24fc9 jump to some code far away (failed devirt?)
7ffb95f24e34: mov 0x30(%r12),%rax r12 = block, rax = block->instructions_->rBegin
7ffb95f24e39: lea -0x48(%rax),%rcx (more vtable nonsense?)
7ffb95f24e3d: test %rax,%rax ...
7ffb95f24e40: cmove %rax,%rcx ...
7ffb95f24e44: mov (%rcx),%rax ...
7ffb95f24e47: mov 0xf8(%rax),%rax ...
7ffb95f24e4e: lea -0x65ff5(%rip),%rdx load a constant
7ffb95f24e55: cmp %rdx,%rax compare whatever we got
7ffb95f24e58: jne 0x7ffb95f24f3d jump somewhere else
7ffb95f24e5e: mov 0x80(%rcx),%rbx load the successor from a known offset (maybe this is MGoto?)
7ffb95f24e65: cmpb $0x0,0xc4(%rbx) succ->isMarked() <- this is the failing instruction
After all this staring, nothing is jumping out at me as a problem. It's tempting to think that maybe this is a bug in clang's devirtualization, but that seems pretty unlikely. Glancing at the other crash that doesn't match this pattern, it certainly seems possible that it is one of the other branches of the devirtualization. It crashes on these instructions:
mov (%rcx,%rdi,8),%rbx
cmpb $0x0,0xc4(%rbx)
Those are what I would expect to see in MTest::getSuccessor. So these crashes are consistent with the hypothesis that sometimes we just have null pointers where our successors should be. It's hard to imagine what sort of bug in our code would lead to that happening at such a low frequency. Maybe we're missing some OOM handling somewhere? Otherwise it's maybe just a downstream consequence of weird bitflips in WarpBuilder.
The crash rate here is quite low. I'm going to mark this as stalled unless we can somehow reproduce this crash in a fuzzer.
Description
•