Open Bug 1906020 Opened 7 months ago Updated 6 months ago

Crash in [@ js::InlineList<T>::insertAfterUnchecked]

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

Other
All
defect

Tracking

()

Tracking Status
firefox129 --- affected

People

(Reporter: release-mgmt-account-bot, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/83b4a81f-cb02-4797-b2a3-58f710240628

Reason: SIGSEGV / SI_KERNEL

Top 10 frames of crashing thread:

0  libxul.so  js::InlineList<js::jit::MUse>::insertAfterUnchecked  js/src/jit/InlineList.h:313
0  libxul.so  js::InlineList<js::jit::MUse>::pushFrontUnchecked  js/src/jit/InlineList.h:272
0  libxul.so  js::jit::MDefinition::addUseUnchecked  js/src/jit/MIR.h:820
0  libxul.so  js::jit::MUse::initUnchecked  js/src/jit/MIR.h:9054
0  libxul.so  js::jit::MUse::MUse  js/src/jit/MIR.h:234
0  libxul.so  mozilla::detail::VectorImpl<js::jit::MUse,   mfbt/Vector.h:154
0  libxul.so  mozilla::Vector<js::jit::MUse,   mfbt/Vector.h:768
0  libxul.so  js::jit::MPhi::addInputSlow  js/src/jit/MIR.h:5773
0  libxul.so  js::jit::MBasicBlock::addPredecessorPopN  js/src/jit/MIRGraph.cpp:1007
1  libxul.so  js::jit::MBasicBlock::addPredecessor  js/src/jit/MIRGraph.cpp:977

By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:

  • First crash report: 2024-05-13
  • Process type: Content
  • Is startup crash: No
  • Has user comments: No
  • Is null crash: Yes - 2 out of 3 crashes happened on null or near null memory address

The Bugbug bot thinks this bug should belong to the 'Core::JavaScript Engine: JIT' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: General → JavaScript Engine: JIT

Looking at the linked crash, we see these crashing instructions:

shl    $0x5,%rcx
shl    $0x5,%rax
lea    (%r8,%rax,1),%rsi
movups %xmm0,(%r8,%rax,1)
mov    %r12,0x18(%r8,%rax,1)                  
mov    %r15,-0x10(%r8,%rcx,1)                 
lea    0x10(%r15),%rcx                        
mov    0x10(%r15),%rdi                        
mov    %rdi,(%r8,%rax,1)                      
mov    %rcx,0x8(%r8,%rax,1)                   
mov    %rsi,0x8(%rdi)
^^^^^^^^^^^^^^^^^^^^^^                        
mov    %rsi,0x10(%r15)    

The last four instructions correspond pretty directly to js::InlineList::insertAfterUnchecked: rcx points to at, rdi points to atNext, (r8,rax,1) points to item. The interesting part is the value of rdi. Relevant registers include:

rdi = 0x779c06803000007f
rcx = 0x00007f779c06887d
r15 = 0x00007f779c06886d

rdi should be the address of a Node. Instead, it looks like it was loaded from an array of Node pointers at a three byte offset

00007f779c06886d......
......779c06803000007f
................00007f779c06886d

In what I'm sure is not a coincidence, r15 is misaligned by 3 bytes. Unfortunately, the instructions that load the value of r15 are outside the context window of the crash, so I can't see where it's from. However, looking at the backtrace, I think at is the uses_ field of producer here, which means that r15 is producer itself (the offset matches).

So we have a Node with a bogus uses_ list. It looks like it's coming from here in addPredecessorPopN.

One other thing that I notice is that rax in this case is 0x1800, which I think implies that we have a very large phi here, and we're hitting a particularly round number. It's not clear why that could ever affect the producer's inline list, but it still feels a little bit like a Clue.

Looking at other crashes with the same signature, I see multiple crashes with privileged instructions, which can only be some sort of binary corruption. This crash looks like a null uses_ field, but that has a different stack trace and is presumably unrelated. Narrowing down to the 17/379 crashes in the last 6 months involving addPredecessorPopN, I don't see any other crashes with the same kind of weird misalignment.

This code is hot enough, the crashes here seem disparate and weird enough, and the crash rate is low enough for me to suspect that this is just flaky hardware.

Severity: -- → S3
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.