Closed Bug 1914821 Opened 6 months ago Closed 6 months ago

Assertion failure: cx_->hadResourceExhaustion(), at /builds/worker/checkouts/gecko/js/src/jit/WarpOracle.cpp:206

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
132 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox-esr128 --- wontfix
firefox130 --- wontfix
firefox131 --- wontfix
firefox132 --- fixed

People

(Reporter: tsmith, Assigned: iain)

References

(Blocks 2 open bugs, )

Details

(Keywords: assertion, pernosco)

Attachments

(1 file)

Found with m-c 20240824-4da5ac911a89 (--enable-debug --enable-fuzzing)

This was found by visiting a live website with a debug build.

STR:

  • Launch browser and visit site

This issue was triggered by visiting http://www.tandfonline.com/doi/epdf/10.1080/01603477.2024.2355452.

Assertion failure: cx_->hadResourceExhaustion(), at /builds/worker/checkouts/gecko/js/src/jit/WarpOracle.cpp:206

#0 0x75d575ecd9bc in js::jit::WarpOracle::createSnapshot() /builds/worker/checkouts/gecko/js/src/jit/WarpOracle.cpp:205:5
#1 0x75d57636af73 in js::jit::CreateWarpSnapshot(JSContext*, js::jit::MIRGenerator*, JS::Handle<JSScript*>) /builds/worker/checkouts/gecko/js/src/jit/Ion.cpp:1653:48
#2 0x75d57632a236 in IonCompile /builds/worker/checkouts/gecko/js/src/jit/Ion.cpp:1723:41
#3 0x75d57632a236 in js::jit::Compile(JSContext*, JS::Handle<JSScript*>, js::jit::BaselineFrame*, unsigned char*) /builds/worker/checkouts/gecko/js/src/jit/Ion.cpp:1916:24
#4 0x75d57632b56d in BaselineCanEnterAtEntry /builds/worker/checkouts/gecko/js/src/jit/Ion.cpp:2048:25
#5 0x75d57632b56d in IonCompileScriptForBaseline(JSContext*, js::jit::BaselineFrame*, unsigned char*) /builds/worker/checkouts/gecko/js/src/jit/Ion.cpp:2173:12
#6 0x875c6f8f9d5  ([anon:js-executable-memory]+0x129d5)

A Pernosco session is available here: https://pernos.co/debug/UzwlKYcvsmyfOH8Zn58ATQ/index.html

Keywords: pernosco

Thanks for the pernosco recording. It is a bit tricky to debug bailout jitcode with an optimized build, but I think I've worked out roughly what's going on. At the very least, here's a testcase that triggers the assertion in the shell.

var arr = [];
for (var i = 0; i < 8; i++) {
  arr.push({x: 1, ["y" + i]: 1});
}

function foo(o, o2, cond) {
  var result = 0;
  for (var i = 0; i < 2; i++) {
    if (cond) {
      result += o.x; // Potentially invalid unbox
    }
  }
  result += o2.x; // Stub folded GetProp
  return result;
}

with ({}) {}
// Ion-compile
for (var i = 0; i < 2000; i++) {
  foo({x: 1}, arr[i%6], i % 2 == 0);
}

// Bail out 
for (var i = 0; i < 10; i++) {
  foo(undefined, arr[6], false);
}
foo({x:1}, arr[7], false);

for (var i = 0; i < 2000; i++) {
  foo({x: 1}, arr[i%6], i % 2 == 0);
}

This assertion doesn't affect correctness. It's just a way of catching potential performance problems due to unnecessary bailout loops..

The issue is a combination of our handling of LICM-induced bailouts and stub folding. We Ion-compile foo. Inside a conditional, inside a loop, we have an MUnbox that is speculatively hoisted outside the loop. After the loop, we have a getprop that is called with multiple shapes, in a way that allows us to do stub folding.

After we've compiled, we call with a carefully crafted set of arguments. The hoisted unbox fails (because undefined is not an object). We bail out with BailoutKind::LICM, updating licmState from NeverBailed to Bailed. Because cond is false, we don't actually execute the corresponding unbox in baseline. Normally, this would cause us to invalidate next time we did an LICM bail. However, after exiting the loop, we see a new shape and do stub folding. This accomplishes two things. First, it calls js::jit::MaybeNotifyWarp to inform Warp that we hit a fallback stub, which updates licmState to BailedAndHitFallback. Secondly, it eats the bailout in a way that doesn't change the CacheIR.

On the next iteration, we bail out at the same unbox. This time, though, because licmState is BailedAndHitFallback, we treat it like a regular bailout. We increment the numFixableBailouts counter and bail out to blinterp. In blinterp, nothing gets updated, because we don't hit the unbox and have already folded in the new shape.

This continues until numFixableBailouts hits 9. This time, we pass in a new shape and don't trigger the unbox failure. We bail out. Normally, we would bail out, fold the shape into the getprop stub, and avoid invalidating. However, because we've been messing around with LICM bailouts, numFixableBailouts hits 10 and triggers an immediate invalidation. We fold the stub in blinterp, but there's no IonScript left to notify. When we recompile, the only thing that's changed is the number of shapes in the stub, which isn't included in the hash, so we trigger an assertion.

We did actually make progress here, so this isn't a real bailout loop. I don't think this should be a performance issue in practice. I haven't managed to figure out how the bug corresponds to the website's source code, although I can say that we inline this function:

function nextChar() { return this.currentChar = this.stream.getByte() }

inside this function:

function getObj() {
  for (var e = !1, t = this.currentChar;;) {
    if (t < 0) return n.EOF;
    if (e) 10 !== t && 13 !== t || (e = !1);
    else if (37 === t) e = !0;
    else if (1 !== w[t]) break;
    t = this.nextChar()
  }
  /* a bunch of code omitted here, but it doesn't include loops so I think it's mostly irrelevant? */
}

(both from the script here).

Not entirely sure what the best fix is. I'll have to think about it.

Blocks: sm-opt-jits
Severity: -- → S4
Priority: -- → P1

After thinking about this, I think the best fix is to just clear out the failed IC hash if we fold a transpiled stub and there's no IonScript.

The ultimate problem is that, when we bail out from an LICM-hoisted guard and check to see if we hit a fallback in baseline, we don't have sufficiently precise tracking to validate that the fallback corresponds to the failing guard. If we hit a different fallback, we get a false negative on the "was this bailout caused by LICM?" question. I was temporarily optimistic that we could check whether we hit a fallback in subsequent bailouts and deduce something from that, but upon reflection we expect to see the same thing in both the good and bad cases, so it doesn't work.

This situation is a bit awkward, but has limited downside: we end up with one extra cycle of invalidation/recompilation before we disable LICM. Given how rarely we expect to end up in this case, I think that's basically fine.

So the remaining problem is just that we report this as a bailout loop because we don't think we've made any progress, and we don't think we've made any progress because we save the hash before stub folding has a chance to reset the counter. If we just plug that hole, I think everything works out.

Assignee: nobody → iireland
Status: NEW → ASSIGNED

The patch here may need some help in landing.

Thanks -- I have hit the button. Fingers crossed it sticks :P

Pushed by mgaudet@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fb1823163a73 Clear IC hash when folding stub r=jandem
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch

The patch landed in nightly and beta is affected.
:iain, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox131 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(iireland)

Given comment 2: “This assertion doesn't affect correctness. It's just a way of catching potential performance problems due to unnecessary bailout loops..”
I guess this can wait if we are not confident.

Jan, what do you think, should this follow the train or be uplifted? (see comment 9)

Flags: needinfo?(iireland) → needinfo?(jdemooij)
Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: