Closed Bug 1942881 (CVE-2025-1934) Opened 11 months ago Closed 10 months ago

Assertion failure: !cx->suppressGC, at js/src/vm/Interpreter.cpp:463

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
137 Branch
Tracking Status
firefox-esr115 - wontfix
firefox-esr128 136+ fixed
firefox135 --- wontfix
firefox136 + fixed
firefox137 + fixed

People

(Reporter: sm-bugs, Assigned: iain)

References

(Blocks 2 open bugs)

Details

(Keywords: csectype-jit, reporter-external, sec-moderate, Whiteboard: [adv-main136+][adv-esr128.8+])

Attachments

(3 files)

Steps to reproduce:

Version: d55e89d48a8053ce45a74b0ec92c0ff6a9dcc43d
Args: js --fuzzing-safe --fast-warmup <test-case>
Test case:

for (a = 0; ; a++) {
    for (b = true; b; b = !inIon())
        c = 0
    function d() {
        if (c++ < 10)
            interruptIf(true)
        return true
    }
    setInterruptCallback(d)
    d()
    const e = "".substring().match("0*[]")
    if (a >= 50)
        e.f
}

Actual results:

Assertion failure: !cx->suppressGC, at /home/user/fuzzilli-ng/targets/spidermonkey/src/js/src/vm/Interpreter.cpp:463

#0 0x55b27a2da39c in js::RunScript(JSContext*, js::RunState&) js/src/vm/Interpreter.cpp:463:3
#1 0x55b27a2db75b in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) js/src/vm/Interpreter.cpp:660:13
#2 0x55b27a2dd84a in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) js/src/vm/Interpreter.cpp:727:8
#3 0x55b27a49fa9f in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) js/src/vm/CallAndConstruct.cpp:55:10
#4 0x55b27a1ec3a2 in ShellInterruptCallback(JSContext*) js/src/shell/js.cpp:1158:16
#5 0x55b27a834dad in InvokeInterruptCallbacks(JSContext*) js/src/vm/Runtime.cpp:393:10
#6 0x55b27a834dad in HandleInterrupt(JSContext*, bool) js/src/vm/Runtime.cpp:430:12
#7 0x55b27a834dad in JSContext::handleInterrupt() js/src/vm/Runtime.cpp:502:12
#8 0x55b27a82b2b3 in js::RegExpShared::execute(JSContext*, JS::MutableHandle<js::RegExpShared*>, JS::Handle<JSLinearString*>, unsigned long, js::VectorMatchPairs*) js/src/vm/RegExpObject.cpp:804:14
#9 0x55b27a2d088a in ExecuteRegExpImpl(JSContext*, js::RegExpStatics*, JS::MutableHandle<js::RegExpShared*>, JS::Handle<JSLinearString*>, unsigned long, js::VectorMatchPairs*) js/src/builtin/RegExp.cpp:330:7
#10 0x55b27a2d088a in ExecuteRegExp(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSString*>, int, js::VectorMatchPairs*) js/src/builtin/RegExp.cpp:1432:7
#11 0x55b27a2b999c in RegExpMatcherImpl(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSString*>, int, JS::MutableHandle<JS::Value>) js/src/builtin/RegExp.cpp:1454:7
#12 0x55b27b31dc54 in js::jit::RRegExpMatcher::recover(JSContext*, js::jit::SnapshotIterator&) const js/src/jit/Recover.cpp:1834:8
#13 0x55b27b9dea6c in js::jit::SnapshotIterator::computeInstructionResults(JSContext*, js::jit::RInstructionResults*) const js/src/jit/JitFrames.cpp:2270:29
#14 0x55b27b9dc9a9 in js::jit::SnapshotIterator::initInstructionResults(js::jit::MaybeReadFallback&) js/src/jit/JitFrames.cpp:2223:12
#15 0x55b27b27a2e2 in js::jit::BailoutIonToBaseline(JSContext*, js::jit::JitActivation*, js::jit::JSJitFrameIter const&, js::jit::BaselineBailoutInfo**, js::jit::ExceptionBailoutInfo const*, js::jit::BailoutReason) js/src/jit/BaselineBailouts.cpp:1646:17
#16 0x55b27b279975 in js::jit::Bailout(js::jit::BailoutStack*, js::jit::BaselineBailoutInfo**) js/src/jit/Bailouts.cpp:146:7
#17 0x3e98d04073e1  (<unknown module>)
Blocks: 1903968
Group: firefox-core-security → core-security
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Version: Firefox 133 → Trunk
Group: core-security → javascript-core-security

Steve, any idea why we have not caught this one with some static analysis?

The problem is that when we execute recover instructions, we disable GC execution for simplicity, and running regexp execution in recover instruction trigger the interruption callback checks which can re-enter JS … and as such want to execute GCs.

This is bad in terms of JavaScript semantic, but this is nothing new from the interrupt callback.
However, I have no idea of the full spectrum of consequences of executing JS code within a bailout given that the stack is not fully recovered yet.

Blocks: sm-opt-jits
Severity: -- → S2
Flags: needinfo?(sphink)
Flags: needinfo?(iireland)
Priority: -- → P1

downgrading to S3, given that outside the JS shell and the chrome of Firefox, nobody has control over the interrupt handler code.

edit: Hum … do we still have nested event loops introduced by interruption callbacks?

Severity: S2 → S3

Hmm. Fuzz testcases that depend on running arbitrary code in the interrupt handler are usually not that exciting, but this one might be a little bit more interesting, because it's not really doing anything particularly fancy. I think you could probably trigger this bug in the browser, at least to the point of running the interrupt handler mid-bailout.

It's not immediately obvious to me what the best fix is. In general it's important for regexp execution to be interruptible, because it's pretty easy to write a regexp that does pathological backtracking and needs to be interrupted. We need to handle interrupts to remain responsive. But then the existence of RRegExpMatcher implies that we have to support interrupts in the middle of a bailout.

I wonder if the best fix is just to stop recovering RegExpMatcher. Is it important in practice? How often do we evaluate a regexp without using it?

On the other hand: can our real interrupt handlers trigger a GC? If they can't, then maybe this isn't a problem in practice. We keep seeing interrupt-related bugs where we get in unclear amounts of trouble because it's not clear what interrupt handlers can/can't do, and it's hard to enforce. Unfortunately, the real interrupt handlers in the browser look too complicated for us to eg move them inside SM.

Flags: needinfo?(iireland)

(In reply to Iain Ireland [:iain] from comment #3)

I wonder if the best fix is just to stop recovering RegExpMatcher. Is it important in practice? How often do we evaluate a regexp without using it?

We've had similar issues with it before (see bug 1921215 comment 7 for instance) and it is a bit surprising to execute all of this code during a bailout.

On the other hand: can our real interrupt handlers trigger a GC?

Yes we use interrupts to trigger a GC in some places. In this case we're executing JS while suppressing GC which is a bad idea in general.

If we knew this was short-running code we could suppress interrupts, but regular expressions can be very slow so we don't want to do that here.

IMO the right fix is to either stop suppressing GC here (probably too difficult and not worth it) or we should stop running code that is potentially slow and needs interrupt support (this recover instruction).

The interrupt callback is already known to mess-up with JS semantic, an alternative would be to check if we are running under a recover instruction and instead of interrupting the execution we shorten it and return a dummy/wrong value. The interrupt callback can then be called once we finished the bailout.

There's a lot unknown so we'll call it sec-moderate for now, but if it can be demonstrated in Firefox itself we'll raise it to sec-high unless we can show this is harmless.

Keywords: sec-moderate

Based on some quick testing, it doesn't look like RRegExpMatcher is ever used in Jetstream or Speedometer. I added a crash in MRegExpMatcher::writeRecoverData, and it didn't fire in jetstream locally, or in a SP3 run on try (including the PGO profiling run). That makes sense; in general, I would not expect code to evaluate a regexp match and ignore the result. So the value of this optimization is very low, and the simplest fix is probably to remove it.

Here's a cleaned-up testcase:

// |jit-test| --no-threads; --fast-warmup

function handler() { return true; }
setInterruptCallback(handler);

function foo(i) {
  let e = "12a".match("2*[a]");
  interruptIf(true);
  if (i >= 50) {
    return e[0]; // Bail out.
  }
}
for (var i = 0; i < 100; i++) {
  foo(i);
}

I initially thought it would be very difficult to make this trigger in the browser, because you need the interrupt and the bailout to line up, but on reflection, it's actually pretty straightforward:

function foo(bailout, s) {
  let e = s.match("(a+)*b"); // exponential backtracking 
  if (bailout) {
    return e[0];
  }
}

// Ion-compile foo with the regexp pushed to
// the bailout path
with ({}) {}
for (var i = 0; i < 5000; i++) {
  foo(false, "ab");
}

// Trigger a bailout with a pathological input.
// This will run for a long time.
// Wait for an interrupt to happen naturally.
foo(true, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac");

So the big question is whether triggering the Gecko interrupt handlers while we're in the middle of bailing out is a problem. My instinct is that it's probably okay in practice, but it's really hard to prove that.

Flags: needinfo?(sphink)
Assignee: nobody → iireland
Status: NEW → ASSIGNED
Flags: sec-bounty?

(In reply to Iain Ireland [:iain] from comment #7)

So the big question is whether triggering the Gecko interrupt handlers while we're in the middle of bailing out is a problem. My instinct is that it's probably okay in practice, but it's really hard to prove that.

The problem is that the interruption callback can be used to start a nested event loop. It can GC, which is the problem reported here, but the worrying part to me was the unknown around the stack walking logic as we never considered that possible during the design of recover instructions and during the design of bailout logic.

Recover instructions can be executed either due to stack introspection (ahead of frame invalidation) or during bailouts. I am not worried about the introspection case as this is triggered within a ordinary function call while resolving what the stack looks like. However the bailout and exception cases are more worrying as we do not follow the logic we have with exit frames.

Reading the code, all functions which are reconstructing the stack are making use of BailoutFrameInfo which happen to set the bailoutData_ field. For example, cases such as JS::ProfilingFrameIterator::iteratorConstruct() should probably not check explicitly for exit frame and use a wrapper which handles both exit frame and bailing frames. I suspect we might have similar cases elsewhere.

(In reply to Nicolas B. Pierron [:nbp] from comment #1)

Steve, any idea why we have not caught this one with some static analysis?

The problem is that when we execute recover instructions, we disable GC execution for simplicity, and running regexp execution in recover instruction trigger the interruption callback checks which can re-enter JS … and as such want to execute GCs.

Or the interrupt callback could directly do a GC without running any JS.

We don't have a static analysis for this. Suppressing the GC makes the hazard analysis happy; it knows it doesn't need to worry about a GC happening, so it doesn't bother to look for unrooted pointers. And in fact, I think GC is not a problem here -- if you try to allocate something, you'll get an OOM instead of a GC. I don't know how this code handles an OOM, but in theory it shouldn't break anything.

So I think the problem here is running JS, not doing a GC.

Is there something for a static analysis to check here? Ordinarily, I would say that you could drop in an AutoCheckCannotGC into Bailout or something, but the hazard analysis will (correctly) say that you've suppressed GC so there's no problem. Even if you put it deeper in the stack, eg SnapshotIterator::computeInstructionResults, the hazard analysis will not be impressed -- it'll say that since that function is only called within the RAII scope of a GC suppression, there's still no problem.

We could add an AutoCheckCannotRunJS that forbids running JS even with GC suppressed. I've long been meaning to complete the "can run JS" analysis. Usually, "can GC" and "can run JS" are close enough that it doesn't matter.

But to use that here, we'd need to fix this case first.

Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 137 Branch
Flags: sec-bounty? → sec-bounty+
Keywords: csectype-jit

The code that was removed has been around a while so the two ESRs are likely affected.

Not convinced this is worth an ESR115 backport given sec-moderate and S3, but it'll need a rebased patch if the answer is yes.

The patch landed in nightly and beta is affected.
:iain, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox136 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(iireland)

Comment on attachment 9462669 [details]
Bug 1942881: Remove RRegExpMatcher r=jandem

Beta/Release Uplift Approval Request

  • User impact if declined/Reason for urgency: It is possible to trigger the browser's interrupt handler while we're in the middle of bailing out. We're not aware of any specific exploit that this enables, but we also can't prove that one doesn't exist. Bailouts are delicate, and normally we can't GC / run script in the middle.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This just deletes a performance optimization that almost never kicks in anyway.
  • String changes made/needed: None
  • Is Android affected?: Yes
Flags: needinfo?(iireland)
Attachment #9462669 - Flags: approval-mozilla-beta?

Comment on attachment 9462669 [details]
Bug 1942881: Remove RRegExpMatcher r=jandem

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's a sec-moderate because we're unclear about the exploitability. The patch is very safe, though, so maybe it's better safe than sorry.
  • User impact if declined: Maybe some clever attacker figures out something clever and this does turn out to be exploitable.
  • Fix Landed on Version: 137
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch deletes an optimization.
Attachment #9462669 - Flags: approval-mozilla-esr128?

Comment on attachment 9462669 [details]
Bug 1942881: Remove RRegExpMatcher r=jandem

Approved for 136.0b4

Attachment #9462669 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-

Comment on attachment 9462669 [details]
Bug 1942881: Remove RRegExpMatcher r=jandem

Approved for 128.8esr

Attachment #9462669 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+
Whiteboard: [adv-main136+][adv-esr128.8+]
Attached file advisory.txt
Alias: CVE-2025-1934
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: