Closed Bug 1735157 Opened 4 months ago Closed 3 months ago

Assertion failure: cx_->hadNondeterministicException(), at js/src/jit/WarpOracle.cpp:188

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
95 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- wontfix
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- verified

People

(Reporter: decoder, Assigned: iain)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: assertion, regression, testcase, Whiteboard: [bugmon:update,bisected,confirmed])

Attachments

(3 files)

The following testcase crashes on mozilla-central revision 20211011-27315087fdc5 (debug build, run with --fuzzing-safe --cpu-count=2 --ion-offthread-compile=off --ion-warmup-threshold=0 --baseline-eager):

const alphabet = [
  "a", "b", "c", "d", "e", "f", "g", "h", /x/,
  "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
  "v", "w", "x", "y", "z"
];
function getRandomStringArray(size) {
    for (let i76 = 0; i76 < size; ++i76) {
        let value = "";
        value += alphabet[Math.random() * 26 | 0];
    }
}
getRandomStringArray(512);

Backtrace:

received signal SIGSEGV, Segmentation fault.
0x0000555557787dc8 in js::jit::WarpOracle::createSnapshot() ()
#0  0x0000555557787dc8 in js::jit::WarpOracle::createSnapshot() ()
#1  0x0000555557a48866 in js::jit::CreateWarpSnapshot(JSContext*, js::jit::MIRGenerator*, JS::Handle<JSScript*>) ()
#2  0x0000555557a29892 in js::jit::Compile(JSContext*, JS::Handle<JSScript*>, js::jit::BaselineFrame*, unsigned char*) ()
#3  0x0000555557a2a4b9 in IonCompileScriptForBaseline(JSContext*, js::jit::BaselineFrame*, unsigned char*) ()
#4  0x0000555557a2ac8a in js::jit::IonCompileScriptForBaselineOSR(JSContext*, js::jit::BaselineFrame*, unsigned int, unsigned char*, js::jit::IonOsrTempData**) ()
#5  0x000033c1c803c727 in ?? ()
[...]
#9  0x0000000000000000 in ?? ()
rax	0x555555736c15	93824994208789
rbx	0x0	0
rcx	0x55555815cf10	93825038405392
rdx	0x0	0
rsi	0x7ffff7105770	140737338431344
rdi	0x7ffff7104540	140737338426688
rbp	0x7fffffffb370	140737488335728
rsp	0x7fffffffb2d0	140737488335568
r8	0x7ffff7105770	140737338431344
r9	0x7ffff7f99840	140737353717824
r10	0x0	0
r11	0x0	0
r12	0x7ffff6018dd0	140737320685008
r13	0x7fffffffb390	140737488335760
r14	0x7ffff60c3628	140737321383464
r15	0xcd8f6adf	3448728287
rip	0x555557787dc8 <js::jit::WarpOracle::createSnapshot()+1528>
=> 0x555557787dc8 <_ZN2js3jit10WarpOracle14createSnapshotEv+1528>:	movl   $0xbc,0x0
   0x555557787dd3 <_ZN2js3jit10WarpOracle14createSnapshotEv+1539>:	callq  0x555556b12f6e <abort>

I tried quite some things to replace the Math.random call in this test (even tried some custom PRNGs and tried to find seeds that reproduce, but no luck). Hence this test is slightly intermittent.

Attached file Testcase

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20211011152735-5640485486b0.
The bug appears to have been introduced in the following build range:

Start: f4af0087a1b49c221f54143a10b7bebca35db49c (20210111195436)
End: febd0fad07331284c49334bab4d9c653f2c80275 (20210111195806)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=f4af0087a1b49c221f54143a10b7bebca35db49c&tochange=febd0fad07331284c49334bab4d9c653f2c80275

Whiteboard: [bugmon:update,bisect] → [bugmon:update,bisected,confirmed]

Iain, would you be able to reproduce in rr, and investigate what might be going wrong?

Blocks: 1729509
Severity: -- → S4
Flags: needinfo?(iireland)
Priority: -- → P2

Yep, already looking at this.

Flags: needinfo?(iireland)

Oh, this is a fascinating one.

The issue is that we have an IC that has a small random chance of failing. In this case, it's the JSOp::Add in value += alphabet[Math.random() * 26 | 0], and it fails when we randomly select /x/ instead of a string. If we warp-compile before we see the failure case, then we generate a warp script with an inlined call to Math.random. When that inlined call returns index 8, then we bail out, but we don't rewind Math.random. After the bailout, we generate a new random index, which means we don't hit the failure case. When we recompile (which in this case happens immediately, because of --ion-warmup-threshold=0), we have the same CacheIR.

My first thought was that this was only a problem for the bailout detection code, but on second thought I don't think that's right. This can be user-visible for code that bails out using resume points that skip backwards past low-probability Math.random calls. If we compile without ever seeing the low-probability case, then bail out and recompile every time the random branch triggers, then the measured frequency of the branch could be lower than expected. For example:

const EPSILON = 0.00001;
const N = 1_000_000;

print("Expected: " + EPSILON * N);

var unbiased = 0;
for (var i = 0; i < N; i++) {
    // This doesn't bail out.
    unbiased += (Math.random() < EPSILON) | 0
}
print("Unbiased: " + unbiased);

var arr = ["", {toString: () => "x"}];
var s = "";
for (var i = 0; i < N; i++) {
    // If we access the second element, we bail out and get a new random number.
    s += arr[(Math.random() < EPSILON) | 0];
}
print("Biased: " + s.length)

Running without any options, this ~consistently gives me "Biased: 0", implying that we've bailed out and swallowed several failures. The effect goes away when EPSILON is large enough.

Right now MRandom is recoverable unless differential testing is enabled. We may need to add code to track whether a compiled script inlined a call to Math.random and subsequently bailed out, and disable recovery of Math.random calls in that case.

If we bail out of Warp based on the result of an inlined call to Math.random and then resume prior to the call, the RNG state will be updated and we won't generate the same value when we resume in baseline. This can lead to repeated bailouts and in extreme cases skew the distribution of random numbers. The most straightforward fix is to make MRandom effectful, with a new alias set representing the RNG state. Since MRandom already isn't movable, I think this doesn't change much; there are only a few ops with AliasSet::Load(AliasSet::Any) that can no longer be moved past MRandom.

Assignee: nobody → iireland
Status: NEW → ASSIGNED

:iain, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(iireland)
Flags: needinfo?(iireland)
Regressed by: 1673497
Has Regression Range: --- → yes
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch

Bugmon Analysis
Verified bug as fixed on rev mozilla-central 20211018214442-3b1b07d0c956.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon

Set release status flags based on info from the regressing bug 1673497

You need to log in before you can comment on or make changes to this bug.