Closed Bug 1401014 Opened 2 years ago Closed 2 years ago

JIT reorders assignment statement, causing logic errors


(Core :: JavaScript Engine: JIT, defect, P1)




Tracking Status
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 + fixed


(Reporter: mcav, Assigned: tcampbell)



(2 files)

Attached image debugger-screenshot.png
In [1], Firefox's JIT appears to reorder statements in a way that causes logic errors.

Simplified Example

The error occurs in a function structured like this:

function f(obj) {
    var x = obj.x; // obj.x is non-null
    var y = obj.y;
    var z = obj.y = obj.x; // ERROR: obj.x is null here.
    obj.x = null;

(This is a simplified example, not a test case.) In this example, `z` and `x` should be the same non-null value. But with the JIT enabled, `obj.x = null` appears to be swapped with the previous line, assigning `null` to `obj.x` before the previous statement has executed.

In-Vitro Reproduction

A screenshot of the debugger (attached) shows the logic error in practice, on real code. Per the annotations, vars `window.a` and `b` should be identical, but they are not. (Please excuse the noisiness of the screenshot.)

This issue surfaced within a JS library, MobX, and the thread tracking their descriptions can be found here[1]. This error has been reproduced by several different people, as noted in the thread.

Test Case

Unfortunately, we have been unable to isolate the issue to a minimal test case. I've described repro steps in [1] that involve downloading a project and manually triggering the case, but reproducibility seems to depend on the optimizations being triggered in the JS engine.

A commenter has noted that this bug does not occur when `javascript.options.baselinejit` is disabled.

Perhaps someone on the JIT team would have insight as to how to build a test case that invokes the JIT deterministically (or simply insight as to what might cause this reordering).

I'm unable to build the sensor web branch. Are you able to email me a tarball of the build output when it fails for you?
Flags: needinfo?(m)
Managed to reproduce now. Disabling baseline makes it go away.
Assignee: nobody → tcampbell
Flags: needinfo?(m)
Priority: -- → P2
Would rewriting the nested assignment as 

var z = obj.x // (or y)
obj.y = z

Prevent the issue?
(In reply to mweststrate from comment #3)
> Would rewriting the nested assignment as 
> ```
> var z = obj.x // (or y)
> obj.y = z
> ```
> Prevent the issue?

No, I tried that but it did not help.
Note that disabling baseline disables Ion as well. Asmjs and WASM should be unaffected, but those probably aren't relevant here, so it effectively disables the JITs altogether. It might be interesting to try with javascript.options.ion set to false to see if the baseline compiler is implicated.
Thanks for the bug report. Let's get this fixed in 57.
Regression range was a busy day for the JIT...

The javascript.options.ion flag fixes, so it is likely an Ion problem. I'm trying to reduce and capture a reduced case.
Consider the following simplification:

> function f(obj) {
>     var z = obj.x;
>     obj.x = null;
>     bailout();
> }

If |IonBuilder::setPropTryInlineAccess| is used to emit |obj.x = null|, a resume point does not get generated. As a result, when bailout-to-Baseline occurs (in the real case, due to a rangecheck) the execution will resume from the start of function f. This requires z to reload obj.x which has since been clobbered.

I'll add the missing resume point and hopefully this doesn't regress any perf.
Ignorant and just curious question (feel free to ignore). Do I understand it correctly that this relates to error handling / recovery (given the bailout?). Cause not everybody using the library with FF seems to be running into the original issue. If it relates to error handling that would explain why.
The term "bailout" inside SpiderMonkey refers de-optimizing the JIT code to something more general case after detecting tricky cases (like exceptions). In this sensorweb case the observing array needed to perform an allocation to resize (it was lazy before) triggering the de-optimization.

One reason that others may not experience is if they messing with the |derivation| object. We only hit this broken code path because we were able to do a lot of analysis about |derivation| and implement it in an efficient way. There are many reasons that would prevent us from optimization the memory layout of |derivation| and as such wouldn't hit this bug.
Bug is that there is a missing resume point, resulting in statements re-executing even after their RHS has been clobbered. This patch adds the missing resume point. This issue goes back to FF49 at least.

We should land this for 57 and backport to ESR-52 since js frameworks hit this in the wild.

(Sorry, the test case is pretty terrible :(
Priority: P2 → P1
Comment on attachment 8910120 [details]
Bug 1401014 - Fix resume point in IonBuilder::setPropTryInlineAccess

Good catch, thanks for fixing.

I wonder if we should MOZ_ASSERT_IF(ins->isEffectful(), ins->resumePoint()) somewhere (there might be some false positives we have to whitelist). Please file a follow-up bug if you agree.
Attachment #8910120 - Flags: review?(jdemooij) → review+
(In reply to Ted Campbell [:tcampbell] from comment #12)
> We should land this for 57 and backport to ESR-52 since js frameworks hit
> this in the wild.

Note, IonBuilder.cpp changed to use MOZ_TRY in 53, you will have to make a special patch for esr52.
Pushed by
Fix resume point in IonBuilder::setPropTryInlineAccess r=jandem
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Too late for 56. Mass won't fix for 56.
You need to log in before you can comment on or make changes to this bug.