Behavior regression in z3 prover js demo

RESOLVED FIXED in Firefox 66

Status

()

defect
P2
normal
RESOLVED FIXED
5 months ago
5 months ago

People

(Reporter: bbouvier, Assigned: bbouvier)

Tracking

({regression})

unspecified
mozilla67
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox64- wontfix, firefox65- wontfix, firefox66+ fixed, firefox67 fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

Assignee

Description

5 months ago
Posted file vendor.js

Reported on irc on the #wasm channel. Could make a reduced test case and I confirm it shows different behavior when run with --no-ion vs no flags.

See https://benj.me/pub/z3em.wasm for the wasm file (too big to put on Bugzilla).

JS file is made from Emscripten, and I adapted the initial test case so it appears in the lines at the bottom.

This is a regression from bug 1437065, so requesting tracking on all Firefox, even though it is not likely to make it to release.

Assignee

Updated

5 months ago
Summary: Behavior regression in einstein js demo → Behavior regression in z3 prover js demo
Assignee

Comment 1

5 months ago

Current investigation: the inlining of function with index 11874 is the cause of the error. Not inlining the function displays the expected output (which is "sat unsat" and not "unknown" in the results).

This function's signature is (i32, i32, i32, i32) -> void.

I'm now trying to see where the behavior diverges, by observing the passed arguments to the wasm function, when it's inlined. To do this, I've disassembled the wasm file into a wat file, added a new function for debugging, and re-assembled it. This is assuming that everything else works correctly in wasm...

This gives me a divergence at the 9064th call of this function (and this is reproducible, in both debug and debug-optimized modes, despite the non-deterministic Ion compilation), so I can edit the wat again and put an unreachable there that's going to be thrown for the 9064th call, so I can catch it in a rr debugger and see where the values come from.

That being said, rr is terribly slow on my machine, for some reason (gdb --args on the program takes less than a minute to finish, while running under rr record doesn't show any output after 5 minutes).

Assignee

Comment 2

5 months ago

(Forgot to add: I wonder if inlining just made the issue visible, and that there's another problem with an opcode implementation, either in the js jit or in wasm).

Too late for 65 at this point, but we can keep it on the radar for 66.

Assignee

Comment 4

5 months ago

Good news: I could have rr working.

However, what it showed is quite surprising. Although the wasm function is at some point directly called from an inline path in the JIT, this path isn't taken when the wasm function receives the incorrect value for the first time. Instead, the generic jit entry is used (so the one that's called by the jit using the generic MCall path).

Playing with IONFLAGS, I can see that there's an invalidation bailout just before this generic function call to the wasm function, so I suspect there might be something wrong there (the information the baseline reads from the stack seems to be incorrect). This is admittedly something that is overlooked in testing (viz. bailouts during calls to inlined wasm function calls); I assumed that tests would cover this properly.

Still need to investigate more to understand what's going on.

Assignee

Updated

5 months ago
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
Assignee

Comment 5

5 months ago
Posted patch disable-fastpaths.patch (obsolete) — Splinter Review

This is not to be checked in as is, but we could have it with JitOptions or an env variable that controls all of this. I think this is sufficient to disable the 3 fast paths (wasm to jits; ion to wasm; jits to wasm; here jits means ion or js baseline).

Flags: needinfo?(lhansen)
Comment on attachment 9039758 [details] [diff] [review]
disable-fastpaths.patch

The patch makes my problem go away in bug 1521939, at least, so I'll make that bug block on this one.

getenv() is a problem for testing in the Android browser, from what I've been able to figure out so far, not just here of course but in general.  We should fix this because Android is important and not all bugs are reproducible in the shell and we eg want to disable poisoning for benchmarking nightly.  If we can't get getenv() to work then we probably want an option, javascript.options.environment, that carries eg a comma-separated list of name=value pairs with possibly quoted values, that has the same functionality.
Flags: needinfo?(lhansen)
Attachment #9039758 - Flags: feedback+

Updated

5 months ago
Blocks: 1521939
Priority: -- → P2
Assignee

Comment 7

5 months ago
Posted patch fix.patch (obsolete) — Splinter Review

It always has to be like this, a one-liner after one week of investigation :)

Thanks a lot nbp for the help provided on irc!

The spewing in bug 1523993 helped me figure out at which point the wasm call traces diverged. Diff'ing with the working mode, I could spot that one wasm call was repeated, while it should not be. An on-stack invalidation would cause a bailout, and because of the missing resumeAfter() call after inlining the wasm call, the wasm call would be repeated.

I've checked with the full test-case and this does solve it.

Attachment #9039758 - Attachment is obsolete: true
Attachment #9040462 - Flags: review?(nicolas.b.pierron)
Assignee

Updated

5 months ago
Attachment #9040462 - Attachment is obsolete: true
Attachment #9040462 - Flags: review?(nicolas.b.pierron)

Unfortunately this does not fix my problem on bug 1521939 :( so more work there...

Comment 10

5 months ago
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/49c7253d955d
Add a resume point after wasm calls inlined in Ion; r=nbp
Assignee

Updated

5 months ago
No longer blocks: 1521939
Assignee

Comment 11

5 months ago

Comment on attachment 9040478 [details]
Bug 1522458: Add a resume point after wasm calls inlined in Ion; r?nbp

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

Bug 1437065

User impact if declined

Incorrect behavior of WebAssembly programs based on optimizations.

Is this code covered by automated tests?

Yes

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Only one line changed. (Would be nice to have in release as a follow-along, if possible; doesn't justify a chemspill though).

String changes made/needed

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration

User impact if declined

Fix Landed on Version

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

String or UUID changes made by this patch

GeckoView Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for consideration

User impact if declined

Fix Landed on Version

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

String or UUID changes made by this patch

Attachment #9040478 - Flags: approval-mozilla-release?
Attachment #9040478 - Flags: approval-mozilla-geckoview65?
Attachment #9040478 - Flags: approval-mozilla-geckoview64?
Attachment #9040478 - Flags: approval-mozilla-esr60?
Attachment #9040478 - Flags: approval-mozilla-beta?

Comment 12

5 months ago
bugherder
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Flags: in-testsuite+

Comment on attachment 9040478 [details]
Bug 1522458: Add a resume point after wasm calls inlined in Ion; r?nbp

Issue has been around since 63, so canceling the ESR60 request since it shouldn't be needed there. Also, I'm not seeing the urgency to uplift this to 65 in a dot release either nor to the GeckoView relbranches which are mostly EOL at this point. Will leave the Beta decision to Liz since that's her release.

Attachment #9040478 - Flags: approval-mozilla-release?
Attachment #9040478 - Flags: approval-mozilla-release-
Attachment #9040478 - Flags: approval-mozilla-geckoview65?
Attachment #9040478 - Flags: approval-mozilla-geckoview65-
Attachment #9040478 - Flags: approval-mozilla-geckoview64?
Attachment #9040478 - Flags: approval-mozilla-geckoview64-
Attachment #9040478 - Flags: approval-mozilla-esr60?

Yup looks like ESR60 is not affected. I was just asking cpeterson about the geckoview flags. We're actually removing those from the approval request form soon.

Comment on attachment 9040478 [details]
Bug 1522458: Add a resume point after wasm calls inlined in Ion; r?nbp

Fix for WASM, includes new tests.
OK to uplift for Monday's beta 6 build.

Attachment #9040478 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.