Closed Bug 1596499 Opened 4 years ago Closed 4 years ago

wasm baseline assertion errors and test failures with C++17

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: froydnj, Assigned: rhunt)

References

Details

Attachments

(1 file)

I'm seeing a number of errors like the following on SM compacting builds with clang 7 and C++17:

Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861
Exit code: -11
FAIL - debug/Script-format-01.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/debug/Script-format-01.js | Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861 (code -11, args "") [0.4 s]
INFO exit-status     : -11
INFO timed-out       : False
INFO stderr         2> Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861
Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861
Exit code: -11
FAIL - debug/Script-format-01.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/debug/Script-format-01.js | Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861 (code -11, args "--baseline-eager") [0.3 s]
INFO exit-status     : -11
INFO timed-out       : False
INFO stderr         2> Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861
Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861
Exit code: -11
FAIL - debug/Script-format-01.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/debug/Script-format-01.js | Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861 (code -11, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.4 s]
INFO exit-status     : -11
INFO timed-out       : False
INFO stderr         2> Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:861

I see similar errors in the SM ARM64 job, only there are a lot more of them.

There's also things like, in the sm-plain job:

Exit code: -11
FAIL - wasm/baseline-opt.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/baseline-opt.js | Unknown (code -11, args "--wasm-compiler=baseline") [0.1 s]
INFO exit-status     : -11
INFO timed-out       : False
Exit code: -11
FAIL - wasm/baseline-opt.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/baseline-opt.js | Unknown (code -11, args "--test-wasm-await-tier2") [0.1 s]
INFO exit-status     : -11
INFO timed-out       : False
TEST-PASS | js/src/jit-test/tests/wasm/baseline-opt.js | Success (code 0, args "--disable-wasm-huge-memory") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "") [0.2 s]
TEST-PASS | js/src/jit-test/tests/v8-v5/check-splay.js | Success (code 0, args "--no-blinterp --no-baseline --no-ion --more-compartments") [1.9 s]
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "--baseline-eager") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "--no-blinterp --no-baseline --no-ion --more-compartments") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "--blinterp-eager") [0.3 s]
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "--wasm-compiler=ion") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [0.3 s]
Exit code: -11
FAIL - wasm/basic.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/basic.js | Unknown (code -11, args "--wasm-compiler=baseline") [0.2 s]
INFO exit-status     : -11
INFO timed-out       : False
Exit code: -11
FAIL - wasm/basic.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/basic.js | Unknown (code -11, args "--test-wasm-await-tier2") [0.2 s]
INFO exit-status     : -11
INFO timed-out       : False
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.4 s]
TEST-PASS | js/src/jit-test/tests/wasm/bce.js | Success (code 0, args "") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/basic.js | Success (code 0, args "--disable-wasm-huge-memory") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/bce.js | Success (code 0, args "--baseline-eager") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/bce.js | Success (code 0, args "--no-blinterp --no-baseline --no-ion --more-compartments") [0.2 s]
Exit code: -11
FAIL - wasm/bce.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/bce.js | Unknown (code -11, args "--wasm-compiler=baseline") [0.2 s]
INFO exit-status     : -11
INFO timed-out       : False
TEST-PASS | js/src/jit-test/tests/wasm/bce.js | Success (code 0, args "--blinterp-eager") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/big-resize.js | Success (code 0, args "") [0.1 s]
TEST-PASS | js/src/jit-test/tests/wasm/bce.js | Success (code 0, args "--wasm-compiler=ion") [0.3 s]
Exit code: -11
FAIL - wasm/bce.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/bce.js | Unknown (code -11, args "--test-wasm-await-tier2") [0.2 s]
INFO exit-status     : -11
INFO timed-out       : False
TEST-PASS | js/src/jit-test/tests/wasm/big-resize.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.1 s]
TEST-PASS | js/src/jit-test/tests/wasm/bce.js | Success (code 0, args "--disable-wasm-huge-memory") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/big-resize.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [0.1 s]
TEST-PASS | js/src/jit-test/tests/wasm/big-resize.js | Success (code 0, args "--baseline-eager") [0.2 s]
TEST-PASS | js/src/jit-test/tests/wasm/big-resize.js | Success (code 0, args "--no-blinterp --no-baseline --no-ion --more-compartments") [0.1 s]
TEST-PASS | js/src/jit-test/tests/wasm/big-resize.js | Success (code 0, args "--blinterp-eager") [0.1 s]
TEST-PASS | js/src/jit-test/tests/wasm/big-resize.js | Success (code 0, args "--wasm-compiler=ion") [0.1 s]
TEST-PASS | js/src/jit-test/tests/wasm/bce.js | Success (code 0, args "--ion-eager --ion-offthread-compile=off --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [0.5 s]
Exit code: -11
FAIL - wasm/big-resize.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/big-resize.js | Unknown (code -11, args "--test-wasm-await-tier2") [0.1 s]
INFO exit-status     : -11
INFO timed-out       : False
Exit code: -11
FAIL - wasm/big-resize.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/wasm/big-resize.js | Unknown (code -11, args "--wasm-compiler=baseline") [0.2 s]
INFO exit-status     : -11
INFO timed-out       : False

and some random ion failures as well.

Full logs can be seen at https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6cbe22a4172905aba69c1aab751453055b3dc55 (please ignore all the windows failures...)

Blocks: cxx17

Extremely worrisome. Nathan, are you trying to track this down or do we need to find somebody with time on their hands? Time is very short now.

Flags: needinfo?(nfroyd)
Priority: -- → P1

I'm happy to volunteer if nobody else has time (but don't make me your first choice since it will take me some time to pull out the linux tools).

I'll try to take a look at this today.

Assignee: nobody → rhunt

Wasn't able to reproduce the issue on my linux desktop with that branch. Tried debug ARM64-sim and debug-plain.

I based a logging commit on that branch and submitted a try run [1].

The interesting part seems to be here [2]: (looking at linux x64-debug failure)

] knownGPRs: rbx rsi r8 r9 r11 r12 r13 
] knownGPRs: 3b48
] allGPRs: rax 
] allGPRs: 1
] knownFPUs: %xmm4.s %xmm5.s %xmm7.s %xmm8.s %xmm10.s %xmm11.s %xmm12.s %xmm14.s %xmm1.d %xmm7.d %xmm10.d %xmm12.d %xmm13.d %xmm14.d %xmm15.d %xmm0.i4 %xmm1.i4 %xmm2.i4 %xmm3.i4 %xmm4.i4 %xmm5.i4 %xmm6.i4 %xmm7.i4 %xmm8.i4 %xmm9.i4 %xmm10.i4 %xmm11.i4 %xmm12.i4 %xmm13.i4 %xmm14.i4 
] allFPUs: %xmm4.s %xmm7.s %xmm9.s %xmm11.s %xmm12.s %xmm13.s %xmm14.s %xmm15.s %xmm0.d %xmm1.d %xmm3.d %xmm5.d %xmm6.d %xmm7.d %xmm10.d %xmm12.d %xmm13.d %xmm14.d %xmm15.d %xmm0.i4 %xmm1.i4 %xmm2.i4 %xmm3.i4 %xmm4.i4 %xmm5.i4 %xmm6.i4 %xmm7.i4 %xmm8.i4 %xmm9.i4 %xmm10.i4 %xmm11.i4 %xmm12.i4 %xmm13.i4 %xmm14.i4 
] Assertion failure: knownGPR_.bits() == ra.allGPR.bits(), at /builds/worker/workspace/build/src/js/src/wasm/WasmBaselineCompile.cpp:896
] Exit code: -11
] FAIL - debug/Script-format-01.js
] TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/debug/Script-format-01.js | knownGPRs: rbx rsi r8 r9 r11 r12 r13  (code -11, args "") [0.1 s]

Somehow, the allGPRs bit flag is being reset to '0x1'. I haven't looked at the code yet to determine if this is possible or implies a miscompilation.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=fcae8e6369ece4cf82eb89aee67b20cc1cb9c3d2&selectedJob=276509640
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276509640&repo=try&lineNumber=69822

Does it repro if you take the js binary from Taskcluster and run it locally under rr?

Good question, I'll try that today.

Taking the binary from Taskcluster and running it locally with rr works!

It appears that BaseRegAlloc isn't being initialized in the BaseCompiler() constructor. Setting a data watchpoint on any member of it and running from the beginning of BaselineCompileFunctions to when the segfault happens (which is after the constructor) shows no writes. I can confirm BaseCompiler() is being called correctly and can step through it. Viewing the disassembly of BaseCompiler() shows no reference/call to a BaseRegAlloc(). Looking at the symbol table [1] I don't see any constructor function making me think it's inlined.

Because BaseCompiler is allocated on the stack, that means the values of BaseRegAlloc are garbage stack values from whatever calls happened before BaselineCompileFunctions.

I don't know why BaseRegAlloc's members wouldn't be initialized. It requires a non-trivial constructor [2], that is called by the non-trivial constructor of BaseCompiler, which is run [3].

[1]

rhunt@rhunt-desktop-arch ~/s/m/artifact> nm js | grep 'BaseRegAlloc'
0000000000f26fc0 t _ZN2js4wasm12BaseRegAlloc11freeTempPtrENS0_6RegPtrEb
0000000000f271a0 t _ZN2js4wasm12BaseRegAlloc11needTempPtrENS0_6RegPtrEPb
0000000000f1bd70 t _ZN2js4wasm12BaseRegAlloc7needF32ENS0_6RegF32E
0000000000f1bc90 t _ZN2js4wasm12BaseRegAlloc7needF32Ev
0000000000f1bef0 t _ZN2js4wasm12BaseRegAlloc7needF64ENS0_6RegF64E
0000000000f1be00 t _ZN2js4wasm12BaseRegAlloc7needF64Ev
0000000000f1b800 t _ZN2js4wasm12BaseRegAlloc7needI32ENS0_6RegI32E
0000000000f1b730 t _ZN2js4wasm12BaseRegAlloc7needI32Ev
0000000000f1b960 t _ZN2js4wasm12BaseRegAlloc7needI64ENS0_6RegI64E
0000000000f1b890 t _ZN2js4wasm12BaseRegAlloc7needI64Ev
0000000000f1bac0 t _ZN2js4wasm12BaseRegAlloc7needPtrENS0_6RegPtrE
0000000000f1b9f0 t _ZN2js4wasm12BaseRegAlloc7needPtrEv
0000000000f1bf80 t _ZN2js4wasm12BaseRegAlloc9LeakCheckD1Ev
0000000000f1bf80 t _ZN2js4wasm12BaseRegAlloc9LeakCheckD2Ev
0000000000f4d410 t _ZN2js4wasm14BaseStackFrame10zeroLocalsEPNS0_12BaseRegAllocE

[2] https://searchfox.org/mozilla-central/rev/524bed6dfbc5ae21c62632d83b7573448b29e0ac/js/src/wasm/WasmBaselineCompile.cpp#642
[3] https://searchfox.org/mozilla-central/rev/524bed6dfbc5ae21c62632d83b7573448b29e0ac/js/src/wasm/WasmBaselineCompile.cpp#12674

(I just discovered that WinDbgX can open ELF files, this is amazing!)

Looking at the disassembly of js!js::wasm::BaseCompiler::BaseCompiler, I can see it initializing fields in order as expected, until it reaches:

00000000`0103a0b1 4c8993d0030000  mov     qword ptr [rbx+3D0h],r10
00000000`0103a0b8 4c899308040000  mov     qword ptr [rbx+408h],r10

It leaves the region between 0x3d0 and 0x408 untouched, and that's exactly where ra lives:

0:000> dt js!js::wasm::BaseCompiler
...
   +0x3d0 masm             : Ptr64 js::jit::MacroAssembler
   +0x3d8 ra               : js::wasm::BaseRegAlloc
   +0x408 fr               : js::wasm::BaseStackFrame

Are we running into issues about how one can or cannot refer to this during constructors? See e.g.

http://eel.is/c++draft/class.init#class.cdtor-2
http://eel.is/c++draft/class.init#class.cdtor-3

I am not 100% sure of the standardese, but it sure sounds like:

https://searchfox.org/mozilla-central/source/js/src/wasm/WasmBaselineCompile.cpp#12674

is possibly not good according to p3 above, even if you're just taking a reference and not accessing any members. I thought we had a static analysis for catching that sort of thing.

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #9)

Are we running into issues about how one can or cannot refer to this during constructors? See e.g.

http://eel.is/c++draft/class.init#class.cdtor-2
http://eel.is/c++draft/class.init#class.cdtor-3

I am not 100% sure of the standardese, but it sure sounds like:

https://searchfox.org/mozilla-central/source/js/src/wasm/WasmBaselineCompile.cpp#12674

is possibly not good according to p3 above, even if you're just taking a reference and not accessing any members. I thought we had a static analysis for catching that sort of thing.

Ah, that might be it then. Here's a try run that (1) changes the ptr to a reference, and (2) inits the pointer after the constructor. It appears that resolved the issue.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=276810529&revision=dac2556f606da3f8d14646648a86d1e93d94c2f3

I'm not sure if it will help clear up our understanding of the problem, but I tried coaxing the compiler into telling me what constructor it was calling:

  public:
-  explicit BaseRegAlloc(BaseCompilerInterface& bc)
+  BaseRegAlloc() = delete;
+  BaseRegAlloc(const BaseRegAlloc& aOther) = delete;
+  BaseRegAlloc(BaseRegAlloc&& aOther) = delete;
+  void operator=(const BaseRegAlloc& aOther) = delete;
+  explicit BaseRegAlloc(float NotThisOne, BaseCompilerInterface& bc)

and it said...

e:/mc/js/src/wasm/WasmBaselineCompile.cpp(12315,7): error: call to deleted constructor of 'js::wasm::BaseRegAlloc'
      ra(*this),
      ^  ~~~~~
e:/mc/arm/js/src/wasm/WasmBaselineCompile.cpp(643,3): note: 'BaseRegAlloc' has been explicitly marked deleted here
  BaseRegAlloc(const BaseRegAlloc& aOther) = delete;
  ^

What in the world? Since when is a BaseCompiler a BaseRegAlloc? Or maybe the point of the standardese that Nathan pasted is that we can't rely on types at this moment?

I think what's happening there is that the compiler is picking constructors in two phases:

  1. See what the possibly-applicable overloads are by argument count.
  2. Resolve overloads by matching argument types.

I have never really grokked the overload resolution algorithm in its full generality, but I think this is an accurate high-level description based on:

http://eel.is/c++draft/over.match.viable#2

So, going from the top:

  • BaseRegAlloc() = delete;

Zero arguments can't be a match.

  • BaseRegAlloc(BaseRegAlloc&& aOther) = delete;

This is weird, because this is a single-argument function, but the viable functions algorithm includes a check for implicit conversions (http://eel.is/c++draft/over.match.viable#4), which I think is failing here due to not having rvalues.

  • explicit BaseRegAlloc(float NotThisOne, BaseCompilerInterface& bc)

Two arguments can't be a match.

Leaving:

  • BaseRegAlloc(const BaseRegAlloc& aOther) = delete;

which would seem to fall under the same constraint as the above move constructor--so you ought to get complaints about not being able to convert things...but maybe there's some other step that clang is including ("check for deleted constructors first" or somesuch) that's not in the spec, but would give the same results as-if it was? And we just happen to get bad error messages from it?

(In reply to Emilio Cobos Álvarez (:emilio) from comment #13)

We have similar patterns in the tree (https://searchfox.org/mozilla-central/rev/131338e5017bc0283d86fb73844407b9a2155c98/dom/base/Document.cpp#1199, https://searchfox.org/mozilla-central/rev/131338e5017bc0283d86fb73844407b9a2155c98/layout/generic/nsGfxScrollFrame.cpp#191, ...)

Would be good to confirm they're not problematic.

The nsGfxScrollFrame instance looks OK, because we're not derefencing this in any way (I think?).

The Document one looks almost exactly like the problematic case here, but tests suggest that it's OK for now:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=540fb2bc155a7626517a6d2435d6b2de967c2af9

Or we have terrible coverage of shadow DOM...?

No, pretty sure it should be covered by tests. Tons of non-shadow-dom stuff also use code living in DocumentOrShadowRoot.

FWIW, we don't dereference it as-in "access any member", we just store it as a reference: https://searchfox.org/mozilla-central/rev/131338e5017bc0283d86fb73844407b9a2155c98/dom/base/DocumentOrShadowRoot.h#238. But we use the star operator.

See Also: → 1597715

(In reply to Nathan Froyd [:froydnj] from comment #16)

Dereferencing and storing a reference was exactly what the wasm compiler was doing, cf.:

I guess the main difference is base-class vs. member initialization? Not sure if there are any special rules, but anyhow just to be safe I filed bug 1597715.

Oh, and inline vs. out of line (DocumentOrShadowRoot constructors are OOL).

The details are little fuzzy here. According to one reading of the spec,
dereferencing/creating a reference to a base class during construction is
undefined. This leads us to miscompile the baseline compiler.

Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/autoland/rev/00dde038e9bf
Avoid UB in BaseCompiler constructor. r=sstangl
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: