Closed Bug 1315596 Opened 3 years ago Closed Last year

Assertion failure: !minimalBundle(bundle) in real-world wasm test case

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

x86
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1325450
Tracking Status
firefox52 --- wontfix
firefox-esr52 --- disabled
firefox53 --- affected
firefox54 --- affected

People

(Reporter: alonzakai, Assigned: bhackett)

References

Details

(Keywords: stale-bug)

Attachments

(5 files)

Attached file a.out.wasm
User Agent: Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Firefox/52.0
Build ID: 20161106030203

Steps to reproduce:

Running the attached .js file will hang in loading the wasm. This is from binaryen2.test_i64 in the emscripten test suite.


Actual results:

The hang appears in the shell, from latest trunk.
Attached file a.out.js
OS: Unspecified → Linux
Hardware: Unspecified → x86
That seems like something we want to have fixed in this release, right Benjamin?
Note: merge is next week. As a result there is not a lot of time left.
Flags: needinfo?(bbouvier)
Priority: -- → P1
Indeed we'd like to have this fixed as soon as possible.

That being said, I can't repro with a debug build on x64 nor x86. Alon, which version of the tree are you using? Which specific flags do you use for configure / compiling?
Flags: needinfo?(bbouvier) → needinfo?(azakai)
This is on latest inbound of last night (321349:4b1de4471e4b) but before updating I saw it on an earlier commit too, from at least a week ago, so it's not a super-recent thing.

This is with --enable-optimize --disable-debug.

32-bit linux, building with the system compiler (gcc 4.8.2).
This is on a slow machine with 2 cores btw, maybe timing related.

Meanwhile, looks like it's not a true hang, as it does allocate memory very slowly. Eventually (minutes) it OOMs.

gdb says main thread is in

#0  0xb7fdd428 in __kernel_vsyscall ()
#1  0xb7faad4b in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:187
#2  0x080fbbe2 in wait (lock=..., this=0xb7a04334)
    at /mozilla-inbound/js/src/threading/posix/ConditionVariable.cpp:118
#3  js::ConditionVariable::wait_for (this=0xb7a04334, lock=..., a_rel_time=...)
    at /mozilla-inbound/js/src/threading/posix/ConditionVariable.cpp:134
#4  0x08496eb1 in js::GlobalHelperThreadState::wait (this=0xb7a04240, locked=..., which=which@entry=js::GlobalHelperThreadState::CONSUMER, 
    timeout=...) at /mozilla-inbound/js/src/vm/HelperThreads.cpp:786
#5  0x08619edb in js::wasm::ModuleGenerator::finishOutstandingTask (this=this@entry=0xbfffcf34)
    at /mozilla-inbound/js/src/wasm/WasmGenerator.cpp:220

and a bunch of threads look like

#0  0xb7fdd428 in __kernel_vsyscall ()
#1  0xb7faad4b in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:187
#2  0x080fbbe2 in wait (lock=..., this=0xb7a04364)
    at /mozilla-inbound/js/src/threading/posix/ConditionVariable.cpp:118
#3  js::ConditionVariable::wait_for (this=this@entry=0xb7a04364, lock=..., a_rel_time=...)
    at /mozilla-inbound/js/src/threading/posix/ConditionVariable.cpp:134
#4  0x0849a7ee in wait (timeout=..., which=js::GlobalHelperThreadState::PRODUCER, locked=..., this=0xb7a04240)
    at /mozilla-inbound/js/src/vm/HelperThreads.cpp:786
#5  js::HelperThread::threadLoop (this=0xb7a20000)
    at /mozilla-inbound/js/src/vm/HelperThreads.cpp:1864

while the running thread is

#0  js::jit::BacktrackingAllocator::minimalBundle (this=this@entry=0xb5ffaf80, bundle=bundle@entry=0xb4453e40, 
    pfixed=pfixed@entry=0xb5ffac0c)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:2438
#1  0x086c2932 in js::jit::BacktrackingAllocator::computeSpillWeight (this=this@entry=0xb5ffaf80, bundle=0xb4453e40)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:2498
#2  0x086c2a9e in js::jit::BacktrackingAllocator::maximumSpillWeight (this=this@entry=0xb5ffaf80, bundles=...)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:2556
#3  0x086c7ca2 in js::jit::BacktrackingAllocator::tryAllocateRegister (this=this@entry=0xb5ffaf80, r=..., bundle=bundle@entry=0xab7bf890, 
    success=success@entry=0xb5ffad44, pfixed=pfixed@entry=0xb5ffad40, conflicting=...)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:1457
#4  0x086c7f20 in tryAllocateRegister (conflicting=..., pfixed=<optimized out>, success=0xb5ffad44, bundle=<optimized out>, r=..., 
    this=<optimized out>)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:1392
#5  js::jit::BacktrackingAllocator::tryAllocateNonFixed (this=this@entry=0xb5ffaf80, bundle=bundle@entry=0xab7bf890, requirement=..., 
    hint=..., success=success@entry=0xb5ffad44, pfixed=pfixed@entry=0xb5ffad40, conflicting=...)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:1223
#6  0x086dd884 in js::jit::BacktrackingAllocator::processBundle (this=this@entry=0xb5ffaf80, mir=0xb5ffb4cc, bundle=bundle@entry=0xab7bf890)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:1293
#7  0x086dda5f in js::jit::BacktrackingAllocator::go (this=this@entry=0xb5ffaf80)
    at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:837
#8  0x081e2d5b in js::jit::GenerateLIR (mir=0xb5ffb4cc)
    at /mozilla-inbound/js/src/jit/Ion.cpp:1954
#9  0x0863f158 in js::wasm::IonCompileFunction (task=task@entry=0xb4344a0c)
    at /mozilla-inbound/js/src/wasm/WasmIonCompile.cpp:3769
#10 0x086459ae in js::wasm::CompileFunction (task=task@entry=0xb4344a0c)
    at /mozilla-inbound/js/src/wasm/WasmIonCompile.cpp:3791
#11 0x08499308 in js::HelperThread::handleWasmWorkload (this=this@entry=0xb7a2075c, locked=...)
    at /mozilla-inbound/js/src/vm/HelperThreads.cpp:1425
#12 0x0849a78a in js::HelperThread::threadLoop (this=0xb7a2075c)
    at /mozilla-inbound/js/src/vm/HelperThreads.cpp:1872
#13 0x084aaadc in callMain<0u> (this=0xb7a19098)
    at /mozilla-inbound/js/src/threading/Thread.h:234
#14 js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start (aPack=0xb7a19098)
    at /mozilla-inbound/js/src/threading/Thread.h:227
#15 0xb7fa6f70 in start_thread (arg=0xb5ffcb40) at pthread_create.c:312
#16 0xb7d71bee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129
I can actually reproduce with an optimized build, but not a debug build (be it debug only, or debug&optimized). This suggests an uninitialized variable somewhere...

I got back to just after 0xd in spidermonkey (before, the given test case would not validate), and I can reproduce even at this time.

Alon, any chance you could provide the same binary compiled with 0xc, please? It won't be an apple-to-apple comparison anymore, but with some luck, it would still reproduce.

In the meanwhile, I will try to have a look at compiler warnings and Coverity issues under wasm/ and jit/.
This small patch helps triggering the issue with a debug build. This is an infinite loop in regalloc, caused because we're trying to split a bundle that's already minimal.
Attached file regalloc-spew
Assertion failure: !minimalBundle(bundle), at /mozilla-inbound/js/src/jit/BacktrackingAllocator.cpp:1320

Jan, this is an instance of an infinite loop in the regalloc, and you've fixed one recently in bug 1309903. Moreover, it seems related to a WasmCallI64, so the same range of things you've touched in this other bug. Maybe you have an idea of what's going on here? Here's the output of IONFLAGS=regalloc, if that can help. (note you need the previous patch to reproduce on a debug build).
Flags: needinfo?(azakai) → needinfo?(jdemooij)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Hang in loading wasm file → Assertion failure: !minimalBundle(bundle) in real-world wasm test case
WasmCallI64 is a red herring, the actual problem is here:

...
[138,139 SubI64] [def v58<g>:tied(0)] [def v59<g>:tied(1)] [use v23:r?] [use v24:r?] [use v28:r] [use v29:r]
[140,141 MulI64] [def v61<g>:%eax] [def v62<g>:%edx] [temp v60<g>] [use v28:%eax] [use v29:%edx] [use v23:r] [use v24:r]
[142,143 WasmStoreI64] [use v2:r] [use v58:r] [use v59:r]
...

We're trying to allocate a register for the MulI64's use of v24. v24 has range [140,142)

All registers are in use, so the register allocator tries some things:

(1) Evict v13 [46,47). This frees up eax \o/ but eax is also needed for MulI64's fixed use/def, so this doesn't get us anywhere.

(2) Evict v29 [132,141). This frees up edx \o/ but like eax, it doesn't let us make progress.

Then we reach MAX_ATTEMPTS (== 2), so we give up and try to split v24, but this fails because v24 is minimal.

Bumping MAX_ATTEMPTS to 3 "fixes" this, because at that point ecx is the register with the lowest spill weight and that one we *can* take after splitting v2.

bhackett, what do you think?
Flags: needinfo?(jdemooij) → needinfo?(bhackett1024)
See Also: → 1316385
(In reply to Jan de Mooij [:jandem] from comment #9)
> WasmCallI64 is a red herring, the actual problem is here:
> 
> ...
> [138,139 SubI64] [def v58<g>:tied(0)] [def v59<g>:tied(1)] [use v23:r?] [use
> v24:r?] [use v28:r] [use v29:r]
> [140,141 MulI64] [def v61<g>:%eax] [def v62<g>:%edx] [temp v60<g>] [use
> v28:%eax] [use v29:%edx] [use v23:r] [use v24:r]
> [142,143 WasmStoreI64] [use v2:r] [use v58:r] [use v59:r]
> ...
> 
> We're trying to allocate a register for the MulI64's use of v24. v24 has
> range [140,142)
> 
> All registers are in use, so the register allocator tries some things:
> 
> (1) Evict v13 [46,47). This frees up eax \o/ but eax is also needed for
> MulI64's fixed use/def, so this doesn't get us anywhere.
> 
> (2) Evict v29 [132,141). This frees up edx \o/ but like eax, it doesn't let
> us make progress.
> 
> Then we reach MAX_ATTEMPTS (== 2), so we give up and try to split v24, but
> this fails because v24 is minimal.
> 
> Bumping MAX_ATTEMPTS to 3 "fixes" this, because at that point ecx is the
> register with the lowest spill weight and that one we *can* take after
> splitting v2.
> 
> bhackett, what do you think?

Hmm, I don't think this fix would hold up in general.  I think the basic problem here is that the regalloc is wasting its time evicting bundles which won't allow it to actually successfully allocate the current bundle, due to fixed register uses/defs of the evicted bundle which overlap the current bundle.  Under the 'if (!aliasedConflicting.empty()) { ... }' branch in BacktrackingAllocator::tryAllocateRegister, we could check for fixed uses/defs in the |aliasedConflicting| bundles and avoid updating |conflicting| if any of them overlap the current bundle.
Flags: needinfo?(bhackett1024)
Attached patch patchSplinter Review
Assignee: nobody → bhackett1024
Attachment #8809133 - Flags: review?(sunfish)
Looks like binaryen2.test_i64_precise hits what looks like the same issue. If another testcase would help let me know and I'll attach it.
(In reply to Alon Zakai (:azakai) from comment #12)
> Looks like binaryen2.test_i64_precise hits what looks like the same issue.
> If another testcase would help let me know and I'll attach it.

Does the patch in comment 11 fix the issue?
Yes, it does, nice.
Comment on attachment 8809133 [details] [diff] [review]
patch

Review of attachment 8809133 [details] [diff] [review]:
-----------------------------------------------------------------

Just a suggestion: the new code here would work really well refactored into a separate function, so that it can just do a return once it's found a fixed overlap instead of testing hasFixedOverlap at each loop level. And it'd keep tryAllocateRegister tidier.
Attachment #8809133 - Flags: review?(sunfish) → review+
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1e14d92676fb
Don't evict bundles which won't help with allocating the target bundle, r=sunfish.
Since this seems to appear pretty easily with wasm, could we land this on 52 (which should be the first wasm and branched to m-a monday)?
Flags: needinfo?(bhackett1024)
Comment on attachment 8809133 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]: fairly old, but only seems to affect wasm
[User impact if declined]: infinite loop / assertion failure on affected pages
[Risks and why]: low risk
Flags: needinfo?(bhackett1024)
Attachment #8809133 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/1e14d92676fb
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8809133 [details] [diff] [review]
patch

jit register allocation tweak needed by wasm, take in aurora52
Attachment #8809133 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 1338642
This was backed out for causing bug 1338642.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla53 → ---
Does this still happen?
Flags: needinfo?(bhackett1024)
I think that bug 1325450 added a better and more robust fix than the patch in this bug.  See also bug 1338642 comment 2.
Flags: needinfo?(bhackett1024)
Status: REOPENED → RESOLVED
Closed: 3 years agoLast year
Resolution: --- → DUPLICATE
Duplicate of bug: 1325450
You need to log in before you can comment on or make changes to this bug.