Closed Bug 1231024 Opened 4 years ago Closed 4 years ago

Assertion failure: !minimalBundle(bundle) at jit/BacktrackingAllocator.cpp:1297

Categories

(Core :: JavaScript Engine, defect)

x86
Unspecified
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla46
Tracking Status
firefox45 + verified
firefox46 + verified
firefox47 --- affected
firefox48 --- affected
firefox49 --- affected
firefox-esr45 --- affected
firefox50 --- affected

People

(Reporter: bc, Assigned: lth)

References

(Blocks 1 open bug, )

Details

(4 keywords)

Crash Data

Attachments

(10 files, 5 obsolete files)

134.67 KB, text/plain
Details
126.39 KB, text/plain
Details
939.71 KB, text/plain
Details
1.48 MB, application/x-gzip
Details
6.49 KB, text/html
Details
6.13 KB, text/x-c
Details
1.57 KB, application/x-javascript
Details
86.52 KB, text/plain
Details
918 bytes, patch
jonco
: review+
Details | Diff | Splinter Review
1.70 KB, patch
jandem
: review+
Details | Diff | Splinter Review
Attached file minidump crash report
1. https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/08/05/how-to-maximise-cpu-performance-for-the-oracle-database-on-linux

2. Assertion failure: !minimalBundle(bundle), at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jit/BacktrackingAllocator.cpp:1297

Windows, Linux Nightly/45 at least

reproduced with https://hg.mozilla.org/mozilla-central/rev/cc9c6cd756cb744596ba039dcc5ad3065a7cc3ea though that is not the regressor.

I can reproduce with a saved version of the page. I'll try to reduce.
Brian do you have some cycles to look at this? Thanks!
Flags: needinfo?(bhackett1024)
If this can be reduced and is reproducible then a blame cset would be good.  This assertion is often a bug in the lowering phase.
It is reproducible and I am working on reducing it but it will take time. mozregression doesn't support debug builds for a long enough range and I don't know when I will be able to manually bisect it.
Alice0775, CC'ing you in case you're bored and looking for a bug to bisect :)
(In reply to Jan de Mooij [:jandem] from comment #4)
> Alice0775, CC'ing you in case you're bored and looking for a bug to bisect :)

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=21730adb78b62b34cd4ed45eb08b9841bb1524ef&tochange=cd35e12d8f91f410f71b6f5d3023a40e9071e24b

Suspect: Bug 1108290
32-bit or 64-bit?
Flags: needinfo?(bob)
Looks like 32bit only. Windows 7 and RHEL6.

Crashes opt too:

bp-85b1bd8a-4cb0-4c51-a15a-1c2c62151208
bp-ff17cd4a-f147-4c8a-b78f-0a0bf2151208

[@ js::jit::BacktrackingAllocator::splitAt ]
Crash Signature: [@ js::jit::BacktrackingAllocator::splitAt ]
Has Regression Range: --- → yes
Flags: needinfo?(bob)
Hardware: Unspecified → x86
Bughunter did see an additional opt signature on Windows

js::jit::TempAllocator::allocateInfallible(unsigned int) with an opt assertion?

Assertion failure: result ([OOM] Is it really infallible?), at c:\builds\moz2_slave\m-cen-w32-00000000000000000000\build\src\js\src\ds/LifoAlloc.h:281
Disregarding the possible OOM issue:  The register management for that optimization on 32-bit x86 is particularly unpleasant, because there are too few registers to go around and some do double duty.  It's possible there's a reg assignment bug or (more likely) that there's a regalloc invariant I wasn't aware of.
(In reply to Lars T Hansen [:lth] from comment #9)
> Disregarding the possible OOM issue:  The register management for that
> optimization on 32-bit x86 is particularly unpleasant, because there are too
> few registers to go around and some do double duty.  It's possible there's a
> reg assignment bug or (more likely) that there's a regalloc invariant I
> wasn't aware of.

If someone can attach the output from setting IONFLAGS=regalloc for the compilation where it's crashing, that should help with diagnosing the problem.
Attached file 1231024.log
With a debug build from today with IONFLAGS=regalloc  I get

Assertion failure: int32_t(outlen) > 0, at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jsprf.cpp:999

on startup without even attempting to load the url.
(In reply to Bob Clary [:bc:] from comment #11)
> Created attachment 8697020 [details]
> 1231024.log
> 
> With a debug build from today with IONFLAGS=regalloc  I get
> 
> Assertion failure: int32_t(outlen) > 0, at
> c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jsprf.
> cpp:999
> 
> on startup without even attempting to load the url.

That sounds like bug 1231581, making that a blocker for the time being.
Depends on: 1231581
Depends on: 1231575
(In reply to Lars T Hansen [:lth] from comment #12)
> (In reply to Bob Clary [:bc:] from comment #11)
> > Created attachment 8697020 [details]
> > 1231024.log
> > 
> > With a debug build from today with IONFLAGS=regalloc  I get
> > 
> > Assertion failure: int32_t(outlen) > 0, at
> > c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jsprf.
> > cpp:999
> > 
> > on startup without even attempting to load the url.
> 
> That sounds like bug 1231581, making that a blocker for the time being.

Drive-by comment:

liveRange dump has a relatively small buffer for JitSpew, which has only 2000 bytes currently.
I tried to enlarge it to 4096, and crashed again. turns out the liveBundle and liveRange could be
huge, and this is the largest result I profiled agains kraken benchmark:

LiveRange usesCount: 760
LiveRange usesCount: 762
LiveRange usesCount: 762
LiveRange usesCount: 762
LiveRange usesCount: 770
LiveRange usesCount: 770
LiveRange usesCount: 770
LiveBundle usesCount: 7
LiveBundle usesCount: 7
LiveBundle usesCount: 7

the easiest way is to try enlarge the buffer size (both LiveRange::toString() and LiveBundle::toString()), or you can replace the MOZ_CRASH() with 'break', if it is urgent.
Attached patch bug1231024.workarounds.patch (obsolete) — Splinter Review
Hi,

so the simplest workarounds it to add the buffer size up to 20000.
(the buffers are in static var area, not on the stack, so 20k is ok.)
Tested on kraken benchmark. Hope it helps :)
Flags: needinfo?(bob)
Wei Wu: I did a Windows 32bit try build with your patch. You can find it at https://treeherder.mozilla.org/#/jobs?repo=try&revision=408fd6ef12a3&selectedJob=14557749

Running the debug build on Windows 7 64bit still asserts: Assertion failure: !minimalBundle(bundle), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/BacktrackingAllocator.cpp:1297

Running the build on Windows 7 32bit also still asserts: Assertion failure: !minimalBundle(bundle), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/BacktrackingAllocator.cpp:1297
Flags: needinfo?(bob)
Attached patch bug1231024.truncates.patch (obsolete) — Splinter Review
(In reply to Bob Clary [:bc:] from comment #15)
> Wei Wu: I did a Windows 32bit try build with your patch. You can find it at
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=408fd6ef12a3&selectedJob=14557749
> 
> Running the debug build on Windows 7 64bit still asserts: Assertion failure:
> !minimalBundle(bundle), at
> c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/
> BacktrackingAllocator.cpp:1297
> 
> Running the build on Windows 7 32bit also still asserts: Assertion failure:
> !minimalBundle(bundle), at
> c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/
> BacktrackingAllocator.cpp:1297

ok seems like the dump is realy big.. this patch outputs "truncates" when the buffer is full instead of crash. Hope this patch will work :)
Flags: needinfo?(bob)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dd6dda17a654 build failure. Please needinfo me when you have a good try debug build I can download.
Flags: needinfo?(bob)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9845ea11946f
looks green now.

FYI it was built on top of mozilla-central. if you want to build in mozilla-inbound,
try apply https://hg.mozilla.org/try/rev/9845ea11946f.
Flags: needinfo?(bob)
Comment on attachment 8697954 [details] [diff] [review]
bug1231024.truncates.patch

obsolete old patches.
Attachment #8697954 - Attachment is obsolete: true
Comment on attachment 8697589 [details] [diff] [review]
bug1231024.workarounds.patch

obsolete old patches.
Attachment #8697589 - Attachment is obsolete: true
Same assertion, sorry.
Flags: needinfo?(bob)
(In reply to Bob Clary [:bc:] from comment #22)
> Same assertion, sorry.
so can we get IONFLAGS=regalloc jitspew logs?
Attached file ionflags.txt.gz
Worked this time. :-)
(In reply to Bob Clary [:bc:] from comment #24)
> Created attachment 8698062 [details]
> ionflags.txt.gz
> 
> Worked this time. :-)

Thank you :)
Attached file testcase.html
Keywords: testcase
Attached file Shell testcase
Bob, thanks a lot for the reduced testcase! Here's a version that runs in the shell.

It asserts with --no-threads on x86.

We also have a crash in BacktrackingAllocator::splitAt on crash-stats, might be the same issue.
Brian, Lars: could one of you please take a look?

[Tracking Requested - why for this release]:
At issue is this snippet on lines 14-16:

  e = "function" == typeof w ? w.apply(void 0, e) : w, a.a(function(s, p) {
      p && s && (b[s] = p, h(s))
  }, o, e)

more clearly written as 

  if (typeof w == "function")
     e = w.apply(void 0, e);
  else
     e = w;
  a.a(function(s, p) { p && s && (b[s] = p, h(s)) }, o, e)

where only the second line matters.  Without looking more deeply yet it could appear that the output of the call requires a register while at the same time the implementation of apply uses all the available registers for its own purposes.  I'll try to construct a smaller test case to demonstrate this.
I actually think that this additionally is the result of some inlining which ends up creating a function with two inlined apply instances, which then seem to conflict in their allocations, but I'm not yet adept at reading the regalloc spew to the point where I understand it fully (though I have a much smaller log than the one previously offered).
The problem is the following deobfuscated function:

    function define_() {
        var args = [].slice.call(arguments);
        var key = "string" == typeof args[0] ? args.shift() : void 0;
        var keys = args.length > 1 ? args.shift() : [];
	let w = args[0];	
	for (let x=0;;) {
	    let t = function ()                       // *A*
	    {
		let values = [];
                for (let k=0; k < keys.length ; k++)  // *B*
		    values.push(definitions[keys[k]]) // *C*
                if (typeof w == "function") {
		    values = w.apply(void 0, values);
		}
		else
		    values = w;
		if (values && key)
		    definitions[key] = values;
	    }
	    t();                                      // *D*
	    if (x >= keys.length)
		break;
	    x++;
	}
    }

This fails with the same assertion (within the context of the program, I'll attach an updated TC).  It can be made to work in two ways: either by commenting out the lines at *A* and *D* so that the function is in-line, or by moving the two lines at *B* and *C* into the "then" arm of the following "if", where they belong.
Attached file Updated shell test case (obsolete) —
Run this with --no-threads --ion-warmup-threshold=100.
Attached file Updated shell test case (much smaller) (obsolete) —
Attachment #8705099 - Attachment is obsolete: true
Last go-around, this is the smallest I've been able to make it.  It appears that some amount of polymorphism at the call site is necessary; if there is not a sufficiently large number of functions then the assertion is not triggered.  And there needs to be some diversity in control paths through the function.

However, inlining is not necessary.  This still fails in a 32-bit Mac OS X build with --no-threads --ion-warmup-threshold=100 --ion-inlining=off.
Attachment #8705122 - Attachment is obsolete: true
Attached file reglog
regalloc spew for a run of the final test case with the flags from my last comment
Based on my inspection of the spew last week it could look like this fails on x86 because some values are Values and need more than one register, but I'm still not familiar enough with regalloc to say.  It's unfortunately likely that I won't have time to figure this out before the train leaves.  I'll offer a patch to disable the optimization on X86 for now.
Attachment #8706245 - Flags: review?(jdemooij)
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
(In reply to Lars T Hansen [:lth] from comment #37)
> Created attachment 8706245 [details] [diff] [review]
> disable fn.apply(...,array) optimization on x86

If we do that, I will recommend to do it based on the value representation instead of the architecture.  Such that we don't get similar issues on ARM.
(In reply to Nicolas B. Pierron [:nbp] from comment #38)
> (In reply to Lars T Hansen [:lth] from comment #37)
> > Created attachment 8706245 [details] [diff] [review]
> > disable fn.apply(...,array) optimization on x86
> 
> If we do that, I will recommend to do it based on the value representation
> instead of the architecture.  Such that we don't get similar issues on ARM.

That seems reasonable, even if I think ARM should have enough registers to be OK here.

The problem is that I have very limited time right before the FF46 train leaves, and what I have is not a very deep diagnosis of the problem - I could be wrong about what's going on.

(It might be safer just to disable the optimization on all platforms until we have a diagnosis.)
Attached patch fix spewSplinter Review
Unfortunately, the regalloc spew is still messed up.
Attachment #8707580 - Flags: review?(jcoppeard)
With the above patch, it's clearer what the problem is from the regalloc spew.  Looking at the assertion failure:

[RegAlloc] Allocating  v125 [298,300) v125:%ecx@298 [priority 2] [weight 2000000]
[RegAlloc]   Requirement %ecx, due to use at 298
[RegAlloc]   %ecx collides with  v139 [299,302) %ecx (def) v139:*@301 [weight 2000000]
Assertion failure: !minimalBundle(bundle)

The [298,300) is the instruction range being allocated, and looking at the instructions from the beginning of the register allocation:

[298,299 ApplyArrayGeneric] [def v139<t>:%ecx] [def v140<p>:%edx] [temp v135<g>:%eax] [temp v136<g>:%ebx] [use v125:%ecx] [use v134:%edi] [use v137:%esi] [use v138:%edx] [use v84:*] [use v84:*] [use v85:*] [use v86:*] [use v87:*] [use v91:*] [use v84:*] [use v99:*] [use v101:*] [use v125:*] [use v130:*] [use v131:*]

v139 is a fixed definition in %ecx, and v125 is a fixed use in %ecx.  This would be OK if v125 was usedAtStart by the instruction, in which case its live range covers only the input of the instruction and not the output (this permits the register allocator to reuse the input's register for an output).  Unfortunately you can't tell from the 'use' above whether the use is AtStart or not, but looking at the source for LIRGenerator::visitApplyArray shows the two uses are useFixed instead of useFixedAtStart, and changing the first one to useFixedAtStart fixes this assertion.  I don't know if doing that will require other changes to the ApplyArray codegen though, the latter looks pretty complicated.
Flags: needinfo?(bhackett1024)
Thanks, that's very helpful.  That code is hairy so I'll need to look into it, but at least there's no indication of a regalloc bug, ie, we can disable the opt on x86 if we want to, cf pending patch, if there's not enough time to fix this before the train leaves.
Attachment #8707580 - Flags: review?(jcoppeard) → review+
https://hg.mozilla.org/mozilla-central/rev/a79e9b6b0da5
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
That patch was not a fix, just a correction to regalloc spew.
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
I believe both uses could be marked AtStart, trying that now.
Attachment #8709040 - Flags: review?(jdemooij)
Attachment #8706245 - Attachment is obsolete: true
Attachment #8706245 - Flags: review?(jdemooij)
assert crash, taking it.
Comment on attachment 8709040 [details] [diff] [review]
narrow the live range for values

Review of attachment 8709040 [details] [diff] [review]:
-----------------------------------------------------------------

Since all registers are fixed, do you know why this didn't fail all the time?
Attachment #8709040 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #50)
> Comment on attachment 8709040 [details] [diff] [review]
> narrow the live range for values
> 
> Review of attachment 8709040 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Since all registers are fixed, do you know why this didn't fail all the time?

I don't.  Given my experiments with the test case (comment 34) it seems that some amount of polymorhism is required to have been detected at the time the code is being jitted, for example, there needs to be a certain number of distinct functions that are applied at the call site; my guess is that this increases register pressure or further constrains register assignment and thus triggers the failure.  (Not much of an analysis.)
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/9c8c14902800
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Lars, could you request the uplift to beta? Thanks
Flags: needinfo?(lhansen)
Jan, this is a top crash of 45. Could you fill the uplift request for Lars? Thanks
Flags: needinfo?(jdemooij)
Keywords: topcrash
Comment on attachment 8709040 [details] [diff] [review]
narrow the live range for values

Approval Request Comment
[Feature/regressing bug #]: Corrects bug in bug 1108290
[User impact if declined]: Crashes
[Describe test coverage new/current, TreeHerder]: None in patch
[Risks and why]: Zero, to the best of my knowledge, corrects regalloc bug
[String/UUID change made/needed]: None
Flags: needinfo?(lhansen)
Attachment #8709040 - Flags: approval-mozilla-beta?
Jan, I took care of this.
Flags: needinfo?(jdemooij)
Comment on attachment 8709040 [details] [diff] [review]
narrow the live range for values

thanks
Should be in 45 beta 3
Attachment #8709040 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Reproduced the issue on Nightly 2015-12-08 debug build, Win 7 x64.
Verified fixed 45b7, 46.0a2 (2016-02-19) debug builds.
Status: RESOLVED → VERIFIED
Crash volume for signature 'js::jit::BacktrackingAllocator::splitAt':
 - nightly (version 50): 1 crash from 2016-06-06.
 - aurora  (version 49): 16 crashes from 2016-06-07.
 - beta    (version 48): 357 crashes from 2016-06-06.
 - release (version 47): 805 crashes from 2016-05-31.
 - esr     (version 45): 283 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          1          0          0          0          0          0
 - aurora           0          3          1          4          3          4          0
 - beta            65         49         57         56         51         45         17
 - release        105        102        121        124        124        130         50
 - esr             19         30         35         38         25         29         28

Affected platforms: Windows, Mac OS X, Linux
Re the crash reports above, those are likely something else than this bug.  (This bug was x86-only, for one thing; I think we don't build 32-bit for Mac OS X any longer.)
Flags: needinfo?(jdemooij)
Filed bug 1293592 to track this.
Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.