Closed
Bug 1231024
Opened 9 years ago
Closed 9 years ago
Assertion failure: !minimalBundle(bundle) at jit/BacktrackingAllocator.cpp:1297
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
People
(Reporter: bc, Assigned: lth)
References
()
Details
(4 keywords)
Crash Data
Attachments
(10 files, 5 obsolete files)
134.67 KB,
text/plain
|
Details | |
126.39 KB,
text/plain
|
Details | |
939.71 KB,
text/plain
|
Details | |
1.48 MB,
application/x-gzip
|
Details | |
6.49 KB,
text/html
|
Details | |
6.13 KB,
text/x-c
|
Details | |
1.57 KB,
application/x-javascript
|
Details | |
86.52 KB,
text/plain
|
Details | |
918 bytes,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
1.70 KB,
patch
|
jandem
:
review+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
1. https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/08/05/how-to-maximise-cpu-performance-for-the-oracle-database-on-linux
2. Assertion failure: !minimalBundle(bundle), at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jit/BacktrackingAllocator.cpp:1297
Windows, Linux Nightly/45 at least
reproduced with https://hg.mozilla.org/mozilla-central/rev/cc9c6cd756cb744596ba039dcc5ad3065a7cc3ea though that is not the regressor.
I can reproduce with a saved version of the page. I'll try to reduce.
Comment 1•9 years ago
|
||
Brian do you have some cycles to look at this? Thanks!
Flags: needinfo?(bhackett1024)
Comment 2•9 years ago
|
||
If this can be reduced and is reproducible then a blame cset would be good. This assertion is often a bug in the lowering phase.
Reporter | ||
Comment 3•9 years ago
|
||
It is reproducible and I am working on reducing it but it will take time. mozregression doesn't support debug builds for a long enough range and I don't know when I will be able to manually bisect it.
Comment 4•9 years ago
|
||
Alice0775, CC'ing you in case you're bored and looking for a bug to bisect :)
Comment 5•9 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #4)
> Alice0775, CC'ing you in case you're bored and looking for a bug to bisect :)
Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=21730adb78b62b34cd4ed45eb08b9841bb1524ef&tochange=cd35e12d8f91f410f71b6f5d3023a40e9071e24b
Suspect: Bug 1108290
Reporter | ||
Comment 7•9 years ago
|
||
Looks like 32bit only. Windows 7 and RHEL6.
Crashes opt too:
bp-85b1bd8a-4cb0-4c51-a15a-1c2c62151208
bp-ff17cd4a-f147-4c8a-b78f-0a0bf2151208
[@ js::jit::BacktrackingAllocator::splitAt ]
Crash Signature: [@ js::jit::BacktrackingAllocator::splitAt ]
Has Regression Range: --- → yes
Flags: needinfo?(bob)
Hardware: Unspecified → x86
Reporter | ||
Comment 8•9 years ago
|
||
Bughunter did see an additional opt signature on Windows
js::jit::TempAllocator::allocateInfallible(unsigned int) with an opt assertion?
Assertion failure: result ([OOM] Is it really infallible?), at c:\builds\moz2_slave\m-cen-w32-00000000000000000000\build\src\js\src\ds/LifoAlloc.h:281
Assignee | ||
Comment 9•9 years ago
|
||
Disregarding the possible OOM issue: The register management for that optimization on 32-bit x86 is particularly unpleasant, because there are too few registers to go around and some do double duty. It's possible there's a reg assignment bug or (more likely) that there's a regalloc invariant I wasn't aware of.
Comment 10•9 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #9)
> Disregarding the possible OOM issue: The register management for that
> optimization on 32-bit x86 is particularly unpleasant, because there are too
> few registers to go around and some do double duty. It's possible there's a
> reg assignment bug or (more likely) that there's a regalloc invariant I
> wasn't aware of.
If someone can attach the output from setting IONFLAGS=regalloc for the compilation where it's crashing, that should help with diagnosing the problem.
Reporter | ||
Comment 11•9 years ago
|
||
With a debug build from today with IONFLAGS=regalloc I get
Assertion failure: int32_t(outlen) > 0, at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jsprf.cpp:999
on startup without even attempting to load the url.
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #11)
> Created attachment 8697020 [details]
> 1231024.log
>
> With a debug build from today with IONFLAGS=regalloc I get
>
> Assertion failure: int32_t(outlen) > 0, at
> c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jsprf.
> cpp:999
>
> on startup without even attempting to load the url.
That sounds like bug 1231581, making that a blocker for the time being.
Depends on: 1231581
Comment 13•9 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #12)
> (In reply to Bob Clary [:bc:] from comment #11)
> > Created attachment 8697020 [details]
> > 1231024.log
> >
> > With a debug build from today with IONFLAGS=regalloc I get
> >
> > Assertion failure: int32_t(outlen) > 0, at
> > c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jsprf.
> > cpp:999
> >
> > on startup without even attempting to load the url.
>
> That sounds like bug 1231581, making that a blocker for the time being.
Drive-by comment:
liveRange dump has a relatively small buffer for JitSpew, which has only 2000 bytes currently.
I tried to enlarge it to 4096, and crashed again. turns out the liveBundle and liveRange could be
huge, and this is the largest result I profiled agains kraken benchmark:
LiveRange usesCount: 760
LiveRange usesCount: 762
LiveRange usesCount: 762
LiveRange usesCount: 762
LiveRange usesCount: 770
LiveRange usesCount: 770
LiveRange usesCount: 770
LiveBundle usesCount: 7
LiveBundle usesCount: 7
LiveBundle usesCount: 7
the easiest way is to try enlarge the buffer size (both LiveRange::toString() and LiveBundle::toString()), or you can replace the MOZ_CRASH() with 'break', if it is urgent.
Comment 14•9 years ago
|
||
Hi,
so the simplest workarounds it to add the buffer size up to 20000.
(the buffers are in static var area, not on the stack, so 20k is ok.)
Tested on kraken benchmark. Hope it helps :)
Flags: needinfo?(bob)
Reporter | ||
Comment 15•9 years ago
|
||
Wei Wu: I did a Windows 32bit try build with your patch. You can find it at https://treeherder.mozilla.org/#/jobs?repo=try&revision=408fd6ef12a3&selectedJob=14557749
Running the debug build on Windows 7 64bit still asserts: Assertion failure: !minimalBundle(bundle), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/BacktrackingAllocator.cpp:1297
Running the build on Windows 7 32bit also still asserts: Assertion failure: !minimalBundle(bundle), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/BacktrackingAllocator.cpp:1297
Flags: needinfo?(bob)
Comment 16•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #15)
> Wei Wu: I did a Windows 32bit try build with your patch. You can find it at
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=408fd6ef12a3&selectedJob=14557749
>
> Running the debug build on Windows 7 64bit still asserts: Assertion failure:
> !minimalBundle(bundle), at
> c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/
> BacktrackingAllocator.cpp:1297
>
> Running the build on Windows 7 32bit also still asserts: Assertion failure:
> !minimalBundle(bundle), at
> c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/js/src/jit/
> BacktrackingAllocator.cpp:1297
ok seems like the dump is realy big.. this patch outputs "truncates" when the buffer is full instead of crash. Hope this patch will work :)
Flags: needinfo?(bob)
Reporter | ||
Comment 17•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dd6dda17a654 build failure. Please needinfo me when you have a good try debug build I can download.
Flags: needinfo?(bob)
Comment 18•9 years ago
|
||
Comment 19•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9845ea11946f
looks green now.
FYI it was built on top of mozilla-central. if you want to build in mozilla-inbound,
try apply https://hg.mozilla.org/try/rev/9845ea11946f.
Flags: needinfo?(bob)
Comment 20•9 years ago
|
||
Comment on attachment 8697954 [details] [diff] [review]
bug1231024.truncates.patch
obsolete old patches.
Attachment #8697954 -
Attachment is obsolete: true
Comment 21•9 years ago
|
||
Comment on attachment 8697589 [details] [diff] [review]
bug1231024.workarounds.patch
obsolete old patches.
Attachment #8697589 -
Attachment is obsolete: true
Comment 23•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #22)
> Same assertion, sorry.
so can we get IONFLAGS=regalloc jitspew logs?
Reporter | ||
Comment 24•9 years ago
|
||
Worked this time. :-)
Comment 25•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #24)
> Created attachment 8698062 [details]
> ionflags.txt.gz
>
> Worked this time. :-)
Thank you :)
Reporter | ||
Comment 26•9 years ago
|
||
Comment 27•9 years ago
|
||
Bob, thanks a lot for the reduced testcase! Here's a version that runs in the shell.
It asserts with --no-threads on x86.
We also have a crash in BacktrackingAllocator::splitAt on crash-stats, might be the same issue.
Comment 28•9 years ago
|
||
Brian, Lars: could one of you please take a look?
[Tracking Requested - why for this release]:
Assignee | ||
Comment 29•9 years ago
|
||
At issue is this snippet on lines 14-16:
e = "function" == typeof w ? w.apply(void 0, e) : w, a.a(function(s, p) {
p && s && (b[s] = p, h(s))
}, o, e)
more clearly written as
if (typeof w == "function")
e = w.apply(void 0, e);
else
e = w;
a.a(function(s, p) { p && s && (b[s] = p, h(s)) }, o, e)
where only the second line matters. Without looking more deeply yet it could appear that the output of the call requires a register while at the same time the implementation of apply uses all the available registers for its own purposes. I'll try to construct a smaller test case to demonstrate this.
Assignee | ||
Comment 30•9 years ago
|
||
I actually think that this additionally is the result of some inlining which ends up creating a function with two inlined apply instances, which then seem to conflict in their allocations, but I'm not yet adept at reading the regalloc spew to the point where I understand it fully (though I have a much smaller log than the one previously offered).
Assignee | ||
Comment 31•9 years ago
|
||
The problem is the following deobfuscated function:
function define_() {
var args = [].slice.call(arguments);
var key = "string" == typeof args[0] ? args.shift() : void 0;
var keys = args.length > 1 ? args.shift() : [];
let w = args[0];
for (let x=0;;) {
let t = function () // *A*
{
let values = [];
for (let k=0; k < keys.length ; k++) // *B*
values.push(definitions[keys[k]]) // *C*
if (typeof w == "function") {
values = w.apply(void 0, values);
}
else
values = w;
if (values && key)
definitions[key] = values;
}
t(); // *D*
if (x >= keys.length)
break;
x++;
}
}
This fails with the same assertion (within the context of the program, I'll attach an updated TC). It can be made to work in two ways: either by commenting out the lines at *A* and *D* so that the function is in-line, or by moving the two lines at *B* and *C* into the "then" arm of the following "if", where they belong.
Assignee | ||
Comment 32•9 years ago
|
||
Run this with --no-threads --ion-warmup-threshold=100.
Assignee | ||
Comment 33•9 years ago
|
||
Attachment #8705099 -
Attachment is obsolete: true
Assignee | ||
Comment 34•9 years ago
|
||
Last go-around, this is the smallest I've been able to make it. It appears that some amount of polymorphism at the call site is necessary; if there is not a sufficiently large number of functions then the assertion is not triggered. And there needs to be some diversity in control paths through the function.
However, inlining is not necessary. This still fails in a 32-bit Mac OS X build with --no-threads --ion-warmup-threshold=100 --ion-inlining=off.
Attachment #8705122 -
Attachment is obsolete: true
Assignee | ||
Comment 35•9 years ago
|
||
regalloc spew for a run of the final test case with the flags from my last comment
Assignee | ||
Comment 36•9 years ago
|
||
Based on my inspection of the spew last week it could look like this fails on x86 because some values are Values and need more than one register, but I'm still not familiar enough with regalloc to say. It's unfortunately likely that I won't have time to figure this out before the train leaves. I'll offer a patch to disable the optimization on X86 for now.
Assignee | ||
Comment 37•9 years ago
|
||
Attachment #8706245 -
Flags: review?(jdemooij)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Comment 38•9 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #37)
> Created attachment 8706245 [details] [diff] [review]
> disable fn.apply(...,array) optimization on x86
If we do that, I will recommend to do it based on the value representation instead of the architecture. Such that we don't get similar issues on ARM.
Assignee | ||
Comment 39•9 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #38)
> (In reply to Lars T Hansen [:lth] from comment #37)
> > Created attachment 8706245 [details] [diff] [review]
> > disable fn.apply(...,array) optimization on x86
>
> If we do that, I will recommend to do it based on the value representation
> instead of the architecture. Such that we don't get similar issues on ARM.
That seems reasonable, even if I think ARM should have enough registers to be OK here.
The problem is that I have very limited time right before the FF46 train leaves, and what I have is not a very deep diagnosis of the problem - I could be wrong about what's going on.
(It might be safer just to disable the optimization on all platforms until we have a diagnosis.)
Comment 40•9 years ago
|
||
Unfortunately, the regalloc spew is still messed up.
Attachment #8707580 -
Flags: review?(jcoppeard)
Comment 41•9 years ago
|
||
With the above patch, it's clearer what the problem is from the regalloc spew. Looking at the assertion failure:
[RegAlloc] Allocating v125 [298,300) v125:%ecx@298 [priority 2] [weight 2000000]
[RegAlloc] Requirement %ecx, due to use at 298
[RegAlloc] %ecx collides with v139 [299,302) %ecx (def) v139:*@301 [weight 2000000]
Assertion failure: !minimalBundle(bundle)
The [298,300) is the instruction range being allocated, and looking at the instructions from the beginning of the register allocation:
[298,299 ApplyArrayGeneric] [def v139<t>:%ecx] [def v140<p>:%edx] [temp v135<g>:%eax] [temp v136<g>:%ebx] [use v125:%ecx] [use v134:%edi] [use v137:%esi] [use v138:%edx] [use v84:*] [use v84:*] [use v85:*] [use v86:*] [use v87:*] [use v91:*] [use v84:*] [use v99:*] [use v101:*] [use v125:*] [use v130:*] [use v131:*]
v139 is a fixed definition in %ecx, and v125 is a fixed use in %ecx. This would be OK if v125 was usedAtStart by the instruction, in which case its live range covers only the input of the instruction and not the output (this permits the register allocator to reuse the input's register for an output). Unfortunately you can't tell from the 'use' above whether the use is AtStart or not, but looking at the source for LIRGenerator::visitApplyArray shows the two uses are useFixed instead of useFixedAtStart, and changing the first one to useFixedAtStart fixes this assertion. I don't know if doing that will require other changes to the ApplyArray codegen though, the latter looks pretty complicated.
Flags: needinfo?(bhackett1024)
Assignee | ||
Comment 42•9 years ago
|
||
Thanks, that's very helpful. That code is hairy so I'll need to look into it, but at least there's no indication of a regalloc bug, ie, we can disable the opt on x86 if we want to, cf pending patch, if there's not enough time to fix this before the train leaves.
Updated•9 years ago
|
Attachment #8707580 -
Flags: review?(jcoppeard) → review+
Comment 43•9 years ago
|
||
Comment 44•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Assignee | ||
Comment 45•9 years ago
|
||
That patch was not a fix, just a correction to regalloc spew.
Assignee | ||
Comment 46•9 years ago
|
||
Assignee | ||
Comment 47•9 years ago
|
||
I believe both uses could be marked AtStart, trying that now.
Assignee | ||
Comment 48•9 years ago
|
||
Attachment #8709040 -
Flags: review?(jdemooij)
Assignee | ||
Updated•9 years ago
|
Attachment #8706245 -
Attachment is obsolete: true
Attachment #8706245 -
Flags: review?(jdemooij)
Assignee | ||
Updated•9 years ago
|
Comment 50•9 years ago
|
||
Comment on attachment 8709040 [details] [diff] [review]
narrow the live range for values
Review of attachment 8709040 [details] [diff] [review]:
-----------------------------------------------------------------
Since all registers are fixed, do you know why this didn't fail all the time?
Attachment #8709040 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 51•9 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #50)
> Comment on attachment 8709040 [details] [diff] [review]
> narrow the live range for values
>
> Review of attachment 8709040 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Since all registers are fixed, do you know why this didn't fail all the time?
I don't. Given my experiments with the test case (comment 34) it seems that some amount of polymorhism is required to have been detected at the time the code is being jitted, for example, there needs to be a certain number of distinct functions that are applied at the call site; my guess is that this increases register pressure or further constrains register assignment and thus triggers the failure. (Not much of an analysis.)
Assignee | ||
Comment 52•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c8c14902800d80d248d8adf451c2f635fd611d2
Bug 1231024 - narrow the live range for values. r=jandem
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Comment 53•9 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Comment 54•9 years ago
|
||
Lars, could you request the uplift to beta? Thanks
Flags: needinfo?(lhansen)
Comment 55•9 years ago
|
||
Jan, this is a top crash of 45. Could you fill the uplift request for Lars? Thanks
Flags: needinfo?(jdemooij)
Keywords: topcrash
Assignee | ||
Comment 56•9 years ago
|
||
Comment on attachment 8709040 [details] [diff] [review]
narrow the live range for values
Approval Request Comment
[Feature/regressing bug #]: Corrects bug in bug 1108290
[User impact if declined]: Crashes
[Describe test coverage new/current, TreeHerder]: None in patch
[Risks and why]: Zero, to the best of my knowledge, corrects regalloc bug
[String/UUID change made/needed]: None
Flags: needinfo?(lhansen)
Attachment #8709040 -
Flags: approval-mozilla-beta?
Comment 58•9 years ago
|
||
Comment on attachment 8709040 [details] [diff] [review]
narrow the live range for values
thanks
Should be in 45 beta 3
Attachment #8709040 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 59•9 years ago
|
||
bugherder uplift |
Updated•9 years ago
|
Flags: qe-verify+
Comment 60•9 years ago
|
||
Reproduced the issue on Nightly 2015-12-08 debug build, Win 7 x64.
Verified fixed 45b7, 46.0a2 (2016-02-19) debug builds.
Comment 61•8 years ago
|
||
Crash volume for signature 'js::jit::BacktrackingAllocator::splitAt':
- nightly (version 50): 1 crash from 2016-06-06.
- aurora (version 49): 16 crashes from 2016-06-07.
- beta (version 48): 357 crashes from 2016-06-06.
- release (version 47): 805 crashes from 2016-05-31.
- esr (version 45): 283 crashes from 2016-04-07.
Crash volume on the last weeks:
Week N-1 Week N-2 Week N-3 Week N-4 Week N-5 Week N-6 Week N-7
- nightly 0 1 0 0 0 0 0
- aurora 0 3 1 4 3 4 0
- beta 65 49 57 56 51 45 17
- release 105 102 121 124 124 130 50
- esr 19 30 35 38 25 29 28
Affected platforms: Windows, Mac OS X, Linux
status-firefox47:
--- → affected
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox-esr45:
--- → affected
Assignee | ||
Comment 62•8 years ago
|
||
Re the crash reports above, those are likely something else than this bug. (This bug was x86-only, for one thing; I think we don't build 32-bit for Mac OS X any longer.)
Flags: needinfo?(jdemooij)
You need to log in
before you can comment on or make changes to this bug.
Description
•